Dark Mode

N e w s C e n t r a l

Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back

<p>Wikipedia has been <a href="https://www.engadget.com/ai/wikipedia-is-struggling-with-voracious-ai-bot-crawlers-121546854.html?_fsig=Wr5Dq_GeIVF_s2qPwjs2Ig--%7EA"><ins>struggling</ins></a> with the impact that <a href="https://www.engadget.com/websites-accuse-ai-startup-anthropic-of-bypassing-their-anti-scraping-rules-and-protocol-133022756.html"><ins>AI crawlers</ins></a> — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been having on its servers, leading to increased costs and slower load times for human users in some cases. Perhaps in an effort to stop the bots from pummeling the public Wikipedia website and soaking up too much bandwidth, the Wikimedia Foundation (which manages Wikipedia's data) is offering AI developers a dataset they can freely use.</p> <p>The organization has teamed up with Kaggle, a data science platform, to offer up a beta release of a structured dataset in both English and French. <a href="https://blog.google/technology/developers/kaggle-wikimedia/"><ins>According to Google</ins></a> — which owns Kaggle — the dataset is formatted for machine learning to make it more useful for training, development and data science.</p> <span id="end-legacy-contents"></span><p>Wikimedia Enterprise <a href="https://enterprise.wikimedia.com/blog/kaggle-dataset/"><ins>notes</ins></a> that the dataset includes &quot;abstracts, short descriptions, infobox-style key-value data, image links and clearly segmented article sections.&quot; There are no references or other &quot;non-prose elements,&quot; such as video clips. The lack of references could make the issue of attribution for information in the dataset somewhat foggy. However, Wikimedia Enterprise (a part of the Wikimedia Foundation that seeks to make Wikipedia data available through APIs) says that the content in the dataset is freely licensed under Creative Commons, the public domain and so on since it's all from Wikipedia.</p>This article originally appeared on Engadget at https://www.engadget.com/ai/wikipedia-offers-ai-developers-a-training-dataset-to-maybe-get-scraper-bots-off-its-back-143255593.html?src=rss

Source: www.engadget.com

Positive Reception: Positive

Read more

China cracks down on 'autonomous' car claims after fatal accident

<p>Chinese authorities have banned automakers from using terms such as &quot;smart driving&quot; and &quot;autonomous driving&quot; for ads in the country, according to <a href="https://www.reuters.com/business/autos-transportation/china-bans-smart-autonomous-driving-terms-vehicle-ads-2025-04-17/"><em>Reuters</em></a>. The Ministry of Industry and Information Technology has tightened its rules for advertising driving assistance features following a fatal crash involving a Xiaomi SUV7 (pictured above), which raised concerns about the technology's safety. Based on Xiaomi's report, the vehicle's driving assistance mode was switched on when the vehicle was approaching a construction zone, but the driver took control right before the SUV collided with a concrete barrier. The electric vehicle went up in flames, with the accident <a href="https://carnewschina.com/2025/04/01/first-fatal-accident-involving-xiaomi-su7-claims-three-lives-on-chinese-highway/">claiming three lives</a>.&nbsp;</p> <p>Back in 2022, the California DMV <a href="https://www.engadget.com/california-dmv-accuses-tesla-false-advertising-130350292.html">accused Tesla</a> of falsely portraying its vehicles as fully autonomous based on the language it used on its website, though that didn't lead to a ban on advertising terms. Chinese authorities announced the new rule at a meeting attended by 60 representatives from the automobile industry. In addition to the new advertising rules, they also announced that they're prohibiting automakers from testing and improving their driver assistance systems via remote software upgrades if they're already in the hands of customers. If the companies want to roll out updates over the air, they'll have to secure an approval for them after conducting a battery of tests.&nbsp;</p> <span id="end-legacy-contents"></span><p>As <em>Reuters</em> noted, there's a growing competition in the Chinese automotive industry with companies launching vehicles promising &quot;smart driving&quot; capabilities. BYD, the <a href="https://edition.cnn.com/2025/03/26/cars/china-byd-profile-tesla-rival-intl-hnk/index.html">top Chinese EV manufacturer</a> based in Shenzhen, <a href="https://www.reuters.com/business/autos-transportation/chinas-byd-sell-21-models-with-its-gods-eye-smart-driving-tech-2025-02-10/">rolled out a whopping 21 models</a> of electric vehicles in February, with the company's free &quot;smart driving&quot; features being one of their main selling points. These automakers may now have to alter their advertising materials in order to comply with the new regulations.</p>This article originally appeared on Engadget at https://www.engadget.com/transportation/china-cracks-down-on-autonomous-car-claims-after-fatal-accident-143026741.html?src=rss

Source: www.engadget.com

Positive Reception: Positive

Read more

Ghost forests are growing as sea levels rise

As trees choked by saltwater die along low-lying coasts, marshes may move in.

Source: arstechnica.com

Neutral Reception: Neutral

Read more

Lichens can survive almost anything, and some might survive Mars

The symbiotic organisms appear to be able to avoid some radiation damage.

Source: arstechnica.com

Positive Reception: Positive

Read more

Google adds YouTube Music feature to end annoying volume shifts

Automatic audio leveling is coming to YouTube Music.

Source: arstechnica.com

Neutral Reception: Neutral

Read more

Trump official to Katy Perry and Bezos’ fiancée: “You cannot identify as an astronaut”

It turns out the FAA now takes no role in identifying who is an astronaut.

Source: arstechnica.com

Neutral Reception: Neutral

Read more

Microsoft’s “1‑bit” AI model runs on a CPU only, while matching larger systems

Future AI might not need supercomputers thanks to models like BitNet b1.58 2B4T.

Source: arstechnica.com

Positive Reception: Positive

Read more

Synology confirms that higher-end NAS products will require its branded drives

Firm will later add "curated drive compatibility" lists after testing.

Source: arstechnica.com

Negative Reception: Negative

Read more

To regenerate a head, you first have to know where your tail is

Planaria can't replace a missing head until after the tail develops sufficiently.

Source: arstechnica.com

Negative Reception: Negative

Read more

Regrets: Actors who sold AI avatars stuck in Black Mirror-esque dystopia

Is $1,000 worth being the AI face of obvious scams? Rueful actors say no.

Source: arstechnica.com

Positive Reception: Positive

Read more
Link copied!