Mastodon @Mastodon

**Chinstrap Community** @ChinstrapCommunity · 3d

Chinstrap Community @ChinstrapCommunity

https://chinstrap.community/oss-maintainers-get-creative-in-fighting-ai-crawlers/

Chinstrap Community · Mar 28OSS maintainers get creative in fighting AI crawlersTechCrunch highlights a growing problem OSS project maintainers are facing with the rise of gen AI as well as the creative solutions they are developing to fight it: aggressive AI bots that crawl G…

#ai #anubis #coss

Replied in thread

**jtgd** @jtgd@sfba.social · 4d

jtgd @jtgd@sfba.social

@olliefrancis

You know how people who BitTorrent have software and lists of IP addresses to block? We need something like that for #AI #crawlers to stop this.

**DansLeRuSH ᴱᶰ** @danslerush@floss.social · 4d *

4d *

DansLeRuSH ᴱᶰ @danslerush@floss.social

« How #crawlers impact the operations of the #Wikimedia projects »

« […] we found out that at least 65% of this resource-consuming traffic we get for the website is coming from #bots, a disproportionate amount given the overall pageviews from bots are about 35% of the total. This high usage is also causing constant disruption for our Site Reliability team, who has to block overwhelming traffic from such crawlers before it causes issues for our readers »

› https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/

Diff · 5dHow crawlers impact the operations of the Wikimedia projectsSince the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grow…

#AI #IA #LLM

**Liz Probert** @greysquirrel@greennet.social · 4d

Liz Probert @greysquirrel@greennet.social

How crawlers impact the operations of the Wikimedia projects https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects

**mr.w0bb1t** @w0bb1t@tldr.nettime.org · 4d

mr.w0bb1t @w0bb1t@tldr.nettime.org

"[..] 65% of our most expensive traffic comes from #bots" · How #crawlers impact the operations of the @wikimediafoundation projects.

https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/

"Since January 2024, we have seen the bandwidth used for downloading multimedia content grow by 50%. This increase is not coming from human readers, but largely from automated programs that scrape the Wikimedia Commons image catalog of openly licensed images to feed images to AI models. Our infrastructure is built to sustain sudden traffic spikes from humans during high-interest events, but the amount of traffic generated by scraper bots is unprecedented and presents growing risks and costs. "

**Carbon Carrot** @CarbonCarrot · 4d *

4d *

Carbon Carrot @CarbonCarrot

Wikimedia Infrastructure is being mass-scraped for AI Usage — the content is free, the infrastructure is not. https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects

(original repost on lobsters: https://lobste.rs/s/autpsf/how_crawlers_impact_operations)

**Hacker News** @h4ckernews · 4d

Hacker News @h4ckernews

Wikipedia is struggling with voracious AI bot crawlers

https://www.engadget.com/ai/wikipedia-is-struggling-with-voracious-ai-bot-crawlers-121546854.html

Engadget · 4dWikipedia is struggling with voracious AI bot crawlersBy Mariella Moon

#HackerNews #Wikipedia #AI

Replied in thread

**jackcole** @jackcole@mstdn.social · 4d *

4d *

jackcole @jackcole@mstdn.social

@camwilson #AI #Crawlers are not only increasing bandwidth costs for #Wikipedia, but looking for code on which to train are similarly weighing down open software sources.

It's like some giant monster devouring resources and requiring nuclear fusion and all the fresh drinking water to do not very much. Interesting that animal intelligence gets by without consuming all the data in the world and a few worms, insects, or a peanut butter and jelly sandwich.

**Camelia** @camelia@tech.lgbt · 6d

Camelia @camelia@tech.lgbt

analyzing logs from yesterday, it looks like #GPTBot ALONE (among other #crawlers) has been making several requests per second to my server FOR THE WHOLE DAY (during which I wasn't able to access my server, unfortunately). It constantly sent requests to the same page, over and over again, until I was able to block it.

**Inautilo** @inautilo · 6d

Inautilo @inautilo

#Business #Introductions
Meet LLMs.txt · A proposed standard for AI website content crawling https://ilo.im/16318s

_____
#SEO #GEO #AI #Bots #Crawlers #LlmsTxt #RobotsTxt #Development #WebDev #Backend

Search Engine Land · Mar 28Meet LLMs.txt, a proposed standard for AI website content crawlingFind out what llms.txt is, how it works, how to think about it, whether LLMs and brands are buying in, and why you should pay attention.

**Βασίλης Βαλατσός** @aethrvmn@apotheke.earth · Mar 29 *

Mar 29 *

Βασίλης Βαλατσός @aethrvmn@apotheke.earth

The best way to stop #Chinese #ai #crawlers DDOSing your site is to make sure you are blocked by the #Great #Firewall

https://aethrvmn.gr/glory-ccp

Now I just need a copypasta like that for #Israel so that #US crawlers also stop crawling

**Light⁂** @light@hachyderm.io · Mar 28 *

Mar 28 *

Light⁂ @light@hachyderm.io

Thanks to Fijxu use of Anubis videos still can be watched on inv.nadeko.net.

I feel like because of the aggressive bot scraping that intensified not long ago will going to make it impossible to continue to use feed readers and the only way to interact with websites will going to be restricted to only be possible from web browsers.
Already opening up videos in mpv from my rss subscribed invidious feeds not working, it was my preferred way to watch videos. Just to clarify I'm aware that rss still works the only thing that doesn't is opening up video links directly with mpv or with any other video player that can do the same. And not only that but I fear at some point reading full articles inside an rss reader will not work forcing me to open article links in a web browser, even if some of feeds can fetch full articles minimizing the need to do so.

I'm not trying to minimize the impact of this scrapers that have on free and open source projects and on web admins who have to deal with this onslaught of bot activity, they are the ones who got it worst.

#invidious #anubis #bots

Replied in thread

**Kevin Karhan** @kkarhan@infosec.space · Mar 27

Mar 27

Kevin Karhan @kkarhan@infosec.space

@chloe is this a download-bomb to trap #Crawlers?

**katzenberger** @katzenberger@mastodon.de · Mar 26

Mar 26

katzenberger @katzenberger@mastodon.de

"#AI" #crawlers are a cancerous disease that must be eradicated, not just fended off.

»The costs are both technical and financial. The Read the Docs project reported that blocking AI crawlers immediately decreased their #traffic by 75 percent, going from 800GB per day to 200GB per day. This change saved the project approximately $1,500 per month in bandwidth costs, according to their blog post "AI crawlers need to be more respectful."«

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/

man sitting in sofa in a flooded living room, feets in water, writing on a laptop

Ars Technica · Mar 25Open source devs say AI crawlers dominate traffic, forcing blocks on entire countriesBy Benj Edwards

**Veronica Olsen** @veronica@mastodon.online · Mar 25

Mar 25

Veronica Olsen @veronica@mastodon.online

Who could have guessed that an industry whose entire business model is based on theft would behave like malware attacks on the Internet?

https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/?utm_source=mastodon&utm_medium=social

Ars Technica · Mar 25Open source devs say AI crawlers dominate traffic, forcing blocks on entire countriesBy Benj Edwards

#AI #DDoS #Crawlers

@m33@theprancingpony.in · Mar 23

Mar 23

@m33@theprancingpony.in

Cloudflare wrestling AI scrapers, not that I disagree, but how Cloudflare comes to decide who or what can access a website? They have a nearly monopolistic, man-in-the-middle position (like in CDN)

Challenging times

#cloudflare #ai #privacy

**Inautilo** @inautilo · Mar 22

Mar 22

Inautilo @inautilo

#Development #Announcements
Trapping bad bots in a labyrinth · Cloudflare can now punish bots breaking ‘no crawl’ rules https://ilo.im/162xjb

_____
#AI #GenerativeAI #Crawlers #Bots #Detection #Protection #Security #Website #WebDev #Backend

The Cloudflare Blog · Mar 19Trapping misbehaving bots in an AI LabyrinthHow Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.

**Alexey Staroselets** @alexeystar@mas.to · Mar 22

Mar 22

Alexey Staroselets @alexeystar@mas.to

Almost 22% of traffic to my website is generated by various web crawlers. More than a half of that belongs to #MJ12bot. I'm going to figure out if it truly obeys robots.txt, as it's said on their website.

#Majestic #crawlers

**William Shotts** @william_shotts@mstdn.social · Mar 21

Mar 21

William Shotts @william_shotts@mstdn.social

Using #AI to fight AI.

Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk content • The Register
https://www.theregister.com/2025/03/21/cloudflare_ai_labyrinth/

The Register · Mar 21Cloudflare builds an AI to lead AI scraper bots into a horrible maze of junk contentBy Simon Sharwood

#Cloud #CloudFlare #crawlers

**G. Clavier** @Enthalpiste@sciences.re · Mar 21

Mar 21

G. Clavier @Enthalpiste@sciences.re

L'article partagé par @nixCraft est effarant: https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/

#Internet est maintenant pourri de #bots #crawlers qui pourrissent les hébergeurs de code de #LogicielsLibres (#FOSS) au point de limite provoquer des DDOS, en tout cas de flinguer les serveurs. Et ce notamment parce qu'ils ne respectent pas les fichiers robots.txt qui sont censés les arrêter. Tout ça pour agglutiner de la data pour entrainer ces #IA à la con.

Qui l'eut cru? Qui aurait pu prévoir???

WHAT A TIME TO BE ALIVE!

LibreNews · Mar 20FOSS infrastructure is under attack by AI companiesLLM scrapers are taking down FOSS projects' infrastructure, and it's getting worse.

Recent searches

Search options

Administered by:

Server stats:

#crawlers