Mastodon @Mastodon

Replied in thread

**Synapsenkitzler** @synapsenkitzler@digitalcourage.social · 1d *

Synapsenkitzler @synapsenkitzler@digitalcourage.social

11.4 AK KI
12 Ausblick und Schlussbemerkung
13 Anhang
13.1 DSGVO Art. 51 ff.
13.2 DSGVO Art. 85
13.3 MStV § 12, § 23, § 113
13.4 TDDDG § 25
13.5 Regelungen zum Rundfunkdatenschutzbeauftragten
13.6 RDSK-Mitgliederliste
13.7 RDSK-Verwaltungsvereinbarung

[ENDE]

#Rundfunkdatenschutzbeauftragter #Datenschutzrecht #Datenstrategie

**Martin Owens** @doctormo@floss.social · 1d *

1d *

Martin Owens @doctormo@floss.social

I've set up my new #inkscape website AI bot trap. It works by giving everyone a chance to not fall into it.

An anchor link that says "I am a bot" and links to /P3W-451/{datetime}/ it's got a fixed position at top -100px so should never be seen

The robots.txt says "Disallow: /P3W-451/" so if you were reading the robots, you'd know.

Then #nginx logs the requests to a log of their ip-addresses and browser strings and sends them a 301 redirect to google.com

#ai #Scraping

1/2

**Strypey** @strypey@mastodon.nzoss.nz · 1d

Strypey @strypey@mastodon.nzoss.nz

Joshua Yuvaraj, co-director of the New Zealand Centre for Intellectual Property, was interviewed on RNZ yesterday, about the degree to which copyright law might be used to prevent scraping of the open web by #MOLE Trainers;

https://www.rnz.co.nz/national/programmes/nights/audio/2018981590/what-can-writers-do-about-their-work-being-used-to-train-ai-models

As Cory Doctorow noted back in 2023;

"In privacy and labor fights, copyright is a clumsy tool at best."

https://pluralistic.net/2023/09/17/how-to-think-about-scraping/

RNZ · 2dWhat can writers do about their work being used to train AI models?Joshua Yuvaraj is the co-director of the New Zealand Centre for Intellectual Property, and a senior lecturer in law at the University of Auckland specialising in copyright and artificial intelligence.

#RNZ #NZCIP #JoshuaYuvaraj

Replied in thread

**sheislaurence** @sheislaurence · 1d

sheislaurence @sheislaurence

@nimi @papuass @stefan @freediverx yeah except you can't force bad actors to use your commercial API if they still have an open route in, that basically cost them next to nothing. It really doesn't matter #scraping isn't elegant. It works, it's cheap. It's basically an arms race that #opensource #openknowledge were never designed to wage. My only hope is that the #cyberpunk spirit will reorganise itself along those faultlines and fight the good fight.

**Liz Probert** @greysquirrel@greennet.social · 1d

Liz Probert @greysquirrel@greennet.social

How crawlers impact the operations of the Wikimedia projects https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects

Diff · 2dHow crawlers impact the operations of the Wikimedia projectsSince the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grow…

**Carbon Carrot** @CarbonCarrot · 1d *

1d *

Carbon Carrot @CarbonCarrot

Wikimedia Infrastructure is being mass-scraped for AI Usage — the content is free, the infrastructure is not. https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/ #AI, #Crawlers, #Infrastructure, #KnowledgeAsAService, #KnowledgeContent, #Operations, #Scraping, #ScrapingBots, #Traffic, #WikimediaFoundation, #WikimediaProjects

(original repost on lobsters: https://lobste.rs/s/autpsf/how_crawlers_impact_operations)

**DSGVO-Portal** @dsgvoportal@social.tchncs.de · 1d

DSGVO-Portal @dsgvoportal@social.tchncs.de

Oberlandesgericht Celle, Urteil vom 20.03.2025, 5 U 129-24: Objektiver Kontrollverlust als immaterieller Schaden anerkannt. #Schadensersatz #Scraping #Immaterieller #Schaden #Datenminimierung #teamdatenschutz #dsgvoportal https://www.dsgvo-portal.de/gerichtsentscheidungen/2025-03-20-OLGCE-5-U-129-24-Schadensersatz-Scraping-Immaterieller-Schaden-Datenminimierung-2170.php

Compliance Essentials GmbH · 1dObjektiver Kontrollverlust als immaterieller Schaden anerkannt. | 02.04.2025 | dsgvo-portal.deBy Martin Holzhofer

**DSGVO-Portal** @dsgvoportal@social.tchncs.de · 3d

DSGVO-Portal @dsgvoportal@social.tchncs.de

Oberlandesgericht Düsseldorf, Urteil vom 14.03.2025, 16 U 94-24: 100 EUR Schadensersatz für Kontrollverlust. #Soziale #Netzwerke #Scraping #Immaterieller #Schaden #teamdatenschutz #dsgvoportal https://www.dsgvo-portal.de/gerichtsentscheidungen/2025-03-14-OLGDUS-16-U-94-24-Soziale-Netzwerke-Scraping-Immaterieller-Schaden-2165.php

Compliance Essentials GmbH · 3d100 EUR Schadensersatz für Kontrollverlust. | 31.03.2025 | dsgvo-portal.deBy Martin Holzhofer

**DSGVO-Portal** @dsgvoportal@social.tchncs.de · 3d

DSGVO-Portal @dsgvoportal@social.tchncs.de

Oberlandesgericht Düsseldorf, Urteil vom 13.03.2025, 16 U 135-23: 100 EUR Schadensersatz für Kontrollverlust. #Soziale #Netzwerke #Scraping #teamdatenschutz #dsgvoportal https://www.dsgvo-portal.de/gerichtsentscheidungen/2025-03-13-OLGDUS-16-U-135-23-Soziale-Netzwerke-Scraping-2164.php

Compliance Essentials GmbH · 3d100 EUR Schadensersatz für Kontrollverlust. | 31.03.2025 | dsgvo-portal.deBy Martin Holzhofer

**Venkatesh-Prasad Ranganath** @orderwithchaos · 5d

Venkatesh-Prasad Ranganath @orderwithchaos

An interesting code hosting related downside of AI. #ai #ddos #web #scraping #copyright #code

https://techcrunch.com/2025/03/27/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/

TechCrunch · Mar 27Open source devs are fighting AI crawlers with cleverness and vengeance | TechCrunchAI web crawling bots are the cockroaches of the internet, many developers believe. FOSS devs are fighting back in ingenuous, humorous ways.

**𝙳𝚊𝚗𝚒𝚎𝚕𝚎 𝙼𝚒𝚌𝚌𝚒** @grimjfoot@mastodon.uno · 5d

𝙳𝚊𝚗𝚒𝚎𝚕𝚎 𝙼𝚒𝚌𝚌𝚒 @grimjfoot@mastodon.uno

Non ci sarebbero carte di credito e account di autenticazione, ma indirizzi, nomi, cognomi, numeri di telefono e ordini. Una questione gravissima, che riguarda circa 7 milioni di clienti dal 2008.

https://www.dday.it/redazione/52522/hanno-bucato-eprice-i-dati-di-68-milioni-di-clienti-in-vendita

DDay.it · 5dHanno bucato ePrice: i dati di 6.8 milioni di clienti in venditaBy Roberto Pezzali

#eprice #scraping #darkweb

**Niebezpiecznik News** @niebezpiecznikbot@mastodon.com.pl · 6d

Niebezpiecznik News @niebezpiecznikbot@mastodon.com.pl

Rickroll w repo i bomba w (niezbyt) głębokim ukryciu

Podatności aplikacji webowych miewają różną genezę. Mogą być niezawinione – błędom typu 0-day nie da się skutecznie zapobiec. Mogą być w pełni zawinione – gdy w trzeciej dekadzie XXI wieku programista pisze kod podatny na SQL injection. Mogą być też wynikiem roztargnienia lub nieuwagi – np. wtedy, gdy na świat wystawione zostanie repozytorium kodu, w którym przechowywana jest owa aplikacja. Nieuprawniony dostęp do takiego repozytorium może oznaczać przejęcie kontroli nad całą webaplikacją – gdy w kodzie źródłowym albo plikach konfiguracyjnych znajdziemy hasło do bazy danych, klucze prywatne do usług chmurowych albo plik tekstowy z hasłem administratora. Podobnie stanie się, gdy operator zapisze kopię bezpieczeństwa do pliku backup.zip zlokalizowanego w głównym katalogu witryny.
Ja postanowiłem zażartować sobie z osób, które szukają takich podatności.
Autorem artykułu jest Tomasz Zieliński, autor szkolenia z automatyzacji pobierania danych z internetu (scrapowanie.pl), w wolnych chwilach prowadzący bloga Informatyk Zakładowy (informatykzakladowy.pl). Za publikację nie otrzymaliśmy wynagrodzenia, ale otrzymamy świadczenie barterowe.
Trojański backup.zip, który wybucha
Z kopią bezpieczeństwa było łatwo – plik https://informatykzakladowy.pl/backup.zip to tak zwana ZIP-bomba. Jest to archiwum plików spreparowane w taki sposób, aby zawartość po rozpakowaniu zajmowała możliwie najwięcej miejsca. Skorzystałem z wariantu opisanego na stronie bamsoftware.com – choć sam plik ZIP ma niespełna 10 megabajtów, to do zapisania zdekompresowanej zawartości potrzeba 281 terabajtów. Dla porównania – typowy dysk twardy w domowym komputerze ma nie więcej niż dwa terabajty.
W dzisiejszych czasach ZIP-bomby są raczej [...]

#ARTYKUŁSPONSOROWANY #Śmieszne #Git #Repozytoria #Scraping #Scrapowanie #Wycieki

https://niebezpiecznik.pl/post/rickroll-w-repo-i-bomba-w-niezbyt-glebokim-ukryciu/

Replied in thread

**Petra van Cronenburg** @NatureMC@mastodon.online · 6d *

6d *

Petra van Cronenburg @NatureMC@mastodon.online

@susankayequinn Here's another article by @brianmerchant : https://www.bloodinthemachine.com/p/openais-studio-ghibli-meme-factory
"AI giants are indeed eating away at the livelihoods and dignity of working artists, and this devouring, appropriating, and automation of the production of art, of culture, at a scale truly never seen before, should not be underestimated as a menace"

Blood in the Machine · Mar 27OpenAI's Studio Ghibli meme factory is an insult to art itselfBy Brian Merchant

#AI #OpenAI #StudioGhibli

**DSGVO-Portal** @dsgvoportal@social.tchncs.de · Mar 27

Mar 27

DSGVO-Portal @dsgvoportal@social.tchncs.de

Oberlandesgericht München, Beschluss vom 13.02.2025, 24 U 3020-24 e: Kein Kontrollverlust durch Verknüpfung einer Mobilfunknummer mit einem Fantasienamen. #Telefonnummer #Scraping #Soziale #Netzwerke #Schadensersatz #teamdatenschutz #dsgvoportal https://www.dsgvo-portal.de/gerichtsentscheidungen/2025-02-13-OLGM-24-U-Telefonnummer-Scraping-Soziale-Netzwerke-Schadensersatz-2162.php

Compliance Essentials GmbH · Mar 27Kein Kontrollverlust durch Verknüpfung einer Mobilfunknummer mit einem Fantasienamen. | 27.03.2025 | dsgvo-portal.deBy Martin Holzhofer

**Petra van Cronenburg** @NatureMC@mastodon.online · Mar 27

Mar 27

Petra van Cronenburg @NatureMC@mastodon.online

"GPT-4o is partly (aside from some licensed content) a product of a massive scrape of the Internet without regard to copyright or consent from artists ... GPT-4o's image generation model (and the technology behind it, once open source) feels like it further erodes trust in remotely produced media ... Everyone needs media literacy skills ..." https://arstechnica.com/ai/2025/03/openais-new-ai-image-generator-is-potent-and-bound-to-provoke/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social via @arstechnica

Ars Technica · Mar 27OpenAI’s new AI image generator is potent and bound to provokeBy Benj Edwards

#AI #generativeAI #imageGenerator

**Renaud JOLY** @renaud_joly@piaille.fr · Mar 27

Mar 27

Renaud JOLY @renaud_joly@piaille.fr

Scraper pour entrainer. https://www.comparitech.com/proxies/web-scraping-for-ai-training/ #seo #scraping

Comparitech · Mar 27How to Use Web Scraping for Machine Learning and AI trainingThis article explains how to use web scraping for machine learning and AI training. It covers key concepts, tools, best practices, and applications to help you collect and use data effectively for smarter AI models.

Continued thread

**Simon Hewison** @zymurgic@mastodon.online · Mar 26

Mar 26

Simon Hewison @zymurgic@mastodon.online

another part of my day job involves working around systems designed to prevent mass AI-driven scraping, because humans and well-behaved query scripts are accidentally caught up in all the war-of-the-scrapers, because Cloudflare etc are offering what seems to management to be a magic bullet, and putting the bluntest of tools in front of anywhere that needs to be public, including APIs.
#scraping #api

**Simon Hewison** @zymurgic@mastodon.online · Mar 26

Mar 26

Simon Hewison @zymurgic@mastodon.online

Part of my day job involves using APIs to retrieve public data from third party public websites, some of which were never designed to publish raw data, so I tread lightly, no more than a human-driven query.
Part of my day job is preventing third party machines from hammering servers I run by incessant mass scraping - hundreds of thousands of ridiculous requests humans would never do or want (typically that's AI-driven scraping that doesn't abide by robots.txt).
I feel conflicted.
#scraping #api

**Winbuzzer** @winbuzzer · Mar 26

Mar 26

Winbuzzer @winbuzzer

AI Crawlers Overwhelm Open-Source Projects, Forcing Developers to Block Entire Countries

#AI #Web #Robotstxt #AIScraping #OpenSource #Cybersecurity #DataScraping #Scraping #WebScraping

https://winbuzzer.com/2025/03/26/ai-crawlers-overwhelm-open-source-projects-forcing-developers-to-block-entire-countries-xcxwbn/

**uǝuunɹƃʇǝO** @oetgrunnen@mstdn.social · Mar 25

Mar 25

uǝuunɹƃʇǝO @oetgrunnen@mstdn.social

Thoughts: AI corps scraping data

The corporations assert that they can utilize public data without incurring any costs, citing fair use as their justification.

To address this issue, we should implement a law that compels corporations claiming fair use as a defense to make all their process data publicly available, free of charge. This would ensure that the scraped data, as well as data derived from the freely available data, is accessible to the public.
#AI #FairUse #Scraping #WebScraping

Recent searches

Search options

Administered by:

Server stats:

#scraping