⚠️ The Fediverse has been scraped, again ⚠️

Almost six million posts from 363 instances have been scraped.

"All the posts with public visibility published by users hosted on Mastodon servers [...] which support the English language" have been scraped along with their metadata, and the "policy, the code of conduct and the prohibited contents of each instance".

The dataset is an attempt at creating an open dataset for "research" into algorithms like the ones Facebook uses to identify problematic content, based around users' use of Content Warnings.

The dataset can be found here:
dataverse.harvard.edu/dataset.

It was created by the University of Milan, Italy, apparently for the 13th AAAI:
aaai.org/

The associated publishing:
aaai.org/ojs/index.php/ICWSM/a or likeable.space/media/30ae595a1 or DM me for a copy.

Related dataset:
dataverse.mpi-sws.org/dataset.

Original post:
likeable.space/objects/98fe744 @tastytea

#FediAdmin #MastoAdmin #MastoDev #Privacy #OpSec #Warning #Fediverse #Mastodon #Scraping

Follow

@puffinus_puffinus @tastytea
This is, simply put, anti-ethical.

Since it is supposedly a scientific study, I would suggest contacting the review board of the university (or something like that).

Participation in scientific studies is not something trivial. Usually it is necessary to get a signed form with free and informed consent; implied consent should not be acceptable. And both the allowed uses and the handling of the data are very restricted.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!