mastodon.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
The original server operated by the Mastodon gGmbH non-profit

Administered by:

Server stats:

379K
active users

Allison Parrish

if you were going to download this today, maybe hold off—I found a frustrating bug where some utf8-encoded texts were being decoded incorrectly with a different encoding, leading to hilarious mojibake when they came out the other end—will post a fix in a few hrs

none of this would be a problem if the reported charset in the metadata was always the correct charset. but there are a lot of texts that report "us-ascii" when what they really mean is "ascii with occasional 8-bit chars just for fun!"

also lots of files (>1%?) that say "yeah I'm utf8 sure whatever" but are actually ISO-8859-1 (according to chardet at least)

lessons learned: (a) never trust someone's claim about the encoding of a text file (b) character encodings are bad and trying to digitize text in the first place was bad idea

@aparrish and chardet can't always get it right. e.g. I recently ran into CSV files that weren't encoded in ISO-8859-1, but instead a MacOS encoding from a similar era.

@aparrish I thought that "ascii with occasional 8-bit chars just for fun!" was the new mandatory charset standard.

@dredmorbius @aparrish I thought it was just a longwinded kind of captcha.