Follow

Wildly unsuccessful:

lossy text compression algorithms

@fribbledom a.k.a. listening to someone else recount what you said, and they got the gist of it, but it's wrong in all the important places.

@fribbledom Quoting my professor

"It's not 42" is an efficient, but highly lossy, compression algorithm for almost all datasets.

@fribbledom What's so bad about occasionally sending the wrong worms?

@fribbledom

Whyldly unsuccessful:

lost sea text compreshion al gore isms.

(human perception only really cares about the phonetics when spoken, if the text's intended use case is ultimately to be spoken, why not take a lesson from JPEG? it won't work for all use-cases, but we want to create a diverse ecosystem of various algorithms)

@codepuppy @fribbledom I am sorely tempted to try implementing something like this now...

@compucat @fribbledom compression became compreshion because of "fashion" :3
(idk if that matches with actual english word pattern frequency XD )

@fribbledom But isn't one used by each of almost 8 billion protein based processors?
I'd say that's fairly successful. Even if each one is also proof that something doesn't have to work reliably or well for it to be successful.

@fribbledom you say that, but that's basically what ASCII is and it was pretty successful.

@Riedler Yes, but it's a technicality. If you convert UTF-8 to ASCII then you lose information and the file gets smaller, especially if you don't store/transmit the redundant 8th bit of each character. For texts in Latin scripts, which remain readable but potentially degraded, this looks a lot like lossy compression. It just happens that ASCII came first so nobody considers it to be a compression scheme.

@danielcassidy I stand by what I said - the first bit of an ASCII character isn't redundant, it's for error checking and compatibility reasons very much needed. ASCII was certainly not the first text encoding, it was just the one that became the most popular at a time when all that still had to be figured out - lastly, converting UTF-8 to ASCII is as much lossy compression as stripping HDR information from a video - the sole removal of information isn't what makes out lossiness.

@danielcassidy and sorry for writing like that, I just read about 10 paragraphs of Mark Twain ranting about the German language.

@Riedler the point about removing HDR is a good one, but I would point out that my original toot was not intended to be taken quite as seriously as you seem to have done 😊

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!