Wonder if we should get rid of automatic language detection given how often it is inaccurate...

If automatic language detection was removed, we could default posting language to interface language. Would that be accurate in your case? I.e. do you post in the same language as the language of Mastodon's interface you see?

@freemo I don't think that's possible. Aren't you a Machine Learning innovator? You should know what this problem entails.

@Gargron I am yes. I didnt mean to suggest it would be trivial to solve with 100% accuracy. I am just suggting you work towards improving the error rate its ok if there is some) rather than eliminating a vital feature altogether.

@freemo Well, for a start I am not a C developer and not a ML expert. CLD3 is developed by Google and I seriously doubt that I could do something that they can't.

@Gargron Perhaps try a different third-party library? Or perhaps improve the way the library is applied. I havent looked at your code but im not suggesting you do the ML yourself at all. But there can be a huge difference in how you apply it.

Just an off the cuff example (not saying this is viable as i dont know enough). But for example I'd imagine a LOT of the error comes from shorter posts analyzed in isolation. However if your library is uncertain what language a particular post is in then it can do one of two things

1) display it anyway, no harm done if you display an unwanted language, only harm is done when you dont display a wanted language. So make the error of an acceptable nature if you cant improve it.

2) use more context, for example if 100% of a users identified posts are chinese and a short post is undetermined what language it is, then assume it is chinese as it should be weighted on context.

1 is the easy path, and probably the one I'd suggest since I dont see a need for perfection here.. but 2 might be a decent incremental step if you really feel perfection is needed.

@freemo CLD3 doesn't offer a reliable confidence rating. You can give it a short string and it will be 95% confident about its wrong result. So while 1 is the better option it is not possible.

@Gargron Perhaps use a library that uses a confidence rating instead, or consider options beyond 1 and 2.

If you'd like me to provide some more serious help and suggestions I can review the code and library options more closely if youd like.

@freemo even from the sidelines it's enfuriating to see how you're posing the problem. I mean, even if you'd be right, you still sound like an asshole. Maybe you should tone down the entitlement a bit. @Gargron is a saint for even replying to you.


I'm sorry if i worded it in a way that gave you that impression, it wasnt my intention. It was intended to be a straight forward reaction not a critical/emotional one.

What about my wording do you feel made it sound like i was being an "asshole" so i can try harder in the future to avoid that language. I'd hate for someone to misinterprit my intention again or be hurt by it.

Also note i explicitly offered my help and time to do the fixing, since as he noted I am an expert in that field. Which I would hope make my good intentions clear.


"How about fix it" - sounds entitled
"Why not trying to fix it?" - sounds reasonable.

I am however not a native English speaker, maybe something got lost in translation. Please accept my apologies.

@mariusor No thats fair, rereading what I said I could see how someone might see it that way. As I said wasnt my intent but I do agree your wording would have been more tactful. My apologies for the misunderstanding.


@Gargron Only if we get an option to manually tag the language in posts

@Gargron How many languages are routinely used on Mastodon?

@Gargron we should also allow setting per-post language, and/or we should improve the detection somehow. idk how google translate does it but i assume some kind of dictionary matching. alphabet is not enough, obviously.

@Gargron It seems to make the federated timeline much more usable as it is

@Gargron Although I would like to be posting in another language. Need to start failing so I can get good sooner.

@Gargron How would this effect custom language packs? I run a custom localle for my instance.

@Gargron Most of the time I'm speaking english or french, with french interface. I don't think that the interface language is that accurate when most of us speak english.

But in the same time I have no other solution to give... :(

@Gargron Why not do both? Allow people to decide what they want/need. If a user is willing to accept some inaccuracy that is up to them.

@Gargron You may consider to rank languages from HTTP's Accept-Language header higher than others for the heuristics.

@Gargron I use a Slovenian interface language as a way of ambiently trying to increase my slovene vocabulary, but mostly toot in english and occassionally french. I appreciate that's a very specific use case though!

@Gargron I do read in two languages, so leave me that option

@Gargron Just give me a setting with a "main" language and a list where I can decide per post what language that post is in.

@gargron i think the most elegant solution would be a default to the interface language and an optional switch for other languages, plus a setting to change the default

@Gargron I use one instance per language so nobody has any unreadable toots in their feed if they follow me. If there was a way to chose language when posting and filter language in the feed that would make things easier.

@Gargron I use Mastodon in English, the UI is also English. Perhaps add a toggle in settings like “Mark my posts as English, German, French, etc”, as a bonus add a per toot toggle.

I'm , living in , but my interfaces are almost always set to (mostly because localised interface usually sound awkward...).

While I mostly post in English on , I do occasionally post bi- or tri-lingual toots, or —when it's about a topic only applicable to Dutch or fedifolk— in one of the other two languages.

I would not be opposed to defaulting to your interface language, with per-post option to define which (multiple!) languages it is in.

the list of languages could also put the ones in the Accept Language header at the top.

@Gargron I post in two languages and would prefer button right after CW to check [automatically detected language] and change [post] language.

I don't think this will hurt, and it's way better than going in settings every time you want to post in different language!

I think most of us will post in their native language and/or English.

@Gargron I HATE automatic settings! respect my preferences and ASK ME. NEVER EVER change anything automaticaly.

@Gargron I wish there was an option to checkmark the languages we understand, and only see toots in those languages.

@Gargron The language detection library makes installing Mastodon more of a hassle than it should. Drop it if you can. /c

@Gargron yes. for example my site is in Kazakh language. and i want it everywhere just in Kazakh

Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!