What if instance admins could decides which specific unicode pages (possibly minus non-printing character classes) to enable (for sign-up, and separately for display)? Don't homoglyph attacks rely on mixing different languages, etc.?
(For compatability/fall-back where instances don't allow that page, we could send them as punycode (with the original in mouseover text))
@munin @Gargron @nightpool -- and Mastodon, with it's (a) multiple, independent, community (and often nationality) distinct instance, (b) still a single codebase, (c) need to actually care about UI stuff, and (d) large international userbase, (and (e) a good-sized infosec community) would be a GREAT place to explore this.
@munin @Gargron @nightpool that's why I was suggesting (... oh. oops, I didn't mention it in this thread.) that it's default off -- instance admins can enable specific pages for signup, and separately for display.
Instances which allow enough pages to have spoofing be a problem has it only be a problem *within their instance*.
That seems reasonable-ish, but would an admin of a large image really stall that much more to install a module than to enable a default-off section of settings?
... maybe actually.
(Also it would require holding off until Mastodon has a plugin architecture...)
Having a structure that allows for plugins would allow for a lot more experimentation around the ecosystem in general - and yes, large instance admins may well install plugins if they're asked for by enough users, or if there's a clear benefit for them.
Also, it would encourage users to start up their own small instances to control their own plugins. Net benefit.
@Gargron @munin @nightpool I was thinking we'd have the canonical forms of the usernames be punycode, which gets rendered as unicode (or partially-rendered or not-but-with-mouseover or however you wanna handle it) by the plugins and by instances which have no idea what's going on just come out as punycode.
Which still has the 2nd-class citizen problem that @nightpool mentioned, but...
One basic example is quasi-control characters such as LEFT-TO-RIGHT EMBEDDING (U+202A).
Declaring "safe" blocks of unicode would be the safest option, even if these are usually encoded into punycode or URI encoded. You'll still run into Han unification politics for CJK though.
@Gargron I think it will make it harder for users of different l18ns type each other handles, so I guess there are more issues than just the technical compatibilities.
Of course, maybe the joy of Japanese (and non-ascii lexicon language) users of typing using their own language could compensate this.
@Gargron @moki this should really be a larger discussion. Not supporting UTF8/punycode usernames is something that potentially leaves out a huge amount of humans in the long run. The amount of humans in the world that don't use have Latin names is gigantic. There are many languages that have names you can't even express well with latin transliteration.
My idea to consider at least: allow the admin to enable which pages (or possibly character classes?) of unicode for to sign up or display?
Sign-up will be default off, display we'll pick which ones to turn on, and maybe not even have options for non-display characters.
@nasser @wakest @Gargron I think a per-instance basis is the best solution if possible. In Japanese elementary schools, we learn to write our own name in Alphabets, so Alphabetical notation is also one of the official representation of our names. Also, as there are few people who want to use their real names on SNS. This may be one of the reasons why many of Japanese don't feel the necessity of using Japanese notation in their account names.
Help phishing, come on.
- You could host your own server with IDN domain name visually matching mastodon.social. IMHO usernames are a non issue.
- What would be the benefit of phishing? Everthing is already public.
@mmn @Gargron @nemeciii Support for UTF in messages seems less problematic than in usernames. Identity is already a challenge in a federated ecosystem and allowing visual spoofing will complicate that. Let users write their "real" names and messages in unicode, but keep the usernames restricted. At the very least, leave that configurable by instance.
@Gargron As an Asian, I don't think ASCII username is too western-centred, everyone nowadays has learned some English from child, and many languages have romanization system. The unicode problem is somelike a upstream's one, the internet whole is just not well prepared for it now.
Maybe we can have a ascii id and a alt/local language id, we can @ , mention etc *directly* use local id and use ascii where it needed(for uri, api). Like translating the app text, translate the id.
@Gargron plus: if I have a unicode id/username you can't input(without input method or don't know how), and even it has some unprintable chars that you can't copy it properly, that will be an annoying situation. But if I have a transcript/ranmonized id(all ascii) aside, I think that will make communication smoother. We do need a global language from UI to underlying code, and English (ascii) does this job good, I think.