What if instance admins could decides which specific unicode pages (possibly minus non-printing character classes) to enable (for sign-up, and separately for display)? Don't homoglyph attacks rely on mixing different languages, etc.?
(For compatability/fall-back where instances don't allow that page, we could send them as punycode (with the original in mouseover text))
@munin @Gargron @nightpool -- and Mastodon, with it's (a) multiple, independent, community (and often nationality) distinct instance, (b) still a single codebase, (c) need to actually care about UI stuff, and (d) large international userbase, (and (e) a good-sized infosec community) would be a GREAT place to explore this.
@munin @Gargron @nightpool that's why I was suggesting (... oh. oops, I didn't mention it in this thread.) that it's default off -- instance admins can enable specific pages for signup, and separately for display.
Instances which allow enough pages to have spoofing be a problem has it only be a problem *within their instance*.
That seems reasonable-ish, but would an admin of a large image really stall that much more to install a module than to enable a default-off section of settings?
... maybe actually.
(Also it would require holding off until Mastodon has a plugin architecture...)
Having a structure that allows for plugins would allow for a lot more experimentation around the ecosystem in general - and yes, large instance admins may well install plugins if they're asked for by enough users, or if there's a clear benefit for them.
Also, it would encourage users to start up their own small instances to control their own plugins. Net benefit.
@Gargron @munin @nightpool I was thinking we'd have the canonical forms of the usernames be punycode, which gets rendered as unicode (or partially-rendered or not-but-with-mouseover or however you wanna handle it) by the plugins and by instances which have no idea what's going on just come out as punycode.
Which still has the 2nd-class citizen problem that @nightpool mentioned, but...
One basic example is quasi-control characters such as LEFT-TO-RIGHT EMBEDDING (U+202A).
Declaring "safe" blocks of unicode would be the safest option, even if these are usually encoded into punycode or URI encoded. You'll still run into Han unification politics for CJK though.
@Gargron I think it will make it harder for users of different l18ns type each other handles, so I guess there are more issues than just the technical compatibilities.
Of course, maybe the joy of Japanese (and non-ascii lexicon language) users of typing using their own language could compensate this.
@Gargron @moki this should really be a larger discussion. Not supporting UTF8/punycode usernames is something that potentially leaves out a huge amount of humans in the long run. The amount of humans in the world that don't use have Latin names is gigantic. There are many languages that have names you can't even express well with latin transliteration.