Welp.
I was loving the Parsimmon Javascript library for parsing Chinese until I smacked head first into the character 𠂊, \u2008a
:(
Parsimmon does not understand the concept that Unicode characters exist outside the Basic Multilingual Planes.
(Aliens Newt voice)
but they do
Ah, I guess now is when I have to learn how to construct surrogate pairs.
@browneyedgirl It is kind of fun!
A huge sprawling database constructed by multiple projects spread across multiple languages all with different ideas about what even a 'character' is...