Part of my grumbling about data formats comes from Chinese language learning and and looking at these two (fortunately text! but raw, non-structured) datasets and asking what format would be best to link, modify and share them. Or lots of other datasets like these.
Like, I just want a standard dictionary showing translation *and decomposition information* for characters?
So would the best format be:
* raw line-delimited text with nonstandard field delimiters (what these two projects decided to use for some reason)
* Excel/OpenOffice spreadsheet
* Word/OpenOffice document
* stick it in a proprietary binary database
PROBABLY (sigh) JSON is the only real option
@h mmm, binary JSON
the 'nice' thing about JSON becoming The Universal Data Standard For Everything is that you can almost but not *quite* represent Lisp lists in it.
and you can almost but not *quite* represent sets in it
and you can almost but not *quite* represent arbitrary dictionaries in it (ie with non-text keys)
and it almost but not *quite* even has integers
and you really can't represent program code at all unless you like pain and suffering
@natecull @h Have you heard of Rivest's canonical s-expressions? Used in encryption software you use every day, whether you know it or not! Comes with a nice binary encoding! https://en.wikipedia.org/wiki/Canonical_S-expressions
@h @natecull https://www.gnupg.org/documentation/manuals/gcrypt/Cryptographic-Functions.html#Cryptographic-Functions shows some real-world structures.
However, almost nobody is using canonical s-exps outside of crypto software. I think they're a cool undernoticed technology though.
@cwebber @h My feeling is that s-exps are missing just one little thing that would make them even more useful, and that's a 'term' marker. Can easily be added of course just by reserving a symbol, but then you have to deal with that symbol being reserved.
You can get a whole lot done with sexps but for dealing with, eg, JSON-like intermixed lists and dictionaries, you kind of need some syntax to indicate that there's a difference between the two.
within a single format, you can make a lot of assumptions.
but when you mix-n-match, suddenly you either have to wrap every 'foreign' bit of data in a whole mess of careful abstractions or you just have ambiguity
Idea is similar to json-ld: your local "compacted" document has symbols that you know map to particular unique URI-bound properties. You have an environment locally that maps these. You can then "expand" (or transform to json-ld or back) for exchange between servers.
IMO your editor can help a lot too... rainbow-delimiters, smartparens (or paredit), parinfer and etc can help a lot! https://dustycloud.org/tmp/emacs_lisp_setup.png
@natecull @cwebber Then I'm more at home with a Wirth way of doing things, and Mac Pascal was a big thing on the Mac back then. I have the feeling that various currents and undercurrents flow in different directions, sometimes without direct relation to a specific platform. Although I do agree that technical devices *inform* ways of doing things, and they certainly influence *how* tools are used, they are not the tool themselves.