Nate Cull, a ghost in spring is a user on You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.

Part of my grumbling about data formats comes from Chinese language learning and and looking at these two (fortunately text! but raw, non-structured) datasets and asking what format would be best to link, modify and share them. Or lots of other datasets like these.

Like, I just want a standard dictionary showing translation *and decomposition information* for characters?

So would the best format be:

* raw line-delimited text with nonstandard field delimiters (what these two projects decided to use for some reason)

* S-expressions

* Excel/OpenOffice spreadsheet
* Word/OpenOffice document

* stick it in a proprietary binary database

PROBABLY (sigh) JSON is the only real option

@natecull RFC 7049 Concise Binary Object Representation

@h mmm, binary JSON

the 'nice' thing about JSON becoming The Universal Data Standard For Everything is that you can almost but not *quite* represent Lisp lists in it.

and you can almost but not *quite* represent sets in it

and you can almost but not *quite* represent arbitrary dictionaries in it (ie with non-text keys)

and it almost but not *quite* even has integers

and you really can't represent program code at all unless you like pain and suffering

@natecull @h Have you heard of Rivest's canonical s-expressions? Used in encryption software you use every day, whether you know it or not! Comes with a nice binary encoding!

@h @natecull Actually the IETF draft is way better to read

I'm not sure you actually want to use it, because you'd probably need to write your own parser... but they're easy to write!

@natecull @h I am, in fact, dealing with canonical-sexps right now because I'm writing an http-signatures library for Guile and I'm using libgcrypt via guile-gcrypt... and it turns out libgcrypt uses canonical-sexps everywhere

@cwebber @natecull I'm curious, what would a typical payload look like expressed as s-exps? Structs with a LISP-y feel?

@h @natecull shows some real-world structures.

However, almost nobody is using canonical s-exps outside of crypto software. I think they're a cool undernoticed technology though.

@cwebber @h My feeling is that s-exps are missing just one little thing that would make them even more useful, and that's a 'term' marker. Can easily be added of course just by reserving a symbol, but then you have to deal with that symbol being reserved.

You can get a whole lot done with sexps but for dealing with, eg, JSON-like intermixed lists and dictionaries, you kind of need some syntax to indicate that there's a difference between the two.

@h @cwebber almost all the annoying things I see with data formats come *when you try to cross-connect and intermix data between formats*.

within a single format, you can make a lot of assumptions.

but when you mix-n-match, suddenly you either have to wrap every 'foreign' bit of data in a whole mess of careful abstractions or you just have ambiguity

@natecull @h Notably that's why json-ld adds the context! We want to make sure that "run" a mile and "run" a program clearly mean two different things.

I've been toying with markup for sexp-ld... :P

@h @natecull I may regret posting this, but this is my WIP syntax for sexp-ld

Idea is similar to json-ld: your local "compacted" document has symbols that you know map to particular unique URI-bound properties. You have an environment locally that maps these. You can then "expand" (or transform to json-ld or back) for exchange between servers.

@cwebber @natecull I am able to appreciate the mathematical rigour of McCarthy, but writing and reading LISP --to me personally-- feels like I'm pushing buttons on an IBM 704 to extract heads or tails. It's a cultural thing.

@h @natecull That's fine.. though IME parenthetical language becomes as easy or even easier to read than non-parenthetical language over time. :)

IMO your editor can help a lot too... rainbow-delimiters, smartparens (or paredit), parinfer and etc can help a lot!

@cwebber @h I like sexps as a syntax MUCH more than I like either Lisp or Scheme as a language

@cwebber @h It's just not as... common about it as some.

@natecull @h from inferring by lisp names I can tell that interlisp was written by the nethack hackers when they were hacking the first versions of the internet. If they had used commons lisp instead we would have had the commonnet instead. #truefacts

Nate Cull, a ghost in spring @natecull

@cwebber @h And Macintoshes were developed by the hackers who invented Maclisp

It all holds together

· Web · 1 · 1

@natecull @h Apple e-macs were designed by emacs enthusiasts coming from maclisp??? I think we're doing a great job of inferring history from language alone here and we should keep it up

@natecull @cwebber Then I'm more at home with a Wirth way of doing things, and Mac Pascal was a big thing on the Mac back then. I have the feeling that various currents and undercurrents flow in different directions, sometimes without direct relation to a specific platform. Although I do agree that technical devices *inform* ways of doing things, and they certainly influence *how* tools are used, they are not the tool themselves.

@cwebber @natecull Gotta go, thanks for the chat guys. Speak soon.

@h @cwebber I often think of modern operating systems (especially Linux) as a city with thousands of years (Internet time) of history embedded in layers of architecture. Warring empires, philosophies, junk piles, ruins... and more and more just built on top.

@natecull @h So just out of curiosity have you read the Zones of Thought series, particularly A Deepness in the Sky? Because based on this toot I'm guessing you'd love it 1000x

@cwebber @h Yep!

Programming as archeology. Everything riddled with vulnerabilities installed millennia ago by ancient unspeakable galactic evil.

Sounds about right.