Thinking of experimenting with delta updates of serialised JSON (as opposed to full serialisation every time, which is expensive for large collections) and wondering if anyone knows of any existing libraries, experiments, etc., that use special object IDs to mark the start and end of objects to enable delta string substitution in serialised JSON. My search engine fu is not returning any results.
@aral there’s two ietf standards for this. jsonpatch and. i don’t rememberers the name of the other one
@zensaiyuki Thanks. Seen those. Unless I’m mistaken, they’re for generating patches and merging two JSON objects – I need to update a stringified version without performing a full stringification (with string substitution).
@aral ah, that- i haven’t heard of anything specific like that but it does remind me of certain C language json parsing librarie(s?) that leave the json string in place and just give you a datastructure of pointers into the original json string. i imagine it would be possible to build off that, if that isn’t already an option in those libraries
Thanks again for sharing your thoughts :)
@zensaiyuki @Moon Haha, yeah, it’s been an area I’ve had an interest in for quite a while now. I wouldn’t be rolling my own. I implemented WOOT in Swift back in the day (but Logoot is better) and I quite like causal trees (http://archagon.net/blog/2018/03/24/data-laced-with-history/). But also wondering if an append-only log wouldn’t work alongside a kappa architecture. Have lots more research to do now :)
@aral @Moon if you’re thinking along the lines of an append only log, another good option could be a Peice Table. in essence you have your start and end markers, but also an index of which ranges in your buffer will make up the output text. when you need to insert some text in the middle, you just stick your inserted text right at the end, and insert its range into the index at the right spot, splitting a chunk of text into two ranges, “split” at the insertion point.
@aral @Moon i am not explaining it well, but the peice table advantage is that it’s very efficient to write to. slightly less efficient to read from if it gets too fragmented. mutations to a peice table would map well to being constructed from a log of operations.
CRDT might be overkill, if you don’t intend for edits to be interleaved from multiple sources simultaneously
@aral @Moon right, and if you’re trying to make the json serialisation of that efficient, you’d translate those into text editing ops on the peice table- and you could make that very efficient if your peices are the intervals around each text token.
so, if you have e.g.
[1, “foo”, false] , you’d get exactly that text in a buffer, then your index
start, end, length, type
0 1 1, arrayst
2 4 2, sp
4 8 5, str
9 10 2 sp
11 16 5 bool
17 17 1 arrayend