Thinking of experimenting with delta updates of serialised JSON (as opposed to full serialisation every time, which is expensive for large collections) and wondering if anyone knows of any existing libraries, experiments, etc., that use special object IDs to mark the start and end of objects to enable delta string substitution in serialised JSON. My search engine fu is not returning any results.

@aral there’s two ietf standards for this. jsonpatch and. i don’t rememberers the name of the other one

@zensaiyuki Thanks. Seen those. Unless I’m mistaken, they’re for generating patches and merging two JSON objects – I need to update a stringified version without performing a full stringification (with string substitution).

@aral ah, that- i haven’t heard of anything specific like that but it does remind me of certain C language json parsing librarie(s?) that leave the json string in place and just give you a datastructure of pointers into the original json string. i imagine it would be possible to build off that, if that isn’t already an option in those libraries

@zensaiyuki @Moon actually made me think further along the lines of what originally got me thinking about this: why not use a CRDT for the data structure… I’m going to give that a think now :)

Thanks again for sharing your thoughts :)

Thread: shitposter.club/objects/793e22

@aral @Moon there are tradeoffs. Plenty of people that’s gone ahead and actually tried to implement crdt, they don’t always come out happy at the other end, and that’s usually after 2-3 years of trying to make it work and reluctantly giving up

@aral @Moon that’s not a personal reccomendation or caution. just noticint the trail of dead bodies on everest

@zensaiyuki @Moon Haha, yeah, it’s been an area I’ve had an interest in for quite a while now. I wouldn’t be rolling my own. I implemented WOOT in Swift back in the day (but Logoot is better) and I quite like causal trees (archagon.net/blog/2018/03/24/d). But also wondering if an append-only log wouldn’t work alongside a kappa architecture. Have lots more research to do now :)

@aral @Moon if you’re thinking along the lines of an append only log, another good option could be a Peice Table. in essence you have your start and end markers, but also an index of which ranges in your buffer will make up the output text. when you need to insert some text in the middle, you just stick your inserted text right at the end, and insert its range into the index at the right spot, splitting a chunk of text into two ranges, “split” at the insertion point.

@aral @Moon i am not explaining it well, but the peice table advantage is that it’s very efficient to write to. slightly less efficient to read from if it gets too fragmented. mutations to a peice table would map well to being constructed from a log of operations.

CRDT might be overkill, if you don’t intend for edits to be interleaved from multiple sources simultaneously

@aral @Moon especially if you can guarantee order of operations anyway without a special data structure

@aral @Moon here’s a pretty good explanation:

darrenburns.net/posts/piece-ta

it’s for text editing, but with a bit of imagination i think you could see how it could apply here.

@zensaiyuki @Moon Haha, chips! Was just thinking about all this and I think I’ve reached a very similar conclusion. Forget stringification: let’s store and replay changes in an append-only log :)

@zensaiyuki @Moon (Sorry if there’s more to the article you linked to; only skimmed the intro. Bookmarked to read properly later.) :)

@aral @Moon it’s just a write efficient data structure for append only logs of edits to text. and it’s old school- designed to be efficient on 1980’s floppy disks.

@zensaiyuki @Moon Right (no pun intended): so for my use case imagine an append-only log of object.x.y.z = 5; delete object.x.y; object.a = [1,2,3]; etc. // ;)

Follow

@aral @Moon right, and if you’re trying to make the json serialisation of that efficient, you’d translate those into text editing ops on the peice table- and you could make that very efficient if your peices are the intervals around each text token.

so, if you have e.g.
[1, “foo”, false] , you’d get exactly that text in a buffer, then your index
start, end, length, type
0 1 1, arrayst
2 4 2, sp
4 8 5, str
9 10 2 sp
11 16 5 bool
17 17 1 arrayend

@aral @Moon
then you’d have a function to translate “root.push(1)” into this text buffer, just appending the inserted text onto the end

[1, “foo”, false], 1

and then insert
18 19 2, sp
20 20 1, num

just before the last table entry for arrayend

@aral @Moon then most of the tricky work is on that index table, the actual text buffer only takes appends.

@aral @Moon and, you can enrich the table with whatever metadata you like. so for stance, if it were text, you could stash whether the chunk is bold or italic or whatever on a table column. in your use case it could be a cache of like, a jsonpath type lookup key.

@aral @Moon i am also reminded of json_pipe and OSC as conceptions of a json like structure, but transmitted as a stream of edits to keys

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!