Nate Cull is a user on mastodon.social. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
Nate Cull @natecull@mastodon.social
Follow

A realisation:

When an 'application' is no longer 'a piece of code on your computer that you may or may not have the source code to', but 'code and data running on someone else's computer that might even be in a different country'...

... suddenly the argument about 'should data be owned by applications for safety, or should it be owned by the operating system for interoperability' becomes no longer academic.

If data is owned by 'applications', it's not owned by YOU. It's not on YOUR computer.

· Web · 10 · 17

The realisation is that this is why I get so. damn. grumpy. about applications acting like petty little feudal dictators about data... and grumpy about the programming 'best practices' (like databases being so very closely linked to specific applications) that enable this.

It's MY DATA. Get it the HECK out of YOUR application. I want it on MY COMPUTER.

That means I need somewhere to put data that's not application-centric..

And we don't have many places for that. The filesystem and, um...

... THEORETICALLY, a database server, but in practice? Yeah, right. Nope.

Databases exist for the purpose of denying access to data.

This is fine until you find yourself on the wrong side of a database and it's your data.

Essentially, a database is its own little OS, with its own little list of users and passwords, which is... silly if you already have an OS, which we do, and worse than silly when you realise that many applications use databases as a way to move data out of your computer, and put it on someone else's. And if it *is* on your computer, it's often still locked to a service account used by the app. *You* don't get to access it. You don't rate high enough. The Computer outranks you.

The other super annoying thing about databases is that they're not (generally) recursively structured.

Filesystems are recursively structured. Dictionaries and objects (ie, JSON, or all the stuff in RAM inside a C++ or Java program) are recursively structured.

You can't put a (SQL) database table inside another database table. Everything's one flat namespace (maybe three levels: Server, Database and Table) and inside that, everything has to be rigidly the same data shape.

Filesystems are pretty good at copying data. (Modulo problems crossing filesystem boundaries, as I found yesterday). You just... put a folder in another folder and you're good.

But you can't do that with a database. Getting data in and out for backups or just moving it around is a royal pain. It's not a one-step process and it's just not much fun.

I guess what I'm saying is maybe a NoSQL database of some kind (strictly only one per OS, with per-object permissions) might be the future.

@natecull IMO a good first step towards improving the situation would be to make SQL actually portable

@elomatreb I would venture to suggest something like a kernel service that maps data into RAM at an object level. Not just virtual memory, but persistent object-structured disk storage as a parallel to files

(or, if we ever got a filesystem that's good for fine-grained objects, just like cache the filesystem in RAM)

then just have pointers to objects and store those in other objects, and have some library code to build/rebuild indexes when they're needed

This is what Macintosh should've been.

@elomatreb and then Microsoft sorta tried to do it? and kept trying through the 2000s and eventually failed for some reason

@natecull You spend years standardizing and then some random database adds e.g. JSON querying and now half the web depends on Postgre
@natecull FWIW I also wished for a "object database as filesystem" kind of thing multiple times, e.g. for organizing a music library it would be so nice

@elomatreb @natecull maybe something to look at is the nepomuk systems used by KDE and their baloo file indexer. These systems attempt to index and and add smarts to files, which you can get write semantic queries about.

Maybe.

@thinkMoult @elomatreb Ooh! Interesting. Is that part of the Semantic Desktop initiative?

@natecull @elomatreb yes. Although I was definitely less than impressed whenever I've attempted to use it in practice. However, there's definitely something in it.

Nothing beats plaintext and files.

@thinkMoult @natecull Except that outside of source code the actual number of plaintext files that aren't created as internals of applications is very very small, at least on my system

@elomatreb @natecull BeOS did it and it was amazing. The Unix legacy that "a file" means a single bag of bits with no structure is the worst thing.

Interoperability is of course always going to be hard unless you're writing to an existing spec, because no two programmers are going to, without coordination, store their data the exact same way. But when good tools exist for storing the structure too, you can at least RE it if you have to without too much trouble.

@keiyakins @natecull Exactly, if the OS gives the programmer a good enough toolset to avoid having to do the structuring (or large parts of it), they will use it.
@dredmorbius @natecull Yeah, the dominance of unix-y files has been not entirely a positive development IMO

@natecull @elomatreb NB, this is somewhat the direction DocFS is intended to go in.

@natecull @elomatreb

> just like cache the filesystem in RAM

This has been implemented for ages now.

>then just have pointers to objects and store those in other objects

Pointer chasing can be pretty bad for performance. In many cases, you'd be better off with a local sqlite3.

Failing that, I guess we'll still have thngs like SQLite... lots of different binary db files that at least can be copied when the application is shut down (unless it has a service and you're never sure when it's shut down)... but it's still really annoying trying to mix and match data between different apps, which is fundamentally what I'd like to do.

The file/document/folder metaphor still seems good for 'data at rest'. But data that might change rapidly? I dunno. I just want to... link to it

@natecull My perspective on SQLite is that no matter what, until those apps have standards for their domain they would all have different parsers you'd need to map between.

And at least SQLite documents inline how it's structured.

Though ofcourse this all depends on the particulars of the situation.

@natecull AFAIU, sqlite is great for single-user, single-instance, but has major concurrent-access issues.

@natecull filesystem has symlinks for a reason.

Also, for data that can change rapidly, you probably want transactions, so that you can change two pieces of data at the same time, w/o anyone seeing the partial state inbetween.

Also, replacing sqlite with files or some kind of object storage won't change the fact that what's stored inside the objects is app-specific.

So what we need is standars. Interchange formats.

@Shamar @natecull
There are also graph databases. One of them is git.
The rest of them shit.

@natecull Files tend to change atomically (remove old, write new), or incrementally (append-to-end).

Databases both change _internally_ and, almost always, _relationally_. The data are interconnected, there are triggers and indices, multiple producers and consumers. It's a far more complex problem.

@natecull You haven't seen some of the daata systems I've worked with..

The correct answer is that you almost never *want* to put one database within another.

Doessn't mean it can't happen.

@natecull Object Databases and Hierarchical Databases were a thing back in the old days. They *sucked*. Slow and easily corrupted, or creating massive data loss by doing simple operations. IBM's SQL model has been a huge win for data speed & reliability, even if it's ugly.

@natecull There can be exceptionally good reasons to divorce system aand DBMS users. Bad argument.

@natecull
>Essentially, a database is its own little OS, with its own little list of users and passwords, which is... silly if you already have an OS

umm... OS doesn't have transactions.

I get what you're trying to say, and I agree that the data should be owned by the users, not by the applications, but I don't think oversimplifying the role of databases like this is not fair.

@natecull What if that database is a well-documented file format, stored in a single file (or two) on your filesystem? And it didn't bother with permissions checks?

If done right this doesn't have to be any worse than plain text files.

@alcinnz I think that might work reasonably well, yeah. If you have multiple users, maybe it gets weird? Or multiple programs active at once? But if the format was well-known, so there could be many 'browsers' for it and you didn't have to use just one low-level API or a high-level app, that could be good.

@natecull I was describing SQLite BTW.

And I've got Sequeler installed as a browser for it.

@alcinnz right, but you can't put a SQLite database inside another one, so I guess it has to be a kind of mix of filesystem plus databases plus structured text files. And SQL or even the relational model is still maybe not the right fit for data which isn't in strictly tabular format.

but I suppose if we had Filesystem + SQLite + JSON, and a browser that can cope with all of those and a concept of linking to entities within a file or a table....

@natecull @alcinnz The traditional unix-like filesystem and SQLite are solving different problems. The former is solving the problem of 'how do I map names to bags of bits?' and the latter is solving 'how do I structure this bag of bits so I can actually do something with it?'

They're very complementary tools, honestly.

@natecull @alcinnz (And sqlite isn't the only one solving that second part, and that's not necessarily a bad thing. Different problems need different data structures, which will map to different on-disk structures)

@natecull

If you had your own ActivityPub, what you just wrote would in fact be your data (on your computer if you could host it yourself).

No?

@natecull well. Except that end to end encryption and zero access encryption exists. So you can own it, and store it elsewhere.

@thinkMoult if you can own the data, yes, you can encrypt it store it elsewhere.

But if the data is owned by 'an application', which is something cloudy-woudy (like Office 365 sort of thing), that application might just decide to store your data in a cloud database, and you don't get to choose.

@natecull Another direction I'm thinking is structured dsta (JSON,YAML, XML, SGML) and atomic transaction as with git.

You can get conflicts. The need resolving.