Question for Unix experts: why do we need open()?

I've spent some time in the past[1] staring at the abyss that is pubs.opengroup.org/onlinepubs/, and much of its complexity seems needed only for Things That Are Not Files.
At the syscall level it's pretty ugly that sockets are not files. Alternative client-side syscalls that unify file system and network:

PUT
GET
POST
DELETE

Just have them take a resource name and maybe a Go channel for synchronizing. What am I missing?!

[1] gitlab.com/sortix/sortix/merge

@akkartik Filesystems don't really work that way. A file descriptor is a bi-directional stream. HTTP methods don't even really map to DB commands (a PUT is a create or update). File descriptors can also be used for sockets, fifos, and on Plan 9, nearly everything. I mean, if you want to make an OS with HTTP verbs for file ops, go for it, but I'd recommend making a religion around it as well; as a tribute to Terry A. Davis. :rip:

@djsumdog
Fallacy: filesystems don't work that way because they don't currently work that way. But, @akkartik is asking why it *has* to work that way in the first place.

Turns out, it doesn't have to. Amazon S3 doesn't work that way, for example, and Plan 9's 9P protocol doesn't either (every read or write has its own offset and length fields).

@djsumdog @akkartik
The reason contemporary filesystems work the way they do dates back to mainframes, where large amounts of storage was on tape, and thus accessed sequentially. That means you can get by with just read and write.

Seek didn't happen until later (fun fact: many OSs still have a function called rewind(), which is just a synonym for seek(SEEK_SET, 0)), when random access devices (both tape and direct access storage devices, or DASDs as IBM called them).

Follow

@djsumdog @akkartik
That means random access to a file was bolted onto an already existing interface. The alternative was to have a completely different API for different storage techniques (hence all the "access methods" of IBM mainframes).

The reason for open() is to prepare the OS for I/O. This is a convenient time to cache directory information, check access privileges, etc. The alternative is to touch this data on every access, which is prohibitively expensive.

@djsumdog @akkartik
Without this step, you'd need to pass a ton more data to each read/write operation. With a separate open step, you abstract all the state behind a single file handle.

Sign in to participate in the conversation
Mastodon

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!