Factorial on SubX
Ok, I think I understand calling conventions now.
Also coming face to face with the pain of debugging machine code 😀
Now that it can translate labels to offsets, SubX also warns on explicit use of error-prone raw offsets. Both when running and in Vim.
As I build up the ladder of abstractions I want to pull up the ladder behind me:
a) Unsafe programs will always work.
b) But unsafe programs will always emit warnings.
As long as SubX programs are always distributed in source form, it will be easy to check for unsafe code. Coming soon: type- and bounds-checking.
SubX now supports basic file operation syscalls: http://akkartik.github.io/mu/html/subx/ex8.subx.html
I've also made labels a little safer, so you can't call to inside a function, or jump to within a different function: http://akkartik.github.io/mu/html/subx/037label_types.cc.html
Next stop: socket syscalls! @h
@h Hmm, the socket syscalls are implemented differently on different platforms. That's dismaying. I'd been hoping to use a set of primitives so tiny that programs for it would work on all modern, extant *nixes. Now I probably need to start testing on Linux (top priority) and all the different *BSDs including Darwin (which is what I develop on).
More adventures with machine code
SubX now has a test harness, and support for string literals.
Current (increasingly silly) factorial program to compare with the parent toot: http://akkartik.name/images/20180923-subx-factorial.html
Two new features:
a) Autogenerated `run_tests` (line 26) which calls all functions starting with 'test_'.
b) String literals (line 31). They get transparently moved to the data segment and replaced with their address.
That isn't much progress for a month. I've been trying a few different things and learning a lot:
a) I spent some time looking at the GREAT stage0/Mes bootstrap project.
b) I tried to design a type-safe low-level language that could be converted line by line to machine code, but it turned out to be a bust (thanks @kragen!). It's possible if you give up on freeing heap memory.
I'm now convinced there are no shortcuts. Gotta build a real compiler in machine code.
There's only one problem: I don't know how to build a compiler. Not really. And definitely not in machine code. So I'm going to be fumbling around for a bit. Lots more wrong turns in my future.
I've been trying to port https://compilers.iecc.com/crenshaw to SubX. With tests. It's been slow going, because I have to think about how to make Crenshaw's primitives like `Error()` and `Abort()` testable.
I don't know if just learning to build a compiler will sustain my motivation, though. So some other ideas:
a) Some sort of idealized register allocator in Python or something. I've never built one, and my intuitions on how hard it is seem off.
b) Port Mu's fake screen/keyboard to SubX so that I can reimplement https://github.com/akkartik/mu/tree/master/edit#readme. It was just too sluggish since Mu was interpreted. Even my 12 year old students quickly dropped it in favor of Vim.
@yumaikas I'm not sure. Darwin certainly does them differently from Linux.
Linux has a single `socketcall()` syscall (number 102) that multiplexes standard Posix functions like `socket()`, `connect()`, etc. http://man7.org/linux/man-pages/man2/socketcall.2.html; https://syscalls.kernelgrok.com.
Darwin has separate syscalls for each Posix function. `socket()` is 97, `connect()` is 98, and so on. https://opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master
Hopefully the BSDs all share a common approach. Bears investigation.
@yumaikas Yes, I'll probably have to swap in a different library for each platform.
> what would keep you from linking to a library?
Sheer bull-headedness :) SubX requires nothing more than a kernel to run. Not even a linker. I'd like to preserve this property when it starts self-hosting.
> does SubX not have a C FFI set up?
Interop with C or Posix is an anti-goal. The whole goal is to eliminate fixed interfaces, not provide yet another ossified interface.
@akkartik But that, too, is unnecessary for a proof-of-concept generating stubs. You may only need #ifdef and #include equivalents. Or perhaps something like Fo's assembly modules, each in a separate file labeled for each target arch. If you do it that way you don't need tne annoyance of implementing #ifdef blocks. Just includes and nothing more.
@h Mu has a tradition of loading all reasonable-looking files in its directory in a well-defined order. I may do that rather than includes.
Loading code from multiple files is not a big deal, I was planning on it anyway.
No, my pain is more about needing to spin up a VM or VPS with different OSs, and develop/test/debug on them.
You're right that it's a one-time thing when building the shims/polyfills. Just needs doing. I may start on self-hosting first.
@vertigo Yeah, Windows is out of scope for the moment. I'm sure there are many more bugs hiding under that particular rock.
@h My sense is that the "BSD sockets API" is just at the level of C prototypes, just like Posix. Do you happen to have any pointers to how the API is implemented in syscalls in different BSDs?
Even if Windows is out of the scope, flatassembler also includes support for win32 calls and generation of PE32 binaries, so that's not a limitation (although @vertigo is right about Windows being a massive dynamic linking annoyance, with no officially published kernel apis)
I've been avoiding self-hosted languages/platforms; the circular dependency of semantics on previous versions bars my goal of thin, easy-to-traverse abstractions. But for an assembler it doesn't feel like as big a deal because of the 1-to-1 mapping between source and binary.
Unfortunately, FASM doesn't seem to have any socket support :( On any platform. So no shims here.
@haitch @h Yes, of course, it's Assembly so one can do anything with it. But the codebase doesn't in itself *encode* how to make socket calls. On any platform, let alone multiple ones. `grep` returns 0 results.
Searching the net shows me plenty of examples -- and always there's the question of how to make it cross-platform.
So I could use FASM, but it seems independent of the need to figure out how sockets are implemented on different platforms.
My broad goal is a platform that I can quickly drill into as deep as necessary. Without knowing the entire stack; JIT learning of the minimum necessary for my immediate purposes.
SubX is just a means to a sub-goal: conventional compilers are too 'thick' to permit easy hackery. Particularly simultaneously hacking within as well as atop them. The context switch is a vast chasm right now. A stack based on FASM would be a vast improvement on that.
@akkartik @h @haitch @yumaikas @freakazoid @firstname.lastname@example.org I can honestly say that the #kestrel3 project would not still be going ahead like it is now had I not read Chuck Moore's Programming a Problem-Oriented Language.
I can confirm that Forth is a language that, in the general case and if you keep things simple, can be bootstrapped from raw assembly.
My Kestrel-3/E2 port of DX-Forth is less than 10KB of code too.
(It'll get larger when I implement limited 9P support for it though.)
@vertigo Forth is definitely bootstrappable. But the extreme lack of checking makes it very hard to run with somebody else's code. Specifically the lack of good error messages when I pass the wrong number or type of arguments to a function.
It took me months to reluctantly move on and consider alternatives: https://lobste.rs/s/0myzye/thoughts_on_forth_programming#c_7vnk1c
We've chatted briefly about this before, though perhaps I wasn't clear then: https://mastodon.social/@akkartik/100357908877440360
@akkartik @kragen @haitch @freakazoid Honestly, my advice here is to read two books: "Programming a Problem Oriented Language" teaches you how to write your own Forth compiler from scratch (bare metal on up). There are no assembly listings in the book, because it's pure guidance, but it was instrumental in me getting DX-Forth working at all.
Second, if you're willing to tackle the more sophisticated compiler techniques needed to have a proper infix compiler, I would then recommend reading Compiler Construction, by Niklaus Wirth (see http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf).
You don't need fancy register allocation to get a working compiler that meets 80% of your performance goals. Don't let that slow you down.
@akkartik @kragen @haitch @freakazoid There seems to be this great fear of passes in compiler design. It's a one-pass compiler, or a two-pass compiler. Hogwash; do 56 passes if you must; all that matters is the final output. Register allocation can be one (or maybe a few) of those passes towards the end.
@vertigo historically, reducing the number of passes was a big win for reducing the complexity and improving interactive performance, in large part because the passes communicated via mass storage. The nanopass framework eliminates a lot of those costs, and of course hardware is fast now. @akkartik @haitch @freakazoid
Each pass is like a Unix filter -- it takes input, processes it in some *very* well defined way, and produces output. Each pass communicates with other passes via in-memory buffers, so no mass storage is required.
Regarding mass storage requirements, though, you can have nanopass compilers on a Commodore 64 if you wanted, by reducing the size of the compilation unit. BSPL works on individual colon definitions, for example, and uses mere KB.
@vertigo I see, yes I divide up work into functions that operate on the entire compilation unit.
Mu has 27 'transforms': http://akkartik.github.io/mu/html/012transform.cc.html
SubX has 7 so far, but I added an additional notion of 'checks': http://akkartik.github.io/mu/html/subx/029transforms.cc.html
But the original nanopass paper describes designing a separate formally-defined language for each pass: https://www.cs.indiana.edu/~dyb/pubs/commercial-nanopass.pdf
(Unix pipelines also introduce concurrency and flow-control; I perhaps know too much for that analogy to be helpful.)
@akkartik I'm not sold on anything thsat recommends many different languages to basically treat successive transforms of an essentially equivalent graph. Proofs for every lttle lang seem way overkill to me, when you can reuse good old graph theory on any graph.
Then, that may be just me and my hammer seeing it all as nails. And linguistically oriented overengineering in a mathematical shape may also be somebody else's hammer. Different approaches, but I know which I'd follow.
NOTE: I'm *not* talking about the Nanopass Framework, which is a framework intended to automate the boilerplate of writing nanopasses, saving time in an educational context. I'm talking about the abstractions they are specific instances of.
@vertigo I get you and I'm not oppossed to the idea of multi-pass compilers. I'm highly biased against the proposed formal proofs expressed as various different languages. I just happen to see multiple passes as a sequence of graph transfor,s, for which you may have a formal description in one regu;ar off-the-shelf language borrowing from graph theory. Data structures of internal memory representation are just optimisations that run on machines.
@vertigo But boilerplate isn't just a problem in an educational context. I'm particularly sensitive to it in machine code, where it's a maze of twisty passages mostly alike.
Also, the nanopass framework isn't just about superficial boilerplate. They were attacking the phase ordering problem. You write a phase expecting some feature to exist, but you inserted the phase where the feature isn't computed yet. Or isn't computed in some situations. Formal specification helps there.
@vertigo To my mind, the framework didn't mutate the idea. The original paper I linked to thinks in terms of separate languages.
But I think I've been getting pedantic. You're right that many phases is a good thing as long as the format (not language) doesn't diverge too much between them. I think I follow where you and @kragen are coming from. You have some in-memory data structure for the program, and successive phases fill in different parts of it. Am I understanding you right?
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!