Factorial on SubX
Ok, I think I understand calling conventions now.
Also coming face to face with the pain of debugging machine code 😀
That isn't much progress for a month. I've been trying a few different things and learning a lot:
a) I spent some time looking at the GREAT stage0/Mes bootstrap project.
b) I tried to design a type-safe low-level language that could be converted line by line to machine code, but it turned out to be a bust (thanks @kragen!). It's possible if you give up on freeing heap memory.
I'm now convinced there are no shortcuts. Gotta build a real compiler in machine code.
There's only one problem: I don't know how to build a compiler. Not really. And definitely not in machine code. So I'm going to be fumbling around for a bit. Lots more wrong turns in my future.
I've been trying to port https://compilers.iecc.com/crenshaw to SubX. With tests. It's been slow going, because I have to think about how to make Crenshaw's primitives like `Error()` and `Abort()` testable.
I don't know if just learning to build a compiler will sustain my motivation, though. So some other ideas:
a) Some sort of idealized register allocator in Python or something. I've never built one, and my intuitions on how hard it is seem off.
b) Port Mu's fake screen/keyboard to SubX so that I can reimplement https://github.com/akkartik/mu/tree/master/edit#readme. It was just too sluggish since Mu was interpreted. Even my 12 year old students quickly dropped it in favor of Vim.
@akkartik @kragen @haitch @freakazoid Honestly, my advice here is to read two books: "Programming a Problem Oriented Language" teaches you how to write your own Forth compiler from scratch (bare metal on up). There are no assembly listings in the book, because it's pure guidance, but it was instrumental in me getting DX-Forth working at all.
Second, if you're willing to tackle the more sophisticated compiler techniques needed to have a proper infix compiler, I would then recommend reading Compiler Construction, by Niklaus Wirth (see http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf).
You don't need fancy register allocation to get a working compiler that meets 80% of your performance goals. Don't let that slow you down.
@akkartik @kragen @haitch @freakazoid There seems to be this great fear of passes in compiler design. It's a one-pass compiler, or a two-pass compiler. Hogwash; do 56 passes if you must; all that matters is the final output. Register allocation can be one (or maybe a few) of those passes towards the end.
@vertigo historically, reducing the number of passes was a big win for reducing the complexity and improving interactive performance, in large part because the passes communicated via mass storage. The nanopass framework eliminates a lot of those costs, and of course hardware is fast now. @akkartik @haitch @freakazoid
Each pass is like a Unix filter -- it takes input, processes it in some *very* well defined way, and produces output. Each pass communicates with other passes via in-memory buffers, so no mass storage is required.
Regarding mass storage requirements, though, you can have nanopass compilers on a Commodore 64 if you wanted, by reducing the size of the compilation unit. BSPL works on individual colon definitions, for example, and uses mere KB.
@vertigo I see, yes I divide up work into functions that operate on the entire compilation unit.
Mu has 27 'transforms': http://akkartik.github.io/mu/html/012transform.cc.html
SubX has 7 so far, but I added an additional notion of 'checks': http://akkartik.github.io/mu/html/subx/029transforms.cc.html
But the original nanopass paper describes designing a separate formally-defined language for each pass: https://www.cs.indiana.edu/~dyb/pubs/commercial-nanopass.pdf
(Unix pipelines also introduce concurrency and flow-control; I perhaps know too much for that analogy to be helpful.)
@akkartik I'm not sold on anything thsat recommends many different languages to basically treat successive transforms of an essentially equivalent graph. Proofs for every lttle lang seem way overkill to me, when you can reuse good old graph theory on any graph.
Then, that may be just me and my hammer seeing it all as nails. And linguistically oriented overengineering in a mathematical shape may also be somebody else's hammer. Different approaches, but I know which I'd follow.
NOTE: I'm *not* talking about the Nanopass Framework, which is a framework intended to automate the boilerplate of writing nanopasses, saving time in an educational context. I'm talking about the abstractions they are specific instances of.
@vertigo I get you and I'm not oppossed to the idea of multi-pass compilers. I'm highly biased against the proposed formal proofs expressed as various different languages. I just happen to see multiple passes as a sequence of graph transfor,s, for which you may have a formal description in one regu;ar off-the-shelf language borrowing from graph theory. Data structures of internal memory representation are just optimisations that run on machines.
@vertigo But boilerplate isn't just a problem in an educational context. I'm particularly sensitive to it in machine code, where it's a maze of twisty passages mostly alike.
Also, the nanopass framework isn't just about superficial boilerplate. They were attacking the phase ordering problem. You write a phase expecting some feature to exist, but you inserted the phase where the feature isn't computed yet. Or isn't computed in some situations. Formal specification helps there.
@vertigo To my mind, the framework didn't mutate the idea. The original paper I linked to thinks in terms of separate languages.
But I think I've been getting pedantic. You're right that many phases is a good thing as long as the format (not language) doesn't diverge too much between them. I think I follow where you and @kragen are coming from. You have some in-memory data structure for the program, and successive phases fill in different parts of it. Am I understanding you right?
@vertigo To return to my original statement about "converting line by line" (which is I think what caused this sub-thread), the goal wasn't to minimize passes or phases but rather to minimize how much information a reader needs to hold in their head to understand how a single statement in a language is compiled. Types, global variables, local variables, sure. But I don't want readers to need to know how the previous statement was compiled. That veers into optimizer territory.
@kragen @akkartik @haitch On RISC-V, to emulate (An)+ or -(An) on 68000, each memory reference will cost you 8 bytes, which is typically in addition to the instruction that computes the desired outcome (so, figure, an average total of 12 bytes per CISCy-equivalent operation). This was the only reason why I didn't implement a naive MachineForth compiler for RISC-V, opting instead for multiple optimization passes and a different structure to the language (e.g., it's a 3-stack environment).
I'm pondering, once I build my revised BSPL compiler (which I intend on using to write the exec.library for VertigOS) and its multifarious passes, I wonder how difficult it'd be to retarget Ur-Scheme to use the BSPL code generator to get Ur-Scheme to run under VertigOS on RISC-V architecture...
@akkartik @kragen @haitch BSPL has some phases which actually mutates the nature of the underlying structure (e.g., going from stack notation to RISC-ish notation), but such disruptive changes usually happens only when you're satisfied that you can't process the current data set any further in a productive way.
I think, as long as it doesn't happen too frequently, it should be simpler to understand overall.
@vertigo That being said, if performance is a primary goal, I'd suggest considering whether simpler heuristics are applicable
@haitch @akkartik @kragen @freakazoid Nice thing about not shying away from passes in compiler implementation is that passes are often plug-and-play relative to each other. That allows you to follow Butler Lampson's philosophy: "Make it work, make it right, make it fast."
A lot of what goes into making a compiler produce the fastest possible code is an act of diminishing returns, and frequently has to be redone every processor generation. Ergo, the more you can modularize it, the better.
@vertigo Anyway, routines that juggle more than 32 variables at amy given time are possibly not very well thought out. And routines that use more than that shou;d probably be penalised and split into other functions. This is less of a problem for processors with many registers, but ai ser how it can be a concern for @akkartik who's specifically targetting IA32.
@akkartik @kragen @freakazoid
@AceNovo @akkartik @kragen I mention it in my follow-up toot, https://mastodon.social/@vertigo/100781598805099246 .
@kragen Right, I should have said "conventionally type-safe" 😅 If I was to build just an intermediate language as a stepping stone during bootstrapping, I'd be fine with constraints like disallowing arrays, or records, or arrays inside records. But (at least with that particular thought experiment) I'd like to have it all without ever needing to write a compiler.
@akkartik Hmm, yeah, I can see that point of view. It's nice to minimize the amount of abstraction. What do you think about http://research.microsoft.com/en-us/um/people/nick/coqasm.pdf? It might be the thing that comes closest to what you want. @haitch @vertigo
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!