Each pass is like a Unix filter -- it takes input, processes it in some *very* well defined way, and produces output. Each pass communicates with other passes via in-memory buffers, so no mass storage is required.
Regarding mass storage requirements, though, you can have nanopass compilers on a Commodore 64 if you wanted, by reducing the size of the compilation unit. BSPL works on individual colon definitions, for example, and uses mere KB.
@vertigo I see, yes I divide up work into functions that operate on the entire compilation unit.
Mu has 27 'transforms': http://akkartik.github.io/mu/html/012transform.cc.html
SubX has 7 so far, but I added an additional notion of 'checks': http://akkartik.github.io/mu/html/subx/029transforms.cc.html
But the original nanopass paper describes designing a separate formally-defined language for each pass: https://www.cs.indiana.edu/~dyb/pubs/commercial-nanopass.pdf
(Unix pipelines also introduce concurrency and flow-control; I perhaps know too much for that analogy to be helpful.)
@akkartik I'm not sold on anything thsat recommends many different languages to basically treat successive transforms of an essentially equivalent graph. Proofs for every lttle lang seem way overkill to me, when you can reuse good old graph theory on any graph.
Then, that may be just me and my hammer seeing it all as nails. And linguistically oriented overengineering in a mathematical shape may also be somebody else's hammer. Different approaches, but I know which I'd follow.
NOTE: I'm *not* talking about the Nanopass Framework, which is a framework intended to automate the boilerplate of writing nanopasses, saving time in an educational context. I'm talking about the abstractions they are specific instances of.
@vertigo I get you and I'm not oppossed to the idea of multi-pass compilers. I'm highly biased against the proposed formal proofs expressed as various different languages. I just happen to see multiple passes as a sequence of graph transfor,s, for which you may have a formal description in one regu;ar off-the-shelf language borrowing from graph theory. Data structures of internal memory representation are just optimisations that run on machines.
@vertigo But boilerplate isn't just a problem in an educational context. I'm particularly sensitive to it in machine code, where it's a maze of twisty passages mostly alike.
Also, the nanopass framework isn't just about superficial boilerplate. They were attacking the phase ordering problem. You write a phase expecting some feature to exist, but you inserted the phase where the feature isn't computed yet. Or isn't computed in some situations. Formal specification helps there.
@vertigo To my mind, the framework didn't mutate the idea. The original paper I linked to thinks in terms of separate languages.
But I think I've been getting pedantic. You're right that many phases is a good thing as long as the format (not language) doesn't diverge too much between them. I think I follow where you and @firstname.lastname@example.org are coming from. You have some in-memory data structure for the program, and successive phases fill in different parts of it. Am I understanding you right?
@vertigo To return to my original statement about "converting line by line" (which is I think what caused this sub-thread), the goal wasn't to minimize passes or phases but rather to minimize how much information a reader needs to hold in their head to understand how a single statement in a language is compiled. Types, global variables, local variables, sure. But I don't want readers to need to know how the previous statement was compiled. That veers into optimizer territory.
@vertigo A graph-coloring register allocator violates that constraint of mine. But as you and @email@example.com pointed out, I was forgetting that we can do without it. Crenshaw is the trivial demonstration that using registers without actually register allocating. Thanks for the reminder!
@akkartik @firstname.lastname@example.org @haitch BSPL has some phases which actually mutates the nature of the underlying structure (e.g., going from stack notation to RISC-ish notation), but such disruptive changes usually happens only when you're satisfied that you can't process the current data set any further in a productive way.
I think, as long as it doesn't happen too frequently, it should be simpler to understand overall.
Server run by the main developers of the project It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!