The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”. Mostly for your typical algebraic style infix expressions with precedence.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
> The hardest part (IMHO) about getting a simple language running, particularly for beginners, is the “expression”
This is why teaching material should be tailored towards teaching rather than implementing demented stuff from the real world just because it is "there." You can easily implement a parser that ignores precedence and tell the reader that we are using brackets to force precedence instead.
In a real-word parser, you might use something like this[1] or some other home-grown algorithm if you didn't know about it. Doesn't matter as long as the parser works.
Is that really the hard part? I figured that part out before I knew much programming: made a simple Excel type formula parser. (A horrible, horrible solution based around replacing strings and manually counting parentheses... but it worked!)
I never got much farther than that though, I've looked into making a compiler many times and got overwhelmed every time.
Good to know I wasn't alone in writing terrible expression parsers. Since I knew absolutely nothing about parsing, my parser/interpreter consisted of splitting on whitespace, followed by linear scanning to find the most high precedence operation, repeated until a single token remains (basically how a human would do it).
I think the hardest part is function calls. Implementing the standard calling convention is really fiddly and annoying when you have to save and restore registers around function calls if you want to do it efficiently (keeping track of which registers are dirty, doing parallel loads of registers into the right argument registers, spilling when appropriate, etc. It is fiddly.
The alternative is going to an IR and doing full register allocation but then you need to implement Lengauer-Tarjan to get into SSA, all the same parallel load stuff for phi functions/block arguments, out-of-SSA, reconstruct in optimisation passes all the information you discard by going to a linear IR, etc.
I'd argue that Lengauer-Tarjan is overrated. While it is neat, it's premature optimization IMHO. Fully maximal SSA (at the entry of each block insert a phi-node for each variable) + dead code elimination (we need it anyways) is much simpler.
Typical implementations of Lengauer-Tarjan are often taken verbatim from Andrew Appel's book and involve higher constant factors than alternative algorithms - such as Cooper et al's "engineered" version of the usual fixpoint algorithm, which is very neat, simple, and performs well.
If you are saying that constructing SSA following the classical approach is a premature optimisation, perhaps you would prefer Braun et al's retelling of Cliff Click's SSA construction algorithm - which works backwards from uses to place phis and requires no dominance information to be computed.
Why not fully maximal SSA + DCE? I mean, other than timings consideration.
Fully maximal does not even need any graph traversal, it is entirely local to each block.
I think it will introduce too many redundant phis, but I've never used it in practice - so I can only speculate. I'm not convinced DCE will clean maximal SSA up substantially. Even the classic Cytron et al algorithm must be combined with liveness analysis to avoid placing dead phis (that is, it does not produce pruned SSA by default). In the past, there was always a fear that SSA has the potential to cause quadratic blowup in the number of variables - this concern is mostly theoretical but probably influenced some of the design decision around algorithms for constructing SSA.
Braun et al's algorithm works backwards from uses (which generate liveness), so you get pruned SSA out. In the case of reducible control flow graphs, you also get minimal SSA. This is all without any liveness or dominance computation beforehand. Granted, you may want those things later, but it's nice that you can construct a decent quality SSA with a fairly intuitive algorithm. Also shown in the paper is that you can incorporate a few light optimisations during SSA construction (constant folding, for example).
I'll definitely test maximal SSA + DCE in tinyoptimizer when I'll get to it. For the moment I made [1] somewhat-pruned mem2reg pass, not even sure how to call it. But it indeed required to compute dominance frontiers.
I'm always happy to see more accessible resources for compiler writers.
---
As an aside: for displaying CFGs on the page, it would be very interesting to emit something somewhat dynamic. SVGs are always a good start, but there is a neat library for doing hierarchical graph layout (dagre, with d3-dagre handling rendering as well). In my own efforts at pedagogy, I've been interested in producing CFGs in-browser whose basic blocks comprise a "unified diff" view of the block (this being achieved in realtime by maintaining a subset of LLVM whose CFGs are persistent). Then it is more obvious what has changed: at least in the case of mem2reg which shouldn't introduce new blocks or move too much around (I forget if it hoists allocas to the entry block or not).
It'd also be cool to distil what underlying ideas you have found to be most useful in your efforts. The constrained scope of them may be useful to me, as I've wanted to create a kind of "advent of compilers" for years (advent of code but with a heavy slant towards compiler tasks/algorithms).
well, in my personal experience, parsing is the easiest part. it's what you do with the AST once you have one that is the hard part. but for some reason an overwhelming amount of writing is about frontends when you could teach the concepts to a layman with some boxes and arrows.
Just getting the grammar straight on the naturally recursive structures can be a trick. They also touch a large portion of the code generation and run time.
Get:
working and you’re 80% there.