Worth mentioning other alternative small backends: http://c9x.me/compile/

ddevault · on April 21, 2020

I use qbe, it's great. Here's a mostly feature-complete C11 compiler based on qbe:

https://git.sr.ht/~mcf/cproc

hawski · on April 21, 2020

I see that cproc is under quite heavy development, but qbe had last commit at the end of November. Is it considered feature complete? I heard about it some months ago and was quite interested in QBE, but it did not enjoy high tempo of changes. It may be considered advantage, I know too little to judge.

ddevault · on April 21, 2020

It's complete enough to compile C11 programs - to me, that's as good of a benchmark as anything. The main thing qbe is missing for cproc's purposes is inline assembly and VLAs. DWARF support would also be nice, but no one seems to care enough to do the work yet.

dbcurtis · on April 21, 2020

Are you saying that qbe does not generate any debug information, or just not DWARF format?

ddevault · on April 23, 2020

qbe does not generate any debug information. Though you can pass some flags to get some ideas of its internal code generation process.

bluejekyll · on April 21, 2020

While the GP doesn’t state this as an advantage, the Rust community would benefit from a fully Rust toolchain.

bluGill · on April 21, 2020

Why? Other than to prove it can be done what is the point.

If rust was a huge community okay, but face it, they are not. It is better therefore to focus their efforts where they can make a difference. A new x where the existing ones are just fine (this includes well maintained) is a waste of resources.

There are many possible good answers to the above question. However I'm not sure they apply, and worse I believe they will split resources that could be used to make something else better.

fluffything · on April 21, 2020

Cranelift - the compiler toolchain being discussed in this post (previously known as Cretonne) - is actually completely written in Rust, being developed (obviously) by Rust programmers, that are members of the Rust community. Its development started at Mozilla, which still employs some of its developers to work on it full-time.

So.. the claim that the Rust community is not big enough to achieve this is wrong, since they have already done it..

The reason they are doing it, is that LLVM is not fine: it is super _super_ slow. People want Rust to compile instantaneously, and are willing to pay people full time to work on that.

D, for example, compiles much faster than C and C++, and does this by having their own backend for unoptimized builds. I don't know how big the D community is, so I can't compare its size to the Rust community, but they did it, and it payed of for them big time, so I don't see why it wouldn't pay off for Rust as well.

int_19h · on April 21, 2020

DMD inherited the backend from DMC++, which was the end of a long line of optimizing C and C++ compilers going back over a decade before the earliest D alphas.

bluGill · on April 21, 2020

I didn't claim rust isn't big enough to do it. (that may well be true given the large effort that went into llvm over many years to make it a good optimizer - this is a different debate though and I'm not sure if it is true)

What I said was rust is better off focusing on problems that are not solved well by other people. A fast modern web browser (with whatever features is lacking) for example.

jfkebwjsbx · on April 21, 2020

> LLVM is not fine: it is super _super_ slow

Source? LLVM is fast for what it does.

What people usually complain about is rustc being slow overall, not the LLVM passes.

kibwen · on April 21, 2020

> What people usually complain about is rustc being slow overall, not the LLVM passes.

The LLVM phases are usually the dominating factor in Rust compile times (the other big single contender is the linking phase). However, when the Rust developers point this out, they are also careful to mention that this may be due to the rustc frontend generating suboptimal IR as input to LLVM; we can both acknowledge that LLVM is often the bottleneck for Rust compilation while also not framing it as a failure on LLVM's part (though at the same time it is uncontroversial to state that LLVM does err on the side of superior codegen versus minimal compilation time, hence the niche that alternative compilers like Cranelift seek to fill).

chc · on April 21, 2020

This is true to some degree — Rust does more work than most programming languages, and that work will always take some time — but the Cranelift backend is also measurably faster than the LLVM one.

sambe · on April 21, 2020

Why phrase it as "other than to prove it can be done" if you already know there are good answers? I think the following obviously do apply:

1) much easier for Rust community to contribute to the compiler from end-to-end.

2) lower coordination cost with LLVM giving complete, Rust-focussed control over code generation/optimisation. Think about e.g. fixing noalias.

3) lower maintenance cost for LLVM integration/fork.

It's also obvious that this needs to be weighed against the loss of LLVM accumulated technology and contributors. This is easy to underestimate (although I think 2)/3) are also easy to underestimate).

bluGill · on April 21, 2020

Because I don't think the possible good answers apply.

Sure it is harder to contribute to the backend, but does it matter? I've been doing c++ for years and never looked at the backend.

I'll grant lower coordination costs. However I believe they are not outweighed by the advantages of the other llvm contributions.

If they need to fork llvm that is a problem. Either merge it back in and be done (with some tests so whatever they need is not broke), or there is a compelling reason as llvm won't work with their changes.

pizlonator · on April 22, 2020

Lower coordination cost is a big deal. Having your own backend means you can do frontend-backend codesign. You can implement language specific optimizations in the backend. Those things are not in the cards if you’re using llvm. (I mean they might be, but unless you fork, the time it’ll take for the changes to make it into llvm will be comparable to the time it takes to write your own backend.)

aseipp · on April 21, 2020

Yes, it does matter, because LLVM is an incredibly complex piece of software. And when you work on a compiler, it turns out you'll have to work on the backend. When I worked on a compiler day-in-and-out, there were single files in LLVM that were bigger than our entire in-house compilation backend put together. Which do you think is more appealing to debug? When a bug in code generation causes compiled programs to segfault, it is not necessarily easy to debug if you aren't intimately familiar with the project, and this fact is compounded when you consider not everyone hacking your compiler is also a C++ programmer, knows LLVM's architecture, and so on. It is literally hundreds of thousands of lines of C++. The trigger test case is probably a massive generated IR program generated by some toolchain written in a completely foreign language, for a foreign language. Playing the game of "recover the blackbox from the crash site" is not always fun.

You can file bug reports, but not every part of the project is going to receive the same level of attention or care from core developers, and not everyone has the same priority. For example the Glasgow Haskell Compiler had to post-process LLVM generated assembly for years because we lacked the ability to attach data directly next to functions in an object file (i.e. at an offset directly preceding the function). Not doing this resulted in serious, meaningful performance drops. That was only fixed because GHC developers, not any other LLVM users, fixed it after finding the situation untenable after so long. But it required feature design, coordination, and care like anything else and did not happen immediately. On the other hand the post-processing stuff was a huge hack and broke in somewhat strange ways. We had other priorities. In the end GHC, LLVM, and LLVM users benefitted, but it was not exactly ideal or easy, necessarily.

On the other hand, "normal" code generation bugs like register misallocation or whatever, caused by extreme cases, were occasionally fixed by upstream developers, or patches were merged quickly. But absolutely none of this was as simple as you think. LLVM is largely a toolchain designed for a C compiler, and things like this show. Rust has similarly stressed LLVM in interesting ways. Good luck if your language has interesting aliasing semantics! (I gave up on trying to integrate LLVM plugins into our build system so that the code generator could better understand e.g. stack and heap registers never aliased. That would have resulted in better code, but I gave up because it turns out writing and distributing plugins for random LLVM versions your users want to use isn't fun or easy, which is a direct result of LLVM's fast-moving release policy -- and it is objectively better to generate worse code if it's more reliable to do so, without question.)

Finally, LLVM's compilation time issues are very real. Almost every project that uses LLVM in my experience ends up having to either A) just accept the fact LLVM will probably eat up a non-negligible amount of the compilation time, or B) you have to spend a lot of time tuning the pass sets and finding the right set of passes that work based on your design and architecture (e.g. earlier passes outside of LLVM, in your own IR, might make later passes not very worth it). This isn't exactly LLVM's fault, basically, but it's worth keeping in mind. Even for GHC, a language with heavy "frontend complexity", you might suspect type checking or whatever would dwarf stuff -- but the LLVM backend measurably increased build times on large projects.

> Either merge it back in and be done

It's weird how you think coordination costs aren't a big deal and then immediately say afterwords "just merge it back in and be done". Yeah, that's how it works, definitely. You just email the patch and it gets accepted, every time. Just "merge it back in". Going to go out on a limb and say you've never actually done this kind of work before? For the record, Rust has maintained various levels of LLVM patches for years at this point. They may or may not maintain various ones now, but I wouldn't be surprised if still they did. Ebbs and flows.

I'm not saying LLVM isn't a good project, or that it is not worth using. It's a great project! If you're writing a compiler, you should think about it seriously. If I was writing a statically typed language it'd be my first choice unless my needs were extreme or exotic. But if you think the people working on this Rust backend are somehow unaware of what they're dealing with, or what problems they deal with, I'm going to go out on a limb and suggest that: they actually do understand the problem domain much, much better than you.

Based on my own experience, I strongly suspect this backend will not only be profitable in terms of compilation time, which is a serious and meaningful metric for users, but will also be more easily understood and grokked by the core developers. And Cranelift itself will benefit, which will extend into other Rust projects.

bluGill · on April 22, 2020

Your points are well taken. Now imagine the rust compiler 15 years from now after great effort has made the backend optimizers great - most of your criticisms to llvm will apply there. It will be rust, and lack some code that isn't needed to optimize rust, but otherwise it will be extremely complex and hard to get into. Merging new fixes will take a long time because it is so hard.

C++ isn't a great language, but learning C++ is the least difficult part of the problem to contributing to llvm.

wtetzner · on April 27, 2020

One advantage is that you don't need a C++ compiler to build the Rust compiler. Dealing with building C++ projects can be a major headache.

pizlonator · on April 21, 2020

Historically writing a compiler in the language that you’re promoting is a good way to really understand the limitations of your language.

I think this works so well because language designers tend to understand compilers better than they understand other software.

cpeterso · on April 21, 2020

I heard Niklaus Wirth would only allow new compiler optimizations (in his compilers for Pascal, Oberson, Modula-2) that proved themselves by speeding up the compiler itself.

pizlonator · on April 21, 2020

Hahaha that sounds excessive!

JavaScriptCore does it differently: many of our benchmarks are either interpreters or compilers written in JavaScript.

One of those benchmarks, Air, is just the stack slot coloring phase of JSC’s FTL JIT (that JIT has >90 phases) rewritten in JavaScript instead of C++. It runs like 50x slower in JS than C++ even in JSC, which wins on that test. So, probably it won’t be possible to write a JS VM in JS anytime soon. I mean, surely it’ll be possible, but it’ll also be hilariously shitty.

KMag · on April 22, 2020

The specific metric, IIRC, is self-compilation of the compiler. Adding optimizations to the compiler needed to speed up compilation of the compiler more than their added complexity slowed down compilation of the compiler.

aseipp · on April 21, 2020

This is the mandatory rule for Chez Scheme which was only broken once when their entire backend was rewritten, and also (from what I have heard) a large guiding principle for the C# compiler at Microsoft.

It's extreme but it's a good idea because it treats compilation time like an actual budget, which it is. You can't just add things endlessly. But it's not easy to achieve in practice.

bluGill · on April 22, 2020

Which might or might not be a good idea. When I'm writing code at my desk the faster it builds the better. I just need my unit tests to finish and they are small. When it the same code running on my embedded system with lots of data being thrown at it in real time and the cpu load is near (sometimes over) 100% I'll take every optimization of the final code I can get no matter how long it takes to build.

It would be nice if a gcc replacement compiler made speed of building code the goal. I'll even accept speed of building the compiler after it was compiled with gcc (clang, msvc...) as the benchmark if that is faster.

Ericson2314 · on April 22, 2020

because bootstrapping entire systems without C.