Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What part of Rust compilation is the bottleneck? (kobzol.github.io)
125 points by dralley on March 15, 2024 | hide | past | favorite | 61 comments


Monomorphization.

For every generic function f, rustc will generate as many instances as there are type instances (shape instances? Does the Compiler distinguish between different kinds of references that all get compiled to pointers?).

This feature has a cost. Compare to OCaml's uniform object representation that enables comparatively blazing compilation performance but pays a prize in performance and weird FFI restrictions (integers with a tag bit).

Btw. It's misleading to say "it's the backend" when the frontend is responsible for creating so much work for it.


> ...when the frontend is responsible for creating so much work for it.

That's the most important point I think. Clang compiling typical C code is very fast, but the same Clang compiling typical C++ code is very slow. Both use the same LLVM backend.


Kind of, despite its slow builds fame, it is possible to have relative fast builds in C++ with monomorphization.

By using binary libraries, external templates for common type sets, incremental compilation and linking, and nowadays (at least for VC++ already) modules.

What Rust still lacks is having sound alternatives to LLVM, or someone supporting similar workflows in Rust.

Using OCaml as an example, it is great to have multiple backends in the box, plus an interpreter, and pick and choose during development workflows.


Maybe cranelift will help with this. Faster compile times is one of its selling points.


That is something that I also look forward to.


Monomorphization can be manually addressed in Rust by writing generic function impls as delegating to a single function where the generic pameters are partily or fully omitted - a kind of "polymorphization". This is a pretty common pattern, e.g. in the Rust std library. In more recent versions of Rust this can be expanded via the use of const generics, e.g. to express the size and alignment of a generic type parameter, where the implementation only depends on these. So this kind of "polymorphization" can be applied more broadly.


D does the same thing with its template system and the compilation is still extremely fast.


May I take the opportunity to ask, what’s the reason for Metas somewhat heavy use of Ocaml? What’s the appeal? You already pointed out the insane compilation perf.


It's a pleasant and practical language to write, yet fairly safe.

Imagine the safety of Rust but looking more like Python (or Haskell...), the concurrency of Go (since V5), and without a borrow checker (but a GC instead).


I watch some intro to OCAML videos, got excited about the languages features, then tried reading some real OCAML (Tezos, which was touted as the star of idiomatic OCAML projects - can't find the site that listed it now), and I found it so incredibly dense, hard to read, and almost completely devoid of meaningful naming and comments. It felt similar to reverse-engineering minified code to me.


sounds like unfamiliarity. OCaml isn't a language a layman can read without prerequisites.


I found the idea pleasant, but not in actuality. So maybe it's an acquired taste? Haha


oh it definitely is. A lot of Haskell can look like you describe, but it's perfectly legible if you have enough reps under your belt. I find normal languages hard to read nowadays.


With all the recent improvements to compilation speed (nightly, cranelift, mold-linker), Rust has become much more pleasant. Trivial and incremental changes to a medium sized crate like rust-analyzer (~200k loc) takes around 2.5s and a small Axum project takes around 0.5s.

These are my very subjective hobby benchmarks running archlinux on an AMD 9 7940HS.

Of course the initial build or the release build take much longer, but it makes me hopeful for the future.


Woah, 200k LoC is considered medium? I work at a Series A startup and our entire product (which is actually much more than a CRUD app) is only in the high tens of thousands, so that’s just a funny thought for me.

My theory is that because Rust is a low level language you tend to miss out on higher level primitives that promote more code reuse. Another theory is that Rust is mature but not quite as mature as something like Java, so there are fewer mature dependencies for you to delegate your work to.

Thoughts on what’s accurate? For context, I’ve written a bit of Rust myself, but am definitely a beginner.


Rust is very good at code reuse.

Generics and cross-crate inlining enable zero-(runtime)cost abstractions, meaning there’s usually no perf downside to using 3rd party code instead of your own.

Strict type system, standardized error checking, thread safety in interfaces, and built in tooling for API documentation makes using libraries relatively easy.

The ecosystem is pretty large now, and has a culture of respecting semver, and focus on safety and reliability.

Cargo makes adding dependencies easy (the most common complaint is that it’s too easy, and people use too many dependencies).


You know Series A is early stage, right?


One reason for me moving from Rust to Go was compilation speed. Go is a simpler language, so apples to oranges, but Go compiles so fast, which to me makes development very different.


OCaml compiles just as fast without having to compromise on the type system.


But then you have to compromise on speed of generated code, poor support of windows, number of libraries and overall ecosystem, no ability to generate standalone executables and probably more but I tried OCaml only briefly so can't speak to all of its shortcomings.


OCaml optimizations are certainly better than Go compiler that hardly does inlining and only recently got some PGO support, and those that care about using LLVM or GCC backends have to compromise on fronteds that still don't do generics.

Go support on Windows is also not great, plugin package doesn't work, filesystem support assumes POSIX semantics, cgo requires installing mingw.


> using LLVM backends

Wait, there is an LLVM based Go toolchain? I thought the Go crowd was known for their NIH obsession.


TinyGo.


Some see it as a compromising on types, I don't. After some years writing Scala code, trying to come up with even better types each day, Go to me is not a compromise but a relief.

My love for types peaked when I was in my mid-40s, now that I'm 50+ I want simple things.


I am almost 50, and my point of view on Go from 2012 has hardly changed.

http://lambda-the-ultimate.org/node/4554#comment-71504

At least it does generics now.


> At least it does generics now

Not in gccgo


Indeed, as you see on other remarks from me I am aware of it, my point was about the language as designed, not the implementations ecosystem.

TinyGo also doesn't do them.


From my experience this simplicity is something you pay the price along the way - development is harder (a good typing gives you a lot of hints about functionality and puts bounds on developers on how to use it) and more error prone (less stuff gets caught by the type checker, instead you find it out in runtime).

Of course it is good to be reasonable - some people completely fly off into the FP world and instead of actually building working stuff they think all day about some clever abstraction and types to model it.


I'd call that disillusionment with bad type systems, which are indeed unnecessarily complex. We have yet to achieve a typing "nirvana", but we're getting closer IMO.


That is definitely true, but from the article it's the backend that takes time, not the frontend where the language itself resides. If you compile go from llvm, it maybe as long as rust.


It’s not just Rust. Practically every compiler based on LLVM is slow. Swift, Zig, Clang.

The Go compiler being written from scratch based on the Plan9 C compiler is a huge advantage.


Zig is moving away from LLVM. Its already has its own backend targeting debug builds for x86 and arm. ReleaseFast and ReleaseSmall is an entire different beast, but its going to be tackled eventually.


Minor correction: Go compiler used to be a modification of Plan9 C compiler, then a Go port of that modification but then it was completely rewritten as a SSA-based compiler so today it has almost nothing to do with the original Plan9 code.


I will chime in to day that rust is also building a alternative to the LLVM backend in the form of cranelift : https://github.com/rust-lang/rustc_codegen_cranelift


I wonder why we do not split up compilation more - especially for web developer. Rust does this a little with "check", C with "-O".

I want fast compilation for my dev cycle or for unit tests, I want slow compilation with optimizations, escape analysis, correctness etc. for production (the distinction between a compiler and linter is also not clear, some compiles do what linters do in other languages).


We do, you have profile configuration for dev or for release

You can get pretty big differences in terms of compile speed / binary size


Giving up entirely on the compile time of "performance" builds can be bad too though, e.g. for people writing games, audio software, etc.


Only if you need to test the full game. If you can unit test algorithms and learn something then fast builds matter.


second that. also another point: I write most Go code using only the stdlib, so there is no dependency web to take care on top of the actual code.


I also have much less dependencies with Go compared to other languages, especially in TS.


If you make a small change to your application, the Rust compiler does a significant amount of rework. That is, it recompiles a lot of code that it has already compiled before. There are valid technical reasons for this because of how LLVM works or that the linker needs to rewrite all addresses. Yes, incremental compilation is a thing but it’s too coarse IMO. To me it seems that taking an extremely fine grained approach to compilation would improve the ergonomics of the iterative hack-and-run method of writing software. Some sort of local database of diffs or some such.


Depends on your target; if you have tiny compilation units you won't be able to optimize /inline on a broad target, that's why single unit compilation is an option (that may or may not improve the resut)


This is true but what is really happening is that the frontend and backend cannot communicate intent effectively because of they way they are separated. The frontend doesn’t know what is important for optimisation because that’s not its job and the backend only sees the code the front gives it (never a wholistic view). So the easiest (and slowest) approach is to do everything over and over again.

Increasing Codegen units (multi unit compilation) is just the user taking a risk that splitting things up will not affect performance optimisations. Nothing smart about it.

If you had tiny compilation units and the frontend understood their significance to the backend then it would be able to build a graph of dirty code to be recompiled when a small piece of it changes.


One problem they really need to address though is that as soon as you put everything in a single compilation unit, then compilation is single threaded and dog slow. Compilation units maybe make sense for C and C++, but for other languages they are just a way to structure the compilation. It should be automatic, and just better all around.


As a workaround, split into multiple crates.

Disclaimer: I concur that this is merely a workaround, not the real fix


Over the years compile speed improved quite a bit (recently this: https://blog.rust-lang.org/2023/11/09/parallel-rustc.html made quite the difference)

If we can squeeze more performance that's great but the largest concern I have around compilation is with the size of the target directory

It can balloon up to node_modules levels


I don’t find node_modules exceeding even a few hundred megabytes very often. But Rust target directories can easily reach multiple gigabytes.


Apart from incremental compilation cache, a large chunk of it is debug information. Lowering debug info precision helps a lot (although it’s still suspiciously large.)


There have been some recent improvements to this, but yeah, it can be still quite large. There is a WIP development of a garbage collector in Cargo that could help with this.


FWIW Jakub has more blog posts on his website which are also really interesting https://kobzol.github.io/


Build time should be instantaneous, just like in go or C. Instead, simple hello world in bevy / egui, can takes forever to build. Even after the first build, build time is noticable for every change . You already have to struggle with the borrow checker, adding the build time to that makes rust the worst dev experience for anything that requires adding dependencies to your project. I've been using rust for 3 years as my main programming language for hobby projects, but now I've decided to switch to C and Go until there is relevant improvements in Rust.


I would love to see more categorized frontend breakdown. Just type check, borrow check and metadata don't show the full picture.


Could LLVM be speeded up by passing it the data in a more efficient structure? Or by slimming down the data it's passed in some way?


It’s well known that the LLVM IR bytecode generated by rustc is terribly verbose – Rust constructs mostly get lowered very naively to a buttload of redundant IR, to be then pruned and condensed by the backend. This is by design, as it helps keep the frontend as simple and fast as possible, but there are certainly cases where it would be a net benefit to move some of the complexity to rustc in order to lighten LLVM’s workload.


I'll never forget my first experience with Rust: trying to build the compiler from source and OOMing my cloud VM. :)


Good article, just throwing out there that flamegraphs would be exactly what you need for visualizing this stuff.


It looks like the profile they're build on already supports those, I think the intention here is to present a sort of at a glance view that could be quickly analyzed.


Makes sense. In my opinion I would just generate fake flamegraphs then - they are much more readable and the format is dummy easy to fake.


Yeah, I actually generated these small charts out of a flamegraph, because it contains too much information and isn't easily split into three distinct parts. And once you condense the information into just 3 blocks, then using a flamegraph doesn't really add any further value, IMO.


/shrug that's fair, with only like four things its not that much more readable.


For libraries, the article shows most of the time being spent in "front end" phases. Isn't that a bit misleading, as the library will eventually have to be included in the final program's binary? The code generation phase isn't exactly attributable to any one module.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: