These numbers don't seem abnormal; I recall building Safari many years ago, and having multi-GB of intermediate products shrink down to a 30MB executable (plus a few hundred MB of debug symbol).
So, I have a thought: if we're spending all this time to compile functions (particularly template functions) that are just thrown away later, why are we performing all our optimization passes up-front? Surely, optimization passes in a project like Chrome must eat up a lot of compilation cycles, and if that's literally wasted, why do it in the first place? Can we have a prelink step where we figure out which symbols will eventually make it, and feed that backwards into the compiler?
Maybe a more efficient general approach might be to simply have the optimizer be a thing that runs after the linker, so that the front-end compiler just tries to translate C++ into some intermediate representation as fast as possible. The linker can do the LTO thing, then split the output into multiple chunks for parallelization, and finally merge the optimized chunks back together. With LLVM, it feels like the bitcode makes this a possible compilation approach...
Hmm...not "abnormal" in the sense that we've gotten used to it: yes. Heck, last I heard building OneNote for Mac takes about half a day on a powerful MacPro.
But I'd say definitely abnormal in terms of how things should be.
Doesn't link-time code generation already exist (at least on MSVC, I think). It makes the linking step so much more expensive, though, which sucks for incremental builds.
Ah, yes, I knew I had to be missing something! Needing to support incremental builds are what makes C++ compilation so frustrating. I wonder why this is so difficult though - dependency tracking should be a thing that can carry through the link stage. Just track what files a given function depend on, and only recompile/recodegen functions that have changed. Of course, it still sucks massively if you change a header file, but that's why you separate the header from the implementation of the methods :)
/LTCG also doesn't seem to parallelize well - last I checked it still ran all the codegen on one core. Maybe that's different now?
So, I have a thought: if we're spending all this time to compile functions (particularly template functions) that are just thrown away later, why are we performing all our optimization passes up-front? Surely, optimization passes in a project like Chrome must eat up a lot of compilation cycles, and if that's literally wasted, why do it in the first place? Can we have a prelink step where we figure out which symbols will eventually make it, and feed that backwards into the compiler?
Maybe a more efficient general approach might be to simply have the optimizer be a thing that runs after the linker, so that the front-end compiler just tries to translate C++ into some intermediate representation as fast as possible. The linker can do the LTO thing, then split the output into multiple chunks for parallelization, and finally merge the optimized chunks back together. With LLVM, it feels like the bitcode makes this a possible compilation approach...