Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
GCC code generation for C++ Weekly Ep 43 example (kristerw.blogspot.com)
86 points by matt_d on Jan 8, 2017 | hide | past | favorite | 17 comments


After a quick investigation, clang (3.9, x86-64) doesn't seem to suffer from the same problem:

https://godbolt.org/g/Uvo7iO


Unlike GCC, clang will iterate gvn (what GCC calls fre, because in GCC, gvn is properly split into an analysis, and fre uses it to do full redundancy elimination) until it stops changing, and does so fairly early in the pipeline

I wrote fre for GCC (and working on llvm's next gvn) and long story short, GCC is more powerful, but it doesn't always run early enough, and is not iterated.

The general problem of detecting all herbrand equivalences is exponential time, even without trying to evaluate the code. So you can't win all the time anyway.


Well, try to significantly increase the argument count in your example.


Calling with 46 arguments seems to move this onto the stack: https://godbolt.org/g/tbKtGz


No. It hits the constant folding threshold and calls the sum function with the args.


This surprises me. This makes it sound like inlining and vectorizztion are performed independently of constant folding. It seems like the compiler should be able to recognize when a function depends only on its inputs. Then constant folding should be able to inline / evaluate any code that can be compile-time evaluated, no matter how much code that is. It doesn't seem like this should be left up to chance.


It's likely just a typical pass ordering problem


Is there no way to indicate to GCC when you want something fully evaluated at compile-time? Maybe some pragmas? 98% of the time I probably don't care whether GCC does a good job optimizing this properly, but for the times I do care: what should I do?


A `constexpr` function storing result in a `constexpr` variable (you need both for the compile-time evaluation guarantee): http://en.cppreference.com/w/cpp/language/constexpr

An extra complication in this example is that `std::accumulate` currently lacks `constexpr` support: http://stackoverflow.com/questions/32395408/why-arent-stdalg...

There's been a proposal to relax this limitation: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p020...

For now you could use a workaround in form of a range-based for loop: https://godbolt.org/g/aVEqlm or even https://godbolt.org/g/GDrFr5 (homogeneous types).

A fold expression is much cleaner, though (also supporting heterogeneous types): https://godbolt.org/g/Hrnkc1


Use constexpr: http://en.cppreference.com/w/cpp/language/constexpr

You can't make sum constexpr in this case, though, since it calls a non-constexpr function, accumulate.


c++ has constexpr for this. a constexpr expression will be evaluated at compile time if the result is used in a constant expression (i.e. assignd to an enum constant or a constexpr variable), or an error will be generated if it is not possible to do so (the expression does not fulfill the requirements to be constexpr).



Your example still gets calculated at compile time if you remove the constexpr. However, if you change -O3 to -O2, then there's a difference.


VC++ 2015 (the latest released version as of this writing) can't optimize this for any number of arguments, even just one.


VC has always been good at code gen but not middle optimizations. Though they are working on it, this kind of thing takes years to build and tune well


> VC has always been good at code gen

Well, target code gen is the job of a compiler. Optimization is a plus. It's like saying VC is good at compiling, but not optimizing.


i'm really not sure what point you are trying to make. Yes, VC is good at compiling but not optimizing.

Given people expect both from their compilers these days ...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: