> (Also: inlining is not the way to go if you are frontend-bound...)
When workload is frontend-bound, then the culprit is usually either in instruction cache misses (e.g. a lot of unpredictable virtual calls or generally poor code layout) or branch mispredictions (e.g. a lot of unpredictable branches in your code). I fail to see how inlining the code can be correlated to these two effects other than, what I believe is a myth, that inlining the code will by (1) growing the executable size (2) put more pressure on the instruction cache size and (3) therefore end up with degraded performance. In my experience, rarely I have seen even nr. (1) taking place (compilers and linkers are way too smart nowadays) and I think I have never managed to measure the effects of (2) and (3) in a non micro-benchmark scenario.
Anyway, if I were to optimize the workload found to be frontend-bound, eliminating branches and getting rid of the dynamic dispatch would be the first two things to check. Inlining maybe but only maybe in the third place.
When workload is frontend-bound, then the culprit is usually either in instruction cache misses (e.g. a lot of unpredictable virtual calls or generally poor code layout) or branch mispredictions (e.g. a lot of unpredictable branches in your code). I fail to see how inlining the code can be correlated to these two effects other than, what I believe is a myth, that inlining the code will by (1) growing the executable size (2) put more pressure on the instruction cache size and (3) therefore end up with degraded performance. In my experience, rarely I have seen even nr. (1) taking place (compilers and linkers are way too smart nowadays) and I think I have never managed to measure the effects of (2) and (3) in a non micro-benchmark scenario.
Anyway, if I were to optimize the workload found to be frontend-bound, eliminating branches and getting rid of the dynamic dispatch would be the first two things to check. Inlining maybe but only maybe in the third place.