> We introduce two new instructions to handle the short move case quickly
These are more or less built in to avx512 already...
(The hot path is: load immediate; bzhi; kmov; load; store. 5 instructions rather than 2, but still trivial. If you look at the _latencies_ involved, the difference is trivial.)\
These are more or less built in to avx512 already...
(The hot path is: load immediate; bzhi; kmov; load; store. 5 instructions rather than 2, but still trivial. If you look at the _latencies_ involved, the difference is trivial.)\
(Afaik it's in sve too.)