Do you have any more information to share? The ultra-bloated unrolled vector imp...

titzer · on Dec 20, 2022

I've heard this directly from engineers at Intel.

saagarjha · on Dec 20, 2022

Is this pre- or post-ERMS?

loeg · on Dec 20, 2022

I don't believe it's been consistently the fastest way to copy memory even post-ERMSB. (It would be great and convenient if it were!)

saagarjha · on Dec 20, 2022

Perhaps for very small sizes (just use a mov directly) and very large ones (that fall out of your cache, so you’d probably want non-temporal stores). I think it would be difficult to cover all cases.

stephencanon · on Dec 20, 2022

It pretty much is (except for length-zero) starting in Icelake, with FSRM (aka ERMSB II, The Reckoning)

titzer · on Dec 20, 2022

It'd be great if this were true, since that would finally put an end to the madness. But I've hoped this for many, many years, but always been disappointed when someone spends ungodly hours on some hyper-tuned AVX thing that just barely eeks past, and we get stuck with another unwieldy monster for a decade.

titzer · on Dec 20, 2022

I heard this around 2018 or so.