I'm no CPU engineer, but this smells like some sort of chip-level macro function...

Symmetry · on Sept 7, 2018

It would be easy for Intel to make the a CPU that never throttled down, they would just have to clock the CPU low enough that it would never run into thermal problems no matter how much load the CPU was experiencing. But then they'd be much worse for everybody's use case except maybe CPU reviewers.

Changing the clock speed based on thermal headroom is really hard to do well but Intel does it well and most other chip-makers are trying to duplicate Intel. This is really the opposite of those old chips that would sometimes burst into flame because a modern CPU which throttles based on temperature will never burst into flame even if you remove the heat sink and put it in a 150 degree over.

valarauca1 · on Sept 7, 2018

AVX-512 does not do to micro-op splitting (on intel server class CPU's, it does on some intel consumer cpu's).

Amusing Zen is emulating AVX512 and AVX2 via micro-op splitting and it performs better under some workloads.

The real issue is path propagation delays of 512bits worth of electricity is extremely non-trivial, and costs a shit load of power. Just `mov`'ing to the AVX-512 instructions (initially when AVX-512 is not warm) can stall the CPU for 10,000+ cycles as it tries to power on all those registers.

brigade · on Sept 7, 2018

Someone analyzed AVX512 performance on the elusive Lenovo laptop? Link!

Also, source on the power up stall? AVX(2) didn’t have that and I’m highly surprised AVX512 would. Agner at least claims the same reduced throughput during warmup, but I think he only has early silicon.

BeeOnRope · on Sept 7, 2018

AVX(2) definitely had the power-up stall on many chips, including all client Skylake I think.

brigade · on Sept 7, 2018

No, it had reduced throughput of AVX instructions while the ALUs powered up. Not a stall.

BeeOnRope · on Sept 8, 2018

Yeah maybe you are right for Skylake client, I haven't tested carefully there, but I'll probably get around to it. This thread [1] indicates that it may have only been Haswell that had the halted portion.

On to Skylake-SP, however, that chip is reported to have both reduced throughput and fully halted periods in [2].

Some have speculated it has to do whether chips have an integrated IVR: the models with integrated IVR having less capability of handling high dI/dt events. I don't know about that though (Skylake-SP still has external VR, right?).

[1] https://www.agner.org/optimize/blog/read.php?i=378#378 [2] https://software.intel.com/en-us/comment/1926876#comment-192...

kazinator · on Sept 7, 2018

> cuts edges even more than the spectre/meltdown issue.

Indeed. If you have some piece of code in a different security context that conditionally executes a heavy instruction based on a decision made over some sensitive data, doesn't this provide a way to obtain information about that data?

scottlamb · on Sept 7, 2018

> If you have some piece of code in a different security context that conditionally executes a heavy instruction based on a decision made over some sensitive data, doesn't this provide a way to obtain information about that data?

Yes. http://www.numberworld.org/blogs/2018_6_16_avx_spectre/

BeeOnRope · on Sept 7, 2018

Yes, this is call NetSpectre basically. Well NetSpectre describes two side channels, but the faster of the two is an AVX-clock/transition related side channel that relies on the CPU downclocking behavior.

Symmetry · on Sept 7, 2018

If that's the case then you're leaking a lot more information via the power draw and timing then you are via the clock speed.

dataflow · on Sept 7, 2018

Is this happening with AVX-512 a realistic scenario?

neiled · on Sept 7, 2018

Oh now I get the name of that TV show! I never realised it was the name of a joke opcode before, thanks :)

cestith · on Sept 7, 2018

Sometimes it's only partly a joke. https://en.wikipedia.org/wiki/Halt_and_Catch_Fire