I am surprised at number 3, I would have thought that it would be optimised to x...

a1k0n · on April 7, 2013

Not really; there's no rounding problem with x << 1. x * 2, x+x, x<<1 are all equivalent. The real reason it doesn't use the shift instruction on x86 is because it takes 3 bytes to encode the instruction whereas adding a register to itself takes two. But like it says, for powerpc it does use a shift as there's no difference in instruction lengths on a RISC.

random aside: gcc usually uses the LEA instruction for this sort of thing, which lets you compute any combination of {1,2,4,8} * x + y + n in one cycle where x and y are registers and n is a constant number. So even with the LEA instruction you can use either lea eax, [eax*2] or lea eax, [eax+eax] to compute this. And so on. I think add eax, eax is most likely what it does though.

acqq · on April 7, 2013

Compilers for x86 prefer adds to shifts because there are processors (x86 exists for so long) on which shift costs more cycles. LEAs are always fast as they are needed for always used addressings.

gsg · on April 7, 2013

x86 has encodings for shifts by 1 that are two bytes, so there's no code size advantage.

mpyne · on April 7, 2013

Sometimes there are "canonical forms" for these operations depending on what chip is being targeted, where the hardware can automatically break data dependencies and improve hardware-level parallelism, as long as the instruction is encoded in the right form.

I don't know that this is necessarily the reason here, but it's one possible explanation.