Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

certainly optimal would be global, and I'm pretty sure its isomorphic to bin packing..whether you use a fixed call interface or not. so its solved, its just exponential


im not sure. given a hypothetical arch with only 1 free register, which local variable do you allocate to it?

  1 int
  2 main(int argc, char *argv[])
  3 {
  4         int sum = 0;
  5         int sum2 = 0;
  6
  7         if (argc >= 2)
  8                 for (int i = 0; i < argv[1]; i++)
  9                         sum += i;
 10
 11         if (argc >= 3)
 12                 for (int i = 0; i < argv[2]; i++)
 13                         sum2 += sum + i;
 14
 15         return 0;
 16 }


> given a hypothetical arch with only 1 free register, which local variable do you allocate to it?

A truly good compiler would use the same register for sum and sum2 and for argv[1] and argv[2] if it needs to, (even if they weren’t dead, as in this toy example)

With only one register, the answer probably will be “none or every single one”, depending on the instruction set, though.

For example, how do you index into arrays? Self modifying code as often is done on the 6502? Or do you have stack-relative indirect addressing that allows you to say “add the contents of the address pointed to by the value at stack pointer plus 6 to the register”? Either way, there will be lots of contention for that single register.

Or does this hypothetical architecture have various complex “add contents of addresses foo[register] and bar and store the result in baz[3] instructions? (Which would probably make it a very bad design. Adding a second register would be about the first thing to do (fun fact: the 4004 had 16 registers), but maybe this hypothetical design has RAM that’s as fast as registers?)


Oh, I want to play...

So, an optimizing compiler would see that pretty much everything is dead code it would assign 0 to the return register and done.


I changed the return stmt to "return sum2" so the for-loop calculations aren't optimized away and fed the code to godbolt.org (compiler explorer).

gcc-trunk at -O3 for x64 will vectorize the loops but there's no register pressure, so the register allocator wasn't taxed much.

No niche optimization pass to convert to using Gauss's shortcut - https://physicsdb.com/sum-natural-numbers/


x86-64-icx-latest and x86-64-clang-trunk use the Gauss shortcut.

wow. wonder if there's much use for that optimization pattern.

edit: clang discussion https://stackoverflow.com/questions/74417624/how-does-clang-...


oops im dumb, convert argv[1] and argv[2] to integers first.


youngun, this is why line numbers always increase by 10




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: