Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Performance can be approximately measured by instructions-per-clock * clock-rate.

But the GHz number on a chip only tells you its clock rate. So if you focus on this, you're missing a crucial aspect of performance (the instructions-per-clock part).

Chip designers can try to increase the clock rate to some insane number, but when they do that, it makes each clock cycle very short, leaving very little time to do productive work.

At any rate, within a particular architecture it's potentially meaningful to look at clock rate. You can expect a 2GHz Pentium 4 to be faster than a 1.5 GHz Pentium 4. It won't be a whole 25% faster, because, for instance, if the processor is waiting for data from RAM, it doesn't matter how fast its clock is ticking, the RAM chips are going to take the same amount of time to get the data that the CPU is waiting for.

But when you talk about completely different designs, e.g., Intel versus AMD, Intel versus Apple, etc., clock speed becomes really and truly meaningless. See for instance the "Comparing" section of https://en.wikipedia.org/wiki/Clock_rate



I don't believe that Apple invented something that Intel or AMD did not, at least not in 2x scale. Single-core geekbench score for i7-8700K is around 10000. It's around 5000 for iPhones which exactly reflects 5GHz vs 2.5GHz. When you're talking about cutting-edge processors, the only difference is clock speed and core count. And while core count could be increased relatively easy, AMD example shows, that it's not easy to approach higher clock speed.


Clock speed and core count are factors.

But your actual performance is really more along the lines of clock speed times instructions per cycle. For example, a 5.0 GHz CPU that only achieves 1 instruction per cycle only does half the work of a 2.0 GHz CPU that achieves 5 instructions per cycle.

Some factors that deeply influence achievable IPC include at least (1) the cache sizes and layout, number of load/store pipes, prefetching relevance and timeliness, and overall memory system design, (2) the kinds, number, and capabilities of the execution units and the quality of the scheduler for these units, and (3) the front-end's ability to correctly predict branches, speed of recovery from mispredicts, and general ability to keep feeding the back-end of the machine.

Beyond that you really need to multiply in the amount of actual work done per instruction. A factor here is the instruction set and the ability of the compiler to utilize it to effectively. For instance, a "single" vector instruction may do an amount of computation similar to 4, 8, 16, or even 32 conventional instructions. Code that uses these instructions may get a massive speedup on CPUs that have enough execution resources to execute many or all "lanes" of the instruction in parallel...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: