I would think that the fourth nop is eligible only at cycle 1 because only four uops can start each cycle, but then the fifth nop is also eligible at cycle 4. What's different about the fourth nop from the third and fifth?
You are not being thick. It was a typo on my part, I intended for all the nops to have 0 for their eligible column.
As you point out, putting them all at 0 isn't even really accurate because only 4 can allocate (rename) a cycle, so the true eligible pattern should be 0-0-0-0-1-1-1-1-2-2-2-2 or something like that (it's more complicated than than because the mov/add also needs to allocate, so it's rp). I'm basically ignoring that assuming the front-end (including allocation) can keep up, or basically that the scheduler is infinitely large and always totally full.
I am going to fix this to be more accurate but I'll have to unroll the loop a bit more for the difference between retire and eligible to make sense.
BTW, this is just a few simple version of what llvm-mca -timeline view does:
Timeline view simulates a simple CPU pipeline, and with -dispatch=4 -mcpu=Skylake it's pretty close to Skylake. Unfortunately, llcm-mca models "unlimited retire bandwidth" so if 100 instructions all become eligible in the same cycle, it will retire all 100. So it's not too helpful here in any cases where it would retire more than 4 per cycle (but it works for this example because that doesn't happen due to the dep chains).
I didn't find any argument you can pass to limit retire bandwidth. In practice this is not a problem for most performance-oriented investigation because the retire limit isn't really a bottleneck, so reported performance from the tool is likely to be the same even though it simulates unlimited bandwidth.
I fixed the typo, but also tried to improve the way cycles are accounted, adding two columns in support of that. If you have a moment can you check out:
Sorry if I'm being thick, but I'm still having trouble seeing where the pattern for nops is coming from, in this example from the post:
I would think that the fourth nop is eligible only at cycle 1 because only four uops can start each cycle, but then the fifth nop is also eligible at cycle 4. What's different about the fourth nop from the third and fifth?