This reminds me of the old Motorola M68K dbcc instruction, which was a specialized looping instruction where you could loop back only a small number of instructions which would all be cached tightly in a tiny cache that existed only for the dbcc case.
Are there modern analogues to that dbcc instruction?
dbcc had a specific semantic that you could only branch back a very small amount and that the CPU would cache the body of the loop. A modern equivalent would use a small amount of cache that is not even L1.
Are there modern analogues to that dbcc instruction?