Huuuiuuge caches, and optimisations for using them as efficiently as possible: probably async prefetch, possibly a semblance of maskable cache, very likely reordering for cache missing ops.
Here, the prehistoric nature of x86 actually drags it down because of poor predictability
x86-64 is decently predictable, probably the same as armv8. i386 wouldn't be because it has to spill to memory so much you can barely manage to rename those registers.
Here, the prehistoric nature of x86 actually drags it down because of poor predictability