On a modern pipelined CPU with separate instruction/data caches self-modifying code does have a rather large penalty since it has to flush the pipeline and the caches, but on the original 8088 PC which has no cache and no pipeline (there's only a 4-byte prefetch queue), the penalty is much smaller.