My favorite Chen article: Why do Windows functions all begin with a pointless MO...

omnibrain · on July 25, 2022

It's such a shame that they killed (twice now?) all the comments in the blog.

mananaysiempre · on July 25, 2022

There is a more-or-less comprehensive archive[1] up to 2019 (which should probably be scraped and hoarded). The article in question is there, comments included[2].

Side note: Michael Kaplan’s blog, which Microsoft took down in a remarkably shameful manner, has also been archived[3], while Eric Lippert reposted (most of?) his old articles on his personal WordPress instance[4].

[1] http://bytepointer.com/resources/old_new_thing/index.htm

[2] http://bytepointer.com/resources/old_new_thing/20110921_226_...

[3] http://archives.miloush.net/michkap/archive/

[4] https://ericlippert.com/

omnibrain · on July 25, 2022

Larry Osterman's Blog seems to be gone too...

mananaysiempre · on July 26, 2022

It’s still there[1], although a number of the posts seem to be missing. Inexplicably, (most of) the comments are in place. (There are other Microsoft blogs that have disappeared without a trace, but not this one.)

[1] https://docs.microsoft.com/en-us/archive/blogs/larryosterman...

stevekemp · on July 26, 2022

I haven't heard that name in a few years, but I remember quite vividly that I found his blog by searching for "+int03 +blog".

At the time that seemed like a good way of getting low-level code-related posts.

game-of-throws · on July 25, 2022

They also changed the URL at least once and broke all links to it. His posts deserve better.

password4321 · on July 25, 2022

Yes, I miss my weekly-ish from Raymond's arch nemesis Yuhong Bao.

For this specific blog post there's an HN discussion¹ and archive.today² (beware!?³) did grab comments while the Internet Archive does not load them:

¹ https://news.ycombinator.com/item?id=3022224

² http://archive.today/2015.07.08-134638/http://blogs.msdn.com...

³ https://news.ycombinator.com/item?id=31945924#31946453

m463 · on July 25, 2022

nice use of superscripts

Waterluvian · on July 26, 2022

Would this instruction be optimized away by… I dunno, out of my league here… the CPU microcode or CPU design itself?

Or is it that when you get to assembly-level instructions, you can have total confidence that everything you’re reading will be executed as-is ?

mFixman · on July 25, 2022

I don't understand the reasoning behind this. Why do you need 5 bytes of unexecuted patch space before the program _and_ 2 bytes of patch space at the beginning of the program?

Wouldn't it be the same to have a single 5-byte effectless operation to patch a single long jump instead of needing space for two jumps?

layer8 · on July 25, 2022

The article explains why a MOV is used instead of two NOPs. Five NOPs would obviously be even worse.

colejohnson66 · on July 25, 2022

But NOPs aren’t even executed. They’re swallowed by either the decoder or dispatcher.

avianlyric · on July 25, 2022

Because you can’t atomically replace the NOPs. So there’s nothing to prevent you from inserting your patch while a thread partway through consuming the NOPs, resulting in a portion of your patch being decoded out of order.

wvenable · on July 25, 2022

The article states that it's one cycle and slot per NOP.

colejohnson66 · on July 25, 2022

Modern x86 processors decode multiple instructions per clock. By “slots”, I’m assuming he means entries in the dispatcher or reservation stations. But NOPs don’t even make it to there. As I said, the decoder that encounters it will probably swallow it and emit nothing.

Besides, it sounds like premature optimization. This isn’t the 1980s; An extra clock cycle per function call is not going to make or break your program.

MBCook · on July 25, 2022

Modern.

There is a very good chance this dates back to 16-bit Windows. Even Windows 98 supported the 486 which was not capable of independent execution (that’s P5) or separate decode from execution (P5Pro there).

Those processors weren’t dead until Windows XP.

wvenable · on July 25, 2022

At the time this was relevant, it wouldn't have been premature optimization. Reducing that many cycles per function call would be a reasonable win.

mFixman · on July 25, 2022

But isn't there a 5-byte single instruction that has no effect, like `NOP DWORD ptr [EAX + EAX*1 + 00H]`?

I thought that multibyte NOPs were executed in a single instruction?

MBCook · on July 25, 2022

They may not have been coalesced at the time the decision was made.

layer8 · on July 25, 2022

I’m pretty sure it would be slower, if only by taking up more space in the instruction cache (in the common case where no hotpatch is applied).

saagarjha · on July 25, 2022

5 bytes of nops takes longer than 2 bytes?

jahewson · on July 25, 2022

Longer time to execute.