> Is there anything stopping MS from copying Apple's approach? One thing I don't...

jlokier · on Nov 27, 2020

It's probably doable because VMware did something very similar about 20 years ago, back before x86 had virtualisation hardware.

That was x86 on x86, but it took a similar level of sophistication then as with cross-architecture "software virtualisation" now to run all 32-bit Windows software, as well as all 16-bit software, other OSes such as Linux, BSD and MS-DOS and Netware, etc.

Considering all the things around like self-modifying code, generated code, thunks etc in Windows and other OSes and applications, VMware did a great job of scanning all code for entry points, intercepting every path that couldn't be executed directly, patching what needed to be patched.

I expect they can revive that approach with cross-architecture translation now, and I would be very surprised if VMware and Parallels are not working on exactly this for x86-on-ARM VMs right now.

Cu3PO42 · on Nov 27, 2020

I understand that Rosetta 2 uses static recompilation "where possible" and only falls back to dynamic when necessary.

This is purely speculative but I would assume that it statically recompiles the __TEXT segment of binaries or maybe whatever it can easily figure out is code by following all possible jumps, etc.

Then, whenever a page is mapped as executable they could then check if this is part of a memory mapped binary which is statically recompiled or otherwise recompile the code in the page and jump to the now native version.

That way JITs should work without fully falling back to interpreting instructions. Things would get tricky when you map some memory both writable and executable, but maybe they're pushing the limits on what the spec allows for instruction caches? I'd have to check my x86 reference manual. Or maybe this is another area where they have some hardware primitive to facilitate a partial recompilation.

inkyoto · on Nov 28, 2020

Static translation is, in fact, the simplest to accomplish, and it is time proven as well. This is what IBM have been doing with AS/400 for more than three decades. AS/400 runs on a virtual ISA, and applications are alwyas compiled into the virtual instruction set. Pointers, for instance, have been 128-bit since day 1 (1989 circa). Upon the first startup, virtual instructions are compiled into real CPU instructions, and the recompiled application image is stashed away in a cache. Consequent application restarts use the cached application image. IBM have been using POWER CPU's for the last few generations of AS/400, but they have also used drastically different CPU designs in the past as well. This allows to run applications that had been written in 1991 today, for example.

Dynamic binary translation (a predecessor of modern JIT) is a completely different story, though.

Now, a few years back, Apple introduced iOS app uploads in Bitcode. Bitcode is the LLVM's own intermediate representation of the abstracted LLVM ISA. Apps available in Bitcode, when downloaded, are statically compiled into the ISA of the actual CPU one's iPhone has. Again, apps uploaded in Bitcode in 2018, will run fast and effeciently on iPhone 18 as well. I could immediately draw parallels with AS/400 a few years back when Apple announced the Bitcode availability.

I could not quickly check whether Bitcode was available for OS X apps today as well, but it would be hard to imagine why Apple would not want to extend the approach to the OS X apps as well. This is where it can get really interesting as apps available through the OS X app store in Bitcode, could be developed and compiled on a device running on the Intel ISA, but they could also be statically translated into M1 (M2, M3 etc) ISA at the download time and, at least theoretically, they would not require a separate ARM / M1 app image (or a universal binary for that matter). Unless, of course, the app leans on something x86 specific.

I am on the hunch that the real disruption that Apple has inflicted upon the industry is not just a stupendously fast and power efficient chip (it is very nice but secondary), but the concept of the demise of the importance of hardware ISA's in general – use the intermediate code representation for app binary images and use whatever CPU / hardware ISA that allows to realise the product vision. It is ARM today but it can be anything else 10 years down the road. I wonder who can push the mainstream PC industry in a similar direction and successfully execute it, though.

aardvark179 · on Nov 28, 2020

Bitcode is an IR, but it is not architecture independent, and the code will have been compiled to a particular calling convention. Bitcode may allow Apple to apply different optimization passes for iPhone processors, but it’s not going to be hugely useful for Rosetta.

inkyoto · on Nov 29, 2020

Bitcode is the bitstream file format used to encode the LLVM IR. LLVM IR itself is architecture independent (unless the code uses inline assembly, which is architecture specific): https://llvm.org/docs/LangRef.html

clang compilers (or other LLVM based language/compiler frontends) generate the LLVM IR, apply optimisation passes and run the LLVM IR through the architecture specific backend that transforms the LLVM IR into the underlying hardware ISA (or into a target ISA when cross-compiling). The original meaning of LLVM was «Low Level Virtual Machine». It is a widespread approach found in many cross-platform / portable compilers, e.g. GNU has RTL.

Here is an example of the «Hello world» using the LLVM IR; there is nothing architecture specific about it:

------

@.str = private unnamed_addr constant [13 x i8] c"Hello World\0A\00", align 1

; Function Attrs: nounwind ssp uwtable

define i32 @main() #0 {

  %1 = alloca i32, align 4

  store i32 0, i32* %1

  %2 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([13 x i8]* @.str, i32 0, i32 0))

  ret i32 0

}

declare i32 @printf(i8*, ...) #1

------

Seamless tranlastion of Bitcode into an Intel or ARM ISA actually works, too: https://www.highcaffeinecontent.com/blog/20190518-Translatin...

Lastly, Bitcode and Rosetta are mutually exclusive. If the app is available in Bitcode, it is run through an optimising hardware ISA backend at the download time and then runs always natively, whereas Rosetta runs existing x86 apps through the static binary translation into ARM first (what Apple refer to as AOT - ahead of the time) and likely uses JIT when the recompiled application is run.

mhh__ · on Nov 28, 2020

Microsoft could easily afford to spend a hundred million dollars on making it work. Bringing up their own processor would be difficult to that degree, but now apple have done it they should be able to do it.

Interestingly, ARM to X86 is probably easier because ARM has a machine readable specification (I don't think Intel or AMD have an equivalent)

thatfrenchguy · on Nov 27, 2020

Technically Rosetta 2 can run 32-bit Windows applications made since Windows 95 as well with CrossOver :)