It's also not immediately obvious to me that pCurEvalStack is initialized.
Having a look at the architecture optimization manual re: the redundant loads; it's not immediately obvious to me that modern processors won't handle this fine (http://www.intel.com/Assets/PDF/manual/248966.pdf).
And also the nice thing about the BINARY_OP() macro is that it can take types and the operator as arguments (e.g. BINARY_OP(I32, I32, I32, +), BINARY_OP(I64, I64, I32, <<) ) meaning it can be used for many operations on many types. I would have to write seperate functions for each case if using in inline function.
And pCurEvalStack is initialised in the LOAD_METHOD_STATE macro, referenced first on line 590, before interpretation begins.
Things that the function contains that might disable optimization: a lot of gotos, inline assembly in GO_NEXT (this does affect optimization: http://msdn.microsoft.com/en-us/library/5hd5ywk0.aspx)...
It's also not immediately obvious to me that pCurEvalStack is initialized.
Having a look at the architecture optimization manual re: the redundant loads; it's not immediately obvious to me that modern processors won't handle this fine (http://www.intel.com/Assets/PDF/manual/248966.pdf).