It seems to me though, that the Java numbers aren't representative. As far as I know, JVM benchmarks should allow HotSpot etc to optimise the functions by calling them ~1k times before the actual benchmark happens. That doesn't seem to be the case in the code.
The same function is being called over and over millions of times, so hopefully that should give HotSpot a chance to kick in. I didn't want to add artificial warm up as I don't think that's representative of real code.