The students "measured noticeable reductions in terms of memory footprint of up to 43%" [1] in some preliminary experiments. More from the accompanying blog post:
"We also hope that the Minecraft community builds on our work and helps benchmark different configurations for native Minecraft servers in more detail and in larger settings."
Please feel free to share any numbers on CPU/memory usage with us!
Note that the memory usage _could_ potentially be significantly improved for the JVM by just using an alternative allocator, such as jemalloc. In our system, we saw, in some instances, native memory usage decrease by about 60%, and it also resolved a slow "leak" that we saw, since glibc was allocating memory, and not returning it to the OS. In our case it was because we were opening a lot of class loaders, and hence zip files, from different threads.
I can second what you wrote about jemalloc. Some internal services at Amazon are using it with solid outcomes. I also recommend trying out 5.3.0 version released earlier this year.
Last I did benchmarking, a vast majority of memory allocations were strings that were typically all dereferenced right away and cleaned up in GEN1 GC. I had contemplated whether string pooling would be useful or not but never got around to it. Would be interesting to see if you could get reduced memory usage and potentially better performance by decreasing pressure on the GC during the GEN1 phase.
(Side note: this was when I was co-maintaining MCPC so was typically with mods installed and they heavily use NBT which I suspect is where a lot of that string allocation was happening.)
This is very interesting. Could you share more details on this particular issue in glibc? Jar files get mapped so I'm really interested where glibc failed to release memory.
No the OP, but we had similar issue — our service was leaking when allocating native memory using JNI. We onboarded Jemalloc as it has better debugging capabilities, but the leak dissapeared and performance improved. We never got around to root causing original leak.
For performance reasons, glibc may not return freed memory to the OS. You can increase the incentive for it to do so, by reducing MALLOC_ARENA_MAX to 2.
https://github.com/prestodb/presto/issues/8993
Why is this? I thought the JVM already did somewhat decent JIT compilation ...
If I understand the article correctly, you're preempting all possibly unoptimized/expensive code paths (reflection) by attempting to literally execute all of them? While it's a cool experiment, isn't it a bit error-prone (besides being a lot of effort of course, but playing Minecraft on the side does sound pretty fun!)?
The JVM is likely to beat AOT compiled java code in almost all cases - but due to Graal having a closed-world assumption (e.g. no unknown class can be loaded, so a non-final class knows that it won’t be overridden allowing for better optimizations, limited reflection allows for storing less metadata on classes, etc) it does allow for significant memory reduction. Also, escape analysis is easier in an offline manner.
JIT compilation requires additional CPU and memory resources at run-time, which AOT compilation can avoid. This also means that for a native executable, the compilation work only needs to be done once at build-time and not per process.
This is the first time I see someone bring up extra cpu and memory usage as a downside of JIT. It might matter in the embedded world but it's Java we're talking about so the cost is minuscule compared to what you're getting for it.
Well, it does make sense - a controlled runtime failure is much better than a segfault, or worse, a silent failure corrupting heap. Pair it with decent performance even back than, increased developer productivity and the best observability tools, which is again helped by the VM-semantics.
Those are usually pretty trivial as they are judiciously handed out based on hot code paths by the JVM.
There are certainly pathological cases where it could cause major issues.
AOT suffers from not having runtime information, so anything involving dynamic dispatch (which is REALLY heavily used in java) will be a lot harder to optimize. JITs get to cheat because they know that the `void foo(Collection bar)` method is always or usually called with an `ArrayList`. PGO is the AOT world's answer to this problem, but it generally explodes build times and requires real world usage.
In java land, there's also the option of "AppCDS" which can cut down a large portion of that compilation time between processes.
GraalVM does have a better optimizer in certain conditions than C2 in vanilla JDK which can can lead to better performance. Basically the only way to know if GraalVM will give you better performance or not is to try it and/or run benchmark your code.
The students "measured noticeable reductions in terms of memory footprint of up to 43%" [1] in some preliminary experiments. More from the accompanying blog post:
"We also hope that the Minecraft community builds on our work and helps benchmark different configurations for native Minecraft servers in more detail and in larger settings."
Please feel free to share any numbers on CPU/memory usage with us!
[1] https://medium.com/graalvm/native-minecraft-servers-with-gra...