Disclaimer: I work on the GraalVM team. The students "measured noticeable reduct...

pwagland · on Sept 2, 2022

Note that the memory usage _could_ potentially be significantly improved for the JVM by just using an alternative allocator, such as jemalloc. In our system, we saw, in some instances, native memory usage decrease by about 60%, and it also resolved a slow "leak" that we saw, since glibc was allocating memory, and not returning it to the OS. In our case it was because we were opening a lot of class loaders, and hence zip files, from different threads.

xendo · on Sept 2, 2022

I can second what you wrote about jemalloc. Some internal services at Amazon are using it with solid outcomes. I also recommend trying out 5.3.0 version released earlier this year.

OniBait · on Sept 2, 2022

Last I did benchmarking, a vast majority of memory allocations were strings that were typically all dereferenced right away and cleaned up in GEN1 GC. I had contemplated whether string pooling would be useful or not but never got around to it. Would be interesting to see if you could get reduced memory usage and potentially better performance by decreasing pressure on the GC during the GEN1 phase.

(Side note: this was when I was co-maintaining MCPC so was typically with mods installed and they heavily use NBT which I suspect is where a lot of that string allocation was happening.)

bitcharmer · on Sept 2, 2022

This is very interesting. Could you share more details on this particular issue in glibc? Jar files get mapped so I'm really interested where glibc failed to release memory.

xendo · on Sept 2, 2022

No the OP, but we had similar issue — our service was leaking when allocating native memory using JNI. We onboarded Jemalloc as it has better debugging capabilities, but the leak dissapeared and performance improved. We never got around to root causing original leak.

Birch-san · on Sept 3, 2022

It's probably the same thing prestodb encountered: https://github.com/prestodb/presto/issues/8993

Birch-san · on Sept 3, 2022

For performance reasons, glibc may not return freed memory to the OS. You can increase the incentive for it to do so, by reducing MALLOC_ARENA_MAX to 2. https://github.com/prestodb/presto/issues/8993

cogman10 · on Sept 2, 2022

I was under the impression that most builds of the JVM used jemalloc by default.

solarkraft · on Sept 2, 2022

Why is this? I thought the JVM already did somewhat decent JIT compilation ...

If I understand the article correctly, you're preempting all possibly unoptimized/expensive code paths (reflection) by attempting to literally execute all of them? While it's a cool experiment, isn't it a bit error-prone (besides being a lot of effort of course, but playing Minecraft on the side does sound pretty fun!)?

kaba0 · on Sept 2, 2022

The JVM is likely to beat AOT compiled java code in almost all cases - but due to Graal having a closed-world assumption (e.g. no unknown class can be loaded, so a non-final class knows that it won’t be overridden allowing for better optimizations, limited reflection allows for storing less metadata on classes, etc) it does allow for significant memory reduction. Also, escape analysis is easier in an offline manner.

pvillano · on Sept 3, 2022

can't that all be done speculatively with de-optimization /s

fniephaus · on Sept 2, 2022

JIT compilation requires additional CPU and memory resources at run-time, which AOT compilation can avoid. This also means that for a native executable, the compilation work only needs to be done once at build-time and not per process.

bitcharmer · on Sept 2, 2022

This is the first time I see someone bring up extra cpu and memory usage as a downside of JIT. It might matter in the embedded world but it's Java we're talking about so the cost is minuscule compared to what you're getting for it.

salmo · on Sept 3, 2022

You’re not wrong, but it is funny how we got here from Gosling’s Oak addressing set top boxes.

The thing was built to address the burgeoning embedded w/ a little horsepower market with its variety of hardware and OSes.

Now it runs Enterprise server software… and Minecraft.

kaba0 · on Sept 3, 2022

Well, it does make sense - a controlled runtime failure is much better than a segfault, or worse, a silent failure corrupting heap. Pair it with decent performance even back than, increased developer productivity and the best observability tools, which is again helped by the VM-semantics.

cogman10 · on Sept 2, 2022

Those are usually pretty trivial as they are judiciously handed out based on hot code paths by the JVM.

There are certainly pathological cases where it could cause major issues.

AOT suffers from not having runtime information, so anything involving dynamic dispatch (which is REALLY heavily used in java) will be a lot harder to optimize. JITs get to cheat because they know that the `void foo(Collection bar)` method is always or usually called with an `ArrayList`. PGO is the AOT world's answer to this problem, but it generally explodes build times and requires real world usage.

In java land, there's also the option of "AppCDS" which can cut down a large portion of that compilation time between processes.

doikor · on Sept 2, 2022

GraalVM does have a better optimizer in certain conditions than C2 in vanilla JDK which can can lead to better performance. Basically the only way to know if GraalVM will give you better performance or not is to try it and/or run benchmark your code.

https://www.graalvm.org/22.2/examples/java-performance-examp...

zamalek · on Sept 2, 2022

Is there any benefit to simply running/JIT the client and server on GraalVM instead of JVM?