The students "measured noticeable reductions in terms of memory footprint of up to 43%" [1] in some preliminary experiments. More from the accompanying blog post:
"We also hope that the Minecraft community builds on our work and helps benchmark different configurations for native Minecraft servers in more detail and in larger settings."
Please feel free to share any numbers on CPU/memory usage with us!
Note that the memory usage _could_ potentially be significantly improved for the JVM by just using an alternative allocator, such as jemalloc. In our system, we saw, in some instances, native memory usage decrease by about 60%, and it also resolved a slow "leak" that we saw, since glibc was allocating memory, and not returning it to the OS. In our case it was because we were opening a lot of class loaders, and hence zip files, from different threads.
I can second what you wrote about jemalloc. Some internal services at Amazon are using it with solid outcomes. I also recommend trying out 5.3.0 version released earlier this year.
Last I did benchmarking, a vast majority of memory allocations were strings that were typically all dereferenced right away and cleaned up in GEN1 GC. I had contemplated whether string pooling would be useful or not but never got around to it. Would be interesting to see if you could get reduced memory usage and potentially better performance by decreasing pressure on the GC during the GEN1 phase.
(Side note: this was when I was co-maintaining MCPC so was typically with mods installed and they heavily use NBT which I suspect is where a lot of that string allocation was happening.)
This is very interesting. Could you share more details on this particular issue in glibc? Jar files get mapped so I'm really interested where glibc failed to release memory.
No the OP, but we had similar issue — our service was leaking when allocating native memory using JNI. We onboarded Jemalloc as it has better debugging capabilities, but the leak dissapeared and performance improved. We never got around to root causing original leak.
For performance reasons, glibc may not return freed memory to the OS. You can increase the incentive for it to do so, by reducing MALLOC_ARENA_MAX to 2.
https://github.com/prestodb/presto/issues/8993
Why is this? I thought the JVM already did somewhat decent JIT compilation ...
If I understand the article correctly, you're preempting all possibly unoptimized/expensive code paths (reflection) by attempting to literally execute all of them? While it's a cool experiment, isn't it a bit error-prone (besides being a lot of effort of course, but playing Minecraft on the side does sound pretty fun!)?
The JVM is likely to beat AOT compiled java code in almost all cases - but due to Graal having a closed-world assumption (e.g. no unknown class can be loaded, so a non-final class knows that it won’t be overridden allowing for better optimizations, limited reflection allows for storing less metadata on classes, etc) it does allow for significant memory reduction. Also, escape analysis is easier in an offline manner.
JIT compilation requires additional CPU and memory resources at run-time, which AOT compilation can avoid. This also means that for a native executable, the compilation work only needs to be done once at build-time and not per process.
This is the first time I see someone bring up extra cpu and memory usage as a downside of JIT. It might matter in the embedded world but it's Java we're talking about so the cost is minuscule compared to what you're getting for it.
Well, it does make sense - a controlled runtime failure is much better than a segfault, or worse, a silent failure corrupting heap. Pair it with decent performance even back than, increased developer productivity and the best observability tools, which is again helped by the VM-semantics.
Those are usually pretty trivial as they are judiciously handed out based on hot code paths by the JVM.
There are certainly pathological cases where it could cause major issues.
AOT suffers from not having runtime information, so anything involving dynamic dispatch (which is REALLY heavily used in java) will be a lot harder to optimize. JITs get to cheat because they know that the `void foo(Collection bar)` method is always or usually called with an `ArrayList`. PGO is the AOT world's answer to this problem, but it generally explodes build times and requires real world usage.
In java land, there's also the option of "AppCDS" which can cut down a large portion of that compilation time between processes.
GraalVM does have a better optimizer in certain conditions than C2 in vanilla JDK which can can lead to better performance. Basically the only way to know if GraalVM will give you better performance or not is to try it and/or run benchmark your code.
It is much worse than this because the free version of graalvm only supports the serial garbage collector. Minecraft servers and clients should be using ZGC to get rid of garbage collection pauses.
Yes it is. Developing any short term job -- that runs multiple seconds and goes away -- like lambda or k8s jobs with Java is meaningless for exactly this reason. The startup time is longer than the run time.
I guess it can be in a specific case: minigames servers (such as Hypixel), which are just a bunch of servers "connected" together. Players start into a "lobby" server, where they can choose a minigame, and are then sent to another server where they spend a few minutes.
The game servers don't restart after the end of a round, though, do they? I'd imagine they kick the players back to the lobby, reset the in-server game, and then tell the lobby to send the next batch of players.
You assume that load is constant, it isn't. And load varies not only with amount of players on minigames server, but with changes in distribution of players between minigames also.
There's usually more than one server per minigame. You could see it in the url you were redirected to; they had more servers running the more popular minigames.
Each minigame has a player limit, so the maximum load on any given minigame server is known (within the bounds of the minecraft sub-superset that makes up that minigame- but usually the minigames are deliberately limited/bounded in how much computation they need, as opposed to vanilla Minecraft). Extra players get sent to the next available server. If there's consistent overflow, at that point you might turn on a whole new server, or change a server's gamemode (I don't know to what degree Hypixel actually did/does this, or how often it's actually necessary).
The VM starts up plenty fast. The slow part is when people use reflective dependency injection containers that take seconds to scan the class path before before executing.
I see you aren't familiar with modern state of Minecraft servers. Due to Minecraft being limited to a one core big servers actually aren't a single instance. They use proxy servers(such as BungeeCord and it's forks) which distributes load between several lobby servers and from there people join one of custom gamemodes(Skyblock, Bedwars, etc). This allows for tens of thousands of people to play simultaneously, but not in the single world, while SMP(Survival Multiplayer) servers can run couple of hundreds at most. These giant servers are heavily containerized and automatically scale under load, so spinning up and shutting down servers is a pretty normal thing. And there have been some attempts to make Minecraft to run a single world on multiple instances(MultiPaper and some private ones), so even for usual SMP server it can be a commonplace soon as player join and leave.
> And there have been some attempts to make Minecraft to run a single world on multiple instances(MultiPaper and some private ones)
First time I hear about MultiPaper, another idea I had which I din't know someone was already working on LOL. It's a pretty promising idea considering the current performance problems of the game. This could possibly allow thousands of players in the same server which would be AMAZING, almost a completely different game. Imagine if MultiPaper was compiled to native using GraalVM.
I don't think this repo provides any value. It compiles only Vanilla server, doesn't provide any benchmarks while spending whole paragraph on GraalVM Enterprise and Oracle Cloud(a single worst cloud experience I've ever had, it took me two dozen attempts to register until I finally gave up) Free Tier promotion
"Cuberite is a Minecraft-compatible multiplayer game server that is written in C++ and designed to be efficient with memory and CPU"
Cuberite has been demoed running on old ARM Android phones and hosting multiple players off it at once. Its performance absolutely annihilates the Java based 'vanilla' server
Can the same trick be used with the java client? My son runs minetest on the raspberry pi 400 as minecraft is to slow. I'll do everything for a bit more fps.
Check out the sodium mod [1], if you haven't already. I've had great success eeking out a few more precious frames with it on older hardware. IIRC, it works on both x86 and ARM processors.
Native compilation usually makes things a little slower, not faster. Using the closed-source Enterprise version, and using PGO gets it back to around the same speed as the VM version I believe currently.
Also, there are mods for the Java server which allow both Java and Bedrock clients to connect to the same server and play together. I don't know the details, but I have played in a server which used these mods.
> Also, there are mods for the Java server which allows both Java and Bedrock clients to connect to the same server and play together.
This is correct. I am running a vanilla SMP for my son, and he plays primarily on the switch. I use a Java server running Fabric and Geyser/Floodgate in order to allow his switch to connect to the server. Everything runs smoothly, so far.
> Also, there are mods for the Java server which allow both Java and Bedrock clients to connect to the same server and play together.
How exactly does that work? Afaik there are quite a few behavioral differences between the two, especially for technical things like redstone and pistons.
Most of these behavioral differences are in the server. So what happens, is that it behaves as if you were playing the Java edition, even when using a Bedrock client.
This is misleading, vanilla Bedrock edition allows for a bigger render distance but has a much smaller simulation distance. There's a whole miriad of differences that they're not at all in feature parity.
Hit Shift + F3 to see a frame time breakdown, then you can determine if it is slow graphics or cpu. If it is CPU, maybe graal helps, but it's hard to tell upfront.
Also check out some mods dedicated to improving performance like Sodium.
I don't think the student looked into that at all, but I guess it depends on what the Java client uses for drawing. GraalVM Native Image currently doesn't support AWT on Linux/JDK17+, but we are working on fixing that soon.
I've always had some questions about graalvm so I'd like to hijack this thread, forgive the out of topic comment please!
I've got a number is spring web applications from which I create an uberjar (jar file with all dependencies) and run them in a Centos server using something like java -jar server.jar (it's a little more complex than this but you get the idea).
Would I be able to use graalvm to create native binaries from these jars? Is there some kind of tutorial describing the procedure?
Is this possible without a license/paying big money?
Finally, is this worth it? Will the apps become any faster?
Spring Boot 3 is expected to support native/graal. There is a milestone release I think.
There is a Graal Community Edition, which is free.
Search for graal and spring pet clinic demo, you will likely find an article reducing startup time 100x (starting pet clinic in 15ms), and reducing memory 2-3x.
I don't know about 'faster', but in my experience most spring applications are RAM bound, not CPU bound. So the native binaries can result in scaling back to smaller and cheaper cloud instances, or smaller VMs. Imagine halving your monthly cloud instance bill, if you are looking for 'worth it'.
If you want to play with a framework where the native part works pretty okay, and still be able to use your dependent injection and dependencies, have a look at Quarkus. They even have some spring 'polyfills'.
I think right now this Isn’t possible with “normal” Spring because Spring and various other libraries you’ll normally use make heavy use of reflection.
Frameworks like Quarkus and Micronaut have been written with native in mind and I think Spring is also working on it (Spring Native).
You would likely not be able to turn them to native binaries without a ton of work — spring uses reflection very heavily, so you would have to list every class that would get reflectively checked (including spring internals).
There is spring native that will solve it for the most part, but I’m not sure how hard it is to change an existing spring web app to that.
GraalVM has a community edition, which is free, I’m not sure about the license.
And it is likely not worth it, performance will likely be worth, but memory usage and startup speed will decrease. It can be worth it for command line apps or some tiny microservice that is mostly idle.
Thank you for the information! Doing some research myself I found some things about the license and integration with spring here: https://www.graalvm.org/faq/ ; it seems that no license is needed for graalvm!
Is Graal VM a silver bullet?
Ignoring startup times, will Graal VM out perform classic JVM (IBM/Oracle etc').
I guess the optimization of the classic JVM are hard to beat.
Also, cross compile is not working with Graal VM (which makes it harder to deploy than a good old Jar file).
Startup times (especially for 'on demand' cloud workloads) are kind of the point of GraalVM. Effectively, it shifts optimisation to the compile phase. GraalVM build take much more time than classic Java. But they run a bit faster (on some workloads dramatically) and use less memory.
It's no silver bullet for development, if you want fast turnaround after changing your code you want the classic JVM. GraalVM can help to cut your production load a bit (although Oracle seems to keep the heavy performance gains behind for their licensed GraalVM enterprise customers)
> But they run a bit faster (on some workloads dramatically)
That’s not true. For the majority of applications the JIT compiler will be much faster (either Graal’s JIT compiler or Hotspot). Startup time, and memory reduction is true though for AOT.
GraalVM is multiple projects and I feel there is often a bit of a mix-up around these:
GraalVM is first and foremost a JIT compiler written in Java that can be plugged into OpenJDK. Due to it being written in a higher level language than the original Hotspot compilers (written in C++) they are easier to write/maintain/experiment with. This mode of operation is used extensively by Twitter for example, because on their workloads it provides better performance than Hotspot, but the two trades blows in general. But this uses the standard javac compiler so it is basically just a slightly different JVM implementation.
Since a JIT compiler outputs machine code it can be “easily” modified to do so in an offline setting as well — this is Graal’s AOT/native compilation mode. This will take a long time compared to some other compilers (I don’t exactly know the reason for that, probably Java’s dynamic nature requiring more wide-reaching analysis?), but will have lower memory usage and faster startup speed compared to the traditional execution mode (but rarely better performance).
There is also Truffle, which turns “naive” language interpreters into efficient JIT compiled runtimes and allowing polyglot execution, which is a whole other dimension.
I use GraalVM as my standard non-native JDK (OpenJDK replacement) and I'd say the performance is somewhat better.
There are a lot of non-biased benchmarks you can find online, most of them showing that Graal (both CE/EE, though particularly EE) are more performant than OpenJDK.
You then also have the option to compile to native, or to embed/run code in other languages baked in.
It usually needs a bit longer warmup period in my experience. But for long-running processes it can be ideal, Twitter for example uses it for quite some time in production.
Also, not every GC is available, or only in the enterprise version.
Anecdotally I found that recent releases of OpenJDK with Hotspot were a bit faster. Both on my machine and for web services. If you don't need native images or truffle, the huge installation size isn't really justified.
There are multiple benchmarks that show marginal gains using GraalVM CE for big data workloads; it might make sense if you're still stuck on Java 8 or 11. The enterprise edition shows more significant gains.
The whole advantage of GraalVM is startup time, which is important for containers, Lambda jobs etc, because it doesn't have to compile bytecode on startup. It isn't supposed to be faster than regular JVM, which has the advantage of being able to analyze and recompile hotspots.
No need to go for the client, it's working fine on my machine, nearly 60fps with a 12-core, 32gb + RTX2080TI with Iris, Sodium, Phosphor and Lithium </i>.
Nearly 60, lol. Also, fps are not the real problem for the client. I had a modpack with 16GB assigned crashing due to OOM errors. Forge is awesome, but modding the hell out of MC requires extreme specs.
So we don't even know if it actually makes things faster? Startup are a none issue, CPU / memory is but you need proof for that.
Graal does not support ZGC or Shenandoah so it's hard to say if the G1 version from Graal is up to speed.