> For example, there was minimal use of java.lang.String because ain’t nobody got time for a second allocation of a char[] with associated indirection and GC churn.
An example of why I'd prefer to avoid Java for something like that. Java has a reputation for being slow because of JIT compilation and GC, but I also think a lot of it comes down to all non-primitive types being boxed Objects (boxed = in a separate allocation). This means it works that garbage collector so much harder, there's poorer locality, it fills RAM/cache with extra Object stuff like mutexes that no one ever uses. (Apparently now 8 bytes after a lot of effort to reduce it. https://openjdk.org/jeps/450)
But it has much lesser impact than in, say, Rust, where it's really an allocation (asking kernel for more RAM). Java's Object "allocation" happens in its own heap, which is a big chunk of RAM already pre-allocated from kernel. So in JVM boxing is really cheap, often just a pointer increment. Also oftentimes the wrapper object and its value are located near each other in the same memory page, so we're not adding a RAM access, more like L2 access.
PS: In some cases it can be even faster than Rust or C++, because it pre-allocates larger pages and drops them in chunks within generational GC (e.g. all values "allocated" to process one HTTP request can be GCed immediately after), while C++ is eager to destruct each object immediately. Also, a GC swipe can happen in a separate thread, not bothering the main thread user is waiting for. One can do the same in Rust and C++ using Arenas, of course.
> But it has much lesser impact than in, say, Rust, where it's really an allocation (asking kernel for more RAM). Java's Object "allocation" happens in its own heap, which is a big chunk of RAM already pre-allocated from kernel.
What? No. Rust doesn't ask the kernel for each allocation individually. That'd be insane; besides the crushing system call overhead, the kernel only does allocations in whole pages (4 KiB on x86-64) so it'd be incredibly wasteful of RAM.
Rust does the same thing as virtually every non-GCed language. It uses a memory allocator [1] that does bookkeeping in userspace and asks the kernel for big chunks via sbrk and/or mmap/munmap. Probably not the whole heap as a single chunk of virtual memory as in Java, but much closer to that than to the other extreme of a separate kernel call for each allocation.
[1] by default, just libc's malloc and free, although you can override this, and many people choose jemalloc or mimalloc instead. My high-level description applies equally well to any of the three.
While Java just does a thread local pointer bump, which will still be more efficient, and closer to stack allocation.
Of course you can optimize better with Rust/CPP/etc, but it is not trivial and you definitely not get it out of the box for free. My point is, this is a bit overblown how much overhead java has.
Yes, my mistake, thanks for pointing out, upvoting. I meant asking memory allocator, not kernel.
I meant that Java usually already has that memory allocated, JVM is a memory allocator of its own. It operates within its own heap. One can do that within Rust of course (or easier in Zig, as it welcomes passing an allocator around), it's just built-in in JVM already. Drawback is that it's more complex to release that memory back from the JVM process, so Java apps (not AOT-compiled) usually consume more RAM.
I'm glad that I'm over that phase I had in university where I wanted to write a custom memory allocator for everything because "I understand my usage better". I will admit that it was a good bit of fun though.
Aside from Rust not working like that (as scottlamb said), Rust is faster than Java even if Java has a faster allocator because Rust code usually does much less allocation in the first place.
I don't know if Rust code allocates more or less in general. It really depends on what kind of code you write. Once Rust code reaches the complexity of the Java stacks it's replacing, you get a lot of wrapper objects, locks, and intermediates to cross thread boundaries and to prove soundness to the borrow checker.
I recently encountered an example of someone writing a Rust version of a popular Java library by just taking the Java code, commenting it out, and writing the Rust equivalent almost line for line. The approach works great (no need to reinvent the wheel and you can point to the existing documentation and code samples) but in terms of allocations, you're not going to find many improvements.
There's a type of Java code that looks more like C code than anything else that runs blazing fast with minimal overhead. It's not the type of Java code you'll probably encounter when writing Java applications, but if you use Java as a kind of cross-platform C target, you can get pretty close to Rust (and for some use cases even beat it). Java has a LOT of tricks up its sleave (pointer compression, dynamic realignment) that Rust can't automatically take advantage of.
Your AbstractFunctorClassFactoryProducer isn't going to be very allocation efficient, but once you start seeing volatile ints all over the place, things quickly become a lot faster.
> So in JVM boxing is really cheap, often just a pointer increment.
Nonsense. Sure, the act of creating a new allocation is cheap, but that's not where the expense lies at all. And of course, to make allocations cheap, you needed to give up something else.
Each allocation needs to be at some point handled by the GC, so the more allocations, the more GC pressure. That then forces you to make your GC generational, which constrains your GC design. You end up with a slower GC no matter what you do.
Moreover, every access to that allocation goes through a pointer, which is basically a guaranteed cache miss.
The classic case of this is iterating over an Array<Integer>. Not only did you have to make n allocations that the GC now has to keep track of, it's not possible to efficiently fetch items of this array from memory at all. You need to first fetch the pointers, then request the pointed to memory. Even in the best possible case of the pointer living right next to the data it points to, you're still paying the overhead of extra memory taking up space in your L1 cache.
Compare with an array of simple integers. Zero GC overhead, zero pointers, zero memory overhead, trivial to prefetch.
---
This is a case of horrid approach to design. Making allocating cheap must come at a cost of slowing down some other part of the GC and just encourages programmers to allocate more, which puts more and more pressure on the GC, making it even slower. It's a self-defeating approach that is therefore completely bone headed.
Allocations should be expensive!
With expensive allocations you get breathing room in your GC to make it otherwise faster. Moreover, since programmers are now encouraged not to allocate, the GC pressure is lower, making it even faster. But it gets even better. Optimizing becomes very clear and straightforward - just reduce allocations. This is great because it allows for targeted optimizations - if a particular piece of code is slow, just reduce its allocation rate and you'll get a nice speedup. Very easy to reason about.
That's why code written C/C++/Go/Rust tends to be so conscious of allocations and any indirections, but Java is full of linked lists, arrays of pointers, allocations everywhere and thousands of layers of indirections.
Cleaning up heaps of garbage can never be faster than not creating the garbage in the first place.
It's not like C++ users don't say the exact same thing about String, though.
The problem is that String really isn't a language primitive.
Different people and programs always have a different notion of what a String should do (Should it be mutable? Should it always be valid UTF-8? Which operations should be O(1), O(n) or O(n log n)? etc.)
> It's not like C++ users don't say the exact same thing about String, though.
If they do, they're wrong, as the two languages are quite different here. In Java, String requires two allocations: your variable is implicitly a pointer to String allocation, which in turn has a pointer to a char[] allocation. In C++, the std::string itself is a value type. The actual bytes might be inline (short string optimization) or behind a single allocation accessible from the std::string.
Rust's std::string::String is somewhere between: it's a value type but does not have a short string optimization (unless you count the empty string returned by String::new).
> Different people and programs always have a different notion of what a String should do (Should it be mutable? Should it always be valid UTF-8? Which operations should be O(1), O(n) or O(n log n)? etc.)
Sure, there can be call for writing your own String type. But what's unique about Java as compared to say C, C++, Go, Rust, even to some extent C# is that you can't have a class or struct that bundles up the parts of your data structure (in the case of a mutable string, two fields: data pointer/capacity + the used length) without boxing. There's a heavy cost to any non-primitive data type.
> Sure, there can be call for writing your own String type. But what's unique about Java as compared to say C, C++, Go, Rust, even to some extent C# is that …
You also can’t make a first class string type in most of those language because “hi” is defined to be of a system specified class. You can make a different type to store strings but it’ll never be as ergonomic to use.
It’s even worse in JavaScript, where the standard library is written in a different language (usually C++). It’s impossible to write JavaScript code that matches the performance of the built in string primitive. At least outside of specific niche use cases.
Rust has lots of alternate string like libraries in cargo that are optimized better for different use cases - like smallstring or ropey. They’re fast, convenient and ergonomic. I imagine C++ is the same.
That’s true, but thanks to GC an allocation also is just a pointer bump in Java, and the two allocations are likely to be close to each other. For short-lived strings, the GC cost is furthermore zero, because only the longer-lived objects need to be tracked and copied with generational GC. So, “heavy cost” is relative.
Also, you can't construct a java String without copying the data into it, because String is immutable. In other languages like c++ or rust the string type is mutable so you don't need an extra copy. Java doesn't even special case blessed APIs like StringBuilder to avoid the extra copy, since StringBuilder itself doesn't have a method to consume the inner buffer, you can only create a string from it without touching the buffer even though it's not the normal usage to create multiple strings from a given StringBuilder.
String.intern() doesn't reuse the data per se. It merely gives you a canonical instance with the same value as the String instance you invoke it on (unless it's the first time it is invoked for that value, in which case it returns the same instance you already have in hand). At that point the latter duplicate instance has already been constructed. The only benefit in terms of memory is that the duplicate instance might be garbage-collected earlier if the program stops referencing it and uses the interned instance instead.
Also, String.intern() can be quite slow compared to something like ConcurrentHashMap.putIfAbsent().
I like how dotnet has acknowledged this and provided a whole lot of machinery recently for things like value types on the stack and Span<T> to reduce array copying.
I would have liked that both had acknowledged this to the point of languages like Oberon (all variants predate them), Eiffel, Modula-3, CLU, Cedar,...
While they were used as inspiration on their design, the AOT approach, with automatic memory management, and programming language features for low level coding really took a while to get back from them, even the NGEN/unsafe from C# v 1.0, while welcomed wasn't that great.
I always think that in an alternate reality, had them been had those features already on version 1.0, C and C++ would have lost even more mindshare at the turn of the century, and something like C++11 would not have been as impactful.
An example of why I'd prefer to avoid Java for something like that. Java has a reputation for being slow because of JIT compilation and GC, but I also think a lot of it comes down to all non-primitive types being boxed Objects (boxed = in a separate allocation). This means it works that garbage collector so much harder, there's poorer locality, it fills RAM/cache with extra Object stuff like mutexes that no one ever uses. (Apparently now 8 bytes after a lot of effort to reduce it. https://openjdk.org/jeps/450)