Is there anything specific to Rust that this library does which modern C++ can’t match in performance? I’d be very interested to understand if there is.
I had expected that’s true. You just never know if perhaps Rust compilers have some more advanced/modern tricks that can only be accessed easily by writing in Rust without writing assembly directly.
There is a trick in truly exclusive references (marked noalias in LLVM). C++ doesn't even have the lesser form of C restrict pointers. However, a truly performance focused C or C++ library would tweak the code to get the desired optimizations one way or another.
A more nebulous Rust perf thing is ability rely on the compiler to check lifetimes and immutability/exclusivity of pointers. This allows using fine-grained multithreading, even with 3rd party code, without the worry it's going to cause heisenbugs. It allows library APIs to work with temporary complex references that would be footguns otherwise (e.g. prefer string_view instead of string. Don't copy inputs defensively, because it's known they can't be mutated or freed even by a broken caller).
And they're all extremely buggy, to the point where Rust has disabled it and reenabled it and disabled it and so on many times over as bugs are constantly discovered in LLVM because Rust is the only major user of it
Rust is not magic and you can compile both with llvm (clang++).
If you specify that the pointers don’t alias, and don’t use any language sugar that adds overhead on either side, the performance will be very similar.
The Rust implementation even needs to use a few unsafe blocks (to work with UnsafeCells internally) but is mostly safe code. Other than that you can achieve the same in C++.
But I think the real benefit is that you can write the rest of your code in safe Rust.
Unless the rest of your code is already in C++ and you’re interested in this new better disrupter implementation, that’s probably a common situation for people interested in this topic. Any recommendations for those in that situation? perhaps existing C++ implementations already match this idk.