TL;DR: Using dtrace to analyze the performance of AVL Trees in C vs BTreeMaps&HashMaps in Rust. Spoilers: Rust wins every time, and you should rewrite your software in Rust anyway ;).
Note that the data structure is different: AVL trees vs. BTrees. The interesting part for me was that because of modern cache hierarchies, BTrees appear to be superior than AVL trees even for in-memory data structures, while before I'd thought of them as a special-purpose on-disk data structure for databases.
AVL trees were more competitive when memory was more shallow. These days an L1 cache hit is dirt cheap and a RAM fetch is ~200x as expensive, following a pointer creates a data dependency. B-trees give you more L1 cache hits and fewer pointers to follow. Back in the 1990s the cache hits and RAM fetches were much closer to parity and there weren't as many pipeline stages to fill so data dependencies weren't as important. Go back even farther and you have completely uniform memory that could be accessed in a fixed number of cycles no matter what the address.
There are a lot of good books from these eras, and there are a lot of good books that teach with the uniform memory model. Most algorithms classes don't talk about the hierarchy at all (even though it distorts the true asymptotic runtime of many algorithms).
Agreed. It blew my mind to learn that tiling matrix multiplication is more efficient than untiled, but that you can only show tiled is faster in a hierarchical memory model (for tiled, you have arithmetic intensity that scales with size of your fast memory rather than a fixed constant). Yet in uniform memory model both are O(n^3).
Makes total sense. The insight is that the memory acts like the disk used to and the caches act like the memory. BTrees worked well for spinning disk because they'd nicely load a whole array (node) of BTree child pointers in one disk read. But the same works for caches, they'd load the the whole array of into the cache when the first one is accessed.
A silly idea I had, wonder if it would be possibly to auto-adjust the B factor of a BTree (the number of nodes) at runtime based the cache line size.
Auto adjusting would be interesting, even better (though harder) would be to do it experimentally rather than based on cache line size. Cache line size only determines the minimum block size that doesn't waste cache capacity, after all.
Ultimately the point was that Rust discourages intrusive data structures, and not using an intrusive data structure turns out to be a win even though experienced C programmers might believe otherwise. Now, of course, an apples to apples comparison would be nice, but I'm sure that after some experience with Rust one might end up writing similar code in C and the difference might be very small.
The point really is that Rust foists better design choices on the programmer.
Doubly linked lists. Singly linked lists and one-way trees work fine with the single ownership model.
As I've said before, the two places where Rust has problems with expressive power are backlinks and partially initialized arrays. Those are routine cases which are forced to unsafe code.
One of the places you most often need to use backlinks and other unsafe code in Rust is implementing fancy data structures. Unfortunately that happens to be what a lot of people think of as "programming" because they come up a lot in computer science classes... Fortunately most production code seldom needs new fancy data structures; maps, sets, tuples, structs and reference counting take you a very very long way. For less common things like graphs there are usually crates you can use off the shelf.
Backlinks ARE a very interesting point about Rust. Just because you don't find yourself using them all that often doesn't mean it's not interesting to the design and expresivity of the language.
The issue is a direct consequence of Rust's strict aliasing rules, which I feel any programmer would benefit from groking.
I think it's important to note that in the case of most data structures, you only need to do the hard work of verifying your unsafe code once, after which you can package it up in a crate.
I guess many C programmers might be used to reimplementing data structures for their programs, but I don't think it's useful in a majority of cases
Use of libraries may be more common in rust than c because of many possible factors:
1. Rust's standard library is large and contains data structures. People are good at centralizing on one thing if it's the standard one, if there are 10 competing string libraries, people might say "ah well, let's write our own". That also sometime means using one dependency will pull in others (e.g. in C you might want someone's regex library, but it might be incompatible with your string library because it uses a different one. A stdlib helps ensure libraries have a standard set of types to talk about, and C barely has that).
2. Rust's package manager (cargo) and registry (crates.io) are really good and making finding and adding dependencies much easier. If I want to add a random library on github to my rust project, it's 1 line in my Cargo.toml (whether it's on crates or not). In C, I have to figure out if they're using cmake, meson, autotools, or any of a dozen other things, and possibly vendor the code into a third_party folder, and possible spend several hours hacking on my build system to link against it.
3. Rust's community has made an effort to standardize on certain crates and include their use. See the rustlang nursery. The closest to this in C is stuff like the gnulib I think, which is an ancient effort pushing garbage code that suffers from the lack of good dependency management severely. I know of no recent similar efforts for C.
4. Rust has generics and a better type system in general which makes it much easier to implement generally usable libraries.
It's important to remember that C also predates the internet, and many c codebases also predate modern package management. Legacy practices have a habit of propagating themselves.
I think that TL;DR misses lots of nuance; I'd suggest a better TL;DR would be "the features and idioms of different languages can lead to different preferred solutions to the same problems, which means that we should be careful not to unnecessarily restrict our perception of new languages in terms of the familiar idioms of old ones." Or maybe: "performance benchmarks are hard, and even carefully-crafted comparisons can miss the point if they paper over important practical implementation concerns; an apples-to-apples comparison can unintentionally mislead if it precludes the consideration that perhaps what you actually wanted was an orange after all". Importantly, the article explains why Rust is faster on this specific benchmark, but goes out of its way to note that benchmarks should always be interpreted with appropriate caution, and that nobody should come away with the notion that this implies any sort of strict dominance of Rust over C.
As suggested in the introduction, one would think it was called "the relative performance of C and Rust" because this article is a corollary to a previous article and conceived as a response to an inquiry regarding that previous article--specifically, to answer the question "in your given benchmark, what explains the relative performance of C and Rust?" :P The basic answer: "data structures"; the insightful answer: "data structures that were originally chosen to play to the different strengths and weaknesses of their respective implementation languages, giving us food for thought as to how language design can shape the solutions that users of that language will tend to arrive at, and the performance implications thereof".
Just because that would indeed focus on the specific sub-portion of the article that you think is most important, doesn't necessarily make it better than a title that meant to encompass the issues surrounding whole act of benchmarking, specifically between languages that might encourage different idiomatic solutions, and then drills down to what that means in this specific instance.
In other words, what you took away from the article might not have been what other people did, or what the author meant to express. It surely wasn't what I thought was important (but neither was "rust is faster/slower than C").