How about: *Any C or C++ programmer knows not to use linked lists anyway.*

Peaker · on Dec 1, 2013

If you really think that, you've missed CS 101. Or maybe std::list is the only linked list implementation you've seen. In that case, I agree, one should never use std::list.

Linux's list.h is extremely useful, and for a wide variety of circumstances, is the most efficient way to manage your data.

dkersten · on Dec 2, 2013

Ok, I'll elaborate, especially since my view is at odds with your statement "for a wide variety of circumstances".

As I see it, the only use case where linked lists are superior to other types of lists, like, perhaps, ArrayList in java or vector/deque in C++, is if the following conditions are met:

1: you care about ordering - often you don't care about ordering and in that case, there is no need for a linked list because you can achieve O(1) insertion & removal then too: insertion can always be at the end, removal can be a "swap with end element, remove end element" operation.

2: inserting and/or removing from the middle of the list is a common operation, if it is not a common operation, then the added cost of doing so with a vector may still be outweighed by a vectors other advantages

3: you do not require random access - lists do not provide random access and lookup is O(n). At the expense of additional complexity in implementation and more memory overhead, you could reduce lookup to, OTOH, O(log n) by using a skip-list

4: you do not iterate through the list often - if you do, you are likely going to blow the cache and mess up prefetching due to poor cache locality. Iterating through an array-based data structure can be much faster in this case.

I would say that for a list to make sense, you MUST have 1 and 2 and probably should have 3. 4 is optional, but if true, should make you consider if there might not be a more suitable data structure. In my own personal experience, this is rare. In fact, in my own personal experience, usually, code either does not require 1 or requires 1 but not 2 - either way, lists are not the appropriate data structure in those cases.

Basically, the short version is that they have very poor cache locality, an (IMHO) narrow use case where other data structures don't have superior performance and they take up more memory per node than a lot of other types of lists.

You linked to a stackoverflow question in another comment saying that they attribute std::list's flaws to linked lists as a whole. The biggest issue they seemed to mention, though, was cache locality - I fail to see how intrusive linked lists solve this. The only solution would be to preallocate nodes in consecutive memory locations, but you still take a hit as the links take up memory (whereas in array-based lists you do not need to store links for each node) and if you need to insert/delete in the middle (why are you using linked lists if this isn't the case?) then you end up jumping around the preallocated nodes anyway and after a while will lose any cache-friendliness you may have had.

Maybe you can elaborate what you meant?

To end with an appeal to authority ;-) I'll quote tptacek[1]:

C programmers are trained to use linked lists. They are the first variable-length containers most programmers come into contact with and so C programmers tend to be imprinted on them like ducklings. Linked lists are a poor general purpose container.

EDIT: I guess I missed an anti-condition: if you don't care about performance, then use whatever models your problem best, which may well be a linked list (though the same is true for std::list).

[1] https://news.ycombinator.com/item?id=4455676

epsylon · on Dec 2, 2013

To further prove the point about cache locality:

https://www.youtube.com/watch?v=0iWb_qi2-uI&t=44m50s

Peaker · on Dec 2, 2013

This is a strawman. This isn't a useful case for lists.

Since he needs to scan the list to find the position to add/remove to/from.

If he used an intrusive list, the removal from a vector-with-list would be faster in a list already for relatively small N's.

What I learn from this video is that Stroustroup is also misguided about linked lists. He thinks you always need to do a linear search to do useful operations. The whole point of lists is the O(1) operations, not O(N) operations.

Indeed, std::list is the culprit here: "We shape our tools and then our tools shape us". std::list users become unaware that list operations are usable on elements without a linear search first.

epsylon · on Dec 2, 2013

Even O(1) operations on lists can break cache locality. Since you're chaining using pointers, all bets that your cells are nearby are off. With vectors, they always are.

Peaker · on Dec 3, 2013

When you need to memmove your whole vector you're going to thrash a lot more of the cache lines than when you touch 2 neighbor cache lines.

dkersten · on Dec 3, 2013

Assuming that your list elements are in the neighboring cache lines (also how it compares to vectors in real life performance, as opposed to time complexity, really depends on how often your vector has to grow for your workload.. often a reasonable size can be chosen up front to minimize this, for example), but your point is of course valid.

Though I have seen vector-like structure implementations that ocmbine arrays and linked lists so that when you run out of space, it allocates a new array and links it to the end of the old array. I've implemented memory pooling systems like this myself. Works well, because as long as you know which bucket/array the element is in, you have random access and you also have good linear scanning performance, and extending the size doesn't require memmove's.

I'm not saying that linked lists have no uses (my original comment about never was not meant quite serious - never is much too strong a word) - they certainly do - but I am saying that I feel they have much more limited use cases than most people seem to think.

I agree that intrusive linked lists have many advantages that non-intrusive linked lists do not have. The main advantage of intrusive data structures is, as you said, that membership in multiple structures is in one place allowing constant time deletion from all structures.

Replying to your other comment:

Deleting from the middle of the list is an extremely common requirement, when you have objects that need to be enumerable in some order[s] and sometimes deleted.

I don't know what work you do in C or C++ (in other languages, I happily use their lists data types without caring if they used linked lists under the hood, because chances are, if I care about absolute performance, I'd have been using C or C++ to begin with), but in my own work, requiring both order and common insertion/deletion in the middle has been rare - or the elements and lists were so small that linear scan and copy was fast. But I guess it depends on your work, your logic here certainly sounds reasonable to me.

When you need indexing, use a vector. When you need quick add/remove, use lists. When you need both, use both.

Sure, you could use, say, a vector of elements that are themselves nodes in an intrusive linked list. Using a random access data structure to index another data structure is something I've done myself. I suppose I'm mostly arguing against the naive use of linked lists that seems to be rampant, because I think that most uses do not do all the interesting things you mention, in which case, IMHO, my points still stand and linked lists have a narrow use case.

You don't need to jump around. Say you have a request object you found via a hash table. After handling it you decide to destroy it. You now want to delete it from various lists it is in. You can do it on all lists in O(1) for each list. This is relatively cache-local (only need 2 extra random accesses, potential misses, for each list).

See, this is the kind of thing that I was looking for when I asked you to elaborate. You're not really using the list as a convenient growable array replacement (as you say yourself, you don't think of them as containers), you're using them as a secondary structure to maintain order in cases where middle removal or insertion is important (or maybe the other data structures are the secondary ones.. whatever). I think I better understand what you meant all along now and your reasoning makes sense to me, I can definitely see the use of this and agree that the problems of linked lists basically boil down to the following: they're seen as containers.

I think in that case, everything I said is totally true. I was definitely somewhat wrong about intrusive lists in the cases where some combination of these are true: another structure is used when lookup is needed, middle insertion/deletion is common, copying is expensive, order is important, elements are members of multiple lists where membership is commonly modified together.

I regularly have my data objects within a hash table, a list, and a search tree simultaneously and efficiently, with no dynamic allocation to sustain any of that.

This sounds good - I don't think most people really do this though. The problem isn't just std::list (or std:: containers in general). Your original reply to me was "If you really think that, you've missed CS 101" but perhaps you missed CS 101, because in CS 101 they (in mine and in any course notes and data structure books I've read) teach the common unobtrusive-linked-list-container data structure for which my points are completely valid. They're not teaching your use cases, or if they are, its certainly not nearly as common.

EDIT: I wanted to add that all of this really depends on what you are storing to begin with. If you are only storing an integer or two, the overhead of the links will kill you and your cache efficiency. If on the otherhand you are storing lots in each node then a few extra pointers isn't going to do much. Also if you're only storing a few bytes (or a few dozen) copying isn't all that expensive and then whether lists or vectors are faster depends on access patterns like I said in my previous comment. A lot of, eg, my python lists are lists of numbers. In high performance c++ code a lot of my vectors simply contained ids, numbers, points (2d or 3d) or similarly simple data.

Peaker · on Dec 3, 2013

It's nice to be able to have productive discussion on the internet!

We can agree on almost all points, I think.

The CS 101 bit you mention is a good point: intrusiveness matters for asymptotics in CS, and should be taught in academia, not (very small portions of) industry.

By the way, when you mention "a vector whose elements are linked list heads", a much more typical scenario is an array/vector whose elements contain one or more list heads.

You were wondering about the kinds of software I was working on that needs this stuff: high-performance storage controllers. We need to maintain a lot of concurrency, handle a lot of kinds of failures, etc. We often require the same objects (e.g: an ongoing I/O request) to be looked up in various ways, associated with failure domains, timed out, etc. So we want it organized by many different data structures, and we need the O(1) of deleting it from the various structures when an I/O request dies.

We also shun dynamic allocations, aside from relatively rare memory pools for large structures that contain the various allocations we need. Intrusive style allows us so many nice things within this style:

* Avoiding the runtime costs of dynamic allocations

* Avoiding the memory costs of extra indirections incurred by dynamic allocations and STL (non-intrusive) style

* Avoiding handling out-of-memory errors in every single code path: there are almost no allocations anywhere, almost all functions become "void" error-free functions!

* Having optimal asymptotics for our operations

* Having reusable generic data structures in C without templates (we dislike C++)

Compared to all this, the STL stuff is just horrible. STL is widely considered to be a superb library, but I find it horrid.

dkersten · on Dec 4, 2013

It's nice to be able to have productive discussion on the internet!

I don't mind being wrong if it means I learn from it :)

Peaker · on Dec 2, 2013

I disagree. Will reply later, too long mto reply on my phone.

Peaker · on Dec 2, 2013

> 1: you care about ordering - often you don't care about ordering and in that case, there is no need for a linked list because you can achieve O(1) insertion & removal then too: insertion can always be at the end, removal can be a "swap with end element, remove end element" operation.

Whether you care about ordering or not, linked lists work great. Adding to an array is only amortized O(1). It is worst-case O(N). Why pay O(N) for the worst case when you can pay a cheap O(1) always?

> 2: inserting and/or removing from the middle of the list is a common operation, if it is not a common operation, then the added cost of doing so with a vector may still be outweighed by a vectors other advantages

Deleting from the middle of the list is an extremely common requirement, when you have objects that need to be enumerable in some order[s] and sometimes deleted.

> 3: you do not require random access - lists do not provide random access and lookup is O(n). At the expense of additional complexity in implementation and more memory overhead, you could reduce lookup to, OTOH, O(log n) by using a skip-list

Here is a false dichotomy dictated by the STL. With intrusive lists, you can have both your vector and lists of various orders. There is no contradiction.

When you need indexing, use a vector. When you need quick add/remove, use lists. When you need both, use both.

> 4: you do not iterate through the list often - if you do, you are likely going to blow the cache and mess up prefetching due to poor cache locality. Iterating through an array-based data structure can be much faster in this case.

Yes, if you want quick (repeated) enumeration, put it in a vector. Again, this doesn't contradict also putting it in lists.

> The biggest issue they seemed to mention, though, was cache locality - I fail to see how intrusive linked lists solve this.

Cache locality is based on your use pattern. When enumeration isn't your common operation, vectors don't have better locality than lists. For example, a sorted list of requests where you may want to time out the oldest one will only refer to the head and tail of the whole list. Cache locality is great.

> then you end up jumping around the preallocated nodes anyway and after a while will lose any cache-friendliness you may have had.

You don't need to jump around. Say you have a request object you found via a hash table. After handling it you decide to destroy it. You now want to delete it from various lists it is in. You can do it on all lists in O(1) for each list. This is relatively cache-local (only need 2 extra random accesses, potential misses, for each list).

> C programmers are trained to use linked lists. They are the first variable-length containers most programmers come into contact with and so C programmers tend to be imprinted on them like ducklings. Linked lists are a poor general purpose container.

I think the false premise here, espoused by the STL, is that a data structure is a "container" at all. Your data can be "contained" by anything (the stack, the heap, a vector, ...). The data structures involved (linked lists, hash tables, search trees) all do not contain the data. They organize the data.

When using vectors as containers STL-style -- you cannot really have your data organized by multiple data structures efficiently.

I regularly have my data objects within a hash table, a list, and a search tree simultaneously and efficiently, with no dynamic allocation to sustain any of that.

STL cannot do this because of the data-structure as "container" philosophy.

EDIT: Also read my other comment detailing why std::list is lacking the most fundamental properties expected of a linked list: https://news.ycombinator.com/item?id=6830526