Yes, because the article isn't about the best approaches for concurrent counters...

scott_s · on Sept 17, 2020

I'm not concerned with the counter itself, either. I'm concerned with the implementation of the synchronization at that level. I'm afraid the solution at that level is a strawman. There is a better solution for the "system call" level.

I also don't think it's useful to think of this hierarchy as "good, bad and ugly." Sometimes the problem you're solving will require being at one of these levels, and you can't perform the optimizations or re-architecture required to drop down a level. When that is the case, it's useful to know the best options at that level.

BeeOnRope · on Sept 17, 2020

As the article shows, there are basically the reasonable levels, 0, 1, and 2 which are all valid approaches and depend on engineering tradeoffs. Levels 3, 4 and 5 are mistakes you want to avoid: so of course you aren't going to agree with the implementation choices in those levels, because they are mistakes! Those types of mistakes occur in practice along these lines (and in other ways), so I think it is useful to show them.

Note that "system call" level doesn't mean "the best level you can implement with a system call", although admittedly perhaps the naming was confusing. Rather, it means "an approach which implies that a system call is made for every interactive" - that is, it is almost always mistake, since can generally avoid this.

The levels are named after the dominant factor in their cost.

scott_s · on Sept 17, 2020

Are you the author?

As I said, sometimes you need to be at those levels for reasons outside of your control. When you are at those levels, it's useful to know the best thing to do at that level. In particular, sched_yield() is a known anti-pattern when using system calls, and there are known better techniques. It's better enough that I think it could have a significant impact on the performance results.

BeeOnRope · on Sept 17, 2020

Yes, I am the author so I'm especially interested in feedback, thus my desire to get to the bottom of this. If it's a misunderstanding I can fix by clarifying something in the article, I'm open to it.

The goal of this post is to define and give a rough measurement for various concurrency performance levels, and as a pedagogical technique I pick one example (cross-thread counter) and stretch it across all the levels. Specifically, the goal is not to create a "good" counter at all levels, but to pick an example which can plausibly hit all the levels, but that's going to require some bad designs at the upper levels.

No all levels represent good engineering: some are simply dominated. For example, I'd argue that 4 and 5 are dominated pretty much always. For a specific requirement, even more levels will be dominated. E.g., for this job, the simplest counter which uses an atomic add to increment the counter is a good baseline: any slower implementation than this is dominated. You'll never need to be at a higher level for this task.

So yes, the higher levels "do something dumb" by definition, because if you are making a system call to increment a counter every time, you are doing it wrong. That sched_yield is an anit-pattern [1] is besides the point because all of the higher levels are deep into anti-pattern territory. That's the only way I could see to make one example work across all the levels (and having a single example was important to me).

If I understand your concern, you would like (or believe the intent is) that the various levels represent a "best possible" implementation within the constraints of the level? I don't think it is possible, as above above it's always going to be deeply artificial once you start forcing an atomic counter to make a system call. For example, I could use a SysV semaphore as my concurrency mechanism: this also always makes a system call. It is not violating any "no sched_yield" rule, but I think it is equally artificial.

---

[1] Actually, I don't know that sched_yield is always an anti-pattern, but let's not even go there: for the purposes of this discussion my point stands even if we assume it is always an anti-pattern (you hint at some of the places where it might be useful with your nanosleep comment).

scott_s · on Sept 17, 2020

I feel like this discussion is slipping out from under me, as I thought we agreed that the counter itself is a contrived example to illustrate synchronization mechanisms. But your recent response seems predicated on the fact the problem you're solving is just incrementing a counter.

I looked at it as just some work that needs to be done. But in the general case, not all work can be reduced to a single, or even a few atomic instructions. And sometimes your synchronization is needed not because you have work to do, but because you are waiting for another thread to complete their work so you can proceed. In those cases, you sometimes need system calls and/or condition variables to correctly synchronize with another thread. And when you find yourself in such a situation, sched_yield() is an anti-pattern.

And your footnote confuses me: I am not proposing using nanosleep() and sched_yield(). I am proposing it instead of. What I linked to in my parent comment explains, basically, that sched_yield() does not mean what you think it means, so don't use it.

BeeOnRope · on Sept 20, 2020

I agree the conversation seems to be going backwards and it seems like more work than expected to get to a common understanding (with no guarantee of success), so let's just drop it.