The Little Book of Semaphores (2016) [pdf]

exDM69 · on Sept 9, 2017

Semaphores are nice for text book examples, but should be avoided in practical multi threaded programming in a shared memory environment. A mutex and condition variables usually provides a cleaner solution that is easier to reason about. In particular, the expression for the condition that's waited on can be arbitrarily complex, not just a single integer like a semaphore's internal state.

A particularly tricky problem is clean program termination when waiting on semaphores. It is much easier to add while(!quit && not_satisfied) cond_wait(lock, is_satisfied); than doing the same with semaphores. You need to add extra synchronization for this case with semaphores. Every call to cond_wait usually needs a check for a "quit" flag.

Semaphores are still useful in other kinds of asynchronous processes such as CPU to GPU synchronization or keeping trains on railways from colliding.

If you encounter a semaphore in multi threaded code that's been written recently, assume it is incorrect. That's the case in the past half a dozen cases where I came across one in the codebase I work on. The most common bug was the program hanging on exit. Replacing them with mutex and condition variables (plural!) was easy, the solution was much simpler to understand and it was correct and the programs now terminated cleanly when exit was requested.

Semaphores should really be removed from the OS class curriculum. Maybe leave them in the advanced parallel/concurrent programming course because they're easy to reason about in formal proofs. But in multi threaded programming, they're almost always the wrong tool for the job.

jeremiep · on Sept 9, 2017

I believe semaphores and condition variables solve different problems. Programs hanging at the exit only means the engineer mostly tested the happy path, not that semaphores are evil. I've seen plenty of deadlocks on mutexes at shutdown as well :)

exDM69 · on Sept 9, 2017

The point is that implementing a semaphore with a mutex and condition is very easy but more versatile. A semaphore only has a single integer internal state, a condition can guard any number of shares variables.

I see no reason to use semaphores in multithreaded programming because of this. They have no advantages over mutex and condition but are much more limited.

anon946 · on Sept 9, 2017

> The point is that implementing a semaphore with a mutex and condition is very easy but more versatile.

Agreed. An interesting exercise is to try the reverse: implementing a condition variable with a semaphore. It's not trivial to do correctly.

I teach OS, and I still cover semaphores, but I always tell the class that I don't think they are very useful. I spend much more time on condition variables.

exDM69 · on Sept 9, 2017

They are useful, just not in multithreaded programming.

Overall I am quite disappointed at the level of education with parallelism and concurrency now that we're 15 years into the multi-core era, let alone in the years past. I'm constantly having issues with coworkers who are quite clueless in the topic, doing futile attempts in taking a textbook example and applying that to a real world problem (usually involving semaphores or trying to use atomics because "locks are hard").

It's equally frustrating to hear the "don't use locks, do CSP/message passing/concurrent collections/lock-free queues instead" cargo cult. Yes, that works for a class of problems but you still need locks to implement message passing (no excuse to not learn locking) and getting simple atomicity invariants right is as hard or harder than with locks.

It takes discipline and hard work to get multithreaded programming right but there are plenty of silver bullets and snake oil being pushed.

pkaye · on Sept 9, 2017

Are there any performance benefits of semaphores vs mutex/condition?

exDM69 · on Sept 9, 2017

No. Not really.

Programs should be optimized for the uncontended case where a mutex and semaphore are pretty similar.

Some textbook toy problems are theoretically faster with semaphores but that's not really the case for practical problems.

The overhead of synchronization primitives isn't really a concern in practice.

bonzini · on Sept 9, 2017

There are two uses of synchronization primitives, those that should never wait (mutexes) and those that should always wait. The example of the latter is a work queue, where work threads need to wake up as fast as possible.

In that case, using semaphores or other "exotic" synchronization primitives (my favorite are Windows's manual-reset events) can help eliminate contention.

Problems with exiting threads are real. An often sufficient solution is to make the semaphore (or event) waiters "detached" and not care about them, while ensuring that the producers don't exit until their work has been completed.

As with other advanced multithreaded programming techniques, discipline is key, together with noticing, documenting and applying recurrent idioms.

exDM69 · on Sept 9, 2017

Do you have an example of a case where a semaphore helps with high contention?

I would think that using a lock free atomic solution would win there, assuming the contention is high and consistent. Even if the semaphore is marginally faster than a mutex for the uncontended case, both will call into kernel for the contended case. That's 25 ns for uncontended and 150 us for contended. Ie. you could spin about 6000 times on the atomic operation and still win.

I agree with you that discipline and applying best practices without compromise is the way to solve multithreaded problems. Testing, verification and tooling need to be top notch or the result is something that works most of the time.

bonzini · on Sept 9, 2017

Not really contention, but for example a common event semaphore pattern for multiple producer/single consumer is on the consumer side

   Test condition (e.g. queue not empty)
   If false,
       Reset event
       Test condition again
       If false,
           Wait on event

The event stays set most of the time, which helps reducing cacheline bounces between the producers even if you don't have contention; so even if you use lock free atomic operations with optimistic spinning for the busy case, (event) semaphores can then still be used when the worker goes to sleep.

The event itself can be implemented as a mutex/condvar pair, alternatively, but that's pointless if you document what you are doing and do it consistently.

twoodfin · on Sept 9, 2017

Semaphores also lack a notion of ownership, which makes tasks like detecting/debugging deadlocks and remedying priority inversion more difficult.

exDM69 · on Sept 9, 2017

Indeed. When you have a mutex+condition, you can add instrumentation like logging or timers for debugging, testing and verification purposes.

MichaelMoser123 · on Sept 14, 2017

Reminds me of this bug: https://sourceware.org/bugzilla/show_bug.cgi?id=12674 Now go and trust anybody after that....

pkulak · on Sept 9, 2017

Semaphores are perfect for controlling the amount of concurrency. I use then for this all the time.

Sure I could use a mutex and a counter plus comparison and busy wait... but why? As far as I'm concerned, a semaphore is just a more extendable mutex.

exDM69 · on Sept 9, 2017

There's no busy waiting involved with condition variables. The while loop is there to guard against spurious wakeups, and it is not a "busy" loop. A semaphore can be implemented using a mutex and a condition at pretty much the same cost as a "raw" semaphore. It is theoretically 2x faster (1 vs 2 atomics needed) but that isn't worth the handicap from being constrained to a single integer state. If you need two or more semaphores, it's already much worse. And even many textbook examples do (to guarantee clean exit).

A semaphore is just a crippled condition variable that had only a single integer as a state. It sort-of works but isn't very practical in real world applications.

sillysaurus3 · on Sept 9, 2017

Locks in general are super slow. If you use a ringbuffer you can avoid them and use an atomic increment instead.

exDM69 · on Sept 9, 2017

That's a common misconception, obtaining an uncontended mutex (futex) is just a single atomic operation that takes about 25 nanoseconds.

Lock-free programming with atomics is even more difficult than reasoning about code with locks. It is best left to very specialized cases where maximum throughput is necessary and contention is high. Most practical use cases don't benefit from that.

The key is minimizing the time the locks are held. That drives down contention and keeps performance high.

sillysaurus3 · on Sept 9, 2017

It's not really the locks that end up the problem. The problem is the queues. If you use locks you end up shoving data into queues which kills your performance. Unbounded memory growth is fun whenever your producer makes messages faster than your consumer can pull stuff out of the queue.

If you just use a disruptor implementation then there's no need to reason about anything. It's good to go and performant by default.

exDM69 · on Sept 9, 2017

Highly contended queues aren't very common, at least in the kind of programming I do.

If contention is extremely high and throughput is paramount, then a lockless solution might work. E.g. audio processing or network stacks might have this kind of usage pattern.

But in my experience, most queues are empty most of the time. That's when you want a lock so the consumer thread can sleep instead of busy wait.

lomnakkus · on Sept 9, 2017

Not to mention: If you thought $OLD_SOLUTION was hard, then lockless is a whole other level of difficulty.

(This is why I would generally advocate for a message-passing style for almost everything, but obviously that's not quite the context of this thread. My reasoning is that you can always start off with message passing and if you discover that that's your bottleneck, you can always transition to a faster alternative[1]. Plus, you could use a lock-free message passing solution, e.g. Aeron.)

[1] Plus, by using message passing you've already got pretty good separation of logic.

bogomipz · on Sept 9, 2017

>"That's a common misconception, obtaining an uncontended mutex (futex) is just a single atomic operation that takes about 25 nanoseconds"

A futex is just a specific implementation of a mutex though right? Or am I reading your comment wrong? I am curious where the 25 ns comes from.

Also isn't what makes a futex so fast is that it's done in user space - hence the name?

exDM69 · on Sept 9, 2017

The 25 ns is from "Latency numbers every programmer should know" [0], and the 2017 edition puts it at 17 ns.

That's essentially the cost of an atomic operation in the CPU interconnect.

A futex is a way to implement a mutex so that the uncontended case is just a spinlock acquire, and the contended case falls back to calling the kernel.

[0] https://people.eecs.berkeley.edu/~rcs/research/interactive_l...

bogomipz · on Sept 9, 2017

Thanks, why do you need to spin if there is no contention for the lock though? That sounds counterintuitive.

slrz · on Sept 10, 2017

Well, you don't. Just like when acquiring an unlocked spinlock. ;-)

Should look something like this:

  for(;;){
    if(atomic_swap(&thefutex, 1) == 0)
      break; // we own the lock now
    // slow path, go to sleep until awoken
    syscall(SYS_FUTEX, FUTEX_WAIT, ...);
  }

Dowwie · on Sept 9, 2017

Alan is a great contributor to the Python community. He's dedicated countless hours to tutorials and talks. @alan: If you're part of the HN community, as I said once in person: thank you!

carapace · on Sept 9, 2017

Do you mean Allen?

plandis · on Sept 9, 2017

This looks like an updated version of the book that was written initially in C?

xorblurb · on Sept 9, 2017

This seems like an interesting book but a few points are dangerous, especially if it is used for teaching.

1. Concurrent access to a variable out of a critical section is qualified as not being dangerous. This actually can't be decided without knowing about the memory model you are considering, and both in the general case and in practical implementations (C++) this is UB.

2. This sentence is maybe only trying to be funny, but discouraging formal proofs in 2016 can very well end up borderline criminal (depending on the application): "The only alternative is to examine the code carefully and “prove” that it is correct. I put “prove” in quotation marks because I don’t mean, necessarily, that you have to write a formal proof (although there are zealots who encourage such lunacy)."

Especially if this is stated right after a rather simple algorithm.

I can absolutely not recommend a book about a critical topic, which contains that kind of defects.

yebyen · on Sept 9, 2017

> in practical implementations (C++) this is UB.

Huh. I had heard of nasal demons before but I had never seen this abbreviation "UB"

TIL, UB and IB

https://stackoverflow.com/questions/2766731/what-exactly-do-...

devnonymous · on Sept 9, 2017

> I can absolutely not recommend a book about a critical topic, which contains that kind of defects.

I think your definition of formal proofs is a bit different from Alan Downey's (who incidentally has also written an entire book on Complexity and the Philosophy of science[1]). Providing a formal proof for even a 'simple' algorithm is incredibly wasteful in the context of the example in which the statement is made. Perhaps you should read the book before calling out it's 'defects'.

[1] http://greenteapress.com/complexity/index.html

More of Alan's books at: http://greenteapress.com/

carapace · on Sept 9, 2017

*Allen

ptero · on Sept 9, 2017

Semaphore is a tool. Like any tool, to use it well you need to apply it correctly and know of its limitations (e.g., what are the guarantees it requires) and your environment (e.g., what the OS / library provides).

IMO the problems in actual software are almost always between the keyboard and the chair; one can write junk using any sync primitives. My 2c.

ghthor · on Sept 9, 2017

I never understood what use a semaphore was until I tried my hand at implementing go like channels in swift. It's a very strange primitive that I don't think many people understand when it should be used.