Do you have an example of a case where a semaphore helps with high contention?
I would think that using a lock free atomic solution would win there, assuming the contention is high and consistent. Even if the semaphore is marginally faster than a mutex for the uncontended case, both will call into kernel for the contended case. That's 25 ns for uncontended and 150 us for contended. Ie. you could spin about 6000 times on the atomic operation and still win.
I agree with you that discipline and applying best practices without compromise is the way to solve multithreaded problems. Testing, verification and tooling need to be top notch or the result is something that works most of the time.
Not really contention, but for example a common event semaphore pattern for multiple producer/single consumer is on the consumer side
Test condition (e.g. queue not empty)
If false,
Reset event
Test condition again
If false,
Wait on event
The event stays set most of the time, which helps reducing cacheline bounces between the producers even if you don't have contention; so even if you use lock free atomic operations with optimistic spinning for the busy case, (event) semaphores can then still be used when the worker goes to sleep.
The event itself can be implemented as a mutex/condvar pair, alternatively, but that's pointless if you document what you are doing and do it consistently.
I would think that using a lock free atomic solution would win there, assuming the contention is high and consistent. Even if the semaphore is marginally faster than a mutex for the uncontended case, both will call into kernel for the contended case. That's 25 ns for uncontended and 150 us for contended. Ie. you could spin about 6000 times on the atomic operation and still win.
I agree with you that discipline and applying best practices without compromise is the way to solve multithreaded problems. Testing, verification and tooling need to be top notch or the result is something that works most of the time.