That's a common misconception, obtaining an uncontended mutex (futex) is just a ...

sillysaurus3 · on Sept 9, 2017

It's not really the locks that end up the problem. The problem is the queues. If you use locks you end up shoving data into queues which kills your performance. Unbounded memory growth is fun whenever your producer makes messages faster than your consumer can pull stuff out of the queue.

If you just use a disruptor implementation then there's no need to reason about anything. It's good to go and performant by default.

exDM69 · on Sept 9, 2017

Highly contended queues aren't very common, at least in the kind of programming I do.

If contention is extremely high and throughput is paramount, then a lockless solution might work. E.g. audio processing or network stacks might have this kind of usage pattern.

But in my experience, most queues are empty most of the time. That's when you want a lock so the consumer thread can sleep instead of busy wait.

lomnakkus · on Sept 9, 2017

Not to mention: If you thought $OLD_SOLUTION was hard, then lockless is a whole other level of difficulty.

(This is why I would generally advocate for a message-passing style for almost everything, but obviously that's not quite the context of this thread. My reasoning is that you can always start off with message passing and if you discover that that's your bottleneck, you can always transition to a faster alternative[1]. Plus, you could use a lock-free message passing solution, e.g. Aeron.)

[1] Plus, by using message passing you've already got pretty good separation of logic.

bogomipz · on Sept 9, 2017

>"That's a common misconception, obtaining an uncontended mutex (futex) is just a single atomic operation that takes about 25 nanoseconds"

A futex is just a specific implementation of a mutex though right? Or am I reading your comment wrong? I am curious where the 25 ns comes from.

Also isn't what makes a futex so fast is that it's done in user space - hence the name?

exDM69 · on Sept 9, 2017

The 25 ns is from "Latency numbers every programmer should know" [0], and the 2017 edition puts it at 17 ns.

That's essentially the cost of an atomic operation in the CPU interconnect.

A futex is a way to implement a mutex so that the uncontended case is just a spinlock acquire, and the contended case falls back to calling the kernel.

[0] https://people.eecs.berkeley.edu/~rcs/research/interactive_l...

bogomipz · on Sept 9, 2017

Thanks, why do you need to spin if there is no contention for the lock though? That sounds counterintuitive.

slrz · on Sept 10, 2017

Well, you don't. Just like when acquiring an unlocked spinlock. ;-)

Should look something like this:

  for(;;){
    if(atomic_swap(&thefutex, 1) == 0)
      break; // we own the lock now
    // slow path, go to sleep until awoken
    syscall(SYS_FUTEX, FUTEX_WAIT, ...);
  }