Yeah the code looks like a spinlock. It behaves terribly under contention, resul...

Yeah the code looks like a spinlock. It behaves terribly under contention, resulting in performance falling off a cliff as the number of threads increases. Adding more threads actually slows down the total performance.

I would fix it if I could be bothered. Instead I will just use the Cuda whisper backend which is pretty nice and fast.