Yeah the code looks like a spinlock. It behaves terribly under contention, resulting in performance falling off a cliff as the number of threads increases.
Adding more threads actually slows down the total performance.
I would fix it if I could be bothered. Instead I will just use the Cuda whisper backend which is pretty nice and fast.
I would fix it if I could be bothered. Instead I will just use the Cuda whisper backend which is pretty nice and fast.