This reminds me of a very neat trick from Windows IO completion ports. When you mix blocking and nonblocking tasks, you want more threads than the number of cores, so that the CPU is fully used even if a thread blocks. But when no thread blocks, overbooking introduces context switch overhead. Windows' trick is to track threads that got work from a IO completion port and when one blocks, wake up another thread waiting on the port. This way you can have 6 threads working a IO completion port in your quad-core CPU and adapt both to mostly CPU intensive work and blocking work.