This problem is inherently single-threaded. Stackless Python is single-threaded. Go distributes the goroutines over several processors using a thread pool so the performance is naturally lower for a no-op program like this one. Conclusion: inter-processor communication is expensive?
> Go distributes the goroutines over several processors using a thread pool so the performance is naturally lower for a no-op program like this one
Performance is certainly worse with more cores. And you can end up with exponential slowdowns due to thread migration. The best strategy is to pin parts of the thread ring on each core.