It seems sys is what makes the big difference here. Does anyone have a theory wh...

barrkel · on Nov 15, 2009

I would guess contention on sync primitives in the dispatcher. The queues look like they need to be locked before they can be modified:

    // Scheduling helpers.  Sched must be locked.
    static void gput(G*);   // put/get on ghead/gtail

This is the comment on the lock:

    in the uncontended case,
     * as fast as spin locks (just a few user-level instructions),
     * but on the contention path they sleep in the kernel.
     * a zeroed Lock is unlocked (no need to initialize each lock).

A better approach for multicore would probably be work-stealing queues.

ori_b · on Nov 15, 2009

From what I know of the authors, they would have picked the simpler solution until it actually turns out to be a problem in the real world.

leif · on Nov 15, 2009

I would tender a guess that since go is compiled to native code, it uses a lot more syscalls to manage its threads. Stackless probably does a lot more of the management inside the vm, and doesn't need to make as many (or any) syscalls to do so.

This is just a guess from someone that knows nothing about either (stackless does run in a vm, right?), so take many large grains of salt.