FWIW, it's not obvious that the one-thread one-connection blocking I/O design is terrible. How terrible it is is a function of address-space starvation from too many threads (a non-issue in 64-bit code), and the overhead of the OS thread scheduler. In a green threading model, I/O can be implemented behind the scenes as non-blocking I/O, but still give user code the ease of use of blocking I/O.
Back to Windows: overlapped I/O is not equivalent to poll, it's much better than that. It is edge-triggered, but Windows manages the threads itself, and tries to ensure 100% CPU utilization in user code, rather than in thread switching logic.
Windows overlapped I/O is effectively a way to write efficient I/O intensive applications in continuation-passing style. You don't need to manage the threads or dispatch loop yourself.
As to why use threads rather than processes, the issue is largely the awkwardness of duplicating the connection handle for a child thread. In Posix land, there's the fork function that makes handing off the handle fairly easy. WSADuplicateHandle can create a new handle that can be used by a child process, but you need to know the child process's ID first, so you need to perform IPC to just to send off the handle.
On the other hand, shared memory concurrency has its advantages too.
FWIW, it's not obvious that the one-thread one-connection blocking I/O design is terrible.
Any time you're using a 'thread' idiom, even if they are green, something has to be maintaining their individual execution state (stack frames, thread-local address space, etc.). I like to think of it as being analagous to unoptimized tail-recursion, with a purely event-driven callback model being like tail-call optimization.
Thanks for explaining Windows' overlapped I/O, it's way better than I thought it was from reading the documentation. I didn't realize that it does all the thread-pooling stuff all by itself -- I guess it makes the most sense not to fight it, and just accept the free utilization (even if it does give a slight latency hit).
No - not tail call - event driven callback model I/O is continuation-passing style (CPS). The two models are equivalent. Code written in the thread idiom can, in principle, be mechanically transformed into CPS and thereby use the event-driven callback model.
In the event callback style, the state that would otherwise be on the stack is held in the callback closure.
In the .NET space, F# does this transformation automatically with what it calls asynchronous workflows (http://blogs.msdn.com/dsyme/archive/2007/10/11/introducing-f...). It uses the 'let!' assignment as a dividing point between the call to the asynchronous method and converting the remainder of the method body into a continuation that gets passed to the asynchronous call.
Back to Windows: overlapped I/O is not equivalent to poll, it's much better than that. It is edge-triggered, but Windows manages the threads itself, and tries to ensure 100% CPU utilization in user code, rather than in thread switching logic.
Windows overlapped I/O is effectively a way to write efficient I/O intensive applications in continuation-passing style. You don't need to manage the threads or dispatch loop yourself.
As to why use threads rather than processes, the issue is largely the awkwardness of duplicating the connection handle for a child thread. In Posix land, there's the fork function that makes handing off the handle fairly easy. WSADuplicateHandle can create a new handle that can be used by a child process, but you need to know the child process's ID first, so you need to perform IPC to just to send off the handle.
On the other hand, shared memory concurrency has its advantages too.