In my experience, there's definitely a point where poll() beats select() when handling lots of connections, long-lived or short-lived. But at that point you are much better off moving to epoll() or kqueue() depending upon the OS. And if you support those syscalls, there's little point falling back to select().
For most people, if your program is handling 10s or 100s of concurrent connections, select() should be just fine.
If you are going to be handling more, it's worthwhile looking into the other syscalls to improve performance. In any case, I'd recommend abstracting away the event handling so that you can switch between different syscalls and can use benchmarking to try to replicate the traffic you want to handle. There's lots of libraries that will do this work for you, unless you need to get into the low-level stuff.
Trying to judge the syscall performance based on reasoning about amount of data transferred between your program and the OS, number of syscalls etc, is very difficult. You are much better off measuring the actual behaviour. For example, epoll() seems like a terribly designed API to me as it involves making many syscalls (whereas kqueue() is just one per loop). However, I found epoll() was very high performance. I guess the cost of syscalls on Linux can be very low in some cases.
systems calls on linux that deal with per-process state try very hard to not invalidate userland memory mappings.
The kernels real range and userland's virtual range won't overlap so for a lot of functions kernel memory just has to mapped/unmapped on call, not invalidate _all_ userland bindings.
Well okay they will overlap. So yeah your mappings may get invalidated but for synchronized higher performance systems calls they _shouldnt_.
This lets them be in the 10's to 100's of nano-seconds.
Normally the most _expensive_ part of a linux syscall is the TLB misses after one.
---
Your model of memory transfer size assumes data is being copied.
The Linux kernel has a lot of features to let userland, devices, and itself all share the same memory copy free.
For most people, if your program is handling 10s or 100s of concurrent connections, select() should be just fine.
If you are going to be handling more, it's worthwhile looking into the other syscalls to improve performance. In any case, I'd recommend abstracting away the event handling so that you can switch between different syscalls and can use benchmarking to try to replicate the traffic you want to handle. There's lots of libraries that will do this work for you, unless you need to get into the low-level stuff.
Trying to judge the syscall performance based on reasoning about amount of data transferred between your program and the OS, number of syscalls etc, is very difficult. You are much better off measuring the actual behaviour. For example, epoll() seems like a terribly designed API to me as it involves making many syscalls (whereas kqueue() is just one per loop). However, I found epoll() was very high performance. I guess the cost of syscalls on Linux can be very low in some cases.