User-level network stack

Quite a while back I talked about optimizing select/epoll/whatever using mmap'd memory shared between user space and the kernel for the event table.


These guys have gone even further, and use mapped memory to bring the entire network stack up to user level:


Their paper talks about the work they did with two particular 10Gbit NICs. I find this interesting because in these benchmarks I ran


the NIC interrupt overhead was 100% of a CPU core, and that was only on 1Gbit ethernet. Clearly for heavy-load deployments a better network solution is needed. It sounds like maybe the OpenOnload guys have a better approach.
