[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: >1024 connections in slapd / select->poll

Yusuf Goolamabbas wrote:

Well if you're feeling brave, I've just completed a patch in CVS HEAD supporting epoll. I haven't tried testing it with a massive number of connections yet, but the code now passes the regular test suite. It should be simple enough to add kqueue support as well now (I would have begun that but I don't have BSD installed anywhere at the moment). Regular poll can easily be added if you want, but there's really no reason to. Solaris /dev/poll is a bit more awkward.

Solaris 10 supports event ports which is supposedly thread friendly


Thanks for the links, good references. Of course they seem to confirm that there's no compelling reason to migrate away from select() in slapd...poll() blocks the entire process, /dev/poll has strict mutex requirements and performs poorly when the descriptor list changes frequently... epoll has some of that characteristic as well - modifying the descriptor set requires a system call, a trip across the user/kernel barrier. With select you just flip a bit in userspace and you're done. The Solaris event ports sound interesting, but I think anybody who develops a "new event handler" on Unix and forgets to support signal() at the outset has overlooked something important...

Anyway, it's too bad that everyone is just copying each other's ideas and not actually learning from the obvious limitations of all of these schemes. A real solution needs to not only perform well on large sets of monitored items, but it needs to be extremely cheap to create and manage these sets in the first place. Only select wins on that score, and the obvious solution to avoid the argument passing overhead that everyone seems so foolishly focused on is to use explicitly mapped memory for the event sets. I.e., mmap a region that is directly accessible in both user and kernel space so that no byte copying needs to be done.

Another point where select (and poll) wins is that there is a fast mapping from the input set to the result set - i.e., if you want to know "did event #5 occur?" you can find out in constant time, because it's just a fixed bitfield lookup. For all the other mechanisms that either return events one at a time or in a flat list, you have to iterate thru the list to look for "event #5". They have thus taken the linear search that the kernel does for select and kicked it out into userland, thus perceiving a great savings in kernel CPU time but not really improving life for the application developer. There is an obvious way to solve both of these problems with no additional cost - do both.

Define the input event set as an array of structures, as most of these mechanisms do. The array resides in a shared memory region. We can use a modified struct kevent as a typical structure:
struct kevent {
uintpt_t ident; /* identifier for event */
short filter; /* filter for event */
u_short flags; /* action flags */
u_int inflags; /* filter flags of interest */
u_int outflags; /* resulting flags */
intptr_t data; /* filter data value */
void *udata; /* opaque identifier */

kqueue is pretty darn good, but it still misses on the argument copying problem, its result set is an array of struct kevent's describing the results, and it doesn't give you direct access for priority management.The bulk of the struct is redundant information, all we want to know are the resulting flags and any data accompanying it.

What a really good, efficient mechanism would do is leave the input event array in place, set the result flags and data there, and return a list of *offsets* to all the entries in the array that got signaled. That way you can navigate the result list in priority order and in event order, all without expensive linear search time. (If your argument list is a mmap'd region, using offsets means you don't have to guarantee the region gets mapped to any particular address, but you can still remember the location of the relevant structure for a monitored object and access it in constant time.)

Maybe I'll write a patch for this for Linux over the holidays. Starting from either of poll or kqueue it would be pretty easy to fix this up right.

 -- Howard Chu
 Chief Architect, Symas Corp.       Director, Highland Sun
 http://www.symas.com               http://highlandsun.com/hyc
 Symas: Premier OpenSource Development and Support