[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: file descriptors

Hallvard B Furuseth wrote:
e.g., at slapd startup time, before calling openlog(), something like:

while(( fd = dup(0)) > 0 )
if ( fd > slap_max_fd )
slap_max_fd = fd;

Put some upper limit on it. This looks like it could eat a lot of
resources if a process may open a lot of descriptors, or if descriptors
are expensive.

Sure, we already have the dtblsize global var for the upper end. Also this discussion is only meaningful on Unix systems with traditional select(). (E.g., winsock select() doesn't care.)

Also, if a process opens a lot of descriptors and then closes a lot of
descriptors, could that slow down future descriptor handling? I don't
have a clue about this, but I imagine an OS could optimize descriptor
handling for processes with less than e.g. (sizeof(int)*CHAR_BIT)
descriptors or for processes where all descriptor numbers are less than
that value. Then invoke slower and more general code once the process
opens more descriptors than that, or a descriptor outside that range.

I've never seen it in any BSD or SysV derived kernels, but anything's possible...


There is hopefully some minimum number of descriptors available to any
process, but if you exceed that: Are you sure the number of descriptors
available to a process is static during this code, so that close()
immediately makes a descriptor available to it - instead of e.g. making
a descriptor available to the system, so any process could use it up?

File descriptors are strictly a per-process resource. Files may be a system-wide resource, but since dup() only increments a reference count on an open file, the total number of file resources in the system isn't changing.

Also, unless POSIX or something denies this, the openlog() call might
need several descriptors. It could load a dynamic library and mmap()
it, open a config file (and still have it open when contacting syslogd),
open some lock file, or whatever. So close more than one descriptor

That's fine. We could simply allocate up to dtblsize-100 or some arbitrary number as a starting point.

Then close all the remaining descriptors before the listener loop
begins, to make them available for main processing.

I imagine some #ifdefs could handle the above at least for common cases.

But is it select() itself which can waste time, or just slapd's
FD_ISSET() & co? (At the moment I don't see which slapd code this
change would optimize, but I haven't searched too hard.)

FD_SET etc. is relatively constant time, since it just requires a mask and shift. But the select() call has to iterate (linearly) from zero to the highest numbered descriptor in use when polling for events to return. So it's advantageous to keep the interesting descriptors in low numbered positions.

In general I don't think this is a significant concern. I was thinking about it before and dismissed it, but recently we were examining a mysterious bug (ITS#4159) and we noticed that the slapd process was getting EMFILE and had run out of descriptors. It turns out the nfiles ulimit on this process was only set at 256, and there were about 85 indexed attributes configured. So select was being called with nfds=256 all the time, while more than 1/3rd of those descriptors were uninteresting.
Would it help to more generally try to get different types of
descriptors to cluster together?  E.g. could select() or an #ifdef in
slapd skip zero bytes in an fd_set, so it would handle the fd_set with
bytes (00, ff) faster than (55, 55)?

I don't believe there's any way to maintain this kind of clustering. Data connections come and go, and we enable/disable selecting on them continually.At best we could reserve the first 8 descriptors so that that byte is always zero.

 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/