Full_Name: Paul Donohue Version: 2.4.44-20.el7 OS: RHEL7 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (172.58.184.58) After upgrading from OpenLDAP 2.2 to OpenLDAP 2.4, we noticed much higher than expected CPU usage on our LDAP servers, particularly on our master server (which only accepts connections from our slave servers, and therefore should generally be idle). After doing lots of profiling and debugging, we managed to determine that the problem was caused by our olcIdleTimeout setting, which we had set to 3. slapd_daemon_task divides olcIdleTimeout by 4 and sets the seconds and microseconds values of a timeval structure appropriately: https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l2581 https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l2344 In OpenLDAP 2.2, this timeval structure is then passed directly as the timeout parameter to a select() call: https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=7a129673f5fb9674be49dda4f6849df88418d135;hb=83e5ac9bb9c42be9125ee2865df70eec59ee4b5f#l1415 However, in OpenLDAP 2.4, this timeval structure is passed to SLAP_EVENT_WAIT. There are four different implementations of SLAP_EVENT_WAIT. The kqueue and winsock implementations handle this timeout properly, but the epoll and solaris implementations drop the microseconds value and use only the seconds value. (We are using the epoll implementation.) Therefore, if olcIdleTimeout is less than 4 and there are any open connections to the server, then epoll_wait is always called with a zero timeout, which causes the slapd_daemon_task while loop to spin continuously and consume CPU. It looks like this issue has been present since epoll support was first added to OpenLDAP: https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=b7d4e1a5f125358b49307256672a0db7f425a4e5 Both epoll and the solaris /dev/poll infrastructure support only millisecond resolution, not microsecond resolution, so I assume the microseconds were simply dropped for simplicity. Could (tvp)->tv_sec*1000 be changed to (tvp)->tv_sec*1000+(tvp)->tv_usec/1000 to correct this issue? https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l536 https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l712
openldap-bugs@PaulSD.com wrote: > Both epoll and the solaris /dev/poll infrastructure support only millisecond > resolution, not microsecond resolution, so I assume the microseconds were simply > dropped for simplicity. > > Could (tvp)->tv_sec*1000 be changed to (tvp)->tv_sec*1000+(tvp)->tv_usec/1000 to > correct this issue? > https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l536 > https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l712 Thanks for the report. Changed in git master, please test. > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Test moved from Incoming to Software Bugs
changed notes changed state Test to Release
fixed in master fixed in RE24 (2.4.48)
changed notes changed state Release to Closed