Issue 8952 - olcIdleTimeout < 4 causes high CPU usage on some systems
Summary: olcIdleTimeout < 4 causes high CPU usage on some systems
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-02 20:14 UTC by openldap-bugs@paulsd.com
Modified: 2019-07-24 19:01 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description openldap-bugs@paulsd.com 2019-01-02 20:14:55 UTC
Full_Name: Paul Donohue
Version: 2.4.44-20.el7
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (172.58.184.58)


After upgrading from OpenLDAP 2.2 to OpenLDAP 2.4, we noticed much higher than
expected CPU usage on our LDAP servers, particularly on our master server (which
only accepts connections from our slave servers, and therefore should generally
be idle).

After doing lots of profiling and debugging, we managed to determine that the
problem was caused by our olcIdleTimeout setting, which we had set to 3.

slapd_daemon_task divides olcIdleTimeout by 4 and sets the seconds and
microseconds values of a timeval structure appropriately:
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l2581
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l2344

In OpenLDAP 2.2, this timeval structure is then passed directly as the timeout
parameter to a select() call:
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=7a129673f5fb9674be49dda4f6849df88418d135;hb=83e5ac9bb9c42be9125ee2865df70eec59ee4b5f#l1415

However, in OpenLDAP 2.4, this timeval structure is passed to SLAP_EVENT_WAIT. 
There are four different implementations of SLAP_EVENT_WAIT.  The kqueue and
winsock implementations handle this timeout properly, but the epoll and solaris
implementations drop the microseconds value and use only the seconds value.  (We
are using the epoll implementation.)  Therefore, if olcIdleTimeout is less than
4 and there are any open connections to the server, then epoll_wait is always
called with a zero timeout, which causes the slapd_daemon_task while loop to
spin continuously and consume CPU.

It looks like this issue has been present since epoll support was first added to
OpenLDAP:
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commit;h=b7d4e1a5f125358b49307256672a0db7f425a4e5

Both epoll and the solaris /dev/poll infrastructure support only millisecond
resolution, not microsecond resolution, so I assume the microseconds were simply
dropped for simplicity.

Could (tvp)->tv_sec*1000 be changed to (tvp)->tv_sec*1000+(tvp)->tv_usec/1000 to
correct this issue?
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l536
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l712
Comment 1 Howard Chu 2019-01-02 21:54:52 UTC
openldap-bugs@PaulSD.com wrote:
> Both epoll and the solaris /dev/poll infrastructure support only millisecond
> resolution, not microsecond resolution, so I assume the microseconds were simply
> dropped for simplicity.
> 
> Could (tvp)->tv_sec*1000 be changed to (tvp)->tv_sec*1000+(tvp)->tv_usec/1000 to
> correct this issue?
> https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l536
> https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/daemon.c;h=2bdb60aa1d74e6ff674fac41cf1f8d261b3c9b96;hb=HEAD#l712

Thanks for the report. Changed in git master, please test.
> 
> 


-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2019-01-06 02:39:00 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 3 Quanah Gibson-Mount 2019-01-31 23:47:48 UTC
changed notes
changed state Test to Release
Comment 4 OpenLDAP project 2019-07-24 19:01:59 UTC
fixed in master
fixed in RE24 (2.4.48)
Comment 5 Quanah Gibson-Mount 2019-07-24 19:01:59 UTC
changed notes
changed state Release to Closed