From a customer: In order to communicate via the LB managed writable ldap, we have to ensure that an idle connection is periodically refreshed. If we do not, the LB will silently drop the connection after 5 minutes. Therefore to combat that I set an olcIdleTimeout on the writable server so that the chain cached connections will be removed before the LB timeout hits. However the slapo-ldap client goes into CLOSE_WAIT state, which causes subsequent ldapmodify updates being brokered by the read only instance to fail with err=80. There appear to be a few bugs filed on this in the past against slapd-ldap, but it's not clear if we may be hitting the same issue, or if this is a new one. I've also connected the read only instances directly to the writable ldap instances and the CLOSE_WAIT issue persists, so I don't believe the CLOSE_WAIT issue is caused by the LB These were the other threads I found as I started looking for this problem, these are using the ldap-proxy though I think: https://www.openldap.org/lists/openldap-technical/201301/msg00323.html http://www.openldap.org/lists/openldap-software/201004/msg00060.html https://www.openldap.org/lists/openldap-bugs/200412/msg00029.html The LB we have seems to be set to forget connections that last over 5 min per the setting, so the 240:10:30 seemed like it should have worked and I just thought it wasn't working because in the man page the text "Only some systems support the customization of these values" is present. however after setting keepalive to 60:10:30 did I maintain a stable connection, so there may be other network settings at play I'm not aware of.
back-ldap likely is missing a task to close idle connections.
I have submitted merge request https://git.openldap.org/openldap/openldap/-/merge_requests/211
Here is a notice of origin and rights statement for the patch The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Tero Saarni tero.saarni@est.tech. I have not assigned rights and/or interest in this work to any party. Ericsson Software Technology AB hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
Commits: • 0eacc4a7 by Tero Saarni at 2021-02-24T22:07:48+00:00 ITS#9197 back-ldap: added task that prunes expired connections
Ever since this went in, we've started getting sporadic test failures of test079, breaking CI/CD.
Trivially reproducible: Cleaning up test run directory from this run. Running 17 of 500 iterations running defines.sh Running slapadd to build database for the remote slapd server... Starting remote slapd server on TCP/IP port 9011... Starting slapd proxy on TCP/IP port 9012... Create shared connection towards remote LDAP (time_t now=1614220114 timeout=1614220118) Checking that proxy has created connections towards backend Sleeping until idle-timeout and conn-ttl have passed Checking that proxy has closed expired connections towards the remote LDAP server (time_t now=1614220119) Create private connection towards remote LDAP (time_t now=1614220119 timeout=1614220123) Checking that proxy has created connections towards backend Sleeping until idle-timeout and conn-ttl have passed Checking that proxy has closed expired connections towards the remote LDAP server (time_t now=1614220125) Checking that idle-timeout is reset on activity Create cached connection: idle-timeout timeout starts (time_t now=1614220125, original_timeout=1614220129) Do another search to reset the timeout (time_t now=1614220128, new_timeout=1614220132) Check that connection is still alive due to idle-timeout reset (time_t now=1614220132) Error: LDAP connection to remote LDAP server is not found (1) Failed after 17 of 500 iterations
Sorry for the flaky test! I've improved it and submitted a merge request https://git.openldap.org/openldap/openldap/-/merge_requests/255
Commits: • 3db2e4a0 by Tero Saarni at 2021-02-25T16:56:55+02:00 ITS#9197 Increase timeouts in test case due to sporadic failures