Issue 9197 - slapd-ldap/slapo-chain hits error 80 after idletimeout
Summary: slapd-ldap/slapo-chain hits error 80 after idletimeout
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: backends (show other issues)
Version: 2.4.48
Hardware: All All
: --- normal
Target Milestone: 2.5.2
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-26 16:55 UTC by Quanah Gibson-Mount
Modified: 2021-02-26 23:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Quanah Gibson-Mount 2020-03-26 16:55:32 UTC
From a customer:

In order to communicate via the LB managed writable ldap, we have to ensure that an idle connection is periodically refreshed. If we do not, the LB will silently drop the connection after 5 minutes.

Therefore to combat that I set an olcIdleTimeout on the writable server so that the chain cached connections will be removed before the LB timeout hits.

However the slapo-ldap client goes into CLOSE_WAIT state, which causes subsequent ldapmodify updates being brokered by the read only instance to fail with err=80. There appear to be a few bugs filed on this in the past against slapd-ldap, but it's not clear if we may be hitting the same issue, or if this is a new one.

I've also connected the read only instances directly to the writable ldap instances and the CLOSE_WAIT issue persists, so I don't believe the CLOSE_WAIT issue is caused by the LB

These were the other threads I found as I started looking for this problem, these are using the ldap-proxy though I think:
https://www.openldap.org/lists/openldap-technical/201301/msg00323.html
http://www.openldap.org/lists/openldap-software/201004/msg00060.html
https://www.openldap.org/lists/openldap-bugs/200412/msg00029.html

The LB we have seems to be set to forget connections that last over 5 min per the setting, so the 240:10:30 seemed like it should have worked and I just thought it wasn't working because in the man page the text "Only some systems support the customization of these values" is present. however after setting keepalive to 60:10:30 did I maintain a stable connection, so there may be other network settings at play I'm not aware of.
Comment 1 Quanah Gibson-Mount 2020-11-30 18:32:06 UTC
back-ldap likely is missing a task to close idle connections.
Comment 2 tero.saarni 2021-01-08 12:36:30 UTC
I have submitted merge request  https://git.openldap.org/openldap/openldap/-/merge_requests/211
Comment 3 tero.saarni 2021-01-18 16:32:19 UTC
Here is a notice of origin and rights statement for the patch

The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Tero Saarni tero.saarni@est.tech. I have not assigned rights and/or interest in this work to any party.

Ericsson Software Technology AB hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
Comment 4 Quanah Gibson-Mount 2021-02-24 22:15:31 UTC
Commits: 
  • 0eacc4a7 
by Tero Saarni at 2021-02-24T22:07:48+00:00 
ITS#9197 back-ldap: added task that prunes expired connections
Comment 5 Quanah Gibson-Mount 2021-02-25 02:21:32 UTC
Ever since this went in, we've started getting sporadic test failures of test079, breaking CI/CD.
Comment 6 Quanah Gibson-Mount 2021-02-25 02:33:30 UTC
Trivially reproducible:

Cleaning up test run directory from this run.
Running 17 of 500 iterations
running defines.sh
Running slapadd to build database for the remote slapd server...
Starting remote slapd server on TCP/IP port 9011...
Starting slapd proxy on TCP/IP port 9012...
Create shared connection towards remote LDAP (time_t now=1614220114 timeout=1614220118)
Checking that proxy has created connections towards backend
Sleeping until idle-timeout and conn-ttl have passed
Checking that proxy has closed expired connections towards the remote LDAP server (time_t now=1614220119)
Create private connection towards remote LDAP (time_t now=1614220119 timeout=1614220123)
Checking that proxy has created connections towards backend
Sleeping until idle-timeout and conn-ttl have passed
Checking that proxy has closed expired connections towards the remote LDAP server (time_t now=1614220125)
Checking that idle-timeout is reset on activity
Create cached connection: idle-timeout timeout starts (time_t now=1614220125, original_timeout=1614220129)
Do another search to reset the timeout (time_t now=1614220128, new_timeout=1614220132)
Check that connection is still alive due to idle-timeout reset (time_t now=1614220132)
Error: LDAP connection to remote LDAP server is not found (1)
Failed after 17 of 500 iterations
Comment 7 tero.saarni 2021-02-25 15:01:41 UTC
Sorry for the flaky test!

I've improved it and submitted a merge request https://git.openldap.org/openldap/openldap/-/merge_requests/255
Comment 8 Quanah Gibson-Mount 2021-02-25 17:03:45 UTC
Commits: 
  • 3db2e4a0 
by Tero Saarni at 2021-02-25T16:56:55+02:00 
ITS#9197 Increase timeouts in test case due to sporadic failures