Issue 4420 - Hanging CLOSE_WAIT connections in ldap-backend
Summary: Hanging CLOSE_WAIT connections in ldap-backend
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-01 08:37 UTC by fred2261@hotmail.com
Modified: 2020-03-26 16:55 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description fred2261@hotmail.com 2006-03-01 08:37:02 UTC
Full_Name: Fred Schmalborn
Version: 2.3.19
OS: AIX 5.2
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (192.109.190.88)


I'm using the ldap backend to proxy some DN's to a corporate LDAP Server.
The requests to the corporate LDAP are both anonymous and USER binds. That is
working fine for the first 30 minutes.

Vor the anonymous BIND there are open two permanent connections to the corporate
LDAP.

The corporate LDAP is closing this connections after 30 minutes and they are
remaining in CLOSE_WAIT state for a long time (forever?).

After that, I often doesn't get more informations from corporate LDAP. In the
TCP trace I see that the OpenLDAP server doesn't communicate with the corporate
LDAP. I have to restart the slapd and it is working fine for the next 30
minutes.

I have tried to set the idle-timeout to 30, but this parameter effects only the
user BIND and not the anonymous.

Is it possible to set a timeout for the anonymous connections?

Are this hanging CLOSE_WAIT connections a bug and how can I avoid them?

At the time it doesn't work (CLOSE_WAIT connections), I see in the syslog:

Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 fd=8 ACCEPT from
IP=127.0.0.1:36877 (IP=0.0.0.0:389)
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 fd=9 ACCEPT from
IP=127.0.0.1:36878 (IP=0.0.0.0:389)
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=0 BIND dn="" method=128
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=0 BIND dn="" method=128
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=0 RESULT tag=97 err=0 text=
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=0 RESULT tag=97 err=0 text=
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SRCH attr=uid
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SRCH attr=uid
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SRCH attr=uid
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SRCH attr=uid
Mar  1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SRCH
base="ou=People,dc=corp,dc=com" scope=2
 deref=3 filter="(&(objectClass=*)(uid=myuser))"
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SRCH attr=uid
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SRCH attr=uid
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SEARCH RESULT tag=101 err=52
nentries=0 text=
Mar  1 09:26:06 ipnetmg5 slapd[290912]: conn=5 fd=8 closed (idletimeout)
Mar  1 09:26:06 ipnetmg5 slapd[290912]: conn=6 fd=9 closed (idletimeout)


With lsof on the OpenLDAP Server I see:

slapd     290912     root  cwd   VDIR               10,6                512  
92226 /usr (/dev/hd2)
slapd     290912     root    0u  VCHR                2,2                0t0   
4161 /dev/null
slapd     290912     root    1u  VCHR                2,2                0t0   
4161 /dev/null
slapd     290912     root    2u  VCHR                2,2                0t0   
4161 /dev/null
slapd     290912     root    3u  unix 0xf10000f3048d5400                0t0     
   ->0xf10000f305e87000
slapd     290912     root    4r  FIFO 0xf10000e32237ae50                  0
slapd     290912     root    5w  FIFO 0xf10000e32237ae50                  0
slapd     290912     root    6u  IPv4 0xf10000f3012efb58                0t0    
TCP *:389 (LISTEN)
slapd     290912     root    7uw VREG               10,7               4096   
8202 /var (/dev/hd9var)
slapd     290912     root   10u  IPv4 0xf10000f305df0b58              0t896    
TCP ipnetmg5c1.muc:32803->corpldap.com:3892 (CLOSE_WAIT)
slapd     290912     root   11u  IPv4 0xf10000f306b82b58               0t76    
TCP ipnetmg5c1.muc:65424->corpldap.com:3892 (CLOSE_WAIT)
slapd     290912     root   12uW VREG               10,7             122901   
8213 /var (/dev/hd9var)
slapd     290912     root   13uW VREG               10,7             493748   
8223 /var (/dev/hd9var)
slapd     290912     root   14u  IPv4 0xf10000f30d5ee358               0t76    
TCP ipnetmg5c1.muc:32804->corpldap.com:3892 (CLOSE_WAIT)

With Best Regards
Fred

Comment 1 ando@openldap.org 2006-03-03 10:07:03 UTC
I have no clear idea as per why this happens; back-ldap refreshes
connections when a LDAP_SERVER_DOWN is returned; the fact that your logs
report a LDAP_UNAVAILABLE (52), which is exactly the ldap response code
mapping of the client library error LDAP_SERVER_DOWN, seems to indicate
that the error is detected correctly, but connection refresh for some
reason doesn't occur.  I note that the correct handling of these
conditions, as far as they can be reproduced, is routinely tested
(partly in the test suite, partly in a private test suite that we
routinely run at SysNet), so I'm positive the software is working as
intended.  It would be of interest to trace a bit more what's happening.
To do that, you could add "args" and "trace" to the log level when the
problem occurs.  You can do that via the cn=config or the cn=monitor
interface without restarting the server; or, if you can live with very
verbose log for 30 min, you can start slapd with that log level.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 2 Howard Chu 2006-03-03 10:19:20 UTC
ando@sys-net.it wrote:
> I have no clear idea as per why this happens; back-ldap refreshes
> connections when a LDAP_SERVER_DOWN is returned; the fact that your logs
> report a LDAP_UNAVAILABLE (52), which is exactly the ldap response code
> mapping of the client library error LDAP_SERVER_DOWN, seems to indicate
> that the error is detected correctly, but connection refresh for some
> reason doesn't occur.  I note that the correct handling of these
> conditions, as far as they can be reproduced, is routinely tested
> (partly in the test suite, partly in a private test suite that we
> routinely run at SysNet), so I'm positive the software is working as
> intended.
>   

I believe the code now in back-ldap is broken. If you do a "grep 
do_retry" in the back-ldap source you'll note that some of the modules 
are testing for LDAP_UNAVAILABLE and some are testing for 
LDAP_SERVER_DOWN. This inconsistency points to another logic problem - 
the LDAP_UNAVAILABLE code should only be present after mapping has 
occurred, but the retry needs to be invoked before mapping. Note that 
ldap_back_op_result will send a result code to the client, so even if 
back-ldap actually retries the operation, the client will have already 
gone away.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 3 ando@openldap.org 2006-03-03 10:45:23 UTC
On Fri, 2006-03-03 at 10:20 +0000, hyc@symas.com wrote:
> I believe the code now in back-ldap is broken. If you do a "grep 
> do_retry" in the back-ldap source you'll note that some of the modules 
> are testing for LDAP_UNAVAILABLE and some are testing for 
> LDAP_SERVER_DOWN. This inconsistency points to another logic problem - 
> the LDAP_UNAVAILABLE code should only be present after mapping has 
> occurred, but the retry needs to be invoked before mapping. Note that 
> ldap_back_op_result will send a result code to the client, so even if 
> back-ldap actually retries the operation, the client will have already 
> gone away.

I'm not so sure it's broken.  I haven't gone thru all of it yet, but as
far as search is concerned, it seems to me that the right value is
checked before and after mapping, and that ldap_back_op_result() is only
called when retry is no longer an option.  I'll investigate a bit more
carefully; I might add some #ifdef'd code to test those error
conditions.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 4 ando@openldap.org 2006-03-03 11:39:49 UTC
On Fri, 2006-03-03 at 10:36 +0000, ando@sys-net.it wrote:
> On Fri, 2006-03-03 at 10:20 +0000, hyc@symas.com wrote:
> > I believe the code now in back-ldap is broken. If you do a "grep 
> > do_retry" in the back-ldap source you'll note that some of the modules 
> > are testing for LDAP_UNAVAILABLE and some are testing for 
> > LDAP_SERVER_DOWN. This inconsistency points to another logic problem - 
> > the LDAP_UNAVAILABLE code should only be present after mapping has 
> > occurred, but the retry needs to be invoked before mapping. Note that 
> > ldap_back_op_result will send a result code to the client, so even if 
> > back-ldap actually retries the operation, the client will have already 
> > gone away.
> 
> I'm not so sure it's broken.  I haven't gone thru all of it yet, but as
> far as search is concerned, it seems to me that the right value is
> checked before and after mapping, and that ldap_back_op_result() is only
> called when retry is no longer an option.  I'll investigate a bit more
> carefully; I might add some #ifdef'd code to test those error
> conditions.

I've walked with gdb inside back-ldap code, and the only case in which
it seems to occur that a 52 is returned to the client is when
ldap_search_ext() returns -1 and ldap_back_retry() fails for the same
reason.  As per why ldap_back_retry() fails I have no clue yet; I caused
those failures by mucking with the return codes of functions that were
in fact succeeding.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------

Comment 5 Howard Chu 2006-05-12 05:00:28 UTC
I saw some hangs while tracking down ITS#4541. The problem was two (or 
more) threads trying to open the default connection at the same time on 
an LDAP* handle. I saw a few cases with 8 active worker threads, with 3 
pairs using the same session handles at once. They were all hanging, 
since no thread could read a complete message. If back-ldap was using 
TLS, things would usually crash or hang in libcrypto. With the ITS#4541 
fix I am no longer seeing these problems.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Comment 6 ando@openldap.org 2006-05-19 14:49:30 UTC
changed notes
changed state Open to Suspended
Comment 7 Kurt Zeilenga 2006-05-28 16:47:06 UTC
changed notes
changed state Suspended to Closed
Comment 8 Alessandro Ranellucci 2007-02-15 18:02:23 UTC
I'm experiencing the CLOSE_WAIT problem with slapd 2.3.32 on MacOS X
(PPC). Cached connections aren't being leaked but they aren't closed on
idle-timeout and they remain in CLOSE_WAIT state after the remote host
has gone.

Compile options:
./configure --enable-rewrite --enable-ldap --enable-meta
--enable-proxycache --enable-spasswd

slapd.conf:
===========
database ldap
suffix "dc=AddressBooks,dc=example,dc=com"
uri ldap://localhost:8080
idle-timeout 2
idassert-bind bindmethod=none
    binddn="cn=Administrative Identity,dc=sssss"
    credentials="admin_password"
    authzID="dn:cn=Sandbox,dc=local,dc=example,dc=com"
    mode=self
    authz=proxyAuthz

(The problem appears also without idle-timeout.)

Steps to reproduce:
1. start slapd
2. connect to slapd and do some queries
3. disconnect the proxied server (for example by killing it)
4. netstat -a still shows the connection as CLOSE_WAIT

Debug:
when running slapd with "-d 5", nothing is printed when the connection
is closed by the remote host (i.e. when I kill it), so slapd isn't
probably catching the event at all.

  - alessandro ranellucci.

Comment 9 Howard Chu 2009-02-17 06:41:48 UTC
moved from Incoming to Archive.Incoming
Comment 10 OpenLDAP project 2014-08-01 21:05:59 UTC
assumed fixed in HEAD/2.3.22