Full_Name: Fred Schmalborn Version: 2.3.19 OS: AIX 5.2 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (192.109.190.88) I'm using the ldap backend to proxy some DN's to a corporate LDAP Server. The requests to the corporate LDAP are both anonymous and USER binds. That is working fine for the first 30 minutes. Vor the anonymous BIND there are open two permanent connections to the corporate LDAP. The corporate LDAP is closing this connections after 30 minutes and they are remaining in CLOSE_WAIT state for a long time (forever?). After that, I often doesn't get more informations from corporate LDAP. In the TCP trace I see that the OpenLDAP server doesn't communicate with the corporate LDAP. I have to restart the slapd and it is working fine for the next 30 minutes. I have tried to set the idle-timeout to 30, but this parameter effects only the user BIND and not the anonymous. Is it possible to set a timeout for the anonymous connections? Are this hanging CLOSE_WAIT connections a bug and how can I avoid them? At the time it doesn't work (CLOSE_WAIT connections), I see in the syslog: Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 fd=8 ACCEPT from IP=127.0.0.1:36877 (IP=0.0.0.0:389) Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 fd=9 ACCEPT from IP=127.0.0.1:36878 (IP=0.0.0.0:389) Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=0 BIND dn="" method=128 Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=0 BIND dn="" method=128 Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=0 RESULT tag=97 err=0 text= Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=0 RESULT tag=97 err=0 text= Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SRCH attr=uid Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SRCH attr=uid Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=6 op=1 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:25:23 ipnetmg5 slapd[290912]: conn=5 op=1 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SRCH attr=uid Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=6 op=2 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SRCH attr=uid Mar 1 09:25:26 ipnetmg5 slapd[290912]: conn=5 op=2 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SRCH base="ou=People,dc=corp,dc=com" scope=2 deref=3 filter="(&(objectClass=*)(uid=myuser))" Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SRCH attr=uid Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SRCH attr=uid Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=5 op=3 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:25:28 ipnetmg5 slapd[290912]: conn=6 op=3 SEARCH RESULT tag=101 err=52 nentries=0 text= Mar 1 09:26:06 ipnetmg5 slapd[290912]: conn=5 fd=8 closed (idletimeout) Mar 1 09:26:06 ipnetmg5 slapd[290912]: conn=6 fd=9 closed (idletimeout) With lsof on the OpenLDAP Server I see: slapd 290912 root cwd VDIR 10,6 512 92226 /usr (/dev/hd2) slapd 290912 root 0u VCHR 2,2 0t0 4161 /dev/null slapd 290912 root 1u VCHR 2,2 0t0 4161 /dev/null slapd 290912 root 2u VCHR 2,2 0t0 4161 /dev/null slapd 290912 root 3u unix 0xf10000f3048d5400 0t0 ->0xf10000f305e87000 slapd 290912 root 4r FIFO 0xf10000e32237ae50 0 slapd 290912 root 5w FIFO 0xf10000e32237ae50 0 slapd 290912 root 6u IPv4 0xf10000f3012efb58 0t0 TCP *:389 (LISTEN) slapd 290912 root 7uw VREG 10,7 4096 8202 /var (/dev/hd9var) slapd 290912 root 10u IPv4 0xf10000f305df0b58 0t896 TCP ipnetmg5c1.muc:32803->corpldap.com:3892 (CLOSE_WAIT) slapd 290912 root 11u IPv4 0xf10000f306b82b58 0t76 TCP ipnetmg5c1.muc:65424->corpldap.com:3892 (CLOSE_WAIT) slapd 290912 root 12uW VREG 10,7 122901 8213 /var (/dev/hd9var) slapd 290912 root 13uW VREG 10,7 493748 8223 /var (/dev/hd9var) slapd 290912 root 14u IPv4 0xf10000f30d5ee358 0t76 TCP ipnetmg5c1.muc:32804->corpldap.com:3892 (CLOSE_WAIT) With Best Regards Fred
I have no clear idea as per why this happens; back-ldap refreshes connections when a LDAP_SERVER_DOWN is returned; the fact that your logs report a LDAP_UNAVAILABLE (52), which is exactly the ldap response code mapping of the client library error LDAP_SERVER_DOWN, seems to indicate that the error is detected correctly, but connection refresh for some reason doesn't occur. I note that the correct handling of these conditions, as far as they can be reproduced, is routinely tested (partly in the test suite, partly in a private test suite that we routinely run at SysNet), so I'm positive the software is working as intended. It would be of interest to trace a bit more what's happening. To do that, you could add "args" and "trace" to the log level when the problem occurs. You can do that via the cn=config or the cn=monitor interface without restarting the server; or, if you can live with very verbose log for 30 min, you can start slapd with that log level. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
ando@sys-net.it wrote: > I have no clear idea as per why this happens; back-ldap refreshes > connections when a LDAP_SERVER_DOWN is returned; the fact that your logs > report a LDAP_UNAVAILABLE (52), which is exactly the ldap response code > mapping of the client library error LDAP_SERVER_DOWN, seems to indicate > that the error is detected correctly, but connection refresh for some > reason doesn't occur. I note that the correct handling of these > conditions, as far as they can be reproduced, is routinely tested > (partly in the test suite, partly in a private test suite that we > routinely run at SysNet), so I'm positive the software is working as > intended. > I believe the code now in back-ldap is broken. If you do a "grep do_retry" in the back-ldap source you'll note that some of the modules are testing for LDAP_UNAVAILABLE and some are testing for LDAP_SERVER_DOWN. This inconsistency points to another logic problem - the LDAP_UNAVAILABLE code should only be present after mapping has occurred, but the retry needs to be invoked before mapping. Note that ldap_back_op_result will send a result code to the client, so even if back-ldap actually retries the operation, the client will have already gone away. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
On Fri, 2006-03-03 at 10:20 +0000, hyc@symas.com wrote: > I believe the code now in back-ldap is broken. If you do a "grep > do_retry" in the back-ldap source you'll note that some of the modules > are testing for LDAP_UNAVAILABLE and some are testing for > LDAP_SERVER_DOWN. This inconsistency points to another logic problem - > the LDAP_UNAVAILABLE code should only be present after mapping has > occurred, but the retry needs to be invoked before mapping. Note that > ldap_back_op_result will send a result code to the client, so even if > back-ldap actually retries the operation, the client will have already > gone away. I'm not so sure it's broken. I haven't gone thru all of it yet, but as far as search is concerned, it seems to me that the right value is checked before and after mapping, and that ldap_back_op_result() is only called when retry is no longer an option. I'll investigate a bit more carefully; I might add some #ifdef'd code to test those error conditions. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
On Fri, 2006-03-03 at 10:36 +0000, ando@sys-net.it wrote: > On Fri, 2006-03-03 at 10:20 +0000, hyc@symas.com wrote: > > I believe the code now in back-ldap is broken. If you do a "grep > > do_retry" in the back-ldap source you'll note that some of the modules > > are testing for LDAP_UNAVAILABLE and some are testing for > > LDAP_SERVER_DOWN. This inconsistency points to another logic problem - > > the LDAP_UNAVAILABLE code should only be present after mapping has > > occurred, but the retry needs to be invoked before mapping. Note that > > ldap_back_op_result will send a result code to the client, so even if > > back-ldap actually retries the operation, the client will have already > > gone away. > > I'm not so sure it's broken. I haven't gone thru all of it yet, but as > far as search is concerned, it seems to me that the right value is > checked before and after mapping, and that ldap_back_op_result() is only > called when retry is no longer an option. I'll investigate a bit more > carefully; I might add some #ifdef'd code to test those error > conditions. I've walked with gdb inside back-ldap code, and the only case in which it seems to occur that a 52 is returned to the client is when ldap_search_ext() returns -1 and ldap_back_retry() fails for the same reason. As per why ldap_back_retry() fails I have no clue yet; I caused those failures by mucking with the return codes of functions that were in fact succeeding. p. Ing. Pierangelo Masarati Responsabile Open Solution OpenLDAP Core Team SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
I saw some hangs while tracking down ITS#4541. The problem was two (or more) threads trying to open the default connection at the same time on an LDAP* handle. I saw a few cases with 8 active worker threads, with 3 pairs using the same session handles at once. They were all hanging, since no thread could read a complete message. If back-ldap was using TLS, things would usually crash or hang in libcrypto. With the ITS#4541 fix I am no longer seeing these problems. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
changed notes changed state Open to Suspended
changed notes changed state Suspended to Closed
I'm experiencing the CLOSE_WAIT problem with slapd 2.3.32 on MacOS X (PPC). Cached connections aren't being leaked but they aren't closed on idle-timeout and they remain in CLOSE_WAIT state after the remote host has gone. Compile options: ./configure --enable-rewrite --enable-ldap --enable-meta --enable-proxycache --enable-spasswd slapd.conf: =========== database ldap suffix "dc=AddressBooks,dc=example,dc=com" uri ldap://localhost:8080 idle-timeout 2 idassert-bind bindmethod=none binddn="cn=Administrative Identity,dc=sssss" credentials="admin_password" authzID="dn:cn=Sandbox,dc=local,dc=example,dc=com" mode=self authz=proxyAuthz (The problem appears also without idle-timeout.) Steps to reproduce: 1. start slapd 2. connect to slapd and do some queries 3. disconnect the proxied server (for example by killing it) 4. netstat -a still shows the connection as CLOSE_WAIT Debug: when running slapd with "-d 5", nothing is printed when the connection is closed by the remote host (i.e. when I kill it), so slapd isn't probably catching the event at all. - alessandro ranellucci.
moved from Incoming to Archive.Incoming
assumed fixed in HEAD/2.3.22