[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE_23 hangs in test039 (back-hdb)

To: openldap-devel@OpenLDAP.org
Subject: Re: RE_23 hangs in test039 (back-hdb)
From: Howard Chu <hyc@highlandsun.com>
Date: Sun, 08 Jan 2006 18:52:21 -0800
In-reply-to: <43C1CCA0.4030703@highlandsun.com>
References: <FE6E92DB6CB2E3A1F92545E0@cadabra-sw.stanford.edu> <309874DB0378CEBCB41E53B9@cadabra-sw.stanford.edu> <43C1CCA0.4030703@highlandsun.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20051221 SeaMonkey/1.5a Mnenhy/0.7.3.0

Howard Chu wrote:

After looking over slapd.1.log I believe I understand the problem now.
On slapd.1 the connection in question did this:
conn=7 fd=20 ACCEPT from IP=127.0.0.1:36431 (IP=127.0.0.1:9011) connection_get(20) connection_get(20): got connid=7 connection_read(20): checking for input on id=7 ber_get_next ber_get_next: tag 0x30 len 46 contents: ber_get_next do_bind ... connection_get(20) connection_get(20): got connid=7 connection_read(20): checking for input on id=7 ber_get_next ber_get_next: tag 0x30 len 46 contents: connection_input: conn=7 deferring operation: binding ber_get_next ber_get_next: tag 0x30 len 46 contents: connection_input: conn=7 deferring operation: pending operations ber_get_next ... ber_get_next on fd 20 failed errno=11 (Resource temporarily unavailable) do_bind: version=3 dn="cn=Manager,dc=example,dc=com" method=128 conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" method=128 ==> hdb_bind: dn: cn=Manager,dc=example,dc=com conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" mech=SIMPLE ssf=0 do_bind: v3 bind: "cn=Manager,dc=example,dc=com" to "cn=Manager,dc=example,dc=co m" send_ldap_result: conn=7 op=0 p=3 send_ldap_result: err=0 matched="" text="" send_ldap_response: msgid=1 tag=97 err=0 ber_flush: 14 bytes to sd 20 conn=7 op=0 RESULT tag=97 err=0 text= conn=8 fd=21 ACCEPT from IP=127.0.0.1:36432 (IP=127.0.0.1:9011) do_bind conn=7 op=2 BIND anonymous mech=implicit ssf=0

What's important to note here is that it skips from op=0 to op=2 and never responds to op=1. By the time everything hangs, there are 15 pending operations on this connection but nothing is making progress any more. I believe what has happened here is that op=1 got implicitly abandoned once slapd recognized that op=2 was a Bind request. (This occurs in connection_input(), before the operation gets queued.)

I think a correct fix might be to put off the abandon step until the Bind request actually gets dequeued.

Also I'll note that this whole situation occurs due to back-ldap reusing an existing connection. When I wrote the connection-sharing code in back-ldap, I specifically wrote it to only reuse a connection if the userID was the same, so it should never actually be sending multiple Bind requests on the same connection. A lot of that has changed with idassert support etc., so I suspect that there's really a bug in back-ldap that needs to be fixed. But we should still consider when is the correct time for an incoming Bind to trigger the abandons.

--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/

Follow-Ups:
- Re: RE_23 hangs in test039 (back-hdb)
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: RE_23 hangs in test039 (back-hdb)
  - From: Pierangelo Masarati <ando@sys-net.it>

Prev by Date: Re: Disabling schema checking in slapadd
Next by Date: Re: RE_23 hangs in test039 (back-hdb)
Index(es):
- Chronological
- Thread