[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE_23 hangs in test039 (back-hdb)



Howard Chu wrote:
After looking over slapd.1.log I believe I understand the problem now.

On slapd.1 the connection in question did this:

conn=7 fd=20 ACCEPT from IP=127.0.0.1:36431 (IP=127.0.0.1:9011)
connection_get(20)
connection_get(20): got connid=7
connection_read(20): checking for input on id=7
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
ber_get_next
do_bind
...
connection_get(20)
connection_get(20): got connid=7
connection_read(20): checking for input on id=7
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
connection_input: conn=7 deferring operation: binding
ber_get_next
ber_get_next: tag 0x30 len 46 contents:
connection_input: conn=7 deferring operation: pending operations
ber_get_next
...
ber_get_next on fd 20 failed errno=11 (Resource temporarily unavailable)
do_bind: version=3 dn="cn=Manager,dc=example,dc=com" method=128
conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" method=128
==> hdb_bind: dn: cn=Manager,dc=example,dc=com
conn=7 op=0 BIND dn="cn=Manager,dc=example,dc=com" mech=SIMPLE ssf=0
do_bind: v3 bind: "cn=Manager,dc=example,dc=com" to "cn=Manager,dc=example,dc=co
m"
send_ldap_result: conn=7 op=0 p=3
send_ldap_result: err=0 matched="" text=""
send_ldap_response: msgid=1 tag=97 err=0
ber_flush: 14 bytes to sd 20
conn=7 op=0 RESULT tag=97 err=0 text=
conn=8 fd=21 ACCEPT from IP=127.0.0.1:36432 (IP=127.0.0.1:9011)
do_bind
conn=7 op=2 BIND anonymous mech=implicit ssf=0



What's important to note here is that it skips from op=0 to op=2 and never responds to op=1. By the time everything hangs, there are 15 pending operations on this connection but nothing is making progress any more. I believe what has happened here is that op=1 got implicitly abandoned once slapd recognized that op=2 was a Bind request. (This occurs in connection_input(), before the operation gets queued.)


I think a correct fix might be to put off the abandon step until the Bind request actually gets dequeued.

Also I'll note that this whole situation occurs due to back-ldap reusing an existing connection. When I wrote the connection-sharing code in back-ldap, I specifically wrote it to only reuse a connection if the userID was the same, so it should never actually be sending multiple Bind requests on the same connection. A lot of that has changed with idassert support etc., so I suspect that there's really a bug in back-ldap that needs to be fixed. But we should still consider when is the correct time for an incoming Bind to trigger the abandons.


--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/