[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd deadlock bug ITS#7296



Hello,

Iâm working on this bug:

http://www.openldap.org/lists/openldap-bugs/201206/msg00026.html

If slapd client connections are torn down in mid-query -- the server has
received the query and has a pending reply to send, but the connection is
closed by the client before it can be sent -- this deadlocks slapd worker
threads. Eventually all threads are deadlocked in send_ldap_ber() which
serializes their network access to send PDUs, and the server becomes
unresponsive and has to be killed.

send_ldap_ber() notices the connection drop and calls connection_closing(). The
problem appears to be that then connection_abandon() abandons all outstanding
executing ops, but does not empty the c_ops queue (as it does with
c_pending_ops). When connection_close() looks at the connection, it always sees
there are outstanding ops and defers the close. I see this pattern:

50cb3104 connection_closing: readying conn=1519 sd=33 for close

50cb3104 connection_close: deferring conn=1519 sd=33
50cb3104 connection_resched: attempting closing conn=1519 sd=33
50cb3104 connection_close: deferring conn=1519 sd=33
50cb3104 connection_resched: attempting closing conn=1519 sd=33

... which repeats until the server freezes entirely.

If I add code to connection_abandon() to empty c_ops, it causes slapd to crash
later with a mutex usage error, so that's apparently not the right place/way to
do it. If I note that the connection is dying and have connection_destroy()
skip the assertion that c_ops must be empty, it fixes the bug: the deadlock no
longer occurs. However, I'm concerned this will leak memory as the ops aren't
being freed.  So my question is: what's the right way to get the outstanding
executing ops abandoned by connection_abandon() to be freed?

The code is complex and I may have misunderstood how best to go about fixing
this, but hopefully this is enough to make sense.

Thanks,

--
  Richard E. Silverman