[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5836) hung writers



Full_Name: Quanah Gibson-Mount
Version: HEAD/RE24/RE23
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.111.29.239)


In a scenario where the network is broken (or other reasons, but that's where I
hit it), slapd can get locked up waiting on writers to finish.  When that
happens, everything ends up in a pthread_cond_wait, like this:

Thread 8 (Thread 1098918240 (LWP 18884)):
#0  0x0000003fca9089ba in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/tls/libpthread.so.0
#1  0x0000002a956c8f58 in ldap_pvt_thread_cond_wait (cond=0x385bdf0,
mutex=0x385bca0) at thr_posix.c:299
#2  0x000000000043dc91 in send_ldap_ber (conn=0x385bc88, ber=0x41680610) at
result.c:198
#3  0x0000000000441450 in slap_send_search_entry (op=0x625bc00, rs=0x41801d60)
at result.c:1137
#4  0x0000002a9717b9e1 in bdb_search (op=0x625bc00, rs=0x41801d60) at
search.c:879
#5  0x000000000049b9c9 in overlay_op_walk (op=0x625bc00, rs=0x41801d60,
which=op_search, oi=0xdece00, on=0x0) at backover.c:650
#6  0x000000000049bb91 in over_op_func (op=0x625bc00, rs=0x41801d60,
which=op_search) at backover.c:702
#7  0x000000000049bc27 in over_op_search (op=0x625bc00, rs=0x41801d60) at
backover.c:724
#8  0x000000000042f1d8 in fe_op_search (op=0x625bc00, rs=0x41801d60) at
search.c:355
#9  0x000000000042ecac in do_search (op=0x625bc00, rs=0x41801d60) at
search.c:217
#10 0x000000000042bd9c in connection_operation (ctx=0x41801e90, arg_v=0x625bc00)
at connection.c:1133
#11 0x000000000042c28c in connection_read_thread (ctx=0x41801e90, argv=0x1f) at
connection.c:1261
#12 0x0000002a956c7c77 in ldap_int_thread_pool_wrapper (xpool=0x8a2f00) at
tpool.c:478
#13 0x0000003fca90610a in start_thread () from /lib64/tls/libpthread.so.0
#14 0x0000003fca0c68c3 in clone () from /lib64/tls/libc.so.6
#15 0x0000000000000000 in ?? ()

slapd ends up hanging indefinitely.  This happened with 8 replicas connected to
the server, where for four of those replicas, the connection between the master
and the replica wouldn't connect (but you could go from the replica to the
master) at some point after the initial connection was made.

Patch to HEAD from Howard is committed.

--Quanah