[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd deadlock bug ITS#7296



Richard Silverman wrote:

However, this is somehow ineffective, because shortly thereafter all the
worker threads block with the following stack:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x0000000000568f9a in ldap_int_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_posix.c:277
#2  0x000000000056aeee in ldap_pvt_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_debug.c:954
#3  0x000000000043e1b7 in send_ldap_ber (op=0x7fe160129fc0, ber=0x7fe16ae7b460) at result.c:303
#4  0x0000000000441338 in slap_send_search_entry (op=0x7fe160129fc0, rs=0x7fe16affc920) at result.c:1429
#5  0x00000000004c5953 in bdb_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:1014
#6  0x000000000042daae in fe_op_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:402
#7  0x000000000042d2f2 in do_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:247
#8  0x000000000042a4b7 in connection_operation (ctx=0x7fe16affca00, arg_v=0x7fe160129fc0) at connection.c:1150
#9  0x00000000005679f3 in ldap_int_thread_pool_wrapper (xpool=0x1008e90) at tpool.c:688
#10 0x000000000056a493 in ldap_debug_thread_wrapper (arg=0x7fe1740015c0) at thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7fe16affd700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(This is with thread debugging enabled.) They are blocked here, again in
send_ldap_ber():

  	while ( conn->c_writers > 0 && conn->c_writing ) {
  		ldap_pvt_thread_cond_wait( &conn->c_write1_cv, &conn->c_write1_mutex );
  	}

The problem is that they are *all* here; there is no one left to signal
that condition variable, so slapd locks up forever.

Please post the entire stack trace for all threads. Since we have
  c_writers = 8, c_writing = 1, c_writewaiter = 1

then one of the threads ought to be blocked on the write2_cv (result.c:373). That implies that they're still waiting for epoll() to say that the socket is writable.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/