[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd deadlock bug ITS#7296



Richard Silverman wrote:
On Thu, 20 Dec 2012, Howard Chu wrote:

Please post the entire stack trace for all threads. Since we have
  c_writers = 8, c_writing = 1, c_writewaiter = 1

then one of the threads ought to be blocked on the write2_cv (result.c:373).
That implies that they're still waiting for epoll() to say that the socket is
writable.

Sure; here is a dump of the stacks of all 18 executing threads:

[Switching to thread 2 (Thread 0x7f21de7fc700 (LWP 8482))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
      at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_posix.c:277
#2  0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f2210697720, mutex=0x7f22106976d0) at thr_debug.c:954
#3  0x000000000043e487 in send_ldap_ber (op=0x7f21e0140260, ber=0x7f21de67a460) at result.c:309
#4  0x000000000044160b in slap_send_search_entry (op=0x7f21e0140260, rs=0x7f21de7fb920) at result.c:1435
#5  0x00000000004c5c27 in bdb_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:1014
#6  0x000000000042dd6a in fe_op_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:402
#7  0x000000000042d5ae in do_search (op=0x7f21e0140260, rs=0x7f21de7fb920) at search.c:247
#8  0x000000000042a776 in connection_operation (ctx=0x7f21de7fba00, arg_v=0x7f21e0140260) at connection.c:1218
#9  0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21ec136ad0) at thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21de7fc700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

OK, most of the threads look like thread 2.

Thread 13 and thread 16 are waiting for their respective sockets to be writable.

[Switching to thread 13 (Thread 0x7f21f27fc700 (LWP 8470))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
      at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_posix.c:277
#2  0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f221069b9a8, mutex=0x7f221069b958) at thr_debug.c:954
#3  0x000000000043e716 in send_ldap_ber (op=0x7f21cc191420, ber=0x7f21f267a460) at result.c:379
#4  0x000000000044160b in slap_send_search_entry (op=0x7f21cc191420, rs=0x7f21f27fb920) at result.c:1435
#5  0x00000000004c5c27 in bdb_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:1014
#6  0x000000000042dd6a in fe_op_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:402
#7  0x000000000042d5ae in do_search (op=0x7f21cc191420, rs=0x7f21f27fb920) at search.c:247
#8  0x000000000042a776 in connection_operation (ctx=0x7f21f27fba00, arg_v=0x7f21cc191420) at connection.c:1218
#9  0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000f20) at thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21f27fc700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

[Switching to thread 16 (Thread 0x7f21f3fff700 (LWP 8467))]#0  pthread_cond_wait@@GLIBC_2.3.2 ()
      at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
162	62:	movl	(%rsp), %edi
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162
#1  0x000000000056926e in ldap_int_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_posix.c:277
#2  0x000000000056b1c2 in ldap_pvt_thread_cond_wait (cond=0x7f22106977c0, mutex=0x7f2210697770) at thr_debug.c:954
#3  0x000000000043e716 in send_ldap_ber (op=0x7f21c8111140, ber=0x7f21f3e7d460) at result.c:379
#4  0x000000000044160b in slap_send_search_entry (op=0x7f21c8111140, rs=0x7f21f3ffe920) at result.c:1435
#5  0x00000000004c5c27 in bdb_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:1014
#6  0x000000000042dd6a in fe_op_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:402
#7  0x000000000042d5ae in do_search (op=0x7f21c8111140, rs=0x7f21f3ffe920) at search.c:247
#8  0x000000000042a776 in connection_operation (ctx=0x7f21f3ffea00, arg_v=0x7f21c8111140) at connection.c:1218
#9  0x0000000000567cc7 in ldap_int_thread_pool_wrapper (xpool=0x17f0e90) at tpool.c:688
#10 0x000000000056a767 in ldap_debug_thread_wrapper (arg=0x7f21f4000a60) at thr_debug.c:770
#11 0x00000039e3407851 in start_thread (arg=0x7f21f3fff700) at pthread_create.c:301
#12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

So the question is why are these threads still waiting on their sockets.

Looking at your logs, it's clear that the error occurs and connection_closing() is called, which will call connection_abandon() to set the o_abandon flag on all of the connection's ops.

Can you please provide the output for "print *op" and "print *op->o_hdr"
in thread 13 and thread 16, frame 3?

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/