[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#4476) connection deadlock



Full_Name: Pierangelo Masarati
Version: HEAD
OS: Linux CentOS4.2 (kernel 2.6)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.72.89.40)
Submitted by: ando


[I suspect the OS is irrelevant, as I noted similar behavior on Solaris 8; I
also suspect the version is re23 as well, as I noted similar behavior with re23,
but I cannot confirm it's the same issue]

I have been able to produce repeated deadlocks in slapd while stressing the
proxy backends.  Scenario: a variant of back-meta that contacts multiple targets
represented by regular slapd with back-bdb databases.  The issue mostly appears
on the target systems, so it's pure OpenLDAP; I suspect someting similar may
have appeared on the proxy.  At some point, when the traffic is very heavy, the
system becomes irresponsive.  Even stopping and restarting the proxy doesn't
help.  Multiple threads on the target systems are stuck on send_ldap_ber(); in
the most recent case, inspection with gdb yielded the backtrace below:


Thread 18 (Thread 1082132832 (LWP 23153)):
#0  0x0000003a484bef86 in __select_nocancel () from /lib64/tls/libc.so.6
#1  0x00000000004427f8 in slapd_daemon_task (ptr=0x0) at daemon.c:1842
#2  0x0000003a48f0610a in start_thread () from /lib64/tls/libpthread.so.0
#3  0x0000003a484c5ee3 in clone () from /lib64/tls/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 17 (Thread 1090525536 (LWP 23158)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aaa0f8378 in ?? ()
#2  0x3530303000000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x40e7f8f0,
    desc=0x40e7f8f0, val=0x41000de0, access=808525938,
    state=0x40f316e00000000, maskp=0x40e7f240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x40e7f8f0)
    at result.c:161
(More stack frames follow...)

Thread 16 (Thread 1098918240 (LWP 23170)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aaa1f9380 in ?? ()
#2  0x6f632e6f00000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x416808f0,
    desc=0x416808f0, val=0x41801de0, access=1668048244,
    state=0x3230303000000000, maskp=0x41680240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x416808f0)
    at result.c:161
(More stack frames follow...)

Thread 15 (Thread 1107310944 (LWP 23171)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aaa2fa380 in ?? ()
#2  0x6f632e6f00000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x41e818f0,
    desc=0x41e818f0, val=0x42002de0, access=1668048244,
    state=0x3930303000000000, maskp=0x41e81240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x41e818f0)
    at result.c:161
(More stack frames follow...)

Thread 14 (Thread 1115703648 (LWP 23172)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aad3fe380 in ?? ()
#2  0x6f632e6f00000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x426828f0,
    desc=0x426828f0, val=0x42803de0, access=1668048244,
    state=0x3330303000000000, maskp=0x42682240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x426828f0)
    at result.c:161
(More stack frames follow...)

Thread 13 (Thread 1124096352 (LWP 23173)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aae500388 in ?? ()
#2  0x3132303900000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x42e838f0,
    desc=0x42e838f0, val=0x43004de0, access=925904944,
    state=0x332b0d0400000000, maskp=0x42e83240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x42e838f0)
    at result.c:161
(More stack frames follow...)

Thread 12 (Thread 1132489056 (LWP 23174)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002aaf602378 in ?? ()
#2  0x3733303000000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x436848f0,
    desc=0x436848f0, val=0x43805de0, access=808525938,
    state=0x40f316e00000000, maskp=0x43684240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x436848f0)
    at result.c:161
(More stack frames follow...)

Thread 11 (Thread 1140881760 (LWP 23176)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab0704378 in ?? ()
#2  0x3530303000000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x43e858f0,
    desc=0x43e858f0, val=0x44006de0, access=808525938,
    state=0x40f316e00000000, maskp=0x43e85240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x43e858f0)
    at result.c:161
(More stack frames follow...)

Thread 10 (Thread 1149274464 (LWP 23177)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab0805388 in ?? ()
#2  0x3132303900000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x446868f0,
    desc=0x446868f0, val=0x44807de0, access=825241648,
    state=0x332b0d0400000000, maskp=0x44686240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x446868f0)
    at result.c:161
(More stack frames follow...)

Thread 9 (Thread 1157667168 (LWP 23180)):
#0  0x0000003a48f089aa in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/tls/libpthread.so.0
#1  0x00000000005d8aac in ldap_pvt_thread_cond_wait (cond=0x2aaa0ce698,
    mutex=0x2aaa0ce548) at thr_posix.c:299
#2  0x0000000000458705 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x44e878f0)
    at result.c:207
#3  0x000000000045bf12 in slap_send_search_entry (op=0x9cd160, rs=0x45008de0)
    at result.c:1237
#4  0x00000000004d1239 in bdb_search (op=0x9cd160, rs=0x45008de0)
    at search.c:878
#5  0x0000000000449d5f in fe_op_search (op=0x9cd160, rs=0x45008de0)
    at search.c:355
(More stack frames follow...)

Thread 8 (Thread 1166059872 (LWP 23182)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab3d01378 in ?? ()
#2  0x3331303000000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x456888f0,
    desc=0x456888f0, val=0x45809de0, access=808525938,
    state=0x40f316e00000000, maskp=0x45688240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x456888f0)
    at result.c:161
(More stack frames follow...)

Thread 7 (Thread 1174452576 (LWP 23183)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab3c00378 in ?? ()
#2  0x3031080400000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x45e898f0,
    desc=0x45e898f0, val=0x4600ade0, access=842018864,
    state=0xa31646900000000, maskp=0x45e89240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x45e898f0)
    at result.c:161
(More stack frames follow...)

Thread 6 (Thread 1182845280 (LWP 23187)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab5e04388 in ?? ()
#2  0x3132303900000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x4668a8f0,
    desc=0x4668a8f0, val=0x4680bde0, access=808464432,
    state=0x332b0d0400000000, maskp=0x4668a240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x4668a8f0)
    at result.c:161
(More stack frames follow...)

Thread 5 (Thread 1191237984 (LWP 23188)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000fa08f8 in ?? ()
#2  0x3031080400000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x46e8b890,
    desc=0x46e8b890, val=0x4700cd80, access=909324336,
    state=0xa31646900000000, maskp=0x46e8b1e0) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x46e8b890)
    at result.c:161
(More stack frames follow...)

Thread 4 (Thread 1199630688 (LWP 23189)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000000000ea08e8 in ?? ()
#2  0x3031080400000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x4768c8f0,
    desc=0x4768c8f0, val=0x4780dde0, access=942878768,
    state=0xa31646900000000, maskp=0x4768c240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0ce530, ber=0x4768c8f0)
    at result.c:161
(More stack frames follow...)

Thread 3 (Thread 1208023392 (LWP 23193)):
#0  0x0000003a48f089aa in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/tls/libpthread.so.0
#1  0x00000000005d8aac in ldap_pvt_thread_cond_wait (cond=0x2aaa050928,
    mutex=0x2aaa0507d8) at thr_posix.c:299
#2  0x0000000000458705 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x47e8d8f0)
    at result.c:207
#3  0x000000000045bf12 in slap_send_search_entry (op=0x2abb7b7ef0,
    rs=0x4800ede0) at result.c:1237
#4  0x00000000004d1239 in bdb_search (op=0x2abb7b7ef0, rs=0x4800ede0)
    at search.c:878
#5  0x0000000000449d5f in fe_op_search (op=0x2abb7b7ef0, rs=0x4800ede0)
    at search.c:355
(More stack frames follow...)

Thread 2 (Thread 1216416096 (LWP 23194)):
#0  0x0000003a48f0adfb in __lll_mutex_lock_wait ()
   from /lib64/tls/libpthread.so.0
#1  0x0000002ab9402380 in ?? ()
#2  0x6f632e6f00000000 in ?? ()
#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x4868e8f0,
    desc=0x4868e8f0, val=0x4880fde0, access=1668048244,
    state=0x3030303000000000, maskp=0x4868e240) at acl.c:427
#5  0x0000000000458550 in send_ldap_ber (conn=0x2aaa0507c0, ber=0x4868e8f0)
    at result.c:161
(More stack frames follow...)

Thread 1 (Thread 182894137632 (LWP 23152)):
#0  0x0000003a48f06ffb in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000005d8a04 in ldap_pvt_thread_join (thread=1082132832,
    thread_return=0x0) at thr_posix.c:193
#2  0x0000000000443496 in slapd_daemon () at daemon.c:2241
#3  0x000000000042c8c7 in main (argc=7, argv=0x7fbffff7b8) at main.c:902

A quick summary:

connection      n.thr.  threads

0x2aaa0507c0    7       17,11,7-5,3-2
0x2aaa0ce530    9       16-12,10-8,4

7 threads are stuck on one connection, and 9 on another.  I'm not much familiar
with that code, but there should clearly be a deadlock.  One oddity is that 

#3  0x0000003a48f07bd4 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x000000000046e11f in access_allowed_mask (op=0x5d8af6, e=0x40e7f8f0,
    desc=0x40e7f8f0, val=0x41000de0, access=808525938,
    state=0x40f316e00000000, maskp=0x40e7f240) at acl.c:427

looks pretty inconsistent, as there is no in pthread_mutex_lock() at acl.c:427
(this is HEAD code); note that the code was compiled by gcc 3.4.5 with -O0 so
there should be no optimization around.

p.