Issue 3494 - slapd deadlock on MOD
Summary: slapd deadlock on MOD
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-19 16:38 UTC by richton@nbcs.rutgers.edu
Modified: 2014-08-01 21:06 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Howard Chu 2005-01-19 10:20:28 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 1 richton@nbcs.rutgers.edu 2005-01-19 16:38:27 UTC
Full_Name: Aaron Richton
Version: 2.2.20
OS: Solaris 9
URL: 
Submission from: (NULL) (165.230.41.99)


An ldapmodify process with a moderate amount (636 DNs affected) of changes was
running for excessive time and was caught by our systems monitoring software.
The stack trace showed a likely deadlock condition in slapd. ldapmodify was
stuck in poll(), waiting for return from slapd. I used gcore to take a snapshot
for future debugging, then sent a SIGINT resulting in

[local4.debug] slapd shutdown: waiting for 3 threads to terminate

I waited a few minutes and again used gcore before sending SIGKILL. This second
backtrace follows:

Thread 5 (process 468682    ):
#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
#1  0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1
#2  0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1
#3  0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88)
    at thr_posix.c:286
#4  0x00128048 in hdb_cache_add (bdb=0x287ed8, eip=0x197d928, e=0xa3564e0,
    nrdn=0xd87ff968, locker=2147486116) at cache.c:880
#5  0x0012449c in hdb_add (op=0x2fe238, rs=0xd87ffd58) at add.c:417
#6  0x0006556c in do_add (op=0x2fe238, rs=0xd87ffd58) at add.c:318
#7  0x0005b734 in connection_operation (ctx=0xd87ffe14, arg_v=0x2fe238)
    at connection.c:1061
#8  0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467

Thread 4 (process 403146    ):
#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
#1  0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1
#2  0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1
#3  0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88)
    at thr_posix.c:286
#4  0x00127ba0 in hdb_cache_find_id (op=0x3ff870, tid=0x0, id=13837,
    eip=0xd8fffbb0, islocked=0, locker=785, lock=0xd8fffc1c) at cache.c:784
#5  0x0012e8b4 in hdb_dn2entry (op=0x3ff870, tid=0x0, dn=0x3ff898,
    e=0xd8fffc34, matched=1, locker=785, lock=0xd8fffc1c) at dn2entry.c:70
#6  0x0012559c in hdb_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:70
#7  0x00082c5c in do_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:622
#8  0x0005b6a4 in connection_operation (ctx=0xd8fffe14, arg_v=0x3ff870)
    at connection.c:1051
#9  0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467

Thread 3 (process 272074    ):
#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
#1  0xfedd2ba8 in cond_wait_queue () from /usr/lib/libthread.so.1
#2  0xfedd3364 in cond_wait () from /usr/lib/libthread.so.1
#3  0xfedd33a0 in pthread_cond_wait () from /usr/lib/libthread.so.1
#4  0x0016ab80 in ldap_pvt_thread_cond_wait (cond=0x286fe8, mutex=0x286fc0)
    at thr_posix.c:251
#5  0x001680e4 in ldap_pvt_thread_rdwr_wlock (rwlock=0x287f80) at rdwr.c:289
#6  0x001271ec in hdb_cache_lru_add (bdb=0x287ed8, locker=755, ei=0x8ffe4f8)
    at cache.c:555
#7  0x00127cec in hdb_cache_find_id (op=0x8f17ad0, tid=0x0, id=369,
    eip=0xd9f3f968, islocked=0, locker=755, lock=0xd9f3f7fc) at cache.c:799
#8  0x000f7534 in hdb_do_search () at tools.c:274
#9  0x000f4fa4 in hdb_search () at tools.c:274
#10 0x0005f3d0 in do_search (op=0x8f17ad0, rs=0xd9fffd58) at search.c:412
#11 0x0005b89c in _end ()
#12 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467

Thread 2 (process 141002    ):
#0  0xfedd59b4 in lwp_yield () from /usr/lib/libthread.so.1
#1  0x0016aa34 in ldap_pvt_thread_yield () at thr_posix.c:184
#2  0x001692ec in ldap_pvt_thread_pool_destroy (tpool=0x22ea54, run_pending=1)
    at tpool.c:376
#3  0x00057f64 in slapd_daemon_task (ptr=0x0) at daemon.c:2003

Thread 1 (process 75466    ):
#0  0xfee9f330 in _lwp_wait () from /usr/lib/libc.so.1
#1  0xfedcd6bc in lwp_wait () from /usr/lib/libthread.so.1
#2  0xfedc90f0 in _thrp_join () from /usr/lib/libthread.so.1
#3  0x0016a9b8 in ldap_pvt_thread_join (thread=2, thread_return=0x0)
    at thr_posix.c:151
#4  0x00058074 in slapd_daemon () at daemon.c:2037
#5  0x00046744 in main (argc=4, argv=0xffbffd94) at main.c:713


Logs, core files, and the LDIF in question are available if necessary.

Comment 2 Howard Chu 2005-01-19 18:19:37 UTC
Thanks for the report. Please try applying the change in 
back-bdb/cache.c rev 1.91 and see if the problem still occurs.

richton@nbcs.rutgers.edu wrote:

>Full_Name: Aaron Richton
>Version: 2.2.20
>OS: Solaris 9
>URL: 
>Submission from: (NULL) (165.230.41.99)
>
>
>An ldapmodify process with a moderate amount (636 DNs affected) of changes was
>running for excessive time and was caught by our systems monitoring software.
>The stack trace showed a likely deadlock condition in slapd. ldapmodify was
>stuck in poll(), waiting for return from slapd. I used gcore to take a snapshot
>for future debugging, then sent a SIGINT resulting in
>
>[local4.debug] slapd shutdown: waiting for 3 threads to terminate
>
>I waited a few minutes and again used gcore before sending SIGKILL. This second
>backtrace follows:
>
>Thread 5 (process 468682    ):
>#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
>#1  0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1
>#2  0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1
>#3  0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88)
>    at thr_posix.c:286
>#4  0x00128048 in hdb_cache_add (bdb=0x287ed8, eip=0x197d928, e=0xa3564e0,
>    nrdn=0xd87ff968, locker=2147486116) at cache.c:880
>#5  0x0012449c in hdb_add (op=0x2fe238, rs=0xd87ffd58) at add.c:417
>#6  0x0006556c in do_add (op=0x2fe238, rs=0xd87ffd58) at add.c:318
>#7  0x0005b734 in connection_operation (ctx=0xd87ffe14, arg_v=0x2fe238)
>    at connection.c:1061
>#8  0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467
>
>Thread 4 (process 403146    ):
>#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
>#1  0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1
>#2  0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1
>#3  0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88)
>    at thr_posix.c:286
>#4  0x00127ba0 in hdb_cache_find_id (op=0x3ff870, tid=0x0, id=13837,
>    eip=0xd8fffbb0, islocked=0, locker=785, lock=0xd8fffc1c) at cache.c:784
>#5  0x0012e8b4 in hdb_dn2entry (op=0x3ff870, tid=0x0, dn=0x3ff898,
>    e=0xd8fffc34, matched=1, locker=785, lock=0xd8fffc1c) at dn2entry.c:70
>#6  0x0012559c in hdb_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:70
>#7  0x00082c5c in do_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:622
>#8  0x0005b6a4 in connection_operation (ctx=0xd8fffe14, arg_v=0x3ff870)
>    at connection.c:1051
>#9  0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467
>
>Thread 3 (process 272074    ):
>#0  0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1
>#1  0xfedd2ba8 in cond_wait_queue () from /usr/lib/libthread.so.1
>#2  0xfedd3364 in cond_wait () from /usr/lib/libthread.so.1
>#3  0xfedd33a0 in pthread_cond_wait () from /usr/lib/libthread.so.1
>#4  0x0016ab80 in ldap_pvt_thread_cond_wait (cond=0x286fe8, mutex=0x286fc0)
>    at thr_posix.c:251
>#5  0x001680e4 in ldap_pvt_thread_rdwr_wlock (rwlock=0x287f80) at rdwr.c:289
>#6  0x001271ec in hdb_cache_lru_add (bdb=0x287ed8, locker=755, ei=0x8ffe4f8)
>    at cache.c:555
>#7  0x00127cec in hdb_cache_find_id (op=0x8f17ad0, tid=0x0, id=369,
>    eip=0xd9f3f968, islocked=0, locker=755, lock=0xd9f3f7fc) at cache.c:799
>#8  0x000f7534 in hdb_do_search () at tools.c:274
>#9  0x000f4fa4 in hdb_search () at tools.c:274
>#10 0x0005f3d0 in do_search (op=0x8f17ad0, rs=0xd9fffd58) at search.c:412
>#11 0x0005b89c in _end ()
>#12 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467
>
>Thread 2 (process 141002    ):
>#0  0xfedd59b4 in lwp_yield () from /usr/lib/libthread.so.1
>#1  0x0016aa34 in ldap_pvt_thread_yield () at thr_posix.c:184
>#2  0x001692ec in ldap_pvt_thread_pool_destroy (tpool=0x22ea54, run_pending=1)
>    at tpool.c:376
>#3  0x00057f64 in slapd_daemon_task (ptr=0x0) at daemon.c:2003
>
>Thread 1 (process 75466    ):
>#0  0xfee9f330 in _lwp_wait () from /usr/lib/libc.so.1
>#1  0xfedcd6bc in lwp_wait () from /usr/lib/libthread.so.1
>#2  0xfedc90f0 in _thrp_join () from /usr/lib/libthread.so.1
>#3  0x0016a9b8 in ldap_pvt_thread_join (thread=2, thread_return=0x0)
>    at thr_posix.c:151
>#4  0x00058074 in slapd_daemon () at daemon.c:2037
>#5  0x00046744 in main (argc=4, argv=0xffbffd94) at main.c:713
>
>
>Logs, core files, and the LDIF in question are available if necessary.
>
>
>  
>


-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Comment 3 Howard Chu 2005-01-20 10:58:20 UTC
changed notes
changed state Test to Release
Comment 4 richton@nbcs.rutgers.edu 2005-01-20 13:11:35 UTC
I applied back-bdb/cache.c patch and reran a similar-size ldapmodify. No
hang. This isn't the first time I've seen hangs on large modifies--just
the first time I caught it--so I'll probably start running with this in
production.

Comment 5 Howard Chu 2005-01-20 18:58:47 UTC
Aaron Richton wrote:

>I applied back-bdb/cache.c patch and reran a similar-size ldapmodify. No
>hang. This isn't the first time I've seen hangs on large modifies--just
>the first time I caught it--so I'll probably start running with this in
>production.
>
Thanks for the feedback. The patch will be in 2.2.21.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Comment 6 Kurt Zeilenga 2005-01-22 21:31:52 UTC
changed state Release to Closed
Comment 7 Howard Chu 2009-02-17 05:09:57 UTC
moved from Software Bugs to Archive.Software Bugs
Comment 8 OpenLDAP project 2014-08-01 21:06:35 UTC
fix in HEAD,re22