changed notes changed state Open to Test moved from Incoming to Software Bugs
Full_Name: Aaron Richton Version: 2.2.20 OS: Solaris 9 URL: Submission from: (NULL) (165.230.41.99) An ldapmodify process with a moderate amount (636 DNs affected) of changes was running for excessive time and was caught by our systems monitoring software. The stack trace showed a likely deadlock condition in slapd. ldapmodify was stuck in poll(), waiting for return from slapd. I used gcore to take a snapshot for future debugging, then sent a SIGINT resulting in [local4.debug] slapd shutdown: waiting for 3 threads to terminate I waited a few minutes and again used gcore before sending SIGKILL. This second backtrace follows: Thread 5 (process 468682 ): #0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 #1 0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1 #2 0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1 #3 0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88) at thr_posix.c:286 #4 0x00128048 in hdb_cache_add (bdb=0x287ed8, eip=0x197d928, e=0xa3564e0, nrdn=0xd87ff968, locker=2147486116) at cache.c:880 #5 0x0012449c in hdb_add (op=0x2fe238, rs=0xd87ffd58) at add.c:417 #6 0x0006556c in do_add (op=0x2fe238, rs=0xd87ffd58) at add.c:318 #7 0x0005b734 in connection_operation (ctx=0xd87ffe14, arg_v=0x2fe238) at connection.c:1061 #8 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 Thread 4 (process 403146 ): #0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 #1 0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1 #2 0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1 #3 0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88) at thr_posix.c:286 #4 0x00127ba0 in hdb_cache_find_id (op=0x3ff870, tid=0x0, id=13837, eip=0xd8fffbb0, islocked=0, locker=785, lock=0xd8fffc1c) at cache.c:784 #5 0x0012e8b4 in hdb_dn2entry (op=0x3ff870, tid=0x0, dn=0x3ff898, e=0xd8fffc34, matched=1, locker=785, lock=0xd8fffc1c) at dn2entry.c:70 #6 0x0012559c in hdb_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:70 #7 0x00082c5c in do_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:622 #8 0x0005b6a4 in connection_operation (ctx=0xd8fffe14, arg_v=0x3ff870) at connection.c:1051 #9 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 Thread 3 (process 272074 ): #0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 #1 0xfedd2ba8 in cond_wait_queue () from /usr/lib/libthread.so.1 #2 0xfedd3364 in cond_wait () from /usr/lib/libthread.so.1 #3 0xfedd33a0 in pthread_cond_wait () from /usr/lib/libthread.so.1 #4 0x0016ab80 in ldap_pvt_thread_cond_wait (cond=0x286fe8, mutex=0x286fc0) at thr_posix.c:251 #5 0x001680e4 in ldap_pvt_thread_rdwr_wlock (rwlock=0x287f80) at rdwr.c:289 #6 0x001271ec in hdb_cache_lru_add (bdb=0x287ed8, locker=755, ei=0x8ffe4f8) at cache.c:555 #7 0x00127cec in hdb_cache_find_id (op=0x8f17ad0, tid=0x0, id=369, eip=0xd9f3f968, islocked=0, locker=755, lock=0xd9f3f7fc) at cache.c:799 #8 0x000f7534 in hdb_do_search () at tools.c:274 #9 0x000f4fa4 in hdb_search () at tools.c:274 #10 0x0005f3d0 in do_search (op=0x8f17ad0, rs=0xd9fffd58) at search.c:412 #11 0x0005b89c in _end () #12 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 Thread 2 (process 141002 ): #0 0xfedd59b4 in lwp_yield () from /usr/lib/libthread.so.1 #1 0x0016aa34 in ldap_pvt_thread_yield () at thr_posix.c:184 #2 0x001692ec in ldap_pvt_thread_pool_destroy (tpool=0x22ea54, run_pending=1) at tpool.c:376 #3 0x00057f64 in slapd_daemon_task (ptr=0x0) at daemon.c:2003 Thread 1 (process 75466 ): #0 0xfee9f330 in _lwp_wait () from /usr/lib/libc.so.1 #1 0xfedcd6bc in lwp_wait () from /usr/lib/libthread.so.1 #2 0xfedc90f0 in _thrp_join () from /usr/lib/libthread.so.1 #3 0x0016a9b8 in ldap_pvt_thread_join (thread=2, thread_return=0x0) at thr_posix.c:151 #4 0x00058074 in slapd_daemon () at daemon.c:2037 #5 0x00046744 in main (argc=4, argv=0xffbffd94) at main.c:713 Logs, core files, and the LDIF in question are available if necessary.
Thanks for the report. Please try applying the change in back-bdb/cache.c rev 1.91 and see if the problem still occurs. richton@nbcs.rutgers.edu wrote: >Full_Name: Aaron Richton >Version: 2.2.20 >OS: Solaris 9 >URL: >Submission from: (NULL) (165.230.41.99) > > >An ldapmodify process with a moderate amount (636 DNs affected) of changes was >running for excessive time and was caught by our systems monitoring software. >The stack trace showed a likely deadlock condition in slapd. ldapmodify was >stuck in poll(), waiting for return from slapd. I used gcore to take a snapshot >for future debugging, then sent a SIGINT resulting in > >[local4.debug] slapd shutdown: waiting for 3 threads to terminate > >I waited a few minutes and again used gcore before sending SIGKILL. This second >backtrace follows: > >Thread 5 (process 468682 ): >#0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 >#1 0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1 >#2 0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1 >#3 0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88) > at thr_posix.c:286 >#4 0x00128048 in hdb_cache_add (bdb=0x287ed8, eip=0x197d928, e=0xa3564e0, > nrdn=0xd87ff968, locker=2147486116) at cache.c:880 >#5 0x0012449c in hdb_add (op=0x2fe238, rs=0xd87ffd58) at add.c:417 >#6 0x0006556c in do_add (op=0x2fe238, rs=0xd87ffd58) at add.c:318 >#7 0x0005b734 in connection_operation (ctx=0xd87ffe14, arg_v=0x2fe238) > at connection.c:1061 >#8 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 > >Thread 4 (process 403146 ): >#0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 >#1 0xfedd1678 in mutex_lock_queue () from /usr/lib/libthread.so.1 >#2 0xfedd2078 in slow_lock () from /usr/lib/libthread.so.1 >#3 0x0016ac3c in ldap_pvt_thread_mutex_lock (mutex=0x287f88) > at thr_posix.c:286 >#4 0x00127ba0 in hdb_cache_find_id (op=0x3ff870, tid=0x0, id=13837, > eip=0xd8fffbb0, islocked=0, locker=785, lock=0xd8fffc1c) at cache.c:784 >#5 0x0012e8b4 in hdb_dn2entry (op=0x3ff870, tid=0x0, dn=0x3ff898, > e=0xd8fffc34, matched=1, locker=785, lock=0xd8fffc1c) at dn2entry.c:70 >#6 0x0012559c in hdb_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:70 >#7 0x00082c5c in do_bind (op=0x3ff870, rs=0xd8fffd58) at bind.c:622 >#8 0x0005b6a4 in connection_operation (ctx=0xd8fffe14, arg_v=0x3ff870) > at connection.c:1051 >#9 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 > >Thread 3 (process 272074 ): >#0 0xfedd58f8 in __lwp_park () from /usr/lib/libthread.so.1 >#1 0xfedd2ba8 in cond_wait_queue () from /usr/lib/libthread.so.1 >#2 0xfedd3364 in cond_wait () from /usr/lib/libthread.so.1 >#3 0xfedd33a0 in pthread_cond_wait () from /usr/lib/libthread.so.1 >#4 0x0016ab80 in ldap_pvt_thread_cond_wait (cond=0x286fe8, mutex=0x286fc0) > at thr_posix.c:251 >#5 0x001680e4 in ldap_pvt_thread_rdwr_wlock (rwlock=0x287f80) at rdwr.c:289 >#6 0x001271ec in hdb_cache_lru_add (bdb=0x287ed8, locker=755, ei=0x8ffe4f8) > at cache.c:555 >#7 0x00127cec in hdb_cache_find_id (op=0x8f17ad0, tid=0x0, id=369, > eip=0xd9f3f968, islocked=0, locker=755, lock=0xd9f3f7fc) at cache.c:799 >#8 0x000f7534 in hdb_do_search () at tools.c:274 >#9 0x000f4fa4 in hdb_search () at tools.c:274 >#10 0x0005f3d0 in do_search (op=0x8f17ad0, rs=0xd9fffd58) at search.c:412 >#11 0x0005b89c in _end () >#12 0x00169730 in ldap_int_thread_pool_wrapper (xpool=0x23f6f8) at tpool.c:467 > >Thread 2 (process 141002 ): >#0 0xfedd59b4 in lwp_yield () from /usr/lib/libthread.so.1 >#1 0x0016aa34 in ldap_pvt_thread_yield () at thr_posix.c:184 >#2 0x001692ec in ldap_pvt_thread_pool_destroy (tpool=0x22ea54, run_pending=1) > at tpool.c:376 >#3 0x00057f64 in slapd_daemon_task (ptr=0x0) at daemon.c:2003 > >Thread 1 (process 75466 ): >#0 0xfee9f330 in _lwp_wait () from /usr/lib/libc.so.1 >#1 0xfedcd6bc in lwp_wait () from /usr/lib/libthread.so.1 >#2 0xfedc90f0 in _thrp_join () from /usr/lib/libthread.so.1 >#3 0x0016a9b8 in ldap_pvt_thread_join (thread=2, thread_return=0x0) > at thr_posix.c:151 >#4 0x00058074 in slapd_daemon () at daemon.c:2037 >#5 0x00046744 in main (argc=4, argv=0xffbffd94) at main.c:713 > > >Logs, core files, and the LDIF in question are available if necessary. > > > > -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
changed notes changed state Test to Release
I applied back-bdb/cache.c patch and reran a similar-size ldapmodify. No hang. This isn't the first time I've seen hangs on large modifies--just the first time I caught it--so I'll probably start running with this in production.
Aaron Richton wrote: >I applied back-bdb/cache.c patch and reran a similar-size ldapmodify. No >hang. This isn't the first time I've seen hangs on large modifies--just >the first time I caught it--so I'll probably start running with this in >production. > Thanks for the feedback. The patch will be in 2.2.21. -- -- Howard Chu Chief Architect, Symas Corp. Director, Highland Sun http://www.symas.com http://highlandsun.com/hyc Symas: Premier OpenSource Development and Support
changed state Release to Closed
moved from Software Bugs to Archive.Software Bugs
fix in HEAD,re22