[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3933) slapd failes to shut down



quanah@stanford.edu wrote:

>Full_Name: Quanah Gibson-mount
>Version: REL_ENG_23
>OS: Solaris 8
>URL: ftp://ftp.openldap.org/incoming/
>Submission from: (NULL) (171.66.182.82)
>
>
>Yesterday I set up 4 systems, 1 a master, 3 replicas.  I have a nightly job that
>stops the server and does various things and then restarts the server.  The job
>waits until the server stops to do anything.  If the server hasn't exited within
>a certain amount of time, it throws an error and stops anything else from
>happening.  Last night, the job went to stop the master, sending it a signal
>around 4:00 a.m.:
>
>Aug 13 04:09:15 ldap-dev0.Stanford.EDU slapd[606]: [ID 543694 local4.debug]
>daemon: shutdown requested and initiated.
>
>This process has never exited.  A backtrace shows:
>
>(gdb) thr apply all bt
>
>Thread 10 (LWP 3):
>#0  0xfed64e48 in lwp_mutex_lock () from /usr/lib/lwp/libthread.so.1
>#1  0xfed5ff90 in mutex_lock_kernel () from /usr/lib/lwp/libthread.so.1
>#2  0xfed609b0 in stall () from /usr/lib/lwp/libthread.so.1
>#3  0xfed60f7c in mutex_lock_internal () from /usr/lib/lwp/libthread.so.1
>
>Thread 9 (LWP 4):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 8 (LWP 5):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 7 (LWP 6):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 6 (LWP 7):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 5 (LWP 8):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 4 (LWP 9):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 3 (LWP 10):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 2 (LWP 11):
>#0  0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>#1  0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>#2  0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>#3  0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>
>Thread 1 (LWP 1):
>#0  0xfee1f368 in _lwp_wait () from /usr/lib/libc.so.1
>#1  0xfed5ca88 in lwp_wait () from /usr/lib/lwp/libthread.so.1
>#2  0xfed58370 in _thrp_join () from /usr/lib/lwp/libthread.so.1
>#3  0x000245b0 in slapd_daemon () at daemon.c:2045
>(gdb) thr 1
>[Switching to thread 1 (LWP 1)]#0  0xfee1f368 in _lwp_wait () from
>/usr/lib/libc.so.1
>(gdb) l
>2045                    ldap_pvt_thread_join( listener_tid, (void *) NULL );
>2046            }
>2047    #else
>2048            /* experimental code */
>2049            slapd_daemon_task( NULL );
>2050    #endif
>2051
>2052            return 0;
>2053
>2054    }
>  
>

This list is simply indicating that the main thread is waiting for the 
listener thread to return, which doesn't occur because thread 10 is 
trying to lock a mutex; however, the trace of thread 10 is too short to 
understand where the lock occurs.  Can you print more of that?  Another 
comment: this morning I had issues with HEAD because of Hallvard's 
commits; the problem ended out because (dunno why) the rebuild was only 
partial (make's fault?) while he changed the Connection struct in 
slap.h; everything went smooth after a "make clean && make".  Just to 
clear out any doubt: did you rebuild from scratch, or kept cvs 
update'ing during the last days storm on RE23?

p.


    SysNet - via Dossi,8 27100 Pavia Tel: +390382573859 Fax: +390382476497