[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3933) slapd failes to shut down
--On Sunday, August 14, 2005 11:50 AM +0200 Pierangelo Masarati
<ando@sys-net.it> wrote:
> quanah@stanford.edu wrote:
>
>> Full_Name: Quanah Gibson-mount
>> Version: REL_ENG_23
>> OS: Solaris 8
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (171.66.182.82)
>>
>>
>> Yesterday I set up 4 systems, 1 a master, 3 replicas. I have a nightly
>> job that stops the server and does various things and then restarts the
>> server. The job waits until the server stops to do anything. If the
>> server hasn't exited within a certain amount of time, it throws an error
>> and stops anything else from happening. Last night, the job went to
>> stop the master, sending it a signal around 4:00 a.m.:
>>
>> Aug 13 04:09:15 ldap-dev0.Stanford.EDU slapd[606]: [ID 543694
>> local4.debug] daemon: shutdown requested and initiated.
>>
>> This process has never exited. A backtrace shows:
>>
>> (gdb) thr apply all bt
>>
>> Thread 10 (LWP 3):
>># 0 0xfed64e48 in lwp_mutex_lock () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed5ff90 in mutex_lock_kernel () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed609b0 in stall () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed60f7c in mutex_lock_internal () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 9 (LWP 4):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 8 (LWP 5):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 7 (LWP 6):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 6 (LWP 7):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 5 (LWP 8):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 4 (LWP 9):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 3 (LWP 10):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 2 (LWP 11):
>># 0 0xfed64d7c in __lwp_park () from /usr/lib/lwp/libthread.so.1
>># 1 0xfed61f74 in cond_wait_queue () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed626e8 in cond_wait () from /usr/lib/lwp/libthread.so.1
>># 3 0xfed62724 in pthread_cond_wait () from /usr/lib/lwp/libthread.so.1
>>
>> Thread 1 (LWP 1):
>># 0 0xfee1f368 in _lwp_wait () from /usr/lib/libc.so.1
>># 1 0xfed5ca88 in lwp_wait () from /usr/lib/lwp/libthread.so.1
>># 2 0xfed58370 in _thrp_join () from /usr/lib/lwp/libthread.so.1
>># 3 0x000245b0 in slapd_daemon () at daemon.c:2045
>> (gdb) thr 1
>> [Switching to thread 1 (LWP 1)]#0 0xfee1f368 in _lwp_wait () from
>> /usr/lib/libc.so.1
>> (gdb) l
>> 2045 ldap_pvt_thread_join( listener_tid, (void *)
>> NULL ); 2046 }
>> 2047 #else
>> 2048 /* experimental code */
>> 2049 slapd_daemon_task( NULL );
>> 2050 #endif
>> 2051
>> 2052 return 0;
>> 2053
>> 2054 }
>>
>>
>
> This list is simply indicating that the main thread is waiting for the
> listener thread to return, which doesn't occur because thread 10 is
> trying to lock a mutex; however, the trace of thread 10 is too short to
> understand where the lock occurs. Can you print more of that? Another
> comment: this morning I had issues with HEAD because of Hallvard's
> commits; the problem ended out because (dunno why) the rebuild was only
> partial (make's fault?) while he changed the Connection struct in slap.h;
> everything went smooth after a "make clean && make". Just to clear out
> any doubt: did you rebuild from scratch, or kept cvs update'ing during
> the last days storm on RE23?
Unfortunately, I can't print more, because the process is gone. Howard did
get a chance to look at it yesterday though, and didn't have any luck
getting more out of it. I always rebuild my servers from scratch
(completely delete the source tree, and then create a new one). I.e., I
CVS update one location, tar that up as a source package, move that source
package to a new location, and build clean.
Hopefully this was just a transient issue in the storm of updates asthis
particular servers build was about 1 day behind from where 2.3.5 got pulled
together from. I'm setting up the systems all over again with 2.3.5, so if
it happens again tonight the problem is still around.
--Quanah
--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
"These censorship operations against schools and libraries are stronger
than ever in the present religio-political climate. They often focus on
fantasy and sf books, which foster that deadly enemy to bigotry and blind
faith, the imagination." -- Ursula K. Le Guin