[Date Prev][Date Next] [Chronological] [Thread] [Top]

thread problems w/ slurpd



Hi, 

I'm running openldap-2.1.22 on a RH9 system.  I'm replicating to a 
few other servers w/ slurpd.  From time to time (more and more
these days...) it gets stuck sending to one server.  A look
with gdb shows:

Thread 8 (Thread 1081076528 (LWP 15133)):
#0  0xffffe002 in ?? ()
#1  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 7 (Thread 1085279024 (LWP 15134)):
#0  0xffffe002 in ?? ()
#1  0x0806d430 in ldap_pvt_sasl_mutex_dispose ()
#2  0x0806ce8c in ldap_pvt_sasl_mutex_dispose ()
#3  0x0805dc87 in ldap_pvt_sasl_mutex_dispose ()
#4  0x0805dcf8 in ldap_pvt_sasl_mutex_dispose ()
#5  0x0804fe93 in do_nothing ()
#6  0x0804f2bf in do_nothing ()
#7  0x08054db5 in do_nothing ()
#8  0x080542c2 in do_nothing ()
#9  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 6 (Thread 1089477424 (LWP 15135)):
#0  0xffffe002 in ?? ()
#1  0x08055069 in do_nothing ()
#2  0x080542c2 in do_nothing ()
#3  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 5 (Thread 1093675824 (LWP 15136)):
#0  0xffffe002 in ?? ()
#1  0x08055069 in do_nothing ()
#2  0x080542c2 in do_nothing ()
#3  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 4 (Thread 1097874224 (LWP 15137)):
#0  0xffffe002 in ?? ()
#1  0x08055069 in do_nothing ()
#2  0x080542c2 in do_nothing ()
#3  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 3 (Thread 1102072624 (LWP 15138)):
#0  0xffffe002 in ?? ()
#1  0x08055069 in do_nothing ()
#2  0x080542c2 in do_nothing ()
#3  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 2 (Thread 1106271024 (LWP 15139)):
#0  0xffffe002 in ?? ()
#1  0x08055069 in do_nothing ()
#2  0x080542c2 in do_nothing ()
#3  0x401fa484 in start_thread () from /lib/tls/libpthread.so.0

Thread 1 (Thread 1075992864 (LWP 15132)):
#0  0xffffe002 in ?? ()
#1  0x08059b2f in lutil_sasl_interact ()
#2  0x080522c8 in do_nothing ()
#3  0x42015704 in __libc_start_main () from /lib/tls/libc.so.6
#0  0xffffe002 in ?? ()



If I send a -15 to the job, it won't die, If I look again w/ gdb all
but threads 1 and 7 will be gone (the numbers will vary).

Looking at the code do_nothing() is a no-op (we have sigaction...)
for sigusr1 (sent by the master to break the worker out of it's
sleep()).

So, why are there more then one of them running?  And why don't
you block out signals in the handlers?

-Seth