[Date Prev][Date Next] [Chronological] [Thread] [Top]

ITS#4247



Recently, "hangs" (i.e. the proxy and the remote servers do nothing) re-
appeared on my laptop during test036/039.  First of all, it appears
roughly 50% of the times; drops down to one every 10-15 runs when
logging is reduced to 0, or just stats.  After attaching so many times
to slapd with gdb, to see that's absolutely idle in daemon on select, I
decided to attach to one of the tester clients that are waiting for
results.  All of them appear to hang on poll() called with NULL timeout.
BTW, my system is CentOS 4.2, which is a clone of RHEL4 with kernel
2.6.9.  portable.h says that HAVE_POLL but don't HAVE_EPOLL.  After
undef'ing HAVE_POLL, and thus using select(), the hang reduced quite a
lot: it ran 20 times with full logging (which, with poll(), would have
had 50% chances of hanging immediately).

I note that there's no more possibility of hangs between the proxy and
the remote servers, because synchronous operations are no longer in use,
and the admin can set timeouts that would resolve the issue at some
point; moreover, in any of the cases I saw, all threads of the proxy
were idle, not waiting or looping on select/poll (this would be
ITS#4246, which I haven't seen ever since any more).  I'm really at a
loss in identifying the reason of such behavior, given that logs do not
appear to indicate any faulty behavior before the hang; yet, many
clients (~15 simultaneously) appear to be waiting for something that
doesn't happen.

p.




Ing. Pierangelo Masarati
Responsabile Open Solution
OpenLDAP Core Team

SysNet s.n.c.
Via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
------------------------------------------
Office:   +39.02.23998309          
Mobile:   +39.333.4963172
Email:    pierangelo.masarati@sys-net.it
------------------------------------------