[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE24 testing call #1 (OL 2.4.24)



masarati@aero.polimi.it wrote:

Both, as well as when running the head tests suite with the 2.4.23
release.  Looks as if the swamp additions have tripped into an
existing problem, not anything new.  Leave it out of RE24 until if
have been resolved?

Btw, any other Solaris test runs out there?  I´t like to know if it is
a real Solaris problem or just me..

I'm seeing a similar failure on 32 bit Sparc Solaris 10. But it actually
locks
up in test036 for me, I never get as far as test039. The gdb trace looks
much
the same as what you posted.

Looks like for some reason threads that are blocked waiting for their
sockets
to become writable are never getting waken up. A regular SIGINT shuts down
slapd cleanly so it doesn't appear to be a problem with the condvars being
used to manage the threads. That kinda points to select() simply not
returning
the writable status.

I haven't used this Solaris machine much, but in fact (looking at the
remnants
of other files in my source tree on this box) this appears to have been a
problem since at least last August. (I.e., it looks like I was
investigating
this same problem back then but dropped it and never got back to it.)

Not sure whether it is related, but I'm currently running test036 with
-DLDAP_THREAD_DEBUG (for unrelated purposes) and I see some mutex-related
failures, of the type

conn=1031 op=1 SRCH base="cn=Monitor" scope=2 deref=0
filter="(objectClass=*)"
../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1029:
ldap_pvt_thread_mutex_unlock error: !THREAD_MUTEX_OWNER( mutex )
../../../ldap-2.4-src/libraries/libldap_r/thr_debug.c:1033:
ldap_pvt_thread_mutex_unlock error: rc is 1

I see a lot of them; they always appear within operations affecting
back-monitor, this seems to be consistent with Rein's backtrace.

uname -a
Linux fl1 2.6.34.7-0.5-desktop #1 SMP PREEMPT 2010-10-25 08:40:12 +0200
x86_64 x86_64 x86_64 GNU/Linux

Running with valgrind/helgrind, I get a hang on Linux too. Unfortunately I can't get a backtrace from the valgrind'd slapd. It shows a fair number of data races in back-meta.

There are also some lock ordering issues, but we already know about most of them and the code avoids deadlock using trylock() when needed. But there are a couple that don't, and thus are deadlock hazards. (request and abandon in libldap seems to be the prime offender.)

I've uploaded my testrun directory to
 http://highlandsun.com/hyc/20110111-testr.tgz

for reference. (Looks like ftp.openldap.org is full again.)

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/