Logged in as guest
Viewing Software Bugs/6090 Full headers
Major security issue: yes no
Notes: Fixed in HEAD fixed in RE24 Notification:
Date: Fri, 01 May 2009 21:53:23 +0000 From: jwm@horde.net To: openldap-its@OpenLDAP.org Subject: slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
Full_Name: John Morrissey Version: 2.4.16 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92) After a couple days of uptime, slapd no longer responds to incoming connections (the connection would be accepted, but all LDAP operations would block). All worker threads seem to be blocking on mutex acquisition in bdb_cache_lru_link(). One thread was chewing lots of CPU. Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll keep it around for a week or two. This is with BDB 4.7.25+all three patches. GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Attaching to program: /usr/sbin/slapd, process 6344 Reading symbols from /usr/lib/libtcmalloc.so.0...done. Loaded symbols for /usr/lib/libtcmalloc.so.0 Reading symbols from /usr/lib/libprofiler.so.0...done. Loaded symbols for /usr/lib/libprofiler.so.0 Reading symbols from /usr/lib/libldap_r-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/libldap_r-2.4.so.2.4.2...done. done. Loaded symbols for /usr/lib/libldap_r-2.4.so.2 Reading symbols from /usr/lib/liblber-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/liblber-2.4.so.2.4.2...done. done. Loaded symbols for /usr/lib/liblber-2.4.so.2 Reading symbols from /usr/lib/libdb-4.7.so...done. Loaded symbols for /usr/lib/libdb-4.7.so Reading symbols from /usr/lib/libodbc.so.1...done. Loaded symbols for /usr/lib/libodbc.so.1 Reading symbols from /usr/lib/libslp.so.1...done. Loaded symbols for /usr/lib/libslp.so.1 Reading symbols from /usr/lib/libsasl2.so.2...done. Loaded symbols for /usr/lib/libsasl2.so.2 Reading symbols from /usr/lib/libgnutls.so.26...Reading symbols from /usr/lib/debug/usr/lib/libgnutls.so.26.4.6...done. done. Loaded symbols for /usr/lib/libgnutls.so.26 Reading symbols from /usr/lib/debug/libcrypt.so.1...done. Loaded symbols for /usr/lib/debug/libcrypt.so.1 Reading symbols from /usr/lib/debug/libresolv.so.2...done. Loaded symbols for /usr/lib/debug/libresolv.so.2 Reading symbols from /usr/lib/libltdl.so.3...done. Loaded symbols for /usr/lib/libltdl.so.3 Reading symbols from /usr/lib/debug/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0xb78e18e0 (LWP 6344)] Loaded symbols for /usr/lib/debug/libpthread.so.0 Reading symbols from /usr/lib/debug/libc.so.6...done. Loaded symbols for /usr/lib/debug/libc.so.6 Reading symbols from /usr/lib/libstdc++.so.6...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /usr/lib/debug/libm.so.6...done. Loaded symbols for /usr/lib/debug/libm.so.6 Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /usr/lib/debug/libnsl.so.1...done. Loaded symbols for /usr/lib/debug/libnsl.so.1 Reading symbols from /usr/lib/debug/libdl.so.2...done. Loaded symbols for /usr/lib/debug/libdl.so.2 Reading symbols from /usr/lib/libtasn1.so.3...done. Loaded symbols for /usr/lib/libtasn1.so.3 Reading symbols from /usr/lib/libgpg-error.so.0...done. Loaded symbols for /usr/lib/libgpg-error.so.0 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libgcrypt.so.11...done. Loaded symbols for /usr/lib/libgcrypt.so.11 Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/ld-2.7.so...done. done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/debug/libnss_files.so.2...done. Loaded symbols for /usr/lib/debug/libnss_files.so.2 Reading symbols from /usr/lib/debug/libnss_compat.so.2...done. Loaded symbols for /usr/lib/debug/libnss_compat.so.2 Reading symbols from /usr/lib/debug/libnss_nis.so.2...done. Loaded symbols for /usr/lib/debug/libnss_nis.so.2 Reading symbols from /lib/libnss_ldap.so.2...done. Loaded symbols for /lib/libnss_ldap.so.2 Reading symbols from /usr/lib/libkrb5.so.3...done. Loaded symbols for /usr/lib/libkrb5.so.3 Reading symbols from /lib/libcom_err.so.2...done. Loaded symbols for /lib/libcom_err.so.2 Reading symbols from /usr/lib/libgssapi_krb5.so.2...done. Loaded symbols for /usr/lib/libgssapi_krb5.so.2 Reading symbols from /usr/lib/libk5crypto.so.3...done. Loaded symbols for /usr/lib/libk5crypto.so.3 Reading symbols from /usr/lib/libkrb5support.so.0...done. Loaded symbols for /usr/lib/libkrb5support.so.0 Reading symbols from /lib/libkeyutils.so.1...done. Loaded symbols for /lib/libkeyutils.so.1 Reading symbols from /usr/lib/sasl2/libsasldb.so.2...done. Loaded symbols for /usr/lib/sasl2/libsasldb.so.2 Reading symbols from /usr/lib/libdb-4.6.so...done. Loaded symbols for /usr/lib/libdb-4
Date: Fri, 01 May 2009 15:00:26 -0700 From: Howard Chu <hyc@symas.com> To: jwm@horde.net CC: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
jwm@horde.net wrote: > Full_Name: John Morrissey > Version: 2.4.16 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92) > > > After a couple days of uptime, slapd no longer responds to incoming connections > (the connection would be accepted, but all LDAP operations would block). All > worker threads seem to be blocking on mutex acquisition in bdb_cache_lru_link(). > One thread was chewing lots of CPU. > > Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll keep > it around for a week or two. This is with BDB 4.7.25+all three patches. > Interesting trace, it looks like all the active threads are waiting for the mutex but apparently none of them owns it. Can you please provide the contents of the mutex? e.g. thread 14 frame 3 print *mutex -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Fri, 01 May 2009 15:10:46 -0700 From: Howard Chu <hyc@symas.com> To: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
hyc@symas.com wrote: > jwm@horde.net wrote: >> Full_Name: John Morrissey >> Version: 2.4.16 >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92) >> >> >> After a couple days of uptime, slapd no longer responds to incoming connections >> (the connection would be accepted, but all LDAP operations would block). All >> worker threads seem to be blocking on mutex acquisition in bdb_cache_lru_link(). >> One thread was chewing lots of CPU. >> >> Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll keep >> it around for a week or two. This is with BDB 4.7.25+all three patches. >> > Interesting trace, it looks like all the active threads are waiting for the > mutex but apparently none of them owns it. Can you please provide the contents > of the mutex? e.g. > thread 14 > frame 3 > print *mutex Ah, I missed this before, your thread 3 is inside a BerkeleyDB lock function. There's nothing useful in the trace for thread 3 though. It seems you may need to recompile BerkeleyDB with debugging enabled (and with -fno-omit-frame-pointer) to get a useful trace from this. This is looking more like a BDB locking issue than an OpenLDAP issue. If you still have the environment, db_stat -CA would be helpful. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Fri, 1 May 2009 18:11:30 -0400 From: John Morrissey <jwm@horde.net> To: Howard Chu <hyc@symas.com> Cc: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
On Fri, May 01, 2009 at 03:00:26PM -0700, Howard Chu wrote: > jwm@horde.net wrote: >> Full_Name: John Morrissey >> Version: 2.4.16 >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (2001:4978:194:0:21f:5bff:fee9:da92) >> >> After a couple days of uptime, slapd no longer responds to incoming >> connections (the connection would be accepted, but all LDAP operations >> would block). All worker threads seem to be blocking on mutex acquisition >> in bdb_cache_lru_link(). One thread was chewing lots of CPU. >> >> Backtrace is below. I also have a ~1.7GB core if it's deemed useful; I'll >> keep it around for a week or two. This is with BDB 4.7.25+all three >> patches. > > Interesting trace, it looks like all the active threads are waiting for > the mutex but apparently none of them owns it. Can you please provide the > contents of the mutex? e.g. > thread 14 > frame 3 > print *mutex (gdb) fra 3 #3 0xb7eec1cd in ldap_pvt_thread_mutex_lock (mutex=0x940a2cc) at /tmp/buildd/openldap-2.4.16/libraries/libldap_r/thr_posix.c:296 296 return ERRVAL( pthread_mutex_lock( mutex ) ); (gdb) print *mutex $1 = {__data = {__lock = 2, __count = 0, __owner = 6372, __kind = 0, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size = "\002\000\000\000\000\000\000\000###30\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2} LWP 6372 is the thread trying to do BDB lock promotion. john -- John Morrissey _o /\ ---- __o jwm@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
Date: Mon, 4 May 2009 09:42:24 -0400 From: John Morrissey <jwm@horde.net> To: Howard Chu <hyc@symas.com> Cc: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
On Fri, 01 May 2009 15:10:46 -0700, Howard Chu wrote: > It seems you may need to recompile BerkeleyDB with debugging enabled (and > with -fno-omit-frame-pointer) to get a useful trace from this. This is > looking more like a BDB locking issue than an OpenLDAP issue. Done. Now we wait. > If you still have the environment, db_stat -CA would be helpful. slapd has been restarted since, so the lock statistics won't be useful. I've modified our harness to get them automatically with the backtrace. john -- John Morrissey _o /\ ---- __o jwm@horde.net _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__
Date: Thu, 7 May 2009 20:20:24 -0400 From: John Morrissey <jwm@horde.net> To: Howard Chu <hyc@symas.com> Cc: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
On Mon, May 04, 2009 at 09:42:24AM -0400, John Morrissey wrote: > On Fri, 01 May 2009 15:10:46 -0700, Howard Chu wrote: > > It seems you may need to recompile BerkeleyDB with debugging enabled (a= nd > > with -fno-omit-frame-pointer) to get a useful trace from this. This is > > looking more like a BDB locking issue than an OpenLDAP issue. >=20 > Done. Now we wait. >=20 > > If you still have the environment, db_stat -CA would be helpful. >=20 > slapd has been restarted since, so the lock statistics won't be useful. > I've modified our harness to get them automatically with the backtrace. Experienced this behavior again today. All worker threads are blocking on mutex acquisition when dealing with the BDB cache. There seems to be a mix of OpenLDAP mutexes and mutexes in Berkeley DB itself. db_stat lock output is below. Still have both the core and the DB environment from immediately after stopping slapd. What do you make of this, Howard? john Rotated `/var/lib/slapd-debug/log' at Mon May 4 13:35:02 UTC 2009. Rotated `/var/lib/slapd-debug/core' at Mon May 4 13:35:02 UTC 2009. GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.htm= l> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu"... Attaching to program: /usr/sbin/slapd, process 5752 Reading symbols from /usr/lib/libtcmalloc.so.0...done. Loaded symbols for /usr/lib/libtcmalloc.so.0 Reading symbols from /usr/lib/libprofiler.so.0...done. Loaded symbols for /usr/lib/libprofiler.so.0 Reading symbols from /usr/lib/libldap_r-2.4.so.2...Reading symbols from /us= r/lib/debug/usr/lib/libldap_r-2.4.so.2.4.2...done. done. Loaded symbols for /usr/lib/libldap_r-2.4.so.2 Reading symbols from /usr/lib/liblber-2.4.so.2...Reading symbols from /usr/= lib/debug/usr/lib/liblber-2.4.so.2.4.2...done. done. Loaded symbols for /usr/lib/liblber-2.4.so.2 Reading symbols from /usr/lib/libdb-4.7.so...Reading symbols from /usr/lib/= debug/usr/lib/libdb-4.7.so.debug...done. done. Loaded symbols for /usr/lib/libdb-4.7.so Reading symbols from /usr/lib/libodbc.so.1...done. Loaded symbols for /usr/lib/libodbc.so.1 Reading symbols from /usr/lib/libslp.so.1...done. Loaded symbols for /usr/lib/libslp.so.1 Reading symbols from /usr/lib/libsasl2.so.2...done. Loaded symbols for /usr/lib/libsasl2.so.2 Reading symbols from /usr/lib/libgnutls.so.26...Reading symbols from /usr/l= ib/debug/usr/lib/libgnutls.so.26.4.6...done. done. Loaded symbols for /usr/lib/libgnutls.so.26 Reading symbols from /usr/lib/debug/libcrypt.so.1...done. Loaded symbols for /usr/lib/debug/libcrypt.so.1 Reading symbols from /usr/lib/debug/libresolv.so.2...done. Loaded symbols for /usr/lib/debug/libresolv.so.2 Reading symbols from /usr/lib/libltdl.so.3...done. Loaded symbols for /usr/lib/libltdl.so.3 Reading symbols from /usr/lib/debug/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0xb78878e0 (LWP 5752)] [New Thread 0x60e67b90 (LWP 5758)] [New Thread 0x61267b90 (LWP 5757)] [New Thread 0x61667b90 (LWP 5756)] [New Thread 0x61a77b90 (LWP 5755)] [New Thread 0x61e77b90 (LWP 5754)] Loaded symbols for /usr/lib/debug/libpthread.so.0 Reading symbols from /usr/lib/debug/libc.so.6...done. Loaded symbols for /usr/lib/debug/libc.so.6 Reading symbols from /usr/lib/libstdc++.so.6...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /usr/lib/debug/libm.so.6...done. Loaded symbols for /usr/lib/debug/libm.so.6 Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /usr/lib/debug/libnsl.so.1...done. Loaded symbols for /usr/lib/debug/libnsl.so.1 Reading symbols from /usr/lib/debug/libdl.so.2...done. Loaded symbols for /usr/lib/debug/libdl.so.2 Reading symbols from /usr/lib/libtasn1.so.3...done. Loaded symbols for /usr/lib/libtasn1.so.3 Reading symbols from /usr/lib/libgpg-error.so.0...done. Loaded symbols for /usr/lib/libgpg-error.so.0 Reading symbols from /usr/lib/libz.so.1...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libgcrypt.so.11...done. Loaded symbols for /usr/lib/libgcrypt.so.11 Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/deb= ug/lib/ld-2.7.so...done. done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /usr/lib/debug/libnss_files.so.2...done. Loaded symbols for /usr/lib/debug/libnss_files.so.2 Reading symbols from /usr/lib/debug/libnss_compat.so.2...done. Loaded symbols for /usr/lib/debug/libnss_compat.so.2 Reading symbols from /usr/lib/debug/libnss_nis.so.2...done. Loaded symbols for /usr/lib/debug/libnss_nis.so.2 Reading symbols from /lib/libnss_ldap.so.2...done. Loaded symbols for /lib/libnss_ldap.so.2 Reading symbols from /usr/lib/libkrb5.so.3...done.
Date: Fri, 08 May 2009 02:40:49 -0700 From: Howard Chu <hyc@symas.com> To: John Morrissey <jwm@horde.net> CC: openldap-its@openldap.org Subject: Re: (ITS#6090) slapd locks up; all slapd worker threads blocking on mutex acquisition in bdb_cache_lru_link()
John Morrissey wrote: > On Mon, May 04, 2009 at 09:42:24AM -0400, John Morrissey wrote: >> On Fri, 01 May 2009 15:10:46 -0700, Howard Chu wrote: >>> It seems you may need to recompile BerkeleyDB with debugging enabled (and >>> with -fno-omit-frame-pointer) to get a useful trace from this. This is >>> looking more like a BDB locking issue than an OpenLDAP issue. >> >> Done. Now we wait. >> >>> If you still have the environment, db_stat -CA would be helpful. >> >> slapd has been restarted since, so the lock statistics won't be useful. >> I've modified our harness to get them automatically with the backtrace. > > Experienced this behavior again today. > > All worker threads are blocking on mutex acquisition when dealing with the > BDB cache. There seems to be a mix of OpenLDAP mutexes and mutexes in > Berkeley DB itself. > > db_stat lock output is below. Still have both the core and the DB > environment from immediately after stopping slapd. > > What do you make of this, Howard? Hard to say; if it's a back-bdb lock problem you should be able to tell by printing the contents of each mutex in each thread that's waiting in pthread_mutex_lock - it will show the thread ID of the current owner of each waiting mutex. I should note that I've recently fixed a deadlock issue in the cache code in HEAD; it would be worthwhile for you to sync up with the new code and see if this problem continues to occur. It's still a little puzzling to see some of these threads stuck in BDB mutex operations; even with the deadlock problem I was investigating, it would never lock inside of BDB, it would only lock on a back-bdb mutex. At any rate, I suggest you grab the newest code. I've also told Quanah to go ahead and sync it into RE24 so it may be there by the time you look into this again. > john > > Rotated `/var/lib/slapd-debug/log' at Mon May 4 13:35:02 UTC 2009. > Rotated `/var/lib/slapd-debug/core' at Mon May 4 13:35:02 UTC 2009. > GNU gdb 6.8-debian > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later<http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "i486-linux-gnu"... > Attaching to program: /usr/sbin/slapd, process 5752 > Reading symbols from /usr/lib/libtcmalloc.so.0...done. > Loaded symbols for /usr/lib/libtcmalloc.so.0 > Reading symbols from /usr/lib/libprofiler.so.0...done. > Loaded symbols for /usr/lib/libprofiler.so.0 > Reading symbols from /usr/lib/libldap_r-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/libldap_r-2.4.so.2.4.2...done. > done. > Loaded symbols for /usr/lib/libldap_r-2.4.so.2 > Reading symbols from /usr/lib/liblber-2.4.so.2...Reading symbols from /usr/lib/debug/usr/lib/liblber-2.4.so.2.4.2...done. > done. > Loaded symbols for /usr/lib/liblber-2.4.so.2 > Reading symbols from /usr/lib/libdb-4.7.so...Reading symbols from /usr/lib/debug/usr/lib/libdb-4.7.so.debug...done. > done. > Loaded symbols for /usr/lib/libdb-4.7.so > Reading symbols from /usr/lib/libodbc.so.1...done. > Loaded symbols for /usr/lib/libodbc.so.1 > Reading symbols from /usr/lib/libslp.so.1...done. > Loaded symbols for /usr/lib/libslp.so.1 > Reading symbols from /usr/lib/libsasl2.so.2...done. > Loaded symbols for /usr/lib/libsasl2.so.2 > Reading symbols from /usr/lib/libgnutls.so.26...Reading symbols from /usr/lib/debug/usr/lib/libgnutls.so.26.4.6...done. > done. > Loaded symbols for /usr/lib/libgnutls.so.26 > Reading symbols from /usr/lib/debug/libcrypt.so.1...done. > Loaded symbols for /usr/lib/debug/libcrypt.so.1 > Reading symbols from /usr/lib/debug/libresolv.so.2...done. > Loaded symbols for /usr/lib/debug/libresolv.so.2 > Reading symbols from /usr/lib/libltdl.so.3...done. > Loaded symbols for /usr/lib/libltdl.so.3 > Reading symbols from /usr/lib/debug/libpthread.so.0...done. > [Thread debugging using libthread_db enabled] > [New Thread 0xb78878e0 (LWP 5752)] > [New Thread 0x60e67b90 (LWP 5758)] > [New Thread 0x61267b90 (LWP 5757)] > [New Thread 0x61667b90 (LWP 5756)] > [New Thread 0x61a77b90 (LWP 5755)] > [New Thread 0x61e77b90 (LWP 5754)] > Loaded symbols for /usr/lib/debug/libpthread.so.0 > Reading symbols from /usr/lib/debug/libc.so.6...done. > Loaded symbols for /usr/lib/debug/libc.so.6 > Reading symbols from /usr/lib/libstdc++.so.6...done. > Loaded symbols for /usr/lib/libstdc++.so.6 > Reading symbols from /usr/lib/debug/libm.so.6...done. > Loaded symbols for /usr/lib/debug/libm.so.6 > Reading symbols from /lib/libgcc_s.so.1...done. > Loaded symbols for /lib/libgcc_s.so.1 > Reading symbo
______________ © Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org