[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Server Hangs

David Engeset wrote:
      Attached are the two new debug output files and below is what I
used to compile BDB for debug.

env CFPLAGS="-g -O2 -fPIC" ../dist/configure --enable-posixmutexes
make&&  make install

Here's the relevant info:

This locker is holding a write lock that the other threads are blocked waiting on:

8000542d dd= 6 locks held 1    write locks 1    pid/thread 1502/2366790512
8000542d WRITE         1 HELD    dn2id.bdb                 page          4

This thread is using both a reader transaction and a write transaction (which is perfectly fine)
80005666 dd= 0 locks held 2    write locks 0    pid/thread 1502/2483026800
80005666 READ          1 HELD    0x2c1bc len:   5 data: 0x0300000000
80005666 READ          1 HELD    0x29f4c len:   5 data: 0x020x52000000
80005667 dd= 0 locks held 5    write locks 3    pid/thread 1502/2483026800
80005667 READ          1 WAIT    dn2id.bdb                 page          4
80005667 WRITE         2 HELD    dn2id.bdb                 page          2
80005667 READ          2 HELD    dn2id.bdb                 page          2
80005667 WRITE         2 HELD    dn2id.bdb                 page       1300
80005667 READ          1 HELD    dn2id.bdb                 page       1300
80005667 WRITE         4 HELD    dn2id.bdb                 page       1301

This thread is only waiting to read the locked page:
80004e6f dd=19 locks held 0    write locks 0    pid/thread 1502/2437909360
80004e6f READ          1 WAIT    dn2id.bdb                 page          4

Thread 2366790512 is 0x8d125b70
       2483026800 is 0x93fffb70
       2437909360 is 0x914f8b70

In your gdb output we see that 0x914f8b70 is LWP 2306, waiting in a search.
Thread 0x93fffb70 is LWP 1506, waiting in a delete.
Thread 0x8d125b70 is LWP 2457, completely idle.

I.e., the thread that owns the offending write lock is not executing any operation at all. It's important to note that back-bdb only takes write locks inside of a transaction, and transactions either commit or abort. In either case, all of their locks are supposed to be released at the end. It is impossible for slapd code to leak write locks like this. (In fact, the slapd code itself can never lock an individual DB page, as is being done here. Those locks can only be taken by the actual BDB library code.) As such, it appears to be a bug in the BDB library you're using.

Are you sure that OpenLDAP was built against the BDB library you've built, as opposed to some version that was already installed on your system?

  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/