[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5707) HEAD/RE24 and BDB 4.7.25p1 hanging



hyc@symas.com wrote:
> I was unable to reproduce the problem on my multi-core machines, but I do see
> it on a single-core machine. I've sent a backtrace and other debug info to the
> Oracle folks, will see what they have to say.

I see the problem; it's a bug in BDB's multi-partition lock manager. When 
using multiple lock table partitions, it obtains a lock on the system-wide 
lock mutex and a lock on the per-region mutex. On a single core system it 
defaults to a single lock table. In this case, the macro that obtains the 
system-wide lock behaves identically to the per-region lock. I.e., both 
attempt to acquire the exact same mutex. Since it's already held, the process 
deadlocks.

(gdb) bt
#0  0xb7f37424 in __kernel_vsyscall ()
#1  0xb7b36c4e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2  0xb7b32a3c in _L_mutex_lock_88 () from /lib/libpthread.so.0
#3  0xb7b3242d in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
     at ../dist/../mutex/mut_pthread.c:207
#5  0xb7daad19 in __lock_getobj (lt=0x8a84848, obj=0xbfd492ec, ndx=492,
     create=1, retp=0xbfd491e4) at ../dist/../lock/lock.c:1470
#6  0xb7da7f53 in __lock_get_internal (lt=0x8a84848, sh_locker=0xb776d508,
     flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, timeout=0,
     lock=0xbfd493cc) at ../dist/../lock/lock.c:588
#7  0xb7da77d6 in __lock_get_api (env=0x8a84550, locker=2147483659, flags=1,
     obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
     at ../dist/../lock/lock.c:423
#8  0xb7da765b in __lock_get_pp (dbenv=0x8a841c0, locker=2147483659, flags=1,
     obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
     at ../dist/../lock/lock.c:395
#9  0x08124fb8 in bdb_dn2id_lock (bdb=0x8a68620, dn=0xbfd493f0, rw=0,
     txn=0x8a890b8, lock=0xbfd493cc)
     at ../../../../head/servers/slapd/back-bdb/dn2id.c:47
#10 0x08125d7d in bdb_dn2id (op=0xbfd49640, dn=0xbfd493f0, ei=0xbfd493e0,
     txn=0x8a890b8, lock=0xbfd493cc)
     at ../../../../head/servers/slapd/back-bdb/dn2id.c:307
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) frame 4
#4  0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
     at ../dist/../mutex/mut_pthread.c:207
207		RET_SET((pthread_mutex_lock(&mutexp->mutex)), ret);
(gdb) p *mutexp
$1 = {mutex = {__data = {__lock = 2, __count = 0, __owner = 29470, __kind = 0,
       __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
     __size = 
"\002\000\000\000\000\000\000\000\036s\000\000\000\000\000\000\001\000\000\000\000\000\000", 
__align = 2}, cond = {__data = {__lock = 0,
       __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0,
       __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
     __size = '\0' <repeats 47 times>, __align = 0}, pid = 29470,
   tid = 3080046272, mutex_next_link = 0, alloc_id = 6, mutex_set_wait = 1,
   mutex_set_nowait = 129, flags = 3}
(gdb)

The mutex being acquired in frame 4 is the same one that was already acquired 
in frame 7, __lock_get_api line 418.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/