Issue 5707 - HEAD/RE24 and BDB 4.7.25p1 hanging
Summary: HEAD/RE24 and BDB 4.7.25p1 hanging
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-22 21:01 UTC by Michael Ströder
Modified: 2014-08-01 21:03 UTC (History)
0 users

See Also:


Attachments
patch.16415 (2.09 KB, application/octet-stream)
2008-09-26 04:52 UTC, Howard Chu
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Michael Ströder 2008-09-22 21:01:41 UTC
Full_Name: Michael Str�der
Version: HEAD/RE24
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (84.163.97.30)


I'm trying to run current RE24 and HEAD with BDB 4.7.25p1. It hangs in test-001
and it hangs in a LDAP conn (probably when doing a bind). Is that combination
really
stable?

It works with very same build scripts/configuration with 4.6.21+patches.

Further information (bt full, log, BDB build script) is in this archived mailing
list posting:

http://www.openldap.org/lists/openldap-devel/200809/msg00075.html

Comment 1 Howard Chu 2008-09-22 23:03:13 UTC
michael@stroeder.com wrote:
> Full_Name: Michael Ströder
> Version: HEAD/RE24
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (84.163.97.30)
>
>
> I'm trying to run current RE24 and HEAD with BDB 4.7.25p1. It hangs in test-001
> and it hangs in a LDAP conn (probably when doing a bind). Is that combination
> really
> stable?
>
> It works with very same build scripts/configuration with 4.6.21+patches.
>
> Further information (bt full, log, BDB build script) is in this archived mailing
> list posting:
>
> http://www.openldap.org/lists/openldap-devel/200809/msg00075.html

I was unable to reproduce the problem on my multi-core machines, but I do see 
it on a single-core machine. I've sent a backtrace and other debug info to the 
Oracle folks, will see what they have to say.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2008-09-22 23:22:56 UTC
hyc@symas.com wrote:
> I was unable to reproduce the problem on my multi-core machines, but I do see
> it on a single-core machine. I've sent a backtrace and other debug info to the
> Oracle folks, will see what they have to say.

I see the problem; it's a bug in BDB's multi-partition lock manager. When 
using multiple lock table partitions, it obtains a lock on the system-wide 
lock mutex and a lock on the per-region mutex. On a single core system it 
defaults to a single lock table. In this case, the macro that obtains the 
system-wide lock behaves identically to the per-region lock. I.e., both 
attempt to acquire the exact same mutex. Since it's already held, the process 
deadlocks.

(gdb) bt
#0  0xb7f37424 in __kernel_vsyscall ()
#1  0xb7b36c4e in __lll_mutex_lock_wait () from /lib/libpthread.so.0
#2  0xb7b32a3c in _L_mutex_lock_88 () from /lib/libpthread.so.0
#3  0xb7b3242d in pthread_mutex_lock () from /lib/libpthread.so.0
#4  0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
     at ../dist/../mutex/mut_pthread.c:207
#5  0xb7daad19 in __lock_getobj (lt=0x8a84848, obj=0xbfd492ec, ndx=492,
     create=1, retp=0xbfd491e4) at ../dist/../lock/lock.c:1470
#6  0xb7da7f53 in __lock_get_internal (lt=0x8a84848, sh_locker=0xb776d508,
     flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, timeout=0,
     lock=0xbfd493cc) at ../dist/../lock/lock.c:588
#7  0xb7da77d6 in __lock_get_api (env=0x8a84550, locker=2147483659, flags=1,
     obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
     at ../dist/../lock/lock.c:423
#8  0xb7da765b in __lock_get_pp (dbenv=0x8a841c0, locker=2147483659, flags=1,
     obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc)
     at ../dist/../lock/lock.c:395
#9  0x08124fb8 in bdb_dn2id_lock (bdb=0x8a68620, dn=0xbfd493f0, rw=0,
     txn=0x8a890b8, lock=0xbfd493cc)
     at ../../../../head/servers/slapd/back-bdb/dn2id.c:47
#10 0x08125d7d in bdb_dn2id (op=0xbfd49640, dn=0xbfd493f0, ei=0xbfd493e0,
     txn=0x8a890b8, lock=0xbfd493cc)
     at ../../../../head/servers/slapd/back-bdb/dn2id.c:307
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) frame 4
#4  0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104)
     at ../dist/../mutex/mut_pthread.c:207
207		RET_SET((pthread_mutex_lock(&mutexp->mutex)), ret);
(gdb) p *mutexp
$1 = {mutex = {__data = {__lock = 2, __count = 0, __owner = 29470, __kind = 0,
       __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
     __size = 
"\002\000\000\000\000\000\000\000\036s\000\000\000\000\000\000\001\000\000\000\000\000\000", 
__align = 2}, cond = {__data = {__lock = 0,
       __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0,
       __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0},
     __size = '\0' <repeats 47 times>, __align = 0}, pid = 29470,
   tid = 3080046272, mutex_next_link = 0, alloc_id = 6, mutex_set_wait = 1,
   mutex_set_nowait = 129, flags = 3}
(gdb)

The mutex being acquired in frame 4 is the same one that was already acquired 
in frame 7, __lock_get_api line 418.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Howard Chu 2008-09-23 00:21:25 UTC
changed notes
changed state Open to Suspended
Comment 4 Howard Chu 2008-09-26 04:52:04 UTC
A patch from Oracle...

-------- Original Message --------
Subject: Re: 4.7.25 deadlock
Date: Thu, 25 Sep 2008 21:48:20 -0700
From: Howard Chu <hyc@symas.com>
To: Michael Ubell <@oracle.com>
References: <54E45A7F-A1BF-4FE1-A9F3-1DA7F320B81C@oracle.com>

Michael Ubell wrote:
> Howard,
>
> You are the second one to report this problem with user defined locks
> when there is a single lock partition.  You  can work around this on a
> single cpu system by just setting the number of lock partitions to be
> greater than 1.   This might have a slight performance impact.  Or you
> can apply the attached patch.

Thanks. That patch looks a lot like what I was using here... ;) Will
this be posted on the oracle web site soon? And yes, the workaround works ok
in the interim.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/
Comment 5 Howard Chu 2008-09-26 06:27:32 UTC
So I guess we have to warn people about this one ourselves for a while.

-------- Original Message --------
Subject: Re: 4.7.25 deadlock
Date: Thu, 25 Sep 2008 23:15:31 -0700
From: Michael Ubell <@oracle.com>
To: Howard Chu <hyc@symas.com>

Howard,

Generally we only post critical patches (data corruption, etc) to the
web site.  Since this one only effects those using user defined locks
and does no damage, I  don't think it will be posted.

Mike


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 6 Michael Ströder 2008-09-26 09:12:41 UTC
Sigh! I find their release versioning and patch publication somewhat
hard to follow anyway. So the best advice to users is to simply avoid
4.7.25 at this time.

Ciao, Michael.

hyc@symas.com wrote:
> So I guess we have to warn people about this one ourselves for a while.
> 
> -------- Original Message --------
> Subject: Re: 4.7.25 deadlock
> Date: Thu, 25 Sep 2008 23:15:31 -0700
> From: Michael Ubell <@oracle.com>
> To: Howard Chu <hyc@symas.com>
> 
> Howard,
> 
> Generally we only post critical patches (data corruption, etc) to the
> web site.  Since this one only effects those using user defined locks
> and does no damage, I  don't think it will be posted.
> 
> Mike
> 
> 

Comment 7 Quanah Gibson-Mount 2008-09-26 16:13:04 UTC
Seriously, what kind of crap is that?  They've got a serious flaw in their 
software, but don't intend to publish the patch? sheesh.

--Quanah

--On September 26, 2008 6:27:45 AM +0000 hyc@symas.com wrote:

> So I guess we have to warn people about this one ourselves for a while.
>
> -------- Original Message --------
> Subject: Re: 4.7.25 deadlock
> Date: Thu, 25 Sep 2008 23:15:31 -0700
> From: Michael Ubell <@oracle.com>
> To: Howard Chu <hyc@symas.com>
>
> Howard,
>
> Generally we only post critical patches (data corruption, etc) to the
> web site.  Since this one only effects those using user defined locks
> and does no damage, I  don't think it will be posted.
>
> Mike
>
>
> --
>    -- Howard Chu
>    CTO, Symas Corp.           http://www.symas.com
>    Director, Highland Sun     http://highlandsun.com/hyc/
>    Chief Architect, OpenLDAP  http://www.openldap.org/project/
>
>



--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 8 Quanah Gibson-Mount 2008-09-30 18:12:32 UTC
I think their patch is broken.  I rebuilt BDB 4.7 with it, and now test008 
fails on me:

bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")
<= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993)
bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")
<= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993)
bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")
<= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993)
bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")
<= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993)
bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")
<= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993)
bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com")
=> bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com")

--Quanah

> --On September 26, 2008 6:27:45 AM +0000 hyc@symas.com wrote:
>
>> So I guess we have to warn people about this one ourselves for a while.
>>
>> -------- Original Message --------
>> Subject: Re: 4.7.25 deadlock
>> Date: Thu, 25 Sep 2008 23:15:31 -0700
>> From: Michael Ubell <@oracle.com>
>> To: Howard Chu <hyc@symas.com>
>>
>> Howard,
>>
>> Generally we only post critical patches (data corruption, etc) to the
>> web site.  Since this one only effects those using user defined locks
>> and does no damage, I  don't think it will be posted.
>>
>> Mike
>>
>>
>> --
>>    -- Howard Chu
>>    CTO, Symas Corp.           http://www.symas.com
>>    Director, Highland Sun     http://highlandsun.com/hyc/
>>    Chief Architect, OpenLDAP  http://www.openldap.org/project/
>>
>>
>
>
>
> --
>
> Quanah Gibson-Mount
> Principal Software Engineer
> Zimbra, Inc
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration
>
>



--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 9 Quanah Gibson-Mount 2008-09-30 18:22:40 UTC

--On September 30, 2008 6:13:05 PM +0000 quanah@zimbra.com wrote:

> I think their patch is broken.  I rebuilt BDB 4.7 with it, and now
> test008  fails on me:

Never mind, test008 fails without the patch to BDB 4.7 as well, so it's not 
related.  test008 simply no longer works for me with current RE24.

--Quanah


--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 10 Quanah Gibson-Mount 2008-10-02 20:57:39 UTC
--On Friday, September 26, 2008 4:13 PM +0000 quanah@zimbra.com wrote:


The patch to fix this issue is now in OpenLDAP cvs:

<http://www.openldap.org/devel/cvsweb.cgi/build/db.4.7.25.patch?hideattic=1&sortbydate=0>

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 11 Quanah Gibson-Mount 2008-10-02 20:57:48 UTC
changed notes
Comment 12 ando@openldap.org 2008-10-11 09:34:17 UTC
Just to clarify: is the patch available from Oracle's web site 
<http://www.oracle.com/technology/products/berkeley-db/db/update/4.7.25/patch.4.7.25.html> 
related?  Is it alternative or complementary to <build/db.4.7.25.patch>?

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------

Comment 13 Howard Chu 2008-10-11 09:48:23 UTC
ando@sys-net.it wrote:
> Just to clarify: is the patch available from Oracle's web site
> <http://www.oracle.com/technology/products/berkeley-db/db/update/4.7.25/patch.4.7.25.html>
> related?  Is it alternative or complementary to<build/db.4.7.25.patch>?

I suspect the contents of that URL will change over time. At the moment, that 
page has only one patch, and it only affects BerkeleyDB replication, which is 
a feature that we have never used.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 14 Quanah Gibson-Mount 2008-12-15 17:54:58 UTC
changed notes
Comment 15 CHIRANA-GHEORGHITA Eugeniu Theodor NCPI/I-BNF 2009-05-19 08:31:52 UTC

*****DISCLAIMER*****

The information contained in this communication is confidential and may be legally privileged. It is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful. Orange Romania S.A. is neither liable for the proper, complete transmission of the information contained in this communication nor any delay in its receipt.

*****END OF DISCLAIMER*****
Comment 16 Howard Chu 2009-06-23 21:12:46 UTC
changed notes
changed state Suspended to Closed
Comment 17 OpenLDAP project 2014-08-01 21:03:34 UTC
BDB4.7 single-core bug
Patch now available on Oracles website.