Full_Name: Michael Str�der Version: HEAD/RE24 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (84.163.97.30) I'm trying to run current RE24 and HEAD with BDB 4.7.25p1. It hangs in test-001 and it hangs in a LDAP conn (probably when doing a bind). Is that combination really stable? It works with very same build scripts/configuration with 4.6.21+patches. Further information (bt full, log, BDB build script) is in this archived mailing list posting: http://www.openldap.org/lists/openldap-devel/200809/msg00075.html
michael@stroeder.com wrote: > Full_Name: Michael Ströder > Version: HEAD/RE24 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (84.163.97.30) > > > I'm trying to run current RE24 and HEAD with BDB 4.7.25p1. It hangs in test-001 > and it hangs in a LDAP conn (probably when doing a bind). Is that combination > really > stable? > > It works with very same build scripts/configuration with 4.6.21+patches. > > Further information (bt full, log, BDB build script) is in this archived mailing > list posting: > > http://www.openldap.org/lists/openldap-devel/200809/msg00075.html I was unable to reproduce the problem on my multi-core machines, but I do see it on a single-core machine. I've sent a backtrace and other debug info to the Oracle folks, will see what they have to say. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com wrote: > I was unable to reproduce the problem on my multi-core machines, but I do see > it on a single-core machine. I've sent a backtrace and other debug info to the > Oracle folks, will see what they have to say. I see the problem; it's a bug in BDB's multi-partition lock manager. When using multiple lock table partitions, it obtains a lock on the system-wide lock mutex and a lock on the per-region mutex. On a single core system it defaults to a single lock table. In this case, the macro that obtains the system-wide lock behaves identically to the per-region lock. I.e., both attempt to acquire the exact same mutex. Since it's already held, the process deadlocks. (gdb) bt #0 0xb7f37424 in __kernel_vsyscall () #1 0xb7b36c4e in __lll_mutex_lock_wait () from /lib/libpthread.so.0 #2 0xb7b32a3c in _L_mutex_lock_88 () from /lib/libpthread.so.0 #3 0xb7b3242d in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104) at ../dist/../mutex/mut_pthread.c:207 #5 0xb7daad19 in __lock_getobj (lt=0x8a84848, obj=0xbfd492ec, ndx=492, create=1, retp=0xbfd491e4) at ../dist/../lock/lock.c:1470 #6 0xb7da7f53 in __lock_get_internal (lt=0x8a84848, sh_locker=0xb776d508, flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, timeout=0, lock=0xbfd493cc) at ../dist/../lock/lock.c:588 #7 0xb7da77d6 in __lock_get_api (env=0x8a84550, locker=2147483659, flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc) at ../dist/../lock/lock.c:423 #8 0xb7da765b in __lock_get_pp (dbenv=0x8a841c0, locker=2147483659, flags=1, obj=0xbfd492ec, lock_mode=DB_LOCK_READ, lock=0xbfd493cc) at ../dist/../lock/lock.c:395 #9 0x08124fb8 in bdb_dn2id_lock (bdb=0x8a68620, dn=0xbfd493f0, rw=0, txn=0x8a890b8, lock=0xbfd493cc) at ../../../../head/servers/slapd/back-bdb/dn2id.c:47 #10 0x08125d7d in bdb_dn2id (op=0xbfd49640, dn=0xbfd493f0, ei=0xbfd493e0, txn=0x8a890b8, lock=0xbfd493cc) at ../../../../head/servers/slapd/back-bdb/dn2id.c:307 ---Type <return> to continue, or q <return> to quit---q Quit (gdb) frame 4 #4 0xb7d00819 in __db_pthread_mutex_lock (env=0x8a84550, mutex=104) at ../dist/../mutex/mut_pthread.c:207 207 RET_SET((pthread_mutex_lock(&mutexp->mutex)), ret); (gdb) p *mutexp $1 = {mutex = {__data = {__lock = 2, __count = 0, __owner = 29470, __kind = 0, __nusers = 1, {__spins = 0, __list = {__next = 0x0}}}, __size = "\002\000\000\000\000\000\000\000\036s\000\000\000\000\000\000\001\000\000\000\000\000\000", __align = 2}, cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\0' <repeats 47 times>, __align = 0}, pid = 29470, tid = 3080046272, mutex_next_link = 0, alloc_id = 6, mutex_set_wait = 1, mutex_set_nowait = 129, flags = 3} (gdb) The mutex being acquired in frame 4 is the same one that was already acquired in frame 7, __lock_get_api line 418. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Suspended
A patch from Oracle... -------- Original Message -------- Subject: Re: 4.7.25 deadlock Date: Thu, 25 Sep 2008 21:48:20 -0700 From: Howard Chu <hyc@symas.com> To: Michael Ubell <@oracle.com> References: <54E45A7F-A1BF-4FE1-A9F3-1DA7F320B81C@oracle.com> Michael Ubell wrote: > Howard, > > You are the second one to report this problem with user defined locks > when there is a single lock partition. You can work around this on a > single cpu system by just setting the number of lock partitions to be > greater than 1. This might have a slight performance impact. Or you > can apply the attached patch. Thanks. That patch looks a lot like what I was using here... ;) Will this be posted on the oracle web site soon? And yes, the workaround works ok in the interim. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
So I guess we have to warn people about this one ourselves for a while. -------- Original Message -------- Subject: Re: 4.7.25 deadlock Date: Thu, 25 Sep 2008 23:15:31 -0700 From: Michael Ubell <@oracle.com> To: Howard Chu <hyc@symas.com> Howard, Generally we only post critical patches (data corruption, etc) to the web site. Since this one only effects those using user defined locks and does no damage, I don't think it will be posted. Mike -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Sigh! I find their release versioning and patch publication somewhat hard to follow anyway. So the best advice to users is to simply avoid 4.7.25 at this time. Ciao, Michael. hyc@symas.com wrote: > So I guess we have to warn people about this one ourselves for a while. > > -------- Original Message -------- > Subject: Re: 4.7.25 deadlock > Date: Thu, 25 Sep 2008 23:15:31 -0700 > From: Michael Ubell <@oracle.com> > To: Howard Chu <hyc@symas.com> > > Howard, > > Generally we only post critical patches (data corruption, etc) to the > web site. Since this one only effects those using user defined locks > and does no damage, I don't think it will be posted. > > Mike > >
Seriously, what kind of crap is that? They've got a serious flaw in their software, but don't intend to publish the patch? sheesh. --Quanah --On September 26, 2008 6:27:45 AM +0000 hyc@symas.com wrote: > So I guess we have to warn people about this one ourselves for a while. > > -------- Original Message -------- > Subject: Re: 4.7.25 deadlock > Date: Thu, 25 Sep 2008 23:15:31 -0700 > From: Michael Ubell <@oracle.com> > To: Howard Chu <hyc@symas.com> > > Howard, > > Generally we only post critical patches (data corruption, etc) to the > web site. Since this one only effects those using user defined locks > and does no damage, I don't think it will be posted. > > Mike > > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ > > -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
I think their patch is broken. I rebuilt BDB 4.7 with it, and now test008 fails on me: bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") <= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993) bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") <= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993) bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") <= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993) bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") <= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993) bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") <= bdb_dn2id: get failed: DB_LOCK_NOTGRANTED: Lock not granted (-30993) bdb_dn2entry("cn=james a jones 4,ou=people,dc=example,dc=com") => bdb_dn2id("cn=james a jones 4,ou=people,dc=example,dc=com") --Quanah > --On September 26, 2008 6:27:45 AM +0000 hyc@symas.com wrote: > >> So I guess we have to warn people about this one ourselves for a while. >> >> -------- Original Message -------- >> Subject: Re: 4.7.25 deadlock >> Date: Thu, 25 Sep 2008 23:15:31 -0700 >> From: Michael Ubell <@oracle.com> >> To: Howard Chu <hyc@symas.com> >> >> Howard, >> >> Generally we only post critical patches (data corruption, etc) to the >> web site. Since this one only effects those using user defined locks >> and does no damage, I don't think it will be posted. >> >> Mike >> >> >> -- >> -- Howard Chu >> CTO, Symas Corp. http://www.symas.com >> Director, Highland Sun http://highlandsun.com/hyc/ >> Chief Architect, OpenLDAP http://www.openldap.org/project/ >> >> > > > > -- > > Quanah Gibson-Mount > Principal Software Engineer > Zimbra, Inc > -------------------- > Zimbra :: the leader in open source messaging and collaboration > > -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
--On September 30, 2008 6:13:05 PM +0000 quanah@zimbra.com wrote: > I think their patch is broken. I rebuilt BDB 4.7 with it, and now > test008 fails on me: Never mind, test008 fails without the patch to BDB 4.7 as well, so it's not related. test008 simply no longer works for me with current RE24. --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Friday, September 26, 2008 4:13 PM +0000 quanah@zimbra.com wrote: The patch to fix this issue is now in OpenLDAP cvs: <http://www.openldap.org/devel/cvsweb.cgi/build/db.4.7.25.patch?hideattic=1&sortbydate=0> --Quanah -- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
changed notes
Just to clarify: is the patch available from Oracle's web site <http://www.oracle.com/technology/products/berkeley-db/db/update/4.7.25/patch.4.7.25.html> related? Is it alternative or complementary to <build/db.4.7.25.patch>? p. Ing. Pierangelo Masarati OpenLDAP Core Team SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------
ando@sys-net.it wrote: > Just to clarify: is the patch available from Oracle's web site > <http://www.oracle.com/technology/products/berkeley-db/db/update/4.7.25/patch.4.7.25.html> > related? Is it alternative or complementary to<build/db.4.7.25.patch>? I suspect the contents of that URL will change over time. At the moment, that page has only one patch, and it only affects BerkeleyDB replication, which is a feature that we have never used. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
*****DISCLAIMER***** The information contained in this communication is confidential and may be legally privileged. It is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking action in reliance of the contents of this information is strictly prohibited and may be unlawful. Orange Romania S.A. is neither liable for the proper, complete transmission of the information contained in this communication nor any delay in its receipt. *****END OF DISCLAIMER*****
changed notes changed state Suspended to Closed
BDB4.7 single-core bug Patch now available on Oracles website.