[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-bdb deadlocks?

Jong-Hyuk wrote:

I could also produce a deadlock case in both a Redhat 7.3 uniprocessor and a
Redhat 9 SMP box,
but I could get the full stack trace only in the Redhat 7.3 box. Weird gdb
behavior in Redhat 9...

Two files were uploaded to the incoming ftp directory:
ftp://ftp.openldap.org/incoming/test008-backtrace.txt for stack trace
ftp://ftp.openldap.org/incoming/test008-db_stat.txt for db_stat -CA result

I tracked down the problem further with gdb, and found the following:

thread 13: modrdn, locker=8000002b, waiting for an entryinfo mutex for
ei->bei_id=4 (ou=alumni association, ou=people, dc=example, dc=com) while
holding a read lock of page 1 and a db lock for the entry corresponding to
the entryinfo (ei->bei_id=4)

thread 5: add, locker=80000028, waiting for a db lock for the ei->bei_id=4
entry, while holding  an entryinfo mutex for ei->bei_id=4

Why should the entryinfo lock be unlocked at the end of bdb_cache_find_id()
As exemplified in the thread 13, it should not be unlocked at least when it
is needed again.

The idea was to make tree navigation separate from retrieving the actual entries, and to release the mutex as quickly as possible. At the moment I'm testing with a version that uses a rdwr lock instead of a mutex but that's failed as well. Your point about not unlocking makes sense, I'll give that a try.

As I suggested some time back, the entryinfo mutex need be replaced with the bdb lock as we did in the original bdb entry cache design.

You may be right. I've been trying to avoid that because they're heavier overhead, but that's the only way to do lock-coupling here. Also, there are some cleanup actions that have to occur under a lock, but are done after Commit. That would be difficult to manage...
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support