[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-bdb deadlocks?



> > A read operation can cause an undetected deadlock when loading an entry
> > into the cache. This is because the lockerID used to lock the cache
> > entry is not the same as the lockerID BDB generates when accessing the
> > database. If a writer has already locked the id2entry pages and is
> > waiting on a write lock for the cache entry, then the reader will be
> > stuck waiting for the id2entry pages and neither will make any progress.
> >
> > This implies that we must use transactions for readers as well as
> > writers, to insure that all locks associated with a read operation get
> > the same lockerID, so that deadlocks can be detected by BDB. Either that
> > or we need to tweak the lock ordering again so that a reader cannot hold
> > any other locks while it is accessing the database.
>
> I've checked in a fix for this. Creating a new transaction for every
> search operation would add a fair amount of overhead, so I've allocated
> long-lived per-thread transactions instead (similar to the way locker
> IDs are being re-used). We can't just replace the existing lockerID
> mechanism with these transactions, because the intended lifetimes of
> their locks are not the same. Anyway, I think this should finally
> resolve the deadlock issues.
>
> Please test...

Looks like we are heading to a right direction. One thing to note is that
there's currently no retry mechanism provided for the added transaction.
Also lacks the retry mechanism is bdb_cache_db_relock() itself. If it is
selected as a victim to resolve a deadlock, it needs to restart the op if
it's a write op, or it needs to restart the locking transaction if it's a
read op. Also, I still suspect that we need to protect db ops in a search op
in order to insure deadlock freedom
(http://www.sleepycat.com/docs/ref/lock/notxn.html), although it might be
costly.
- Jong-Hyuk