[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB: issue with mdb_cursor_del



On Mon, 2017-10-16 at 13:58 +0200, Hallvard Breien Furuseth wrote:
> On 16. okt. 2017 12:51, Howard Chu wrote:
> > timur.kristof@gmail.com wrote:
> > > I have an app that uses LMDB, and I've experienced an interesting
> > > issue: when trying to delete a certain item with mdb_cursor_del,
> > > it
> > > crashed with the following backtrace: https://pastebin.com/7p9wtk
> > > j9
> 
> Weird backtrace.  It says mdb_page_dirty(), which is small, streches
> over 300+ lines (frames #3-#4).  And mdb_page_alloc() alone has no
> hex address for prefix.  Maybe miscompilation, two liblmdb libraries
> linked into the same executable, or something like that?  Or some
> wild pointer write or whatever messed things up.

Not sure what was going on there, maybe -O3 messed it up. Still, the
issue does appear with -O0 too and here is a backtrace with -O0:
https://pastebin.com/SfeMMEPH

> > Most likely the dirty 
> > list is too big, which means you're trying to do too much in a
> > single 
> > transaction.
> 
> Shouldn't happen though. The txn should have failed earlier with
> MDB_TXN_FULL.
> 
> Which also shouldn't happen since LMDB should have spilled enough
> pages to
> make room - unless you have hundreds of cursors at modified pages so
> LMDB can't spill enough.
> 
> But we should probably test LMDB with impractically tight dirty-list
> arrays 
> (i.e. a very small MDB_IDL_UM_MAX), so LMDB keeps running into such
> cases.

I've taken a look at the value of rc (see my reply to Howard), and it
seems to me that Леонид Юрьев's assessment may be correct here. rc is
-1 which indicates that the page (even though newly allocated, maybe a
reused page?) is already on the txn's dirty pages list.

- Timur