[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: BadRSlotError: mdb_txn_begin: MDB_BAD_RSLOT: Invalid reuse of reader locktable slot



Luke Kenneth Casson Leighton wrote:
On Thu, Sep 11, 2014 at 11:37 AM, Howard Chu <hyc@symas.com> wrote:
Luke Kenneth Casson Leighton wrote:

hi all,

the infamous obscure error which people are seeing only very
infrequently is rearing its head at least 2 to 3 times per day in a
test lab where i work.  this is however a secure environment so i
cannot post core-dumps or any details of the application.

given the restrictions, what information is needed and what approach
is needed to debug and fix this?  luckily it's happening a lot so
there's the possibility of a regular iterative approach.

the operating system(s) have been ubuntu 12.04 and also 14.04, both
have resulted in this obscure bug.  bizarrely, this bug occurs in a
*single process*.  it's not even multi-processing.  however
metasync=False, sync=False, map_async=True, readahead=False and
writemap=True.


Use the Source, Luke.

  :)

MDB_BAD_RSLOT is returned only one place in mdb.c and the situation is very
specific. It means you've tried to begin a new read txn on a thread that
already has a read txn outstanding.

  ... but there aren't any threads... this is literally only one
process.  there are no threads involved at all.  the single process is
doing writes in a txn followed by reads in a separate txn.

Technically, a single process is also a single thread.

The API docs are pretty clear that a
thread may only have one txn at a time.

You need to track down whatever is creating read txns in your code and make
sure they're being properly committed or aborted.

  this is from python, and all code is done using "with env.begin .... as txn:"

  there are no exceptions occurring within any blocks, and even if they
were the "with" statement calls the __exit__ function which closes the
transaction.

I can't comment on anything python is doing, but it sounds like it's missing a step...

  so, all code is as expected, hence the reason for raising it here
because this is definitely not something that should be happening.

  *thinks*... there is only one possible thing that i can think of, and
it's related to using cursors.  i am not calling close or del on the
txn.cursor objects within the "with" block.  could it be that python's
garbage collection is somehow collecting those txn.cursor objects at
random points, interacting in some way with the current read txn?

No idea. If you're using py-lmdb it sounds like we need David Wilsom to chime in here. In the C API there's no way a cursor could interfere with a txn, no guesses what the python code is doing.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/