[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: index corruption (1164) still present in 2.0.15 with db 3.1.17 (ITS#1359)

I'm trying to follow the path of how an attr=value pair gets put into an
index, and I can't see how a particular race condition is prevented. An
entry DN1 is having attr1=value1 added to it.  The indexer() function in
back-ldbm/index.c is called. It opens the attr1 index cache with a flag
of LDBM_WRCREAT, creates keys for "value1", calls key_change() for
every key, then closes the attr1 index cache.

  The key_change() function calls idl_insert_key().

  Assume the key is not used much, so only a few DNs have it.
idl_insert_key() will load the first ID block idl_fetch_one(), which just
loads the first block and makes an in-memory duplicate of it, and then
call idl_insert(), then call idl_store() and return.

So the chain of events on the db cache is:

A  open the database cache
B  fetch the first block into a memory copy idl
C  insert the ID of DN1 into the memory copy idl
D  write the idl back to the database cache
E  close the database cache

Now suppose 2 threads are working on two different connections. Thread 1 is
adding attr1=value1 to DN1, and thread 2 is adding the same attr1=value1
to DN2. They both get into the indexer() at about the same time, and
start executing the steps A-E above at about the same time:

thread 1   thread 2

The ldbm_cache_open() for thread 1 reopens attr1.dbb from some bygone
operation, then later on thread 2 will get the same cache pointer, now
with a refcount of 2.

Now it looks to me like both threads load the same initial block, then
thread 1 adds DN1 to a memory copy, then thread 2 adds DN2 to a DIFFERENT
memory copy, then thread 1 writes its copy to disk, then thread 2 writes a
DIFFERENT memory copy (without DN1 in it) to disk, overwriting thread 1's
change. Both close the db cache, and the index for attr1=value1 is missing

I've been following this around in the code for a few hours, and it's
after midnight here in Pittsburgh, so I may have missed a lock or
something along the way. Is there something that makes thread2 wait to
open the cache while thread1 has it open? Should there be?  Or should
insert_key be serialized? Or serialized on some per-db-key basis?

-Mark Adamson
 Carnegie Mellon