[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5439) back-bdb race condition



On Tue, 25 Mar 2008, Howard Chu wrote:

> Rein Tollevik wrote:
>> On Tue, 25 Mar 2008, hyc@symas.com wrote:
>> It now looks to me as if the entire rs->sr_entry was released an reused,
>> and that the bug probably is in back-bdb.  It just always happened when
>> syncprov was running as this is a master server with mostly writes
>> except from the reads syncprov does.  Which also means that the title on
>> the bug report is probably misleading :-(
>> 
>> The reason for my shift of focus is some highly suspicious variables
>> found in the bdb_search() frame.
>
> Look at the value of idflag in bdb_search. (see the comment around 672)
>
> Could be a locking bug with ID_NOCACHE in cache.c...

Yes, it looks very much as this was the problem.  We have been running
with the patch at the end since thursday afternoon, and so far it seem
to have worked :-) Although it is a bit too early to conclude,
especially as we have seen other problems as well.

We did see segmentation faults both friday and monday (not very much
going on during the weekend..), but so far those appear to be related to
ITS#5445. A slapd and db_checkpoint (run out of cron) deadlock which I
hope is due to the BerkeleyDB bug mentioned in ITS#5391 has also
occurred, so I'll upgrade from 4.6.18 to 4.6.21.1 to see if that problem
goes away.

Now, back to this bug.  There is a potential bug in bdb_cache_find_id()
as it inspects and modifies the EntryInfo without it being locked.
Although I don't think this was the cause here, as bdb_search() always
passes a NULL eip which mean that bdb_cache_find_ndn() should have
acquired a lock.  The patch always locks the entryinfo though, as I
believe that to be necessary for the rest of the patch to be correct.

The real problem was in bdb_cache_return_entry_rw() which released
ei->bei_e without making sure it was the only thread using it.  I.e it
should have had a write lock on the entry_db (as bdb_cache_lru_purge()
makes sure to have), but instead it freed it after releasing the read
lock.

Instead of write locking the entry before releasing it my alternative
solution is to only let the thread that loads the entry set the
CACHE_ENTRY_NOT_CACHED flag, and only if no other threads found the
entryinfo while it loaded the entry.  All other threads clears the flag
when they find they don't need to load the entry.

I also use trylock to lock the entryinfo before setting or inspecting
the cache flag, for two reasons.  First, honoring this flag should not
be important enough to wait for a lock (after all, it should be cleared
anyhow if anyone else is interested in the same entry).  Second (and
most important), to avoid any deadlock situations, which my first
attempt to fix this bug managed to create when I moved the release of
the entryinfo lock to after the entry_db lock had been acquired in 
bdb_cache_find_id().

Rein

Index: OpenLDAP/servers/slapd/back-bdb/cache.c
diff -u OpenLDAP/servers/slapd/back-bdb/cache.c:1.1.1.15 OpenLDAP/servers/slapd/back-bdb/cache.c:1.2
--- OpenLDAP/servers/slapd/back-bdb/cache.c:1.1.1.15	Sat Mar 22 16:47:56 2008
+++ OpenLDAP/servers/slapd/back-bdb/cache.c	Mon Mar 31 17:52:17 2008
@@ -252,16 +252,27 @@
  	int free = 0;

  	ei = e->e_private;
-	bdb_cache_entry_db_unlock( bdb, lock );
-	if ( ei ) {
-		bdb_cache_entryinfo_lock( ei );
+	if ( ei &&
+		( ei->bei_state & CACHE_ENTRY_NOT_CACHED ) &&
+		( bdb_cache_entryinfo_trylock( ei ) == 0 ) ) {
  		if ( ei->bei_state & CACHE_ENTRY_NOT_CACHED ) {
+			/* Releasing the entry can only be done when
+			 * we know that nobody else is using it, i.e we
+			 * should have an entry_db writelock.  But the
+			 * flag is only set by the thread that loads the
+			 * entry, and only if no other threads has found
+			 * it while it was working.  All other threads
+			 * clear the flag, which mean that we should be
+			 * the only thread using the entry if the flag
+			 * is set here.
+			 */
  			ei->bei_e = NULL;
  			ei->bei_state ^= CACHE_ENTRY_NOT_CACHED;
  			free = 1;
  		}
  		bdb_cache_entryinfo_unlock( ei );
  	}
+	bdb_cache_entry_db_unlock( bdb, lock );
  	if ( free ) {
  		e->e_private = NULL;
  		bdb_entry_return( e );
@@ -854,6 +865,11 @@

  	/* Ok, we found the info, do we have the entry? */
  	if ( rc == 0 ) {
+		if ( !( flag & ID_LOCKED )) {
+			bdb_cache_entryinfo_lock ( *eip );
+			flag |= ID_LOCKED;
+		}
+
  		if ( (*eip)->bei_state & CACHE_ENTRY_DELETED ) {
  			rc = DB_NOTFOUND;
  		} else {
@@ -873,13 +889,13 @@
  				(*eip)->bei_state |= CACHE_ENTRY_LOADING;
  			}

-			/* If the entry was loaded before but uncached, and we need
-			 * it again, clear the uncached state
-			 */
-			if ( (*eip)->bei_state & CACHE_ENTRY_NOT_CACHED ) {
-				(*eip)->bei_state ^= CACHE_ENTRY_NOT_CACHED;
-				if ( flag & ID_NOCACHE )
-					flag ^= ID_NOCACHE;
+			if ( !load ) {
+				/* Clear the uncached state if we are not
+				 * loading it, i.e it is already cached or
+				 * another thread is currently loading it.
+				 */
+				(*eip)->bei_state &= ~CACHE_ENTRY_NOT_CACHED;
+				flag &= ~ID_NOCACHE;
  			}

  			if ( flag & ID_LOCKED ) {
@@ -906,10 +922,14 @@
  #endif
  						ep = NULL;
  						bdb_cache_lru_link( bdb, *eip );
-						if ( flag & ID_NOCACHE ) {
-							bdb_cache_entryinfo_lock( *eip );
-							(*eip)->bei_state |= CACHE_ENTRY_NOT_CACHED;
-							bdb_cache_entryinfo_unlock( *eip );
+						if ( ( flag & ID_NOCACHE ) &&
+							( bdb_cache_entryinfo_trylock ( *eip ) == 0 ) ) {
+							/* Set the cached state only if no other thread
+							 * found the info while we was loading the entry.
+							 */
+							if ( (*eip)->bei_finders == 1 )
+								(*eip)->bei_state |= CACHE_ENTRY_NOT_CACHED;
+							bdb_cache_entryinfo_unlock ( *eip );
  						}
  					}
  					if ( rc == 0 ) {