[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: test008 failed in HEAD



Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Hallvard B Furuseth wrote:
http://folk.uio.no/hbf/test008out2.tbz
ldapsearch.flt = ldif.flt + cn=James A Jones 4,ou=People,dc=example,dc=com
.

I just got a similar failure in RE24. Looks like the same thing Aaron
reported. Too bad it takes so long to reproduce it.

Very strange. The entry could not be found because cache_find_ndn returned an
EntryInfo for the previous incarnation of the entry, which contained the old
entryID. This is extremely odd given that bdb_cache_delete_internal has
already successfully deleted the old record from the cache, before the
previous Delete completed. On the subsequent Add, bdb_cache_add fails to
insert the new EntryInfo because it has the same RDN as the old.

Stranger and stranger. It looks like BerkeleyDB's transaction locking isn't working. The problem is occurring because BDB is returning the info for an entryID that is being deleted, but whose Delete transaction hasn't yet committed.


Here's a sequence of events from slapd.1.log:

conn=5 op=66 DEL dn="cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com"
bdb_dn2entry("cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com")
==> bdb_delete: cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com
bdb_dn2entry("cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com")
=> bdb_dn2id_delete( "cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com", 0x0000006b )
...
bdb_idl_delete_key: 6b @cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b %ou=alumni association,ou=people,dc=example,dc=com
bdb_idl_delete_key: 6b @ou=alumni association,ou=people,dc=example,dc=com
<= bdb_index_read: failed (-30990)
bdb_idl_delete_key: 6b @ou=people,dc=example,dc=com
<= bdb_dn2id_delete 0
=> index_entry_del( 107, "cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" )
<= index_entry_del( 107, "cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com" ) success
====> bdb_cache_delete( 107 )
...


bdb_idl_fetch_key: [b49d1940]
<= bdb_substring_candidates: 3, first=12, last=107
bdb_search_candidates: id=3 first=12 last=107
=> send_search_entry: conn 28 dn="cn=James A Jones 1,ou=Alumni Association,ou=People,dc=example,dc=com"
ber_flush2: 146 bytes to sd 20
<= send_search_entry: conn 28 exit.
=> send_search_entry: conn 28 dn="cn=James A Jones 2,ou=Information Technology Division,ou=People,dc=example,dc=com"
ber_flush2: 157 bytes to sd 20
<= send_search_entry: conn 28 exit.
add_internal "cn=james a jones 2,ou=alumni association,ou=people,dc=example,dc=com", 107
...


bdb_delete: deleted id=0000006b dn="cn=James A Jones 2,ou=Alumni Association,ou=People,dc=example,dc=com"
send_ldap_result: conn=5 op=66 p=3


The time between the "bdb_cache_delete(107)" and the "bdb_delete: deleted..." is all the txn_commit time. In between there you see that an IDL search returned a result including entryID 107, even though all of the index IDLs for that entry were already deleted. This should never happen; all of the relevant IDL and dn2id pages should still be writelocked. There's also no Deadlocks within this interval of time, so nothing aborted midstream.

I'm seeing this failure with both BDB 4.2.52 and BDB 4.6.21, and no failure in RE23, so it must be something changed between RE23 and RE24, but so far looking at the diffs I don't see anything that could have caused this change in behavior.
--
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/