[Date Prev][Date Next] [Chronological] [Thread] [Top]

possible? replication issue with 2.1



hi,

we got a mailingsystem running backed by 4 OpenLDAP 2.1.27 servers
hosting all user data. one is running as master, the other three as
slaves.

every now and then we see strange errors on our MTAs when they query the
slaves. most of the queries work as expected, except queries for 2 or 3
entries return LDAP_BUSY.

if i manually issue an ldapsearch on these entries, the server sends me
the entry's data, but instead of exiting the search with LDAP_SUCCESS,
the server sends the exit code 51 (LDAP_BUSY).
every client regards this as a failure and doesn't even bother to parse
the returned entry.

i turned on debugging to see what's going on. here's the shortened
output (i wrapped some lines):

[...]
Nov  3 16:29:10 galen slapd[31869]: bdb_idl_fetch_key: [e2b5f963]
Nov  3 16:29:10 galen slapd[31869]: <= bdb_index_read 2 candidates
Nov  3 16:29:10 galen slapd[31869]: bdb_search_candidates: id=2 
first=76073 last=76075
Nov  3 16:29:10 galen slapd[31869]: entry_decode: 
"uid=someone,dc=somewhere,dc=at,dc=."
Nov  3 16:29:10 galen slapd[31869]: <= 
entry_decode(uid=someone,dc=somewhere,dc=at,dc=.)
Nov  3 16:29:10 galen slapd[31869]: => send_search_entry:  
dn="uid=someone,dc=somewhere,dc=at,dc=."
Nov  3 16:29:10 galen slapd[31869]: <= send_search_entry
Nov  3 16:29:10 galen slapd[31869]: ====> bdb_cache_return_entry_r( 
76073 ): created (0)
Nov  3 16:29:10 galen slapd[31869]: entry_decode: 
"uid=someone,dc=somwhere,dc=at,dc=."
Nov  3 16:29:10 galen slapd[31869]: <= 
entry_decode(uid=someone,dc=somwhere,dc=at,dc=.)
Nov  3 16:29:10 galen slapd[31869]: ====> bdb_cache_add_entry( 76075 ):
"uid=someone,dc=somewhere,dc=at,dc=.":
already in dn cache
Nov  3 16:29:10 galen slapd[31869]: send_ldap_result: conn=0 op=1 p=3
Nov  3 16:29:10 galen slapd[31869]: send_ldap_result: err=51 matched="" 
text="ldap server busy"

apparently bdb_index_read() found 2 candidates matching the query, both
having the same dn. i guess slapd returns LDAP_BUSY, because the call to
bdb_cache_add_entry(76075) fails.

i did a slapcat of the database, examined the ldif file and found the
two entries with the same dn. the entries are identical, except that one
of them got a more recent modifyTimestamp and an additional attribute
(that one is an exact copy of the entry stored in the master's
database).

so i guess that something went wrong when the master replicated the
updates to this entry. 

any hints what's gone wrong here?
is this a know issue in 2.1 and maybe fixed in 2.2?

unfortunetly, the system is in productive use so i can't deploy the
latest OpenLDAP releases. 
but i got a copy of a corrupt database here on my laptop. so if you need
some more debugging information i can provide it.

tia,
tom.

-- 
Thomas "Duke" Hager                       {duke,hager}@sigsegv.at
GPG: 1024D/D27F858C            http://www.sigsegv.at/gpg/duke.gpg
=================================================================
"Never Underestimate the Power of Stupid People in Large Groups."


Attachment: signature.asc
Description: This is a digitally signed message part