[Date Prev][Date Next] [Chronological] [Thread] [Top]

new entry lost on multi-master setup (two scenarios)



Greetings,

At first, I was going to create a bug report, but decided to send to
list first.  I tried this with both: 2.4.23 (Debian package), and
2.4.25, compiled from source, bdb 4.8.

After a couple of entries just disappeared on one multi-master setup I
had, I decided to further investigate, and found this (there are two
cases, for the same procedure):

1. Configure two LDAP servers in multi-master setup.
2. Make sure they replicate correctly (off course).
3. Shutdown one of the two ldap servers.
4. Create a new entry (say, ou1) on the LDAP server that is left up.
5. Shutdown the last LDAP server.
6. Start the *other* LDAP server, the one where you didn't create the entry.
7. Create another entry, say: ou2, so that both servers has a new
entry, that is *not* on the other server.
8. Shutdown the LDAP server (both servers down now).
9. Start both LDAP servers.

Result (case 1): one of the two newly created entries is missing on
*one* of the servers, and only one of the entries is missing on the
other server.

Result (case 2): one entry is missing on *both* servers.

Both servers has NTP, and has the same timezone (ie, time is synchronized).

I'm *not* replicating cn=config (I shouldn't, because I have different
SSL certificates on each server).  Now, more details:

slapd with -d 16384 gives me this on the server that misses both
entries, on this server I created the entry dn
ou=ou2,dc=st-andes,dc=com (and the server decided to delete it!, and,
for some reason, it didn't detected the new ou1 entry created on the
other server):

http://www.st-andes.com/openldap/case1/log-server2-case1.txt

The other server (the one that kept one entry and lost the other), on
this server I created the entry ou=ou1,dc=st-andes,dc=com, and it says
it was changed by peer.....:

http://www.st-andes.com/openldap/case1/log-server1-case1.txt

Now, I'm seeing here that it is using 000 server id... but on the
cn=config.ldif I have:

olcServerID: 1 ldap://ldap.ildetech.com:389/
olcServerID: 2 ldap://ldap2.ildetech.com:389/

And the syncrepl:

olcSyncRepl: rid=001 provider=ldap://ldap.ildetech.com:389
binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
credentials="secret" searchbase="dc=st-andes,dc=com"
type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
credentials="secret" searchbase="dc=st-andes,dc=com"
type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
olcMirrorMode: TRUE

And, as you can see on the command line, I have the URL specified on
the -h parameter, but it seems to be ignoring it!.  Or, should I
specify the *whole* urls that I put on the -h parameter?
(ldap://ldap2.ildetech.com:389 ldap://127.0.0.1:389/ ldaps:///
ldapi:///)

So, I decided to change the config:

On server 1 (kirara):

olcServerID: 1

and

olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
credentials="secret" searchbase="dc=st-andes,dc=com"
type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
olcMirrorMode: TRUE

On server 2 (happy):

olcServerID: 2

and

olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
credentials="secret" searchbase="dc=st-andes,dc=com"
type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
olcMirrorMode: TRUE

With this new setup, and following the same procedure, I get one
missing entry on *both* servers (at least servers gets to a consistent
state), but I still have a missing entry.  The logs for this setup:

Server 2 (ID 2, where I created entry: ou2 while the other server was
down), this server decided, wrongly, to delete entry ou2:

http://www.st-andes.com/openldap/case2/log-server2-case2.txt

And the other server (where I created ou1):

http://www.st-andes.com/openldap/case2/log-server1-case2.txt

This one never saw the other entry, ou2.

For both cases, the syncprov module was with default configuration:

dn: olcOverlay={0}syncprov
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
structuralObjectClass: olcSyncProvConfig
entryUUID: 24354488-e5bf-102f-9e6a-ad3cba95f7f1
creatorsName: cn=config
createTimestamp: 20110318152128Z
entryCSN: 20110318152128.935227Z#000000#000#000000
modifiersName: cn=config
modifyTimestamp: 20110318152128Z

What do you think?

Thanks in advance!

Ildefonso Camargo