[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: new entry lost on multi-master setup (two scenarios)



Ok, then... either:I'm missing something obvious, or no one have any
idea on this...  Should I create a bug report based on my findings
here?

Thanks!

Ildefonso Camargo

On Tue, Apr 19, 2011 at 2:12 PM, Jose Ildefonso Camargo Tolosa
<ildefonso.camargo@gmail.com> wrote:
> Greetings,
>
> Any comments on this? can anybody help me verify this possible bug?
>
> Ildefonso.
>
> On Sun, Apr 17, 2011 at 2:24 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso.camargo@gmail.com> wrote:
>> Greetings,
>>
>> At first, I was going to create a bug report, but decided to send to
>> list first.  I tried this with both: 2.4.23 (Debian package), and
>> 2.4.25, compiled from source, bdb 4.8.
>>
>> After a couple of entries just disappeared on one multi-master setup I
>> had, I decided to further investigate, and found this (there are two
>> cases, for the same procedure):
>>
>> 1. Configure two LDAP servers in multi-master setup.
>> 2. Make sure they replicate correctly (off course).
>> 3. Shutdown one of the two ldap servers.
>> 4. Create a new entry (say, ou1) on the LDAP server that is left up.
>> 5. Shutdown the last LDAP server.
>> 6. Start the *other* LDAP server, the one where you didn't create the entry.
>> 7. Create another entry, say: ou2, so that both servers has a new
>> entry, that is *not* on the other server.
>> 8. Shutdown the LDAP server (both servers down now).
>> 9. Start both LDAP servers.
>>
>> Result (case 1): one of the two newly created entries is missing on
>> *one* of the servers, and only one of the entries is missing on the
>> other server.
>>
>> Result (case 2): one entry is missing on *both* servers.
>>
>> Both servers has NTP, and has the same timezone (ie, time is synchronized).
>>
>> I'm *not* replicating cn=config (I shouldn't, because I have different
>> SSL certificates on each server).  Now, more details:
>>
>> slapd with -d 16384 gives me this on the server that misses both
>> entries, on this server I created the entry dn
>> ou=ou2,dc=st-andes,dc=com (and the server decided to delete it!, and,
>> for some reason, it didn't detected the new ou1 entry created on the
>> other server):
>>
>> http://www.st-andes.com/openldap/case1/log-server2-case1.txt
>>
>> The other server (the one that kept one entry and lost the other), on
>> this server I created the entry ou=ou1,dc=st-andes,dc=com, and it says
>> it was changed by peer.....:
>>
>> http://www.st-andes.com/openldap/case1/log-server1-case1.txt
>>
>> Now, I'm seeing here that it is using 000 server id... but on the
>> cn=config.ldif I have:
>>
>> olcServerID: 1 ldap://ldap.ildetech.com:389/
>> olcServerID: 2 ldap://ldap2.ildetech.com:389/
>>
>> And the syncrepl:
>>
>> olcSyncRepl: rid=001 provider=ldap://ldap.ildetech.com:389
>> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
>> credentials="secret" searchbase="dc=st-andes,dc=com"
>> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
>> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
>> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
>> credentials="secret" searchbase="dc=st-andes,dc=com"
>> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
>> olcMirrorMode: TRUE
>>
>> And, as you can see on the command line, I have the URL specified on
>> the -h parameter, but it seems to be ignoring it!.  Or, should I
>> specify the *whole* urls that I put on the -h parameter?
>> (ldap://ldap2.ildetech.com:389 ldap://127.0.0.1:389/ ldaps:///
>> ldapi:///)
>>
>> So, I decided to change the config:
>>
>> On server 1 (kirara):
>>
>> olcServerID: 1
>>
>> and
>>
>> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
>> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
>> credentials="secret" searchbase="dc=st-andes,dc=com"
>> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
>> olcMirrorMode: TRUE
>>
>> On server 2 (happy):
>>
>> olcServerID: 2
>>
>> and
>>
>> olcSyncRepl: rid=002 provider=ldap://ldap2.ildetech.com:389
>> binddn="cn=admin,dc=st-andes,dc=com" bindmethod=simple
>> credentials="secret" searchbase="dc=st-andes,dc=com"
>> type=refreshAndPersist retry="3 5 5 +" timeout=7 starttls=critical
>> olcMirrorMode: TRUE
>>
>> With this new setup, and following the same procedure, I get one
>> missing entry on *both* servers (at least servers gets to a consistent
>> state), but I still have a missing entry.  The logs for this setup:
>>
>> Server 2 (ID 2, where I created entry: ou2 while the other server was
>> down), this server decided, wrongly, to delete entry ou2:
>>
>> http://www.st-andes.com/openldap/case2/log-server2-case2.txt
>>
>> And the other server (where I created ou1):
>>
>> http://www.st-andes.com/openldap/case2/log-server1-case2.txt
>>
>> This one never saw the other entry, ou2.
>>
>> For both cases, the syncprov module was with default configuration:
>>
>> dn: olcOverlay={0}syncprov
>> objectClass: olcOverlayConfig
>> objectClass: olcSyncProvConfig
>> olcOverlay: {0}syncprov
>> structuralObjectClass: olcSyncProvConfig
>> entryUUID: 24354488-e5bf-102f-9e6a-ad3cba95f7f1
>> creatorsName: cn=config
>> createTimestamp: 20110318152128Z
>> entryCSN: 20110318152128.935227Z#000000#000#000000
>> modifiersName: cn=config
>> modifyTimestamp: 20110318152128Z
>>
>> What do you think?
>>
>> Thanks in advance!
>>
>> Ildefonso Camargo
>>
>