[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5988) entries skipped in n-way mutimaster replication



Full_Name: Adrien Futschik
Version: 2.4.15
OS: Linux RHEL 4-5 & Solaris 10
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (192.54.193.59)


Hello,

As suggested by Howard Chu, I am filling an ITS for a problem I encountered when
testing N-way multimaster with OpenLDAP 2.4.15.

Here is the situation :

I have been testing n-way multimaster replication with OpenLDAP for a while 
now (from 2.4.11, to 2.4.15) and just when I though that everything was 
working perfectly, I dicided to test N-way multimaster not only with 2 masters 
on different servers, but with 4 ! (all 4 servers are time-synced using NTP)

2 OpenLDAP instances per server.

I have been configuring syncprov and syncrepl accordingly :
olcServerID: 1 ldap://163.106.38.90:9011/
olcServerID: 2 ldap://163.106.38.92:9012/
olcServerID: 3 ldap://163.106.38.90:9013/
olcServerID: 4 ldap://163.106.38.92:9014/

olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3

I am starting with all instances synced and I am trying to add entries en all 
four instances (in //). If I do so, I have a few entries that are not 
replicated on the others. I am getting this kind of messages :

do_syncrep2: 
cookie=rid=011,sid=002,csn=20090227130003.849482Z#000000#004#000000
do_syncrep2: rid=011 CSN too old, ignoring 
20090227130003.849482Z#000000#004#000000
do_syncrep2: 
cookie=rid=013,sid=002,csn=20090227130003.849482Z#000000#004#000000
do_syncrep2: rid=013 CSN too old, ignoring 
20090227130003.849482Z#000000#004#000000
do_syncrep2: 
cookie=rid=014,sid=002,csn=20090227130003.946474Z#000000#004#000000

Did someone face the same issue ? 

Here is my configuration : (I am using refreshAndPersist mode for both cn=config

and olcDatabase={1}bdb)

M1 on IP1 / PORT1 :
dn: cn=config
objectClass: olcGlobal
cn: config
structuralObjectClass: olcGlobal
creatorsName: cn=config
olcServerID: 1 ldap://163.106.38.90:9011/
olcServerID: 2 ldap://163.106.38.92:9012/
olcServerID: 3 ldap://163.106.38.90:9013/
olcServerID: 4 ldap://163.106.38.92:9014/
entryUUID: ef89c876-adb3-4dc7-aa7d-024bbc359c98
createTimestamp: 20090227085748Z
entryCSN: 20090227085749.920499Z#000000#004#000000
modifiersName: cn=config
modifyTimestamp: 20090227085749Z
contextCSN: 20090227085752.833630Z#000000#001#000000

dn: olcDatabase={1}bdb
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcDatabase: {1}bdb
olcDbDirectory: ./openldap-data
olcSuffix: c=fr
olcRootDN: cn=admin,c=fr
olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno=
olcSizeLimit: 100
olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcTimeLimit: 600
olcMirrorMode: TRUE
olcDbCacheSize: 2000
olcDbCheckpoint: 2000 10
olcDbIndex: default pres,eq
olcDbIndex: cn,sn pres,eq,sub
olcDbIndex: objectClass,entryCSN,entryUUID eq
structuralObjectClass: olcBdbConfig
entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958
creatorsName: cn=config
createTimestamp: 20090227085752Z
entryCSN: 20090227085752.729899Z#000000#001#000000
modifiersName: cn=config
modifyTimestamp: 20090227085752Z

M2 on IP2 / PORT2 :
dn: cn=config
objectClass: olcGlobal
cn: config
structuralObjectClass: olcGlobal
entryUUID: 8da75037-65e6-4375-8c21-7e5c0194a60b
creatorsName: cn=config
createTimestamp: 20090227085723Z
olcServerID: 1 ldap://163.106.38.90:9011/
olcServerID: 2 ldap://163.106.38.92:9012/
olcServerID: 3 ldap://163.106.38.90:9013/
olcServerID: 4 ldap://163.106.38.92:9014/
entryCSN: 20090227085725.003182Z#000000#002#000000
modifiersName: cn=config
modifyTimestamp: 20090227085725Z
contextCSN: 20090227085752.833630Z#000000#001#000000

dn: olcDatabase={1}bdb
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcDatabase: {1}bdb
olcDbDirectory: ./openldap-data
olcSuffix: c=fr
olcRootDN: cn=admin,c=fr
olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno=
olcSizeLimit: 100
olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcTimeLimit: 600
olcMirrorMode: TRUE
olcDbCacheSize: 2000
olcDbCheckpoint: 2000 10
olcDbIndex: default pres,eq
olcDbIndex: cn,sn pres,eq,sub
olcDbIndex: objectClass,entryCSN,entryUUID eq
structuralObjectClass: olcBdbConfig
entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958
creatorsName: cn=config
createTimestamp: 20090227085752Z
entryCSN: 20090227085752.729899Z#000000#001#000000
modifiersName: cn=config
modifyTimestamp: 20090227085752Z

M3 on IP1 / PORT3 :
dn: cn=config
objectClass: olcGlobal
cn: config
structuralObjectClass: olcGlobal
entryUUID: cf068647-318f-4848-9c72-9c7745a8a4b3
creatorsName: cn=config
createTimestamp: 20090227085742Z
olcServerID: 1 ldap://163.106.38.90:9011/
olcServerID: 2 ldap://163.106.38.92:9012/
olcServerID: 3 ldap://163.106.38.90:9013/
olcServerID: 4 ldap://163.106.38.92:9014/
entryCSN: 20090227085743.825685Z#000000#003#000000
modifiersName: cn=config
modifyTimestamp: 20090227085743Z
contextCSN: 20090227085752.833630Z#000000#001#000000

dn: olcDatabase={1}bdb
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcDatabase: {1}bdb
olcDbDirectory: ./openldap-data
olcSuffix: c=fr
olcRootDN: cn=admin,c=fr
olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno=
olcSizeLimit: 100
olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcTimeLimit: 600
olcMirrorMode: TRUE
olcDbCacheSize: 2000
olcDbCheckpoint: 2000 10
olcDbIndex: default pres,eq
olcDbIndex: cn,sn pres,eq,sub
olcDbIndex: objectClass,entryCSN,entryUUID eq
structuralObjectClass: olcBdbConfig
entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958
creatorsName: cn=config
createTimestamp: 20090227085752Z
entryCSN: 20090227085752.729899Z#000000#001#000000
modifiersName: cn=config
modifyTimestamp: 20090227085752Z

M4 on IP2 / PORT4 :
dn: cn=config
objectClass: olcGlobal
cn: config
structuralObjectClass: olcGlobal
entryUUID: ef89c876-adb3-4dc7-aa7d-024bbc359c98
creatorsName: cn=config
createTimestamp: 20090227085748Z
olcServerID: 1 ldap://163.106.38.90:9011/
olcServerID: 2 ldap://163.106.38.92:9012/
olcServerID: 3 ldap://163.106.38.90:9013/
olcServerID: 4 ldap://163.106.38.92:9014/
entryCSN: 20090227085749.920499Z#000000#004#000000
modifiersName: cn=config
modifyTimestamp: 20090227085749Z
contextCSN: 20090227085752.833630Z#000000#001#000000

dn: olcDatabase={1}bdb
objectClass: olcDatabaseConfig
objectClass: olcBdbConfig
olcDatabase: {1}bdb
olcDbDirectory: ./openldap-data
olcSuffix: c=fr
olcRootDN: cn=admin,c=fr
olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno=
olcSizeLimit: 100
olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c
 =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe
 rsist retry="5 5 300 12 3600 +" timeout=3
olcTimeLimit: 600
olcMirrorMode: TRUE
olcDbCacheSize: 2000
olcDbCheckpoint: 2000 10
olcDbIndex: default pres,eq
olcDbIndex: cn,sn pres,eq,sub
olcDbIndex: objectClass,entryCSN,entryUUID eq
structuralObjectClass: olcBdbConfig
entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958
creatorsName: cn=config
createTimestamp: 20090227085752Z
entryCSN: 20090227085752.729899Z#000000#001#000000
modifiersName: cn=config
modifyTimestamp: 20090227085752Z

Considering that M1 & M3 are on the same server and therefore have exactly the 
same time, if this was a time related problem, I shouldn't get any "CSN too 
old" messages between M1&M3 and M2&M4, should I ?

I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, 
when M2&M3&M4 receive it, they also pass it to M2&M3&M4 ! I don't understand 
why this appends but it look's very much like this is what's happening, 
because sometimes, M2 would have passed-it to M4, before M4 has actually 
received the add order from M1.

I therefore happen to notice that sometimes, entries send from M1 are 
received in the wrong order by other masters and therefore some entries may 
be skipped !!!

Here is a example :
I add cn=M1client1 & cn=M1client2 on M1, 

M1client1 & M1client2 are successfully replicated on M2&M4 but on M3, only 
M1client2 is inserted and I am getting an "CSN too old" message for M1client1 
on M3.

I guess that M2 or M4 are not managing there queues in the right order. I don't
exactly understand why M2&M3&M4 should propagate en entry sent by M1, because
they will eventually receive the entry sent by M1.

Adrien Futschik