Issue 5843 - Multi-master doesn't replicate deletes under certain circumstances.
Summary: Multi-master doesn't replicate deletes under certain circumstances.
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.13
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-03 00:19 UTC by ildefonso_camargo@yahoo.com
Modified: 2014-08-01 21:04 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description ildefonso_camargo@yahoo.com 2008-12-03 00:19:27 UTC
Full_Name: Jose Ildefonso Camargo Tolosa
Version: 2.4.13
OS: Linux
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (190.73.138.178)


Hi!

I just configured multi-master replication (with N=3) for testing
purposes, and I just found an annoying problem, under these
conditions:

1. Configure a N number of "masters", and have them replicate happily
(this is important).
2. Stop slapd service on all of the servers.
3. Start slapd service on any number of servers < N (ie, leave at
least one stopped).
4. Delete any entry (yes, it only fails with deletes, you can combine
changes: modify, add, and delete, and only deletes will fail to
replicate).
5. Stop slapd service on *all of the servers again* (this is the most
important part).
6. Start slapd service on all the servers.
7. Look for the deleted entry on the server(s) that you left stopped in steps 3
and 4, the entry is there!, but it isn't on the server(s) that were running in
step 3 and 4.

You will see that, on the servers that you left down on step 3, the
deleted entries are still present.

If you leave at least one server up, it replicates just fine, the
problem is when you stop all of the masters, and then start them again.

If you need further info, just ask!

I hope this helps,

Ildefonso Camargo

Comment 1 adrien.futschik@atosorigin.com 2009-01-16 08:37:22 UTC
Hello,

I might be facing the same problem. 
Here is what I did :

I'm testing N-Way Multi-Master replication with OpenLDAP 2.4.11 & 2.4.13

I have setup 2 Masters (m1 & m2) starting form test050-syncrepl-multimaster
and
modifying it.

Every thing seems to work fine except deleting entries.

Let me explain. 
case 1 :
   . When I add an entry on m1 it is successfully replicated on m2.
   . When I try to delete this entry on m1, it is successfully removed from m1,
but not replicated on m2.
   . When, I try to delete this entry on m2, it is successfully removed from m2
& m1.

case 2 :
   . When I add an entry on m2 it is successfully replicated on m1.
   . When I try to delete this entry on m2, it is successfully removed from m2,
but not replicated on m1.
   . When, I try to delete this entry on m1, it is successfully removed from m1
& m2.

I don't have the same problem when I delete an attribute or update an entry.

Here is how I have setup-ed my masters :

m1 -config :
dn: cn=config
objectClass: olcGlobal
cn: config
olcServerID: 1

dn: olcDatabase={0}config,cn=config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcRootPW:< file://$CONFIGPWF

m2 - config :
dn: cn=config
objectClass: olcGlobal
cn: config
olcServerID: 2

dn: olcDatabase={0}config,cn=config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcRootPW:< file://$CONFIGPWF

m1 - syncprov :
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 1 $URI1
olcServerID: 2 $URI2

dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001 provider=$URI1 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
olcSyncRepl: rid=002 provider=$URI2 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
-
add: olcMirrorMode
olcMirrorMode: TRUE

m2 - syncrepl :
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001 provider=$URI1 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
olcSyncRepl: rid=002 provider=$URI2 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
-
add: olcMirrorMode
olcMirrorMode: TRUE

m1 - schema :
include: file://$ABS_SCHEMADIR/core.ldif
include: file://$ABS_SCHEMADIR/cosine.ldif
include: file://$ABS_SCHEMADIR/inetorgperson.ldif
include: file://$ABS_SCHEMADIR/openldap.ldif
include: file://$ABS_SCHEMADIR/nis.ldif

m1 - backend :
dn: olcDatabase={1}$BACKEND,cn=config
objectClass: olcDatabaseConfig
objectClass: olc${BACKEND}Config
olcDatabase: {1}$BACKEND
olcSuffix: $BASEDN
olcDbDirectory: ./openldap-data
olcRootDN: $MANAGERDN
olcRootPW: $PASSWD
olcSyncRepl: rid=004 provider=$URI1 binddn="$MANAGERDN" bindmethod=simple
 credentials=$PASSWD searchbase="$BASEDN" type=refreshOnly
 interval=$INTERVAL retry="5 5 300 5" timeout=3
olcSyncRepl: rid=005 provider=$URI2 binddn="$MANAGERDN" bindmethod=simple
 credentials=$PASSWD searchbase="$BASEDN" type=refreshOnly
 interval=$INTERVAL retry="5 5 300 5" timeout=3
olcMirrorMode: TRUE

dn: olcOverlay=syncprov,olcDatabase={1}${BACKEND},cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

Did I miss something ?

Adrien Futschik

Comment 2 adrien.futschik@atosorigin.com 2009-01-16 14:13:28 UTC
Thanks a lot !
 
 This seems to be working fine. Is this documented anywhere ? 
 I never saw that option before.
 
 Adrien Futschik
 
 ========================================
  
  Hello,
  
  add olcSpSessionlog to syncprov. I choose 1000 as value.
  
  Best regards
  Andreas

Comment 3 adrien.futschik@atosorigin.com 2009-01-22 08:03:53 UTC
I Think I found the problem. 

I have to confirm this, but the "olcSpSessionlog" parameter doesn't seem to be necessary.
I made a mistake in my configuration. I used "refreshOnly" mode for the backend and with "refreshOnly", consumer isn't aware when a provider deletes an entry. So I understood at least. 

Here is a extract from the documentation about syncrepl : 
"Also as a consequence of the search filter used in the syncrepl specification, it is possible for a modification to remove an entry from the replication scope even though the entry has not been deleted on the provider. Logically the entry must be deleted on the consumer but in refreshOnly mode the provider cannot detect and propagate this change without the use of the session log."

Therefore I changed "refreshOnly" for "refreshAndPersist", I removed the "olcSpSessionlog" parameter, and everything seems to be working fine. At least, deletes are correctly replicated to the second master.

Here is the final configuration I am using :

m1 -config :
dn: cn=config
objectClass: olcGlobal
cn: config
olcServerID: 1

dn: olcDatabase={0}config,cn=config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcRootPW:< file://$CONFIGPWF

m2 - config :
dn: cn=config
objectClass: olcGlobal
cn: config
olcServerID: 2

dn: olcDatabase={0}config,cn=config
objectClass: olcDatabaseConfig
olcDatabase: {0}config
olcRootPW:< file://$CONFIGPWF

m1 - syncprov :
dn: cn=config
changetype: modify
replace: olcServerID
olcServerID: 1 $URI1
olcServerID: 2 $URI2

dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001 provider=$URI1 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
olcSyncRepl: rid=002 provider=$URI2 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
-
add: olcMirrorMode
olcMirrorMode: TRUE

m2 - syncrepl :
dn: olcDatabase={0}config,cn=config
changetype: modify
add: olcSyncRepl
olcSyncRepl: rid=001 provider=$URI1 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
olcSyncRepl: rid=002 provider=$URI2 binddn="cn=config" bindmethod=simple
 credentials=$CONFIGPW searchbase="cn=config" type=refreshAndPersist
 retry="5 5 300 5" timeout=3
-
add: olcMirrorMode
olcMirrorMode: TRUE

m1 - schema :
include: file://$ABS_SCHEMADIR/core.ldif
include: file://$ABS_SCHEMADIR/cosine.ldif
include: file://$ABS_SCHEMADIR/inetorgperson.ldif
include: file://$ABS_SCHEMADIR/openldap.ldif
include: file://$ABS_SCHEMADIR/nis.ldif

m1 - backend :
dn: olcDatabase={1}$BACKEND,cn=config
objectClass: olcDatabaseConfig
objectClass: olc${BACKEND}Config
olcDatabase: {1}$BACKEND
olcSuffix: $BASEDN
olcDbDirectory: ./openldap-data
olcRootDN: $MANAGERDN
olcRootPW: $PASSWD
olcSyncRepl: rid=004 provider=$URI1 binddn="$MANAGERDN" bindmethod=simple
 credentials=$PASSWD searchbase="$BASEDN" type=refreshAndPersist
 interval=$INTERVAL retry="5 5 300 5" timeout=3
olcSyncRepl: rid=005 provider=$URI2 binddn="$MANAGERDN" bindmethod=simple
 credentials=$PASSWD searchbase="$BASEDN" type=refreshAndPersist
 interval=$INTERVAL retry="5 5 300 5" timeout=3
olcMirrorMode: TRUE

dn: olcOverlay=syncprov,olcDatabase={1}${BACKEND},cn=config
changetype: add
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: syncprov

So I guess this was not a bug, but a mistake in the documentation :
http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master

Can anyone confirm ?

Adrien Futschik

Comment 4 Gavin Henry 2009-01-22 13:56:24 UTC
> So I guess this was not a bug, but a mistake in the documentation :
> http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master
> 
> Can anyone confirm ?

Yes, you are correct, apologies. Ando fixed this in test050:


-----------
1.14 Sun Nov 16 22:06:30 2008 UTC; 2 months ago by ando
Changed since 1.13: +28 -10 lines
Diffs to 1.13 (colored diff)

add indexes when supported; syncrepl on configuration should always be refreshAndPersist
-----------

and I missed it for updating the docs. The update to the guide will be in the next release, 2.4.14

I'll close this ITS now.

Thanks!

-- 
Kind Regards,

Gavin Henry.
OpenLDAP Engineering Team.

E ghenry@OpenLDAP.org

Community developed LDAP software.

http://www.openldap.org/project/

Comment 5 Gavin Henry 2009-01-22 13:57:00 UTC
moved from Incoming to Documentation
Comment 6 Gavin Henry 2009-01-22 13:57:28 UTC
changed notes
changed state Open to Test
Comment 7 Gavin Henry 2009-01-22 15:26:20 UTC
changed notes
changed state Test to Open
moved from Documentation to Software Bugs
Comment 8 Howard Chu 2009-01-22 15:42:14 UTC
adrien.futschik@atosorigin.com wrote:
> I Think I found the problem.
>
> I have to confirm this, but the "olcSpSessionlog" parameter doesn't seem
> to
be necessary.
> I made a mistake in my configuration. I used "refreshOnly" mode for the
backend and with "refreshOnly", consumer isn't aware when a provider deletes
an entry. So I understood at least.

No. Replication is supposed to work regardless of mode, refreshOnly / 
refreshAndPersist should yield identical results.

> Here is a extract from the documentation about syncrepl : "Also as a
> consequence of the search filter used in the syncrepl
specification, it is possible for a modification to remove an entry from the
replication scope even though the entry has not been deleted on the provider.
Logically the entry must be deleted on the consumer but in refreshOnly mode
the provider cannot detect and propagate this change without the use of the
session log."

The above text is correct, but it is talking about partial replication where 
the syncrepl search filter causes only a subset of the entries to be 
replicated. That does not apply to your case because you haven't set any 
special filter in your syncrepl config.

> So I guess this was not a bug, but a mistake in the documentation :
> http://www.openldap.org/doc/admin24/replication.html#N-Way%20Multi-Master
>
> Can anyone confirm ?

There is still a bug here.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 9 adrien.futschik@atosorigin.com 2009-01-23 07:23:37 UTC
 
> No. Replication is supposed to work regardless of mode, refreshOnly / 
> refreshAndPersist should yield identical results.
...
 
> There is still a bug here.
 
You mean that the fact that N-way Multi-master doesn't replicate deletes properly when using "refreshOnly" is a bug ?

But N-way Multi-master seams to be working correctly when using "refreshAndPersist". I see some reasons why using "refreshOnly" instead of "refreshAndPersist", will this be corrected in the next release of OpenLDAP ?

What is recommended with OpenLDAP 2.4.11. Should we wait for the next stable release if we intend to use N-way Multi-master in production ?

Best regards

Adrien Futschik

Comment 10 Howard Chu 2009-01-24 02:25:00 UTC
ildefonso_camargo@yahoo.com wrote:
> Full_Name: Jose Ildefonso Camargo Tolosa
> Version: 2.4.13
> OS: Linux
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (190.73.138.178)
>
>
> Hi!
>
> I just configured multi-master replication (with N=3) for testing
> purposes, and I just found an annoying problem, under these
> conditions:
>
> 1. Configure a N number of "masters", and have them replicate happily
> (this is important).
> 2. Stop slapd service on all of the servers.
> 3. Start slapd service on any number of servers<  N (ie, leave at
> least one stopped).
> 4. Delete any entry (yes, it only fails with deletes, you can combine
> changes: modify, add, and delete, and only deletes will fail to
> replicate).
> 5. Stop slapd service on *all of the servers again* (this is the most
> important part).
> 6. Start slapd service on all the servers.
> 7. Look for the deleted entry on the server(s) that you left stopped in steps 3
> and 4, the entry is there!, but it isn't on the server(s) that were running in
> step 3 and 4.
>
> You will see that, on the servers that you left down on step 3, the
> deleted entries are still present.
>
> If you leave at least one server up, it replicates just fine, the
> problem is when you stop all of the masters, and then start them again.
>
> If you need further info, just ask!

I repeated the steps you described but was unable to reproduce the problem. 
However, following the steps that Adrien provided in a followup, I was able to 
reproduce the situation and identify the problem. A fix is now in CVS HEAD, 
please test.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 Howard Chu 2009-01-24 02:27:34 UTC
changed notes
changed state Open to Test
Comment 12 Quanah Gibson-Mount 2009-01-26 20:55:01 UTC
changed notes
changed state Test to Release
Comment 13 Gavin Henry 2009-02-05 16:31:32 UTC
adrien.futschik@atosorigin.com wrote:
>  
>> No. Replication is supposed to work regardless of mode, refreshOnly / 
>> refreshAndPersist should yield identical results.
> ...
>  
>> There is still a bug here.
>  
> You mean that the fact that N-way Multi-master doesn't replicate deletes properly when using "refreshOnly" is a bug ?
> 
> But N-way Multi-master seams to be working correctly when using "refreshAndPersist". I see some reasons why using "refreshOnly" instead of "refreshAndPersist", will this be corrected in the next release of OpenLDAP ?
> 
> What is recommended with OpenLDAP 2.4.11. Should we wait for the next stable release if we intend to use N-way Multi-master in production ?

I would wait for 2.4.14.

-- 
Kind Regards,

Gavin Henry.
Managing Director.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry@suretecsystems.com

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/

Suretec Systems is a limited company registered in Scotland. Registered
number: SC258005. Registered office: 13 Whiteley Well Place, Inverurie,
Aberdeenshire, AB51 4FP.

Subject to disclaimer at http://www.suretecgroup.com/disclaimer.html

Comment 14 Quanah Gibson-Mount 2009-02-15 02:04:18 UTC
changed notes
changed state Release to Closed
Comment 15 OpenLDAP project 2014-08-01 21:04:19 UTC
fixed in HEAD
fixed in RE24