Issue 6648 - Syncrepl cold refresh fails with mirrormode supplier
Summary: Syncrepl cold refresh fails with mirrormode supplier
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.23
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-15 16:36 UTC by Andrew Findlay
Modified: 2020-03-19 22:09 UTC (History)
0 users

See Also:


Attachments
ajf-syncrepl-config-20100915a.tgz (16.14 KB, application/x-compressed)
2020-03-19 22:08 UTC, Quanah Gibson-Mount
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Andrew Findlay 2010-09-15 16:36:17 UTC
Full_Name: Andrew Findlay
Version: 2.4.23
OS: OpenSuSE 11.1
URL: ftp://ftp.openldap.org/incoming/ajf-syncrepl-config-20100915a.tgz
Submission from: (NULL) (88.97.25.132)


When a read-only consumer server uses one peer of a mirrormode pair as its
supplier, there is a case where the initial refresh phase of the synchronisation
loses deletions.

This is very similar to the case where a single master server has its serverID
changed between taking a slapcat backup to LDIF and bringing up a consumer
server fom that LDIF file. See this thread for some background:
http://www.openldap.org/lists/openldap-technical/201009/msg00193.html

The problem occurs when a deletion was originated on a server whose serverID
does not match the serverID of the supplier when a new consumer is brought up.
It is masked by the volatile syncprov-sessionlog if the deletion is still in the
provider's log, which is why the instructions below stop and start the servers
so much.

I have uploaded the configs and data files for this to ftp.openldap.org.


Case 1: serverID of supplier server changes during the creation of a consumer:

people.ldif contains entries #1, #2, #3, #4 under dc=people,dc=example,dc=org

1)      Load people.ldif into empty master server
2)      Start master                              
3)      Stop master                               
4)      Dump master DB using slapcat > m1.ldif    
5)      Start master                              
6)      Delete entry#1                            
7)      Stop master                               
8)      change master serverID                    
9)      Start master                              
10)     Dump master DB using slapcat > m2.ldif    
11)     Delete entry#2                            
11a)            Stop master                       
11b)            Start master                      
12)     Load m1.ldif into empty consumer server   
13)     Start consumer                            
14)     Check ContextCSN matches on each server   
15)     Check for Entry#1 Entry#2 Entry#3         

If 11a/b are run:
        Entry#1 and Entry#2 are still found on the consumer

If 11a/b are not run:
        Entry#1 is still found on the consumer
        Entry#2 is deleted - presumably because it was in the volatile
sessionlog


Case 2: mirrormode pair supplying read-only consumer:

1)      Load people.ldif into empty peer server with serverID 1
2)      Start peer 1                                            
3)      Start peer 2 and allow it to sync up                    
4)      Stop peer 1                                             
4)      Dump peer 1 DB using slapcat > p1.ldif
5)      Start peer 1
6)      Delete entry#1 on peer 1
7)      Delete entry#2 on peer 2
8)      Verify that both entries have gone on both servers
9)      Stop both peer servers (this clears the volatile syncrepl-sessionlog)
10)     Start peer 1 and peer 2
11)     Load p1.ldif into empty consumer server (which is configured to sync
from peer 1)
12)     Start consumer
13)     Check ContextCSN matches on each server
14)     Check for Entry#1 Entry#2 Entry#3 on all servers

Entries #1 and #2 have gone on both mirrormode peers
Entries #1 and #2 are still present on the consumer
The consumer only has a ContextCSN value from peer 1

15)     Restart the consumer server

The consumer now has both ContextCSN values but it still has entries #1 and #2

16)     Delete Entry #3 on peer 1
17)     Check for entry #3 on all servers

Correctly deleted

18)     Delete Entry #4 on peer 2
19)     Check for entry #3 on all servers

Correctly deleted

Entries #1 and #2 are still present on the consumer


Andrew
Comment 1 Quanah Gibson-Mount 2017-04-07 23:29:55 UTC
Hi Andrew,

Sorry for the delay in response to this issue report.  I believe I actually 
encountered this same problem at a later point, and it is fixed in current 
OpenLDAP.  Would you be able to confirm?

Thanks,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 2 OpenLDAP project 2017-04-07 23:30:20 UTC
May already be fixed since 2.4.23
Comment 3 Quanah Gibson-Mount 2017-04-07 23:30:20 UTC
changed notes
moved from Incoming to Software Bugs
Comment 4 Quanah Gibson-Mount 2020-03-19 22:08:29 UTC
Created attachment 618 [details]
ajf-syncrepl-config-20100915a.tgz

From FTP server
Comment 5 Quanah Gibson-Mount 2020-03-19 22:09:32 UTC
Believed fixed, can re-open if reproduced.