Issue 6489 - Incomplete Master/Slave Replication
Summary: Incomplete Master/Slave Replication
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.21
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-09 12:23 UTC by frank.offermanns@caseris.de
Modified: 2017-03-28 15:43 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description frank.offermanns@caseris.de 2010-03-09 12:23:17 UTC
Full_Name: Frank Offermanns
Version: 2.4.21
OS: Windwos
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (217.6.189.242)


I am running LDAP from Head with BDB 4.8.26 as backend on windows. I am testing
master/slave replication while adding 10000 users with 10 filled attributes. 
I am not using a specific time synchronization. Standard windows time
synchronization in domains is active. But as far as I understood, a microsecond
time synchronisation is only needed for master-master, which I am not testing. 

My problem is, that not every user and/or every attribute is replicated from my
master to my slave. 

If the replicationmode is refreshAndPersist more entries are incorrect on the
slave. 
When doing refreshOnly (every 5 mins) fewer entries have this problem.
Switching to delta-syncrepl changes this behaviour. 
With delta-syncrepl once the intial content load has run, the replication seems
to work 100%.
But when I add entries while the inital content load (empty slave database) is
running, a few complete users are missing (seems to be always a few in a row).
But every written user has all attributes, so no missing attributes in this
case. 
I will post my configuration (for refreshAndPersist replication). 
If you need any other info please let me known. 

Here my configurations:
Master:
ucdata-path	./ucdata
include		./schema/core.schema
include		./schema/cosine.schema
include		./schema/Personcaesar.schema
include		./schema/ConfigObjects.schema

loglevel	0

pidfile		./run/slapd.pid
argsfile	./run/slapd.args


access to * by dn.one="ou=Admins,o=caesar" write
        by * read


#######################################################################
# BDB database definitions
#######################################################################

database	hdb
cachesize       10000
idlcachesize	30000
suffix		""
checkpoint      1024    5
rootdn		"cn=Administrator,o=caesar"
rootpw		{SHA}secret...

directory	"c:/all2421/data"
dbconfig set_cachesize	0	400000000	1
dbconfig set_flags	DB_LOG_AUTOREMOVE
dbconfig set_lg_regionmax	1048576
dbconfig set_lg_max	10485760
dbconfig set_lg_bsize	2097152


# Indices to maintain
index	sn		pres,eq
index	cn		pres,eq,sub
index	MasterApp	pres,eq
index	RightVoice	pres,eq
index	DCOMServer	pres,eq
index	ExtensionSMS	pres,eq
index	ExtensionFax	pres,eq,sub
index	ExtensionVoice	pres,eq,sub
index	ExtensionCTI	pres,eq
index	Deleted		pres,eq
index	GUID		pres,eq
index	CTIServerName	pres,eq,sub
index	LastSyncUser	pres,eq
index 	ApplicationPhoneNr	pres,eq,sub
index	NetDialLoginName	pres,eq
index	email		pres,eq
index	FullExtensionVoice	pres,eq,sub
index	FullExtensionFax	pres,eq,sub
index	FullExtensionSMS	pres,eq
index   FullName	pres,eq
index   PersonalID	pres,eq
index   entryUUID	eq
index   entryCSN	eq
index	objectClass 	eq


overlay 	syncprov
syncprov-checkpoint 1000 60
syncprov-sessionlog 10000

________________________________________________
Slave:
ucdata-path	./ucdata
include		./schema/core.schema
include		./schema/cosine.schema
include		./schema/Personcaesar.schema
include		./schema/ConfigObjects.schema

loglevel	0

pidfile		./run/slapd.pid
argsfile	./run/slapd.args

access to * by dn.one="ou=Admins,o=caesar" write
        by * read

#######################################################################
# BDB database definitions
#######################################################################

database	hdb
cachesize       10000
idlcachesize	30000
suffix		""
checkpoint      1024    5
rootdn		"cn=Administrator,o=caesar"
rootpw		{SHA}secret....

directory	"c:/all2421_48/data"
dbconfig set_cachesize	0	400000000	1
dbconfig set_flags	DB_LOG_AUTOREMOVE
dbconfig set_lg_regionmax	1048576
dbconfig set_lg_max	10485760
dbconfig set_lg_bsize	2097152


# Indices to maintain
index	sn		pres,eq
index	cn		pres,eq,sub
index	MasterApp	pres,eq
index	RightVoice	pres,eq
index	DCOMServer	pres,eq
index	ExtensionSMS	pres,eq
index	ExtensionFax	pres,eq,sub
index	ExtensionVoice	pres,eq,sub
index	ExtensionCTI	pres,eq
index	Deleted		pres,eq
index	GUID		pres,eq
index	CTIServerName	pres,eq,sub
index	LastSyncUser	pres,eq
index 	ApplicationPhoneNr	pres,eq,sub
index	NetDialLoginName	pres,eq
index	email		pres,eq
index	FullExtensionVoice	pres,eq,sub
index	FullExtensionFax	pres,eq,sub
index	FullExtensionSMS	pres,eq
index   FullName	pres,eq
index   PersonalID	pres,eq
index   entryUUID	eq
index   entryCSN	eq
index	objectClass 	eq


syncrepl       rid=001
               provider="ldap://CAS-WS091201.domain.local"
               searchbase="o=caesar"
               type=refreshAndPersist
               retry="5 3 15 +"
               binddn="cn=Administrator,o=caesar"
               bindmethod=simple
               credentials="secret"

sizelimit size.soft=100 size.hard=1000 size.prtotal=unlimited
limits dn.exact="cn=Administrator,o=caesar" time.soft=unlimited
time.hard=unlimited size.soft=unlimited size.hard=unlimited


Comment 1 Quanah Gibson-Mount 2010-07-22 18:03:31 UTC
--On Tuesday, March 09, 2010 12:23 PM +0000 Frank.Offermanns@caseris.de 
wrote:

> Full_Name: Frank Offermanns
> Version: 2.4.21
> OS: Windwos
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (217.6.189.242)
>
>
> I am running LDAP from Head with BDB 4.8.26 as backend on windows. I am
> testing master/slave replication while adding 10000 users with 10 filled
> attributes.  I am not using a specific time synchronization. Standard
> windows time synchronization in domains is active. But as far as I
> understood, a microsecond time synchronisation is only needed for
> master-master, which I am not testing.

BDB 4.8.26 is known problematic.  Please use BDB 4.8.30, and OpenLDAP 
2.4.23, and report back if you still have issues.

Thanks!

--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 2 frank.offermanns@caseris.de 2010-07-23 12:38:30 UTC
Hello Quanah,

thanks for your info.
I compiled OpenLDAP 2.4.23 and BDB 4.8.30, but unfortunately the problem 
still persists. 
I added 10000 Users added and:
with RefreshAndPersist in 136 users at least one attribute was missing. 
with RefreshOnly in 34 users at least one attribute was missing. 

In my post in March I said, that with accesslog I do not have this 
problem. This is not completely true.
Meanwhile (also with 2.4.21) I found out that with accesslog I also have 
this problem, but less frequentely.
So with about 40000 users added I had only about 1 or 2 attributes 
missing. With the new version I added 10000, 20000 and 80000 users without 
a single problem. So I am not sure, if this problem still persist with 
accesslog, but with RefreshAndPersist and RefreshOnly there is definitely 
still a problem. 
Due to the fact that no one else complains about this, maybe this problem 
is Windows only?
Or is there someone else who also has this problem?

If you are interested I could provide you my windows test-application. 
With this the problem can be reproduced in a few minutes. Let me know if I 
should send you this program. 

Thanks a lot so far.

Best regards,
Frank Offermanns

> 
> > Full_Name: Frank Offermanns
> > Version: 2.4.21
> > OS: Windwos
> > URL: ftp://ftp.openldap.org/incoming/
> > Submission from: (NULL) (217.6.189.242)
> >
> >
> > I am running LDAP from Head with BDB 4.8.26 as backend on windows. I 
am
> > testing master/slave replication while adding 10000 users with 10 
filled
> > attributes.  I am not using a specific time synchronization. Standard
> > windows time synchronization in domains is active. But as far as I
> > understood, a microsecond time synchronisation is only needed for
> > master-master, which I am not testing.
> 
> BDB 4.8.26 is known problematic.  Please use BDB 4.8.30, and OpenLDAP 
> 2.4.23, and report back if you still have issues.
> 
> Thanks!
> 
> --Quanah
> 
> --
> 
> Quanah Gibson-Mount
> Principal Software Engineer
> Zimbra, Inc
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration

Comment 3 frank.offermanns@caseris.de 2010-09-14 08:42:17 UTC
Hello,

after digging very deep to find out, what the problem could be, I found 
interesting facts:
1)
With our new client-DLL the master/slave replication problem seems to be 
solved. (our old DLL wrote each attribute separately, our new writes all 
in a row), but
2)
So I switched back to Master/Master in hope that this works also, but I 
found out the following.
- my client-DLL first create a new user entry and then it does a add with 
all attributes.
- at the second master server I activated replication log and saw the 
following short message (the important part is surounded with ________):
"do_syncrep2: rid=001 
cookie=rid=001,sid=001,csn=20100914081312.596142Z#000000#00
1#000000
syncrepl_entry: rid=001 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
syncrepl_entry: rid=001 be_search (0)
syncrepl_entry: rid=001 cn=10008,o=BAD_CLIENT3,ou=users,o=caesar
slap_queue_csn: queing 09369668 20100914081312.596142Z#000000#001#000000
syncprov_matchops: skipping original sid 001
slap_graduate_commit_csn: removing 09377b10 
20100914081312.596142Z#000000#001#00
0000
syncrepl_entry: rid=001 be_add cn=10008,o=BAD_CLIENT3,ou=users,o=caesar 
(0)
slap_queue_csn: queing 09369668 20100914081312.596142Z#000000#001#000000
syncprov_matchops: skipping original sid 001
slap_graduate_commit_csn: removing 09377b10 
20100914081312.596142Z#000000#001#00
0000
do_syncrep2: rid=001 cookie=
syncrepl_entry: rid=001 LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_MODIFY)
__________________________________________________________________
dn_callback : new entry is older than ours 
cn=10008,o=BAD_CLIENT3,ou=users,o=cae
sar ours 20100914081312.596142Z#000000#001#000000, new 
20100914081312.100780Z#00
0000#001#000000
___________________________________________________________________
syncrepl_entry: rid=001 be_search (0)
syncrepl_entry: rid=001 cn=10008,o=BAD_CLIENT3,ou=users,o=caesar
syncrepl_entry: rid=001 entry unchanged, ignored 
(cn=10008,o=BAD_CLIENT3,ou=user
s,o=caesar)"

Is it a bug, or a result of a bad time synchronization (I only have 
windows standard time synchronization)
But if it would be a time synchronization problem, in recent posts I asked 
when the time synchronization is important. I got the answer (if 
understood correctly) that
the time synchronization only matters if concurrent write operations are 
made.  So it should'nt be an issue here, since I made my write operations 
only at one master. 

Best regards,
Frank

Comment 4 Quanah Gibson-Mount 2017-03-28 00:05:10 UTC
Hi Frank,

I believe this is a duplicate of ITS#8281.  Can you test with current RE24 
and see if you can still produce this problem?

Thanks,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 5 OpenLDAP project 2017-03-28 00:05:29 UTC
ITS#8281?
Comment 6 Quanah Gibson-Mount 2017-03-28 00:05:29 UTC
changed notes
changed state Open to Closed
moved from Incoming to Software Bugs
Comment 7 Quanah Gibson-Mount 2017-03-28 15:43:30 UTC
--On Tuesday, March 28, 2017 10:21 AM +0200 Frank Offermanns 
<Frank.Offermanns@caseris.de> wrote:

>
> Hello Quanah,
>
> as mentioned before, I am no longer able to compile the new versions of
> OpenLDAP. There has been some changes which made it no longer compilable
> with MSYS/Mingw at windows os (at least with my version of mingw/msys).
> Therefore I am sorry I can't test the new version, bacause I can't build
> it.

Hi Frank,
Is it possible for you to update to a current version of MSYS/Mingw? 
Looking at ITS#8127, it appears running a current version from 
<https://sourceforge.net/projects/mingw-w64/files/mingw-w64/mingw-w64-release/> 
should fix the build issues you've encountered.

Thanks,
Quanah



--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>