[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: syncrepl: consumer state is newer than provider



Hi Howard,

I have tried the slapd -c option with a rid value, and it
also tries to resync the entire directory when doing that
while comparing CSNs. There is also a cid value which can
be passed to the -c option, but I was unable to find an
example of what to pass in there. Is it just a contextCSN value?
Thanks.

cheers,

Ven

-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com] 
Sent: August-02-11 2:35 PM
To: Mahadevan, Venkatasubramanian
Cc: Chris Jacobs; 'openldap-technical@openldap.org'
Subject: Re: syncrepl: consumer state is newer than provider

Mahadevan, Venkatasubramanian wrote:
> Hi David,
>
> Thanks much for your response.
> That's what I did but when I do that it seems to take forever to 
> recover using syncrepl as it goes through all the entries in the 
> databases comparing CSNs. So what I did was stop slapd and rebuild the 
> database using slapadd with the -w option to preserve syncrepl 
> information. After that, replication started working again, but it's a 
> less than ideal way to recover from a replication failure. Perhaps the 
> inherent nature of 2 master servers being updated leads to replication 
> conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN values are out of sync?

Next time try the slapd -c option.

> cheers,
>
> Ven
>
> ________________________________________
> From: Chris Jacobs [Chris.Jacobs@apollogrp.edu]
> Sent: Monday, August 01, 2011 8:33 AM
> To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org'
> Subject: Re: syncrepl: consumer state is newer than provider
>
> Apologies for top posting - blackberry.
>
> Short term fix:
> Pick a server, take it offline (stop slapd).
> Clear it's database - be careful to not delete any db config files.
> Start it back up.
>
> If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.
>
> Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.
>
> - chris
>
> Chris Jacobs, Systems Administrator, Technology Services Group Apollo 
> Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc.
> 2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct 
> 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email 
> chris.jacobs@apollogrp.edu
>
> ________________________________
> From: 
> openldap-technical-bounces@OpenLDAP.org<openldap-technical-bounces@Ope
> nLDAP.org>
> To: openldap-technical@openldap.org<openldap-technical@openldap.org>
> Sent: Fri Jul 29 14:03:06 2011
> Subject: syncrepl: consumer state is newer than provider
>
> Hello,
>
> I have 2 OpenLDAP servers with the following configuration:
>
> -- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5
> -- The two servers are setup in a mirrored multi-master configuration. 
> Below is the relevant portion of the slapd.conf:
>
>
> server1
> ----------
> syncrepl rid=002
> provider=ldaps://server2
> type=refreshAndPersist
> retry="5 5 300 +"
> searchbase="o=ourdomain.ca"
> attrs="*,+"
> bindmethod=simple
> binddn="cn=Replication Manager,o=ubc.ca"
> credentials=something
>
> mirrormode TRUE
> overlay syncprov
> syncprov-checkpoint 100 10
>
> server2
> ----------
> syncrepl rid=001
> provider=ldaps://server1
> type=refreshAndPersist
> retry="5 5 300 +"
> searchbase="o=ourdomain.ca"
> attrs="*,+"
> bindmethod=simple
> binddn="cn=Replication Manager,o=ubc.ca"
> credentials=something
>
> mirrormode TRUE
> overlay syncprov
> syncprov-checkpoint 100 10
>
> The servers have their clocks synchronized using ntp. Below is the output of ntpq:
>
> server1
> ----------
> ntpq>  peer
>       remote           refid      st t when poll reach   delay   offset  jitter
> ======================================================================
> ========
> +hub.ubc.ca      93.113.2.250     3 u  594 1024  377    1.252    1.110   1.520
> *dns3.ubc.ca     192.53.103.108   2 u   92 1024  377    1.648    2.670   0.157
>
> server2
> ----------
> ntpq>  peer
>       remote           refid      st t when poll reach   delay   offset  jitter
> ======================================================================
> ========
> +hub.ubc.ca      93.113.2.250     3 u  332 1024  377    0.706    3.487   0.900
> *dns3.ubc.ca     192.53.103.108   2 u  325 1024  377    1.631    3.668   0.022
>
>
> As far as I can tell the clocks appear to be in sync with each other, 
> so hopefully this is not a cause of the replication issues I am having.
>
> The problem is that the servers are now refusing to synchronize with 
> each other (replication was working
> before) but not it does not. The log files on the servers are filled with entries like:
>
> server1
> ----------
> Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 
> LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]: 
> do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling 
> to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
> Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* + 
> Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!
>
> server2
> ----------
> Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 
> LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]: 
> do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling 
> to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
> Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul 
> 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!
>
>
> So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on  both servers and got the following values:
>
> server1
> ----------
> 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000
> #002#000000
>
> server2
> ----------
> 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000
> #002#000000
>
>
> So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again?
> As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!
>
> cheers,
>
> Ven
>
> ________________________________
> This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.
>
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/