[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8125) MMR throws away valid changes causing drift



--On Monday, May 11, 2015 7:32 PM +0000 quanah@zimbra.com wrote:

> --On Friday, May 08, 2015 7:56 PM +0000 quanah@zimbra.com wrote:
>
>>
>>
>> --On May 4, 2015 at 8:00:45 PM +0000 openldap-its@OpenLDAP.org wrote:
>>
>>
>> In addition to the missing entries, multiple differences exist for
>> existing  entries as well.  At this point, it seems that replication with
>> 2.4.41 is  entirely unsafe.
>
> The existing entry drift seems to be related to the locked up replication
> that occured because of a bug in openssl.  So the real issue is just the
> entries that weren't created, which is a current known limitation of
> syncrepl refresh that needs fixing.

Unfortunately, I'm now seeing this at a second customer site with 2-node 
MMR, where it is running current RE24 (2.4.41).

In this case:

ldap01 and ldap02 were stopped
openldap was upgraded from 2.4.39 to 2.4.41
ldap01's db was slapcat'd
ldap02's db was erased
ldap02's db was recreated from ldap01's slapcat

Now they are having drift, and constant refresh troubles:

[zimbra@ldap01-dc01 ~]$ grep REFRESH /var/log/zimbra.log
May 15 08:56:02 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515065528.000299Z,cn=accesslog), switching to 
REFRESH
May 15 08:58:07 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 
LDAP_RES_INTERMEDIATE - REFRESH_DELETE
May 15 09:57:18 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515075718.000081Z,cn=accesslog), switching to 
REFRESH
May 15 10:46:57 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515084653.000032Z,cn=accesslog), switching to 
REFRESH
May 15 12:43:18 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515104318.000049Z,cn=accesslog), switching to 
REFRESH
May 15 16:32:19 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515143211.000250Z,cn=accesslog), switching to 
REFRESH
May 15 16:36:41 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515143556.000046Z,cn=accesslog), switching to 
REFRESH
May 15 17:23:10 ldap01-dc01 slapd[31479]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515152310.000002Z,cn=accesslog), switching to 
REFRESH

[zimbra@ldap02-dc01 xx]$ grep FRESH /var/log/zimbra.log
May 15 08:35:46 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515063539.000009Z,cn=accesslog), switching to 
REFRESH
May 15 08:47:44 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515064733.000043Z,cn=accesslog), switching to 
REFRESH
May 15 08:56:03 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515065528.000234Z,cn=accesslog), switching to 
REFRESH
May 15 08:57:24 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 
LDAP_RES_INTERMEDIATE - REFRESH_DELETE
May 15 09:31:12 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515073112.000086Z,cn=accesslog), switching to 
REFRESH
May 15 10:46:58 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515084653.000003Z,cn=accesslog), switching to 
REFRESH
May 15 13:44:11 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515114405.000313Z,cn=accesslog), switching to 
REFRESH
May 15 14:54:49 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515125448.000002Z,cn=accesslog), switching to 
REFRESH
May 15 17:23:11 ldap02-dc01 slapd[18167]: do_syncrep2: rid=201 delta-sync 
lost sync on (reqStart=20150515152311.000035Z,cn=accesslog), switching to 
REFRESH


This is caused by modify ops:

[zimbra@ldap02-dc01 xx]$ grep \(20\) /var/log/zimbra.log
May 15 08:35:46 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=xxx,dc=com (20)
May 15 08:47:44 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=yyy,dc=co,dc=za (20)
May 15 08:56:03 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=zzz,dc=co,dc=za (20)
May 15 09:31:12 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=www,dc=co,dc=za (20)
May 15 10:46:58 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=ppp,dc=co,dc=za (20)
May 15 13:44:11 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=rrr,dc=co,dc=za (20)
May 15 14:54:49 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=zzz,dc=com (20)
May 15 17:23:11 ldap02-dc01 slapd[18167]: syncrepl_message_to_op: rid=201 
be_modify uid=xxx,ou=people,dc=zzz,dc=com (20)

So overall, there seems to be significant problems with RE24 in relation to 
replication at this time.

Clocks are tightly in sync.

--Quanah

--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration