[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#8490) changes not written to accesslog, causing replicas to loop syncing
- From: quanah@zimbra.com
- Date: Thu, 01 Sep 2016 07:58:24 +0000
- Auto-submitted: auto-generated (OpenLDAP-ITS)
--On Thursday, September 01, 2016 8:05 AM +0000 quanah@zimbra.com wrote:
> --On Thursday, September 01, 2016 7:52 AM +0000 quanah@openldap.org wrote:
>
>> Full_Name: Quanah Gibson-Mount
>> Version: OpenLDAP 2.4.44
>> OS: Linux 2.6
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (75.111.52.177)
>>
>>
>> In a 2-node MMR setup. Node 1 is getting a lot of write traffic. Both
>> node 1 and node 2 have 3 replicas each. At some point, a change is
>> received by node 1, which writes the change to its accesslog DB and its
>> primary DB. It's 3 replicas are all correctly updated. MMR node 2
>> receives the change, updates its primary DB, but *fails* to write the
>> change to the accesslog DB. However, it *does* write the CSN update to
>> the accesslog DB successfully. This causes all of its replicas to also
>> update their CSN. Then a change comes in triggering a constraint
>> violation on the replicas, but fully accepted by their master.
>
> So the above summary is incorrect. While 3 replicas did go out of
> sync... 2 belonged to the primary master (node1), and 1 belonged to the
> secondary master (node 2). So really, 4 systems didn't log the change
> (MMR node 2, ldap05, ldap07, ldap09).
Ok, so that's not correct either. I now have the correct topography:
ldap01 has the following replicas: ldap02, ldap05, ldap07, ldap09
ldap02 has the following replicas: ldap01, ldap06, ldap08, ldap10
So the replicas of ldap01 received the change and rejected it. ldap02 just
skipped writing the entry to the accesslog, and as a result, none of its
replicas ever got the change, and thus they never hit the failure issue of
err 19, but they all are now lacking this modification entirely.
I would note that every server was loaded today from the same ldap backup,
so they were all perfectly in sync.
In looking at the LDAP accesslog, what I see is that what should have been
a modRDN op was stored in the accesslog as a MOD op (the one I noted
before). This seems particularly bizarre, because ldap01 should have
rejected this change as well. It appears we may have a problem where the
accesslog DB is updated, but then the change got rejected by the unique
overlay.
--Quanah
--
Quanah Gibson-Mount