[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8493) Under heavy modrdn load, masters desync



--On Saturday, September 03, 2016 4:51 PM +0000 quanah@zimbra.com wrote:

> --On Saturday, September 03, 2016 6:15 AM +0000 quanah@openldap.org wrote:
>
>> Full_Name: Quanah Gibson-Mount
>> Version: 2.4.44+ITS8432
>> OS: Linux 2.6
>> URL: ftp://ftp.openldap.org/incoming/
>> Submission from: (NULL) (75.111.52.177)
>>
>>
>> Trying to reproduce another ITS, I discovered a new bug.  When doing
>> MODRDN ops on one master, the other master keeps going out of sync.
>> Specifically:
>>
>> Sep  3 01:12:17 zre-ldap002 slapd[29206]: syncrepl_message_to_op: rid=100
>> be_modrdn uid=user.924,ou=people,dc=zre-ldap002,dc=eng,dc=zimbra,dc=com
>> (32) Sep  3 01:12:17 zre-ldap002 slapd[29206]: do_syncrep2: rid=100
>> delta-sync lost sync on (reqStart=20160903051215.747829Z,cn=accesslog),
>> switching to REFRESH
>
>
> Note that this master also has a replica.  The replica never rejected a
> single one of these MODRDNs coming from this master.  Which means that
> either:
>
> a) The data on the master spontaneously corrupted at some point
>
> or
>
> b) The master wrote the MODRDNs to the accesslog, which the replica
> picked  up, but did not itself make the MODRDN changes to its database.
>
> In the end, of the 50,000 MODRDNs it was processing, it threw an error 32
> for 441 of them.

After the master that was not accepting direct writes re-sync'd with the 
master accepting writes, it still had 403/50000 entries wrong.  So did its 
replica.  So the master isn't writing the changes to the accesslog.  So 
it's option c.  The master rejects a valid op, never sync's correctly, and 
in the end 2/3rds of my servers have invalid databases.

I see zero indication that using a sessionlog works around 
<http://www.openldap.org/its/index.cgi/?findid=8125> at all.  I still end 
up with missed entries even with everything *in* the sessionlog.

--Quanah

--

Quanah Gibson-Mount