[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#8444) Out-of-sync issue with memberOf overlay, Delta-syncrepl MMR and >2 nodes
- To: openldap-its@OpenLDAP.org
- Subject: Re: (ITS#8444) Out-of-sync issue with memberOf overlay, Delta-syncrepl MMR and >2 nodes
- From: okuznik@symas.com
- Date: Tue, 18 Jul 2017 16:28:18 +0000
- Auto-submitted: auto-generated (OpenLDAP-ITS)
On Thu, Jun 08, 2017 at 06:36:02PM +0100, Ond=C5=99ej Kuzn=C3=ADk wrote:
> A more self-contained log of the same issue, available at
> ftp://ftp.openldap.org/incoming/its8444.log
>=20
> (line numbers below are against current master, commit
> 91f4d3a6b75e73bf4ea498e83e2e4cb4e7a320e0)
>=20
> There are some things that occur in all the failures I have seen so far=
:
> - the server that received the operation (#1) sends the accesslog entry
> with no CSN in the cookie, then another provider (#2) picks up this
> message and relays it to its consumers, this one with a CSN in the
> cookie
> - a consumer picks up these two in short succession, in the log above,
> processing of the one from #2 is finished first (they are being
> processed concurrently)
>=20
> Usually, once one of them gets processed, the new CSN should be noted
> and the other threads should just skip it (syncrepl.c:943 and onwards).
> In this one, having no CSN in the cookie seems to allow both to process
> so far as to run syncrepl_message_to_op(), and one of them will then
> inevitably fail to add the entry.
>=20
> I don't understand yet why server #1 sends the operations without a CSN
> in the cookie and (especially if I reorder the overlays to set up
> memberof last), the race goes the other way around and the operation to
> fail is the one from server #2.
>=20
> My take on it was that in a delta-sync environment all entries would be
> passed with a new CSN and that should end up in the cookie, allowing
> syncrepl.c:986 to do its job.
Using the behaviour above, I have been able to trigger a desync, the
script and testrun directory from it happening are available here:
ftp://ftp.openldap.org/incoming/its8444-desync.tgz
In srv3, looking at cn=3Daccesslog, we can see that the increment by 2 ha=
s
been applied (and logged) twice with the same entryCSN, as:
reqStart=3D20170718155142.000007Z,cn=3Daccesslog
reqStart=3D20170718155142.000009Z,cn=3Daccesslog
also seen in the log around those two DNs above.
--=20
Ond=C5=99ej Kuzn=C3=ADk
Senior Software Engineer
Symas Corporation http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP