[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#8444) Out-of-sync issue with memberOf overlay, Delta-syncrepl MMR and >2 nodes



On Wed, Aug 23, 2017 at 02:42:29PM +0100, Ond=C5=99ej Kuzn=C3=ADk wrote:
> It is caused by the cookie not containing CSN and a race between the
> syncCookie check in do_syncrep2 and syncrepl_message_to_op.
>=20
> This race is probably fine with plain syncrepl which is idempotent, but
> deltasync changes get their own dn in each accesslog instance and some
> can be applied twice unless we know how to find out we've already seen
> them - they need to mention the CSN.
>=20
> The CSN itself gets lost on at least one occasion - when there's a
> checkpoint triggered. Not 100 % sure why the cookie gets eaten because
> of it, the op pointer is different between the syncprov_op_response tha=
t
> calls syncprov_checkpoint and the one that decides CSN hasn't changed.

Yes, whenever a checkpoint happens, the syncCookie in cn=3Daccesslog only
contains rid=3DXXX,sid=3DYYY. I thought that was because the checkpoint
results in a new accesslog entry and that would be transmitted first,
but that's not the case, there is no accesslog entry nor anything sent
to the client (as observed by ldapsearch -E sync=3Drp).

I think it looks like this: syncproc_checkpoint modifies the suffix
entry, that calls slap_graduate_commit_csn and the csn is removed from
be_pending_csn_list. accesslog_response then can't find the CSN there
and has nothing to insert into its own pending csn list. Strange that
changing the overlay order (accesslog vs. syncprov) doesn't change this
behaviour, something I'd expect if the above is the reason this happens.

--=20
Ond=C5=99ej Kuzn=C3=ADk
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP