[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6059) Abandon syncprov race condition?



rein@OpenLDAP.org wrote:
> Full_Name: Rein Tollevik
> Version: 2.4.16
> OS: linux
> URL:
> Submission from: (NULL) (81.93.160.250)
> Submitted by: rein
>
>
> I've had two cases where a delete operation was performed on the master without
> being replicated to its consumers, which so far appear to be cases of possible
> connection lost (abandon) race conditions.  The log (level: stats) shows the
> "DEL" message of the entry, immediately followed by a "closed (connection lost)"
> message on the connection.  Note: No "RESULT" message was logged.
>
> I haven't looked very much into this, but my theory so far is that syncprov
> skipped replicating of the delete op after noticing the abandon resulting from
> loosing the connection, even though the delete had already taken place in the
> local database.  That it happened after a delete op might very well have been a
> coincident, this possible race could exist after any modify op for all I know.

> Do we need some sort of o_committed flag that can be used to prevent o_abandon
> from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with
> multiple values, including "too late"?

No. What good can that do, since the connection has already been lost?

It doesn't matter if syncprov fails to send an update to a consumer - the 
consumer's cookie state will let it pick up where it left off when it reconnects.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/