[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6059) Abandon syncprov race condition?

To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#6059) Abandon syncprov race condition?
From: hyc@symas.com
Date: Mon, 11 May 2009 02:38:31 GMT
Auto-submitted: auto-generated (OpenLDAP-ITS)

rein@OpenLDAP.org wrote:
> Full_Name: Rein Tollevik
> Version: 2.4.16
> OS: linux
> URL:
> Submission from: (NULL) (81.93.160.250)
> Submitted by: rein
>
>
> I've had two cases where a delete operation was performed on the master without
> being replicated to its consumers, which so far appear to be cases of possible
> connection lost (abandon) race conditions.  The log (level: stats) shows the
> "DEL" message of the entry, immediately followed by a "closed (connection lost)"
> message on the connection.  Note: No "RESULT" message was logged.
>
> I haven't looked very much into this, but my theory so far is that syncprov
> skipped replicating of the delete op after noticing the abandon resulting from
> loosing the connection, even though the delete had already taken place in the
> local database.  That it happened after a delete op might very well have been a
> coincident, this possible race could exist after any modify op for all I know.

> Do we need some sort of o_committed flag that can be used to prevent o_abandon
> from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with
> multiple values, including "too late"?

No. What good can that do, since the connection has already been lost?

It doesn't matter if syncprov fails to send an update to a consumer - the 
consumer's cookie state will let it pick up where it left off when it reconnects.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Prev by Date: Re: (ITS#6103) canceled operations do not respond
Next by Date: (ITS#6104) race condition with cancel operation
Index(es):
- Chronological
- Thread