[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#4813) glue/syncrepl

hyc@symas.com wrote:
> Allan E. Johannesen wrote:
>>>>>>> "hyc" == Howard Chu <hyc@symas.com> writes:
>> hyc> Since you mention that this occurs more often in 2.3.33 than in "previous
>> hyc> releases" - what previous version are you comparing to?
>> Well, I should have said I've never seen it before.  I've generally been
>> running the new releases within a day of release, and I rebuild the data at
>> each release, so everything starts clean.  Therefore, it _may_ have existed
>> previously, but never showed up in the days during which the given releases
>> ran.
>> I guess I only mentioned that since someone saw it several releases ago in a
>> different ITS.  I never saw it before.

>> In 2.3.33, it happened right after loading the data.  I thought I did it wrong,
>> so I loaded things again and it was fine.  After some days, though, it (meaning
>> the change to "objectClass: glue") happened again.
> The only change to syncrepl between 2.3.32 and .33 was one or two debug 
> messages, no functional changes. In 2.3.32 there was no change to syncrepl at 
> all (the bug in ITS#4790 was in connection.c, not syncrepl.c). The only 
> change in 2.3.31 was also in debug messages, not functional changes. So as 
> unlikely as it seems, at the moment this appears to be a coincidence and the 
> bug must be older.
> If you see this happening repeatedly, turn on the sync debug level and 
> capture that output for a while. When you notice the problem, you should also 
> see some number of "syncrepl_del_nonpresent" messages in the log. We'll 
> probably need to see a large chunk of the log to be able to follow the 
> sequence of events.

Hm, in re-reading ITS#4626, I see a pertinent detail in followup #2. I think 
I understand part of the problem.

The particular entry was modified after the current refresh session began, so 
that entry is omitted from the current refresh results. Since the entry is 
actually missing from the refresh data, the consumer treats it as deleted. 
Since the entry has children, it cannot actually be deleted, so it gets 
turned into a glue entry.

So there's two issues - the provider should still send the UUID of the entry, 
so that the consumer doesn't consider it deleted. But also, this problem 
ought to have self-corrected. Once the replication transitioned from Refresh 
to Persist phase, the modified entry should have been sent to the consumer, 
and the glue entry should have been replaced by the correct data.

Looks like both problems are in the syncprov overlay.
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc
   OpenLDAP Core Team            http://www.openldap.org/project/