[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Replication delay



On 3/22/19 7:54 AM, Angel L. Mateo wrote:
> El 21/3/19 a las 20:26, Michael Ströder escribió:
>> On 3/21/19 8:22 AM, Ángel L. Mateo wrote:
>>> Now the server with problems works without problems for days, but
>>> then it start delaying syncs.
>> How do you detect this?
>>
>     Checking contextCSN attribute of all ldap servers. I get something
> like this:
> 
> contextCSN: 20190322064915.077600Z#000000#01f#000000
> contextCSN: 20190322065006.637604Z#000000#020#000000
> contextCSN: 20190322065002.859879Z#000000#021#000000
> contextCSN: 20190322065000.303715Z#000000#022#000000
> contextCSN: 20190301102558.398349Z#000000#027#000000
> contextCSN: 20190314080533.305657Z#000000#029#000000
> 
>     There is one value for every server. When everything is ok, these
> values are the same in all servers. But sometimes in the new server are
> different, with values older than the in the others.

This is most times caused by an OpenLDAP bug. I see this quite often
with MMR providers even though the entries have been correctly
replicated to the other providers. Hence I asked for your detection method.

I've double-checked the code of my monitoring script very often!

And I'm not the only one seeing this false alarm in the monitoring.
E.g. two guys approached me after my OpenLDAP lightning monitoring talk
at FOSDEM reporting the same issue. And they use another monitoring tool.

So please check whether changes were correctly replicated instead.
Yes, that's nearly impossible if you have many changes on many entries.

I've considered to search the highest entryCSN value per provider ID
(server-side sorting on entryCSN, search limit 1) to compare it against
its accompanying contextCSN value. But during first superficial tests I
only got strange results. I have to investigate further before I can
come up with detailed results.

Ciao, Michael.

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature