[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Testing the state of replicates



<quote who="Aaron Richton">
> [Gavin says]
>> Dig the main source. servers/slapd/syncrepl.c and
>> servers/slapd/overlays/syncprov.c
>
> Hmm, wrong source files. Try libraries/liblutil/csn.c, which sayeth:
>
>   * These routines are (loosly) based upon draft-ietf-ldup-model-03.txt,
>   * A WORK IN PROGRESS.  The format will likely change.
>   *
>   * The format of a CSN string is: yyyymmddhhmmssz#s#r#c
>   * where s is a counter of operations within a timeslice, r is
>   * the replica id (normally zero), and c is a counter of
>   * modifications within this operation.  s, r, and c are
>   * represented in hex and zero padded to lengths of 6, 3, and
>   * 6, respectively. (In previous implementations r was only 2 digits.)
>

Ah, many thanks.

>
> We use
> http://www.openldap.org/lists/openldap-software/200602/msg00158.html,
> maybe with a small mod or two (I forget), to check that contextCSN isn't
> wedged. This only works when the syncrepl thread is completely borked. A
> better check would be something along the lines of the Net::LDAP ldifdiff
> to make sure that nothing's different. Of course this has race condition
> issues (not that we make writes all that often, but on paper at least). If
> anybody has something like that as a monitoring plugin, you'd erase one
> line off my perpetual todo list...

;-) Plugin for what?

>
> (Yes, that would be of great interest to me. ~93% of syncrepl bugs we've
> seen involve very very very slight errors that only result in an entry or
> two being wrong. contextCSN being wrong...we pretty much only see that in
> the field when tcp keepalives fail to indicate the need for a
> reconnection.)
>

So the entryCSN would be wrong?