[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Testing the state of replicates



[Gavin says]
Dig the main source. servers/slapd/syncrepl.c and
servers/slapd/overlays/syncprov.c

Hmm, wrong source files. Try libraries/liblutil/csn.c, which sayeth:

 * These routines are (loosly) based upon draft-ietf-ldup-model-03.txt,
 * A WORK IN PROGRESS.  The format will likely change.
 *
 * The format of a CSN string is: yyyymmddhhmmssz#s#r#c
 * where s is a counter of operations within a timeslice, r is
 * the replica id (normally zero), and c is a counter of
 * modifications within this operation.  s, r, and c are
 * represented in hex and zero padded to lengths of 6, 3, and
 * 6, respectively. (In previous implementations r was only 2 digits.)


We use http://www.openldap.org/lists/openldap-software/200602/msg00158.html, maybe with a small mod or two (I forget), to check that contextCSN isn't wedged. This only works when the syncrepl thread is completely borked. A better check would be something along the lines of the Net::LDAP ldifdiff to make sure that nothing's different. Of course this has race condition issues (not that we make writes all that often, but on paper at least). If anybody has something like that as a monitoring plugin, you'd erase one line off my perpetual todo list...


(Yes, that would be of great interest to me. ~93% of syncrepl bugs we've seen involve very very very slight errors that only result in an entry or two being wrong. contextCSN being wrong...we pretty much only see that in the field when tcp keepalives fail to indicate the need for a reconnection.)