[Date Prev][Date Next]
RE: Testing the state of replicates
So it seems easy to do this monitoring via some external agent/program.
Can I do something (short of writing an overlay) to get this information
with a ldap query? i.e. some query which would give me the difference
between the current contextCSN of the machine I'm talking to and the
master server. AFAICT, the existing overlays won't let me create this
kind of synthesized value.
Alternatively, I think I'd be happy with a query to tell me if the
server thinks it is having trouble talking to the master.
Thanks for everything.
From: Gavin Henry [mailto:email@example.com]
Sent: Wednesday, March 05, 2008 3:28 AM
To: Aaron Richton
Cc: Marantz, Roy; firstname.lastname@example.org
Subject: RE: Testing the state of replicates
<quote who="Aaron Richton">
> [Gavin says]
>> Dig the main source. servers/slapd/syncrepl.c and
> Hmm, wrong source files. Try libraries/liblutil/csn.c, which sayeth:
> * These routines are (loosly) based upon
> * A WORK IN PROGRESS. The format will likely change.
> * The format of a CSN string is: yyyymmddhhmmssz#s#r#c
> * where s is a counter of operations within a timeslice, r is
> * the replica id (normally zero), and c is a counter of
> * modifications within this operation. s, r, and c are
> * represented in hex and zero padded to lengths of 6, 3, and
> * 6, respectively. (In previous implementations r was only 2
Ah, many thanks.
> We use
> maybe with a small mod or two (I forget), to check that contextCSN
> wedged. This only works when the syncrepl thread is completely borked.
> better check would be something along the lines of the Net::LDAP
> to make sure that nothing's different. Of course this has race
> issues (not that we make writes all that often, but on paper at
> anybody has something like that as a monitoring plugin, you'd erase
> line off my perpetual todo list...
;-) Plugin for what?
> (Yes, that would be of great interest to me. ~93% of syncrepl bugs
> seen involve very very very slight errors that only result in an entry
> two being wrong. contextCSN being wrong...we pretty much only see that
> the field when tcp keepalives fail to indicate the need for a
So the entryCSN would be wrong?