[Date Prev][Date Next]
Re: (delta-)syncrepl and nagios
On Fri, 2006-02-10 at 09:45 +0200, Buchan Milne wrote:
> On Thursday 09 February 2006 19:57, Samuel Tran wrote:
> > On Mon, 2006-02-06 at 14:41 -0500, Aaron Richton wrote:
> > > That's been on my todo list for over a year now. (So I'll join in the
> > > request for a copy if there is such a script!)
> > >
> > > If anybody does write this, it's important to note that something that
> > > strictly compares contextcsns is likely useless (I think it would just be
> > > a false positive disaster). Replication doesn't happen instantly; there
> > > should be some sort of configurable threshold for "csns should be within
> > > <time>".
> > >
> > >
> > > I've been meaning to ask the list: how many of you check up on your
> > > slaves from a consistency perspective? What do you do? (contextcsn is the
> > > approach I've wanted to take. Every time I get annoyed enough to write a
> > > nagios plugin, I notice that everything is in sync and defer it...)
> > I wrote a very generic python script with exhaustive comments/debugging.
> > It can be modified to be used as a Nagios script plugin.
> > To view a description of the script:
> > $ pydoc ldapSynchCheck
> > To view the help:
> > $ ./ldapSynchCheck.py -h
> I guess you didn't look at the perl extension script for BigBrother/Hobbit
> that I posted. It assumes that it will be able to:
> 1)read sufficient configuration information from cn=config to be able to
> determine all the databases using sync-repl, and the master for each
> database, on any server
This is a good idea. However some people may not use cn=config yet. We
don't in our production environment.
> 2)read the contextCSN for any database on any server
> anonymously, but, due to this, requires absolutely no configuration. For use
> with Hobbit, it just needs to be run on the hobbit server, and any host in
> the bb-hosts file just needs 'ol'. Of course, the hobbit server needs to be
> able to access all the LDAP servers involved.
In my script the default binding is anonymous as well. I just wanted to
have the option to bind with a specific dn.
> You may want to take a look, so a user of your script doesn't need to provide
> the URIs, but instead can just provide the server to check.
> At present, it only goes yellow (not red), since there's no real way to
> determine if the server being 3 months behind (ie you catch the 30 second
> perion it takes to replicate the first change to one database in 3 months) is
> severe enough for an error .. but it does show how far ahead (which could
> indicate checkpointing/recover problems on the master) or behind the slave is
> (so you don't have to compare contextCSNs in your head).
I will take a look at your script.
> I could take a look at making it work for nagios, but we're phasing nagios
> out, and the only LDAP servers monitored for anything by nagios don't use
I am curious, what are you going to replace Nagios with?
Thanks for your valuable comments.