Re: (delta-)syncrepl and nagios

On Fri, 2006-02-10 at 09:45 +0200, Buchan Milne wrote:
> On Thursday 09 February 2006 19:57, Samuel Tran wrote:
> > On Mon, 2006-02-06 at 14:41 -0500, Aaron Richton wrote:
> > > That's been on my todo list for over a year now. (So I'll join in the
> > > request for a copy if there is such a script!)
> > >
> > > If anybody does write this, it's important to note that something that
> > > strictly compares contextcsns is likely useless (I think it would just be
> > > a false positive disaster). Replication doesn't happen instantly; there
> > > should be some sort of configurable threshold for "csns should be within
> > > <time>".
> > >
> > >
> > > I've been meaning to ask the list: how many of you check up on your
> > > slaves from a consistency perspective? What do you do? (contextcsn is the
> > > approach I've wanted to take. Every time I get annoyed enough to write a
> > > nagios plugin, I notice that everything is in sync and defer it...)
> >
> > I wrote a very generic python script with exhaustive comments/debugging.
> > It can be modified to be used as a Nagios script plugin.
> >
> > To view a description of the script:
> > $ pydoc ldapSynchCheck
> >
> > To view the help:
> > $ ./ldapSynchCheck.py -h
> >
> I guess you didn't look at the perl extension script for BigBrother/Hobbit 
> that I posted. It assumes that it will be able to:
> 1)read sufficient configuration information from cn=config to be able to 
> determine all the databases using sync-repl, and the master for each 
> database, on any server

This is a good idea. However some people may not use cn=config yet. We
don't in our production environment.

> 2)read the contextCSN for any database on any server
> anonymously, but, due to this, requires absolutely no configuration. For use 
> with Hobbit, it just needs to be run on the hobbit server, and any host in 
> the bb-hosts file just needs 'ol'. Of course, the hobbit server needs to be 
> able to access all the LDAP servers involved.

In my script the default binding is anonymous as well. I just wanted to
have the option to bind with a specific dn.

> You may want to take a look, so a user of your script doesn't need to provide 
> the URIs, but instead can just provide the server to check.
> http://www.zarb.org/~bgmilne/hobbit/
> At present, it only goes yellow (not red), since there's no real way to 
> determine if the server being 3 months behind (ie you catch the 30 second 
> perion it takes to replicate the first change to one database in 3 months) is 
> severe enough for an error .. but it does show how far ahead (which could 
> indicate checkpointing/recover problems on the master) or behind the slave is 
> (so you don't have to compare contextCSNs in your head).

I will take a look at your script.

> I could take a look at making it work for nagios, but we're phasing nagios 
> out, and the only LDAP servers monitored for anything by nagios don't use 
> sync-repl.

I am curious, what are you going to replace Nagios with?

Thanks for your valuable comments.

Best Regards,