[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (delta-)syncrepl and nagios



On Friday 10 February 2006 16:54, Samuel Tran wrote:
> On Fri, 2006-02-10 at 09:45 +0200, Buchan Milne wrote:
> > On Thursday 09 February 2006 19:57, Samuel Tran wrote:
> > > On Mon, 2006-02-06 at 14:41 -0500, Aaron Richton wrote:
> > > > That's been on my todo list for over a year now. (So I'll join in the
> > > > request for a copy if there is such a script!)
> > > >
> > > > If anybody does write this, it's important to note that something
> > > > that strictly compares contextcsns is likely useless (I think it
> > > > would just be a false positive disaster). Replication doesn't happen
> > > > instantly; there should be some sort of configurable threshold for
> > > > "csns should be within <time>".
> > > >
> > > >
> > > > I've been meaning to ask the list: how many of you check up on your
> > > > slaves from a consistency perspective? What do you do? (contextcsn is
> > > > the approach I've wanted to take. Every time I get annoyed enough to
> > > > write a nagios plugin, I notice that everything is in sync and defer
> > > > it...)
> > >
> > > I wrote a very generic python script with exhaustive
> > > comments/debugging. It can be modified to be used as a Nagios script
> > > plugin.
> > >
> > > To view a description of the script:
> > > $ pydoc ldapSynchCheck
> > >
> > > To view the help:
> > > $ ./ldapSynchCheck.py -h
> >
> > I guess you didn't look at the perl extension script for
> > BigBrother/Hobbit that I posted. It assumes that it will be able to:
> > 1)read sufficient configuration information from cn=config to be able to
> > determine all the databases using sync-repl, and the master for each
> > database, on any server
>
> This is a good idea. However some people may not use cn=config yet. We
> don't in our production environment.

Note I made a mistake here, that should be cn=monitor.

> > 2)read the contextCSN for any database on any server
> > anonymously, but, due to this, requires absolutely no configuration. For
> > use with Hobbit, it just needs to be run on the hobbit server, and any
> > host in the bb-hosts file just needs 'ol'. Of course, the hobbit server
> > needs to be able to access all the LDAP servers involved.
>
> In my script the default binding is anonymous as well. I just wanted to
> have the option to bind with a specific dn.
>
> > You may want to take a look, so a user of your script doesn't need to
> > provide the URIs, but instead can just provide the server to check.
> >
> > http://www.zarb.org/~bgmilne/hobbit/
> >
> > At present, it only goes yellow (not red), since there's no real way to
> > determine if the server being 3 months behind (ie you catch the 30 second
> > perion it takes to replicate the first change to one database in 3
> > months) is severe enough for an error .. but it does show how far ahead
> > (which could indicate checkpointing/recover problems on the master) or
> > behind the slave is (so you don't have to compare contextCSNs in your
> > head).
>
> I will take a look at your script.
>
> > I could take a look at making it work for nagios, but we're phasing
> > nagios out, and the only LDAP servers monitored for anything by nagios
> > don't use sync-repl.
>
> I am curious, what are you going to replace Nagios with?

Some proprietary monitoring infrastructure, which will be standard across the 
entire organisation and will cater to notification, escalation etc etc, and 
internally we will use Hobbit (to monitor aspects which are not going to be 
escalated out of our environment, and for capacity planning purposes).

Note that some of the decisions are based on politics, and how our Nagios 
implementation is managed, and the fact that we could roll out the 
performance/capacity monitoring features of Hobbit (including this script 
with graphing via Hobbit's ncv collector) in less time than anyone 
maintaining the Nagios installation could show that it could be done for even 
basic checks.

AFAIK, most people using Nagios also use a 2nd tool for graphing/trending 
purposes (ie cricket), we'd prefer to just have one that does both aspects 
just as well (or, in some cases, better in our environment).

Regards,
Buchan

-- 
Buchan Milne
ISP Systems Specialist
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)

Attachment: pgpAG0eAtQkKX.pgp
Description: PGP signature