[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl failure monitoring



--On Thursday, April 19, 2007 3:47 PM -0700 Donn Cave <donn@u.washington.edu> wrote:

We use slurpd, and I have gone to some pains to make our home-grown
service monitor software check the replication files, on the master
hosts, so we have timely notification when replication has stalled.

How do sites that use syncrepl do this?

Buchan Milne made a nice plugin to monitor replication status for syncrepl as a hobbit plugin. I bastardized his nice script to make it work for me with nagios.


For example, my new replica is failing right away.  I can see it in
the master syslog:  a bind, a search for * +, then a search result
with err=3.  On the replica side, however - not a peep.

After a little tinkering, I can get "do_syncrep2 result: rid=101 Timed
out",
but that requires changes to the code.  This exercise convinced me that
the syncrepl engine isn't supposed to syslog success or failure of its
queries, presumably for some good reason and there must be a better way
to diagnose problems.

The monitoring objective is to verify that the server is either synched,
or is making satisfactory progress in that direction.  Is there a good way
to monitor the state of that syncrepl thread?

Yes, it is quite simple. Once merely looks at the contextCSN values at the root of the database on both the master and slave. If they match, things are in sync. If they don't, they aren't.


--Quanah



--
Quanah Gibson-Mount
Senior Systems Software Developer
ITS/Shared Application Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html