[Date Prev][Date Next] [Chronological] [Thread] [Top]

syncrepl failure monitoring



We use slurpd, and I have gone to some pains to make our home-grown
service monitor software check the replication files, on the master
hosts, so we have timely notification when replication has stalled.

How do sites that use syncrepl do this?

For example, my new replica is failing right away.  I can see it in
the master syslog:  a bind, a search for * +, then a search result
with err=3.  On the replica side, however - not a peep.

After a little tinkering, I can get "do_syncrep2 result: rid=101 Timed out",
but that requires changes to the code. This exercise convinced me that
the syncrepl engine isn't supposed to syslog success or failure of its
queries, presumably for some good reason and there must be a better way
to diagnose problems.


The monitoring objective is to verify that the server is either synched,
or is making satisfactory progress in that direction. Is there a good way
to monitor the state of that syncrepl thread?


Thanks,
	Donn Cave, donn@u.washington.edu