[Date Prev][Date Next] [Chronological] [Thread] [Top]

syncrepl failure



Hi

I had a master/replicas setup that has been working for years. Recently,
I discovered that a replica sometime failed to return existing entries:
no error, it just returned nentries=0, for a few queries, then went back
to normal. 

I suspected some database corruption: I killed slapd, removed the
databases, restarted slapd so that it resync from master. It pulled a
few hundreds of records and then got its up to date contextCSN while a
lot of entries are still missing. Restarting slapd exhibit a stright
failure in syncrepl:
Sep 28 06:34:29 motul slapd[2901]: do_syncrepl: rid=217 rc -2 retrying

How do I debug that? I had the problem with OpenLDAP-2.4.21/
db-4.7.25.3. I tried upgrading the replica to OpenLDAP-2.4.32 /
db-4.8.30 but it did not change anything. Is there a chance that
upgrading the master (runs 2.4.21 too) will help? 

During the short time syncrepl runs, logs are filled with stuff like
this, I wonder if it is a related problem or not:
Sep 28 03:30:16 motul slapd[269]: conn=-1 op=0 => bdb_dn2id_add
dn="uid=user,ou=foo,dc=example,dc=net" ID=0x7c: put failed:
DB_LOCK_DEADLOCK: Locker killed to r
esolve a deadlock -30994 

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org