[Date Prev][Date Next] [Chronological] [Thread] [Top]

misconfigured read-only replica causes master slapd to crash



Hi,

We use a single master and two read-only replicas; we use back-bdb on
all systems. Each read-only replica replicates from the master with
syncrepl, configured to refreshAndPersist. During a particularly heavy
update load recently, replication on one of the read-only replicas
started to fail due to a misconfigured DB_CONFIG. The replica wrote the
following messages to its log repeatedly:

  Dec 14 04:01:08 pip-dev slapd[12645]: bdb(dc=csupomona,dc=edu): Lock table is out of available lock entries
  Dec 14 04:01:08 pip-dev slapd[12645]: => bdb_idl_delete_key: c_get failed: Cannot allocate memory (12)
  Dec 14 04:01:08 pip-dev slapd[12645]: conn=-1 op=0: attribute "memberUid" index delete failure
  Dec 14 04:01:08 pip-dev slapd[12645]: null_callback : error code 0x50
  Dec 14 04:01:08 pip-dev slapd[12645]: syncrepl_entry: rid=001 be_modify failed (80)
  Dec 14 04:01:08 pip-dev slapd[12645]: do_syncrepl: rid=001 rc 80 retrying

as it tried and failed to start replication again.

Shortly after, the master slapd crashed, writing nothing to its log
indicating why (or even referencing the crash at all). We initially
noticed this behavior with a 2.4.26 master and a 2.4.28 read-only
replica (we came upon this problem while performing some maintenance,
which is why there's a version mismatch). I reproduced the problem on a
2.4.28 master while researching ITS #7113 [1] (which describes this
problem more precisely and in more detail).  Has anyone else run into
this issue? Is there a good way to insulate the master slapd from
misconfigured replicas? Our replicas shouldn't break like this (we've
tuned our DB_CONFIG to ensure that they don't in the future), and
hopefully slapd can be modified so that the master doesn't crash even if
replicas do break, but we'd rather not have to worry about our master
crashing if our DB_CONFIG proves inadequate in the meantime.

[1] http://www.openldap.org/its/index.cgi/Incoming?id=7113

Thanks for any help,
-- 
Kevan Carstensen                        <kacarstensen@csupomona.edu>
Operating Systems Analyst, I&IT Systems, Cal Poly Pomona