[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8147) syncrepl fails when encountering bad data, master leaks memory



Full_Name: Mark Bannister
Version: 2.4.30
OS: Oracle Solaris 11.2
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (205.228.82.171)


I have a master server and a replica configured with syncrepl in
refreshAndPersist mode.  I?m using bdb (not mdb), simple auth (not SASL),
standard syncrepl (not delta-syncrepl).

I noticed that the directory has about 5 erroneous entries in 300,000 where
there is a multi-valued attribute containing two identical values.  These
entries were added by slapadd -q.  Here is an example:

dn: cn=test,ou=rpc,dc=mycompany,dc=com
objectClass: oncRpc
cn: test
cn: test
oncRpcNumber: 12345678

When the replica attempts to copy this data using syncrepl from the master
server, it fails.  All entries up to that point are synchronised fine, but any
entries from that point onwards are missing.  I didn't see any log entries
telling me about this failure, although I admit I didn't look very hard or tweak
the log levels.

This then causes a memory leak in the master server:

$ while :; do ps -p 9025 -o pid,ppid,pmem,rss,vsz,args | tail +2; sleep 60;
done
9025     1  5.2 434204 463892 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.3 443216 472900 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.4 444384 474076 /usr/lib/sla - -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.5 454680 484364 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.5 458288 487972 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.6 464996 494684 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///
9025     1  5.7 472952 502636 /usr/lib/slapd -f /etc/openldap/slapd.conf -u
openldap -g openldap -h ldap:///

... until memory is exhausted, and I get:

ch_malloc of 606572 bytes failed

... followed by a 3GB core dump file.

When the server restarts, the cycle starts over again, ad infinitum until the
filesystem is full of core dump files.

When I remove the 5 duplicate attribute values and restart the master and
replica servers, the entire directory is then successfully replicated and the
memory footprint of the slapd process remains stable.  I am therefore assuming
that the duplicate attribute values were causing syncrepl not to complete and a
memory leak in the master server.

I did see that a number of memory leaks have been fixed since 2.4.30, but I
didn't see anything that looked like this profile.  Sorry I don't have a newer
version of OpenLDAP to hand to re-test.