[Date Prev][Date Next] [Chronological] [Thread] [Top]

BDB malloc problems and DB corruption



Hello everyone,

This may be more of a problem with my BDB setup but I thought I'd check
here first since the application that's suffering is OpenLDAP.  Here is
my setup:

RHEL 3 AS (2.4.21-37.ELsmp)
OpenLDAP 2.2.29
BerkeleyDB 4.2.52 + patches
Cyrus-SASL-2.1.21
Heimdal 0.7.1
OpenSSL-0.9.7g

System has 4189724672 (4GB) memory.

slapd.conf has:
checkpoint 2048 5
cachesize 50000


DB_CONFIG:
set_cachesize 2 0 1
set_lg_regionmax 262144
set_lg_bsize 2097152

set_lk_max_locks 4000
set_lk_max_lockers 4000
set_lk_max_objects 4000

/etc/sysctl.conf:
kernel.shmmax = 3221225472 (3GB)

After running for around 24 hours or so, it starts having problems and
mods don't get written:

Nov  8 01:04:45 tau slapd[10839]: conn=251627 op=181 MOD dn="(removed
from email)"

Nov  8 01:04:45 tau slapd[10839]: conn=251627 op=181 MOD attr=
(abbreviated for email) utaStudentMajor utaStudentGradePoints entryCSN
modifiersName modifyTimestamp

Nov  8 01:04:46 tau slapd[10839]: conn=251627 op=181 RESULT tag=103
err=0 text=

Nov  8 01:04:46 tau slapd[10839]: bdb(dc=uta,dc=edu): malloc: Cannot
allocate memory: 3145764

Nov  8 01:04:46 tau slapd[10839]: bdb(dc=uta,dc=edu): txn_checkpoint:
failed to flush the buffer cache Cannot allocate memory

Nov  8 01:04:46 tau slapd[10839]: conn=251627 op=182 MOD dn="(removed
from email)"

Nov  8 01:04:46 tau slapd[10839]: conn=251627 op=182 MOD
attr=objectClass utaStudentStatus utaDiscloseInfo entryCSN modifiersName
modifyTimestamp

Nov  8 01:04:46 tau slapd[10839]: conn=251627 op=182 RESULT tag=103
err=0 text=

Nov  8 01:04:46 tau slapd[10839]: bdb(dc=uta,dc=edu): malloc: Cannot
allocate memory: 3145764

Nov  8 01:04:46 tau slapd[10839]: bdb(dc=uta,dc=edu): txn_checkpoint:
failed to flush the buffer cache Cannot allocate memory

(etc etc etc)


I recently wrote a fairly simple consistency checker that would look up
a set of DNs on all the slaves and the master and compare them.  What
concerned me was that on some of the entries, the SLAVES had the latest
version with the latest changes (verified with the our registry
database) but the master (who is the only one that can write to the
slaves) had an outdated entry (even the modify timestamp was older).

Since keeping OpenLDAP running is just one of my two dozen roles, I'll
admit that I may have set the cachesize a bit high, but I assumed it
would take that to mean "use all you want."  But with malloc failures,
I'm wondering if its a simple matter of my cachesizes being high or if
there is maybe a memory leak or something else I'm missing.

Any help or insight would be appreciated.

-- DK