[Date Prev][Date Next] [Chronological] [Thread] [Top]

malloc: Cannot allocate memory



Hey all,

With openldap 2.3 in production for several weeks now, I experienced my first crash of slapd today. I have my loglevel at 256 and was able to find this in the logs.

Started with a bunch of these:
Nov 2 18:00:41 ldap1 slapd[1525]: bdb(dc=fuse,dc=net): malloc: Cannot allocate memory: 377


Finally ended with:
Nov 2 18:00:41 ldap1 slapd[1525]: ch_calloc of 1 elems of 392 bytes failed


I then was paged and restarted slapd and it was able to recover the db automatically. Nice feature by the way in 2.3! Its running smooth again since then.

Now, I had another ldap server go down yesterday and the system completely died, just beeps when you power it on. Dell said that means it has no memory, so either all 4 memory chips died or the motherboard died. They are sending someone over tomorrow. Because of this fact, it makes me suspect that perhaps they shipped those servers with a batch of bad memory. But just in case, that isn't the case, I was hoping someone had some suggestions for me.

Here is some relevant info.

The DB has about 400,000 dn's, each with about 5 attributes in them. I'm running openldap 2.3.7 w/ a syncprov.c patch and bdb 4.2 w/ 4 patches from sleepycat and one from the openldap dist. I'm running this on a FreeBSD 5.4 machine with 2 2.8G CPUs and 2G of ram. I built the distribution from source. The machine that died is a syncrepl slave machine using refreshandpersist.

From 12:00AM to 8:00PM today I've had 48,612 connections to that machine.
They are mostly just simple binds, followed by an equality search on an indexed attribute.

In slapd.conf I have the following.

The backend is bdb

Indexing 6 attributes with eq
index   objectClass     eq
index   uid             eq
index   radiusGroupName eq
index   accountNumber   eq
index   entryUUID       eq
index   entryCSN        eq

cachesize       100000
idlcachesize    300000
checkpoint      1024 5

Now that I think about it, that 100,000/300,000 might be a bit high. What do you think? Could that cause the memory error? I'm not sure if this cache is related to the cachesize in DB_CONFIG - I'm assuming this uses extra memory outside the DB_CONFIG size. Is that correct?

My DB_CONFIG file has the following:

set_cachesize 0 536870912 1
set_lg_regionmax 1048576
set_lg_bsize 2097152
set_lg_max 10485760
set_flags DB_LOG_AUTOREMOVE

Since I've got 2G of RAM, I could up that 512M cachesize higher if needed and mayble lower the cachesize in slapd.conf. Do you think that would help?

Here is top, sorted by res about an hour after the restart:
Mem: 217M Active, 1051M Inact, 164M Wired, 80M Cache, 112M Buf, 492M Free
Swap: 1024M Total, 1024M Free
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND
33669 ldap 20 0 756M 395M kserel 0 0:25 0.00% 0.00% slapd


Any suggestions?

Unfortunately, this is a production machine and one of my other slaves is in the shop, so there isn't much room for experimentation.

I was hoping someone had some insight to perhaps the slapd.conf cachesize/idlecachesize settings and the DB_CONFIG settings. Think there is anything I could try to help with an issue like this? Any ideas of why this could have happened if its not bad memory in the machine?

Any help/advice/suggestions/etc... is appreciated.

-Dusty Doris