[Date Prev][Date Next] [Chronological] [Thread] [Top]

Memory leak in 2.4.31 w/ hdb and MMR?



I've got my two-node setup up and running, MMR w/ delta-syncrepl. Node 1 is up and running with ~340k entries in the main DIT. It consumes about 4G of VM on a Redhat AS6 box, and the data dir is around 2.3G on-disk (including __db.* and the one log.* file...), and another 100M in cn=accesslog.

I'm bringing up node 2 after nuking the data dir, and letting it syncrepl from nothing. I was tracking a SIGBUS error, but that turned out to be olcLogLevel=Stats+Sync generating a huge amount of logging and filling up the disk. Fixed that, moved on.

Now, it's still crashing... but that's because slapd is bloating up hugely and causing the machine to run out of VM and kill the process. I've added about 8G of temporary swap, and it's still going. slapd on node2 is 5.5G resident, and nearly 15G in size total now. On-disk the hdb is only around 1.6G, so it still has a long ways to go.

Can I assume this is not the way it should be?

vmstat shows a lot of swap-out but almost no swap-in, so it's not thrashing. pmap shows a large number of 64M "anon" segments. 140 of those at one point, and 157 when I checked just now.

It looks a lot like a memory leak, though I can't tell offhand whether the problem is in OpenLDAP (2.4.31) or in BerkeleyDB (5.3.15). When it finishes, I'm planning on turning off delta-syncrepl and pave/rebuild again, and see if it behaves the same. I could also give mdb a shot, but since this is a mirror I'd have to rebuild both sides.

Any suggestions as to where I could start looking for the source of the problem? Obviously I'm not planning on rebuilding this node on a regular basis (and certainly not all via syncrepl) but I'm concerned that over an extended period of time it'll leak memory even during normal use.

I'm including my DB_CONFIG, just in case. I can supply more of the config as necessary.

DB_CONFIG:

set_cachesize 0 536870912 0
set_lg_regionmax 10485760
set_lg_max 104857600
set_lg_bsize 26214400
set_lk_max_locks 4096
set_lk_max_objects 4096
set_flags DB_LOG_AUTOREMOVE


(If anything looks particularly stupid in here, even unrelated to the leak, I'd love the advice...)