[Date Prev][Date Next] [Chronological] [Thread] [Top]

Memory leak or index corruption?



This has been a problem for the last handful of releases (through
2.1.17).

I run a simple "whitepages" server: simple authentication (no SASL,
etc.) for a small number of updates performed daily on the local
machine by the directory manager.  All outside queries use anonymous
binds (virtually all coming from a CGI script using Net::LDAP).

Solaris 8, gcc 3.2, Berkeley DB 2.1.15, configuration:
  --enable-wrappers --enable-phonetic --enable-rewrite
  --enable-meta --enable-ldap --enable-rlookups --enable-ldbm

For BDB I've set cachesize in DB_CONFIG to 50 MB.  The .bdb files
themselves consume approx. 300 MB.

There is a single one-way suffix rewriting rule to accomodate an
old-style search base (<OLDBASE>):

  database meta
  suffix "ou=<OLDBASE>"
  lastmod off
  uri "ldap:///ou=<OLDBASE>"
  rewriteEngine on
  rewriteContext default
  rewriteRule "(.*)<OLDBASE>" "%1<NEWBASE>" ":"

A couple of access-control entries attempt to keep some information
local:

  access to dn="<NEWBASE>" attrs=<attr1>,<attr2>,...
        by peername="^IP=XXX\.YYY\."            read
        by peername="^IP=127\.0\.0\.1:[0-9]+$$" read
        by anonymous                            search
  access to *
        by * read

The problem:

On starting slapd, memory will climb a bit and level off at somewhere
around 60MB-100MB.  After approx. one day (about 20k-40k queries), memory
use starts to expand and then queries become very slow (an execution trace
indicates that indexes are no longer being used).  If it continues, the
system eventually exhausts VM and slapd crashes.

A dump of a live slapd's memory map while this is happening shows that the
'leak' is on the heap.

This happens whether I build the db/run with bdb or ldbm (the 'other' BDB).
I do note that not all query connections log an UNBIND before terminating
the connection.

Before I dive in to add/modify a bunch of memory debugging test code
(with time I don't have at the moment), has anyone seen this behavior?
Any hints for working around the problem?

Thanks!
Tom.