[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Memory leak or index corruption?



> This has been a problem for the last handful of releases (through
> 2.1.17).
>
> I run a simple "whitepages" server: simple authentication (no SASL,
> etc.) for a small number of updates performed daily on the local
> machine by the directory manager.  All outside queries use anonymous
> binds (virtually all coming from a CGI script using Net::LDAP).
>
> Solaris 8, gcc 3.2, Berkeley DB 2.1.15, configuration:
>    --enable-wrappers --enable-phonetic --enable-rewrite
>    --enable-meta --enable-ldap --enable-rlookups --enable-ldbm
>
> For BDB I've set cachesize in DB_CONFIG to 50 MB.  The .bdb files
> themselves consume approx. 300 MB.
>
> There is a single one-way suffix rewriting rule to accomodate an
> old-style search base (<OLDBASE>):
>
>    database meta
>    suffix "ou=<OLDBASE>"
>    lastmod off
>    uri "ldap:///ou=<OLDBASE>"
>    rewriteEngine on
>    rewriteContext default
>    rewriteRule "(.*)<OLDBASE>" "%1<NEWBASE>" ":"

Since you're using only one target URI, I suggest you try
using back-ldap and see if the problem is still there.
I haven't tested back-meta (and back-ldap) rewriting
stuff much intensively, so the might be some leak.

The fact you notice the indices are not being used by
the data storage backends after a while seems to move
the problem downstream, but I assume that part of the
suite is much more intesively stresses by average users.

I also suggest you turn off the unused rewriteContexts,
e.g. explicitly set

rewriteContext searchResult
rewriteContext matchedDn

To increase performances, you may want to --enable-local
and use

URI ldapi:///ou=<OLDBASE>

>
> A couple of access-control entries attempt to keep some information
> local:
>
>    access to dn="<NEWBASE>" attrs=<attr1>,<attr2>,...
>          by peername="^IP=XXX\.YYY\."            read
>          by peername="^IP=127\.0\.0\.1:[0-9]+$$" read
>          by anonymous                            search
>    access to *
>          by * read

Again, I assume these ACLs apply on the data storage backend
and not on the gateway.

>
> The problem:
>
> On starting slapd, memory will climb a bit and level off at somewhere
> around 60MB-100MB.  After approx. one day (about 20k-40k queries),
> memory use starts to expand and then queries become very slow (an
> execution trace indicates that indexes are no longer being used).  If it
> continues, the system eventually exhausts VM and slapd crashes.
>
> A dump of a live slapd's memory map while this is happening shows that
> the 'leak' is on the heap.
>
> This happens whether I build the db/run with bdb or ldbm (the 'other'
> BDB). I do note that not all query connections log an UNBIND before
> terminating the connection.

This might be a problem, since back-{ldap|meta}
do not free stuff until requested by the client.

I think some development has been undertaken recently
(by Howard) concerning connection pooling and reusing.
It's in HEAD code, if you feel like trying it ...

>
> Before I dive in to add/modify a bunch of memory debugging test code
> (with time I don't have at the moment), has anyone seen this behavior?
> Any hints for working around the problem?

You may try first with some memory profiling software
(e.g. fnccheck); if there are systematic, repeatable
leaks you should be able to trace them shorlty; be sure
you try all the types of queries your system receives.

Please keep us informed about your findings.

p.

-- 
Pierangelo Masarati
mailto:pierangelo.masarati@sys-net.it