[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: SLAP - memory leak ? (ITS#250)



>Martin Hofbauer Bacher Systems EDV wrote:
>> 
>> The problem escalated this night, The Server has to be rebooted,
>>  so I have to continue check/analyse this problem today !
>> 
>>  What I found out is that I use the "virtusertable" of sendmail as a LDAP search
>> "(|(mail=sn.gn@xxx.at)(mailalternateaddress=sn.gn@xxx.at))"
>> and that filter  produces heavy load .

do both mail and mailalternativeaddress have equality indices...

>> checked with ldapsearch it increases also the heap, but
>> not every time  !
>> 
>> After about 2 Hours of "normal" runnina after rebootg, the problem started
>> again, but I was logged in and saw following errors:
>> 
>> Aug 14 13:44:49 mail sendmail[16788]: NAA16788: SYSERR(root): 421 Error in
>> ldap_search_st using (|(mail=sn.gn@xxx.at)(mailalternateaddress=sn.gn@xxx.at))
>> in map virtuser: Interrupted system call
>> 
>> After this error no such filter worked with "ldapsearch" any more
>> only simple filters was ok like "mail=sn.gn@xxx.at"
>> 
>> slapd restart did not help !
>> 
>> What I have changed now:
>> Now I am running sendmail with this simple filter and the
>> systems need a lot fewer recources ( cpu : 0,3% - 0,5% instead of 5-10% )
>>
>> seams stable now !
>> 
>> Any new ideas ( I did not installed the nothread version till now !) ?

It's hard to tell if the nothread version or the change in filter
improved your stability.  You should test one change at a time.
The drop in load could be cause only it's taking twice as long to
service the request.  That is, without threads, there is no concurrency
to push the load up.

If there was a memory like in the compound filter code, I would think
you would see a steady rise in memory use as new pages were needed.
Using ps or such, you would see the page count slowily increase.

However, you stated before are seeing much larger jumps (megabytes).
The only thing I could think of (in OpenLDAP proper) that could
cause large jumps is loading of new entries into the cache.  However,
you said your cache was large enough to hold all entries.  So, unless
you are adding/modify entries as well, I don't see where the growth
would come from within OpenLDAP (or the DB libraries). 

Another possibility is the the growth is not in the heap, but in
the stack.   If you monitor the thread count over time, you should
see stack growth each time the count reaches a new high.  This
is because the thread implementation must allocate new pages
(instead of reusing old pages from a prior thread) for the new
thread.   It's likely that thread code allocates a sizeable
chunk of stack space in such cases (likely multiple megabytes).

Kurt

>> Thank you
>> 
>> martin
>> 
>> On Fri, 13 Aug 1999, Kurt D. Zeilenga wrote:
>> 
>> > Martin Hofbauer Bacher Systems EDV wrote:
>> > >
>> > > On Fri, 13 Aug 1999, Kurt D. Zeilenga wrote:
>> > >
>> > > > At 08:43 AM 8/13/99 GMT, you wrote:
>> > > > >slapd has to be restarted every few hours, because it grows very heavy in
>> > > > >memory
>> > > > >
>> > > > >2800 entries
>> > > >
>> > > > Is it just cache growth?
>> > >
>> > > No, the size ( shown by "top" ) is stable for about 20 min.
>> > > then "SIZE" grows for about 6MB it one step !
>> >
>> > It could be that the cache grew because a larger set of entries where
>> > search for or with a filter that required additional indices to be loaded
>> > (and cached).  It could be that some additional entries where added as well.
>> > It could be that a temporary larger demand of was placed upon the server
>> > (by more clients and/or more concurrent operations) causing the server to
>> > demand more resources from the operating system.
>> >
>> > > > >What can I do to determine, why it grows ?
>> >
>> > First, you need a controlled environment.  (ie: control over exactly clients
>> > interact with the server).  It is extermely difficult to diagnose problems
>> > when the server is in general use.   Second, you need appropriate tools
>> > to track heap usage.  These can be expensive.
>> >
>> > > > Start slapd, check size.
>> > > > Do a search that matches all entries in your database, check size.
>> > > > Repeat search, check size.
>> > > > Repeat search, check size.
>> > > > Do other operations, check size.
>> > >
>> > > No change of Mem. SIZE ( checked via "top" )
>> >
>> > That's a good sign.  Also, I noticed that your top shows what I assume
>> > is a thread count.  You should watch this count as well.  We had huge
>> > problems with LWP leaking resources (whole threads).  Pthreads (which,
>> > I believe, set on top of LWP) seem to behave better, but...
>> >
>> > You might try --without-threads and see if you suffer similar heap
>> > growth.  If --without-threads is stable, you might using an alternative
>> > pthread implementation (GNU pth, FSU threads, or the like).
>> >
>> > > > You should see one large dump after doing the first search
>> > > > as the cache is being prep'ed.
>> >
>> > You might try prep'd index caches as well.  That is, do a search that
>> > matches all objects followed by searches that use each and every index
>> > you have.  Include a one level search (for id2children) and a search
>> > with "dn" filter.  This will give you a baseline to measure from.
>> >
>> > Note, you can easily grow beyond this baseline without adding/modify
>> > entries... just kick off a couple dozen concurrent searches.
>> >
>> > > 2.) After my initial request for help for this issue
>> > >         I got the "common errors" mail, also explaining
>> > >         how to "./configure" under Solaris 2.x
>> > >         That is different as I used ( see my init. bug report)
>> >
>> > There are a number of differnet approaches in getting configure to
>> > accept pthreads under Solaris.  But they all generally lead to the
>> > same set of libraries being linked in.
>> >
>> > > 3:) "sendmail" is the client, which uses this daemon most.
>> > >         Due to the new relase (OpenLDAP 1.2.6) I decided
>> > >         to use this version, but sendmail was compiled 2 weeks
>> > >         before linked against OpenLDAP 1.2.4
>> > >         ( the same with "pam_ldap" and "nss_ldap" )
>> > >         Could this be a problem.
>> >
>> > The client SDK hasn't changed in quite some time.  Your clients should
>> > be fine.
>> >
>> > > 4.) I have seen  ( can not remember where ) information of a memory leak BUG
>> > >      after the DBM backend has been modified.
>> >
>> >
>> > >         I seems so to me here in a similar case.
>> > >         After we build the DBM backend from scratch ( "ldif2ldbm" )
>> > >         the daemon run without problems for about 2 days.
>> > >         ( But I didn't really checked the daemon very often )
>> > >
>> > >         should I switch to "gdbm" ?
>> >
>> > I don't recommend GDBM.   I do recommend Berkeley DB 2.7.5 (and nothing less).
>> >
>> > Kurt
>> >
>> 
>> -------------------------------------------------------------------
>> Martin Hofbauer                                       IT-Consulting
>> phone : +43 (1) 60 126-34                   Bacher Systems EDV GmbH
>> fax   : +43 (1) 60 126-4                         Wienerbergstr. 11B
>> e-mail: mh@bacher.at                         A-1101 Vienna, Austria
>> --
>
>