[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5860) slapd memeory leak under openldap 2.4



Howard,

I do not think the ITS is completely solved. Let me try to explain better.

I have the system respecting now the DN cachesize boundary. At the same time I wasn't sure about the cachefree idea and now I changed to 1, or equal the default value.

In any case what happens is :

1) I have the system doing sequential search in a reasonable speed until the dncachesize is reached;
2) After dncchesize is reached the sequential search hangs, or the output from the search get stuck for a long time(I'm forwarding to a file so I do not have screen actualization delays);
Ex:
[root@brtldp11 backup]# date; cat temp.ldif |grep -e '^pnnum*' |wc -l
Mon Jan 26 17:22:21 BRST 2009
250016
[root@brtldp11 backup]# date; cat temp.ldif |grep -e '^pnnum*' |wc -l
Mon Jan 26 17:23:00 BRST 2009
250016
See above that even almost after 1 minute passed not new LDIF entrance was included.
3) Then query get stuck and looks like deterministic it time by time only dumps 16 entrances and get stuck. This behavior repeats and with these stuck that sometime gets minutes the query never ends.

olmBDBEntryCache: 884
olmBDBDNCache: 1000261
olmBDBIDLCache: 1

olmBDBEntryCache: 611
olmBDBDNCache: 1000261
olmBDBIDLCache: 1

Even with cachesize as 1000 and cachefree as 1, the olmBDBEntryCache continues to decrease, just slow now.

I was expecting that a cachefree as 1000 would purge all entrances and then cache again all 1000 new with in sequence always answering the search. So the search would never hangs like it is happening now.

It get stuck and will never ends since it responds only 16 entrances in order of minutes and in bursts.

The previous load was more reasonable than now since even taking much longer it would end the search.

[root@brtldp11 backup]# date; cat temp.ldif |grep -e '^pnnum*' |wc -l
Mon Jan 26 17:31:36 BRST 2009
250128

Please let me know if you need more information,

Rodrigo.

--- On Mon, 1/26/09, Howard Chu <hyc@symas.com> wrote:

> From: Howard Chu <hyc@symas.com>
> Subject: Re: (ITS#5860) slapd memeory leak under openldap 2.4
> To: rlvcosta@yahoo.com
> Cc: openldap-its@OpenLDAP.org
> Date: Monday, January 26, 2009, 5:18 PM
> Rodrigo Costa wrote:
> > Howard,
> >
> > I download the new HEAD version and made some testing.
> Now the
> olmBDBEntryCache is following the cache configuration.
> 
> Good, then this ITS is resolved. Usage questions should be
> directed to the 
> -software mailing list.
> 
> > I also made some small change in slapd.conf including
> cachefree
> configuration. Please see below in the end how slapd is
> configured related to
> bdb cache :
> 
> > line 123 (cachesize       1000)
> > line 124 (cachefree     1000)
> > line 125 (idlcachesize    1000)
> > line 126 (dncachesize     1000000)
> 
> Setting cachefree equal to cachesize will effectively cause
> the entire entry 
> cache to be dumped each time it reaches its maximum size.
> That's clearly not a 
> good idea.
> 
> > I increase the dncachesize since with a small value
> the search takes
> considerable more time.
> >
> > First I tested with a value greater than the value
> that was consuming
> > before
> without the memory boundary. In this way I was expecting
> some order of
> magnitude as before, what it achieved (line 126
> (dncachesize 5000000)):
> 
> > BEFORE CACHED:
> > 1000000
> >
> > real    5m23.084s
> > user    0m28.026s
> > sys     0m6.686s
> >
> > Then Cache has :
> >
> > olmBDBEntryCache: 1001
> > olmBDBDNCache: 4000264
> > olmBDBIDLCache: 1
> >
> > AFTER CACHED:
> >
> > 1000000
> >
> > real    2m31.623s
> > user    0m28.145s
> > sys     0m8.637s
> >
> > Just to let clear these tests above where with the new
> logic and where the
> DN Cache Size is bigger than the final number I had with
> the old logic where
> all information is cached into memory.
> >
> > For each entrance I have 4 dn's that compose the
> entrance. Since my filter
> is for only one of the dn I was expecting this cache to be
> only related with
> the filter and then in 1,000,000.
> 
> > Then I change the dncachesize to 1000000 (line 126
> (dncachesize 1000000) :
> >
> > The total time for a search using a filter for one
> index dn became :
> >
> > ldap_result: Can't contact LDAP server (-1)
> > 255801
> >
> > real    469m26.619s
> > user    0m7.027s
> > sys     0m1.756s
> >
> > I needed to kill the server so I would have an idea
> about how many
> > entrances
> it searched. The time is too long and I'm not sure if
> it would even end. I let
> all night but the process did not end. This was the same
> ldapsearch as when
> all entrances would be allocated in a DN cache into memory.
> >
> > But the cache process was something I wasn't
> expecting. Before the
> dncachesize was reached I had at monitor :
> >
> > olmBDBEntryCache: 1001
> > olmBDBDNCache: 921296
> > olmBDBIDLCache: 1
> >
> > Then after the boundary was reached it became :
> >
> > olmBDBEntryCache: 1
> > olmBDBDNCache: 1000262
> > olmBDBIDLCache: 1
> >
> > Where these numbers didn't change anymore. Not
> sure why after the
> dncachesize is reached the cachesize(or olmBDBEntryCache)
> became 1.
> 
> Because you set cachefree to 1000. 1001 - 1000 = 1.
> 
> -- 
>    -- Howard Chu
>    CTO, Symas Corp.           http://www.symas.com
>    Director, Highland Sun     http://highlandsun.com/hyc/
>    Chief Architect, OpenLDAP 
> http://www.openldap.org/project/