[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5860) slapd memeory leak under openldap 2.4



Howard,

I download the new HEAD version and made some testing. Now the olmBDBEntryCache is following the cache configuration.

I also made some small change in slapd.conf including cachefree parameter. Please see below in the end how slapd is configured related to bdb cache :

line 123 (cachesize       1000)
line 124 (cachefree     1000)
line 125 (idlcachesize    1000)
line 126 (dncachesize     1000000)

I increase the dncachesize since with a small value the search takes considerable more time.

First I tested with a value greater than the value that was being consumed before without the memory boundary. In this way I was expecting some order of magnitude as before, what was achieved (line 126 (dncachesize     5000000)):

BEFORE CACHED:
1000000

real    5m23.084s
user    0m28.026s
sys     0m6.686s

Then Cache has :

olmBDBEntryCache: 1001
olmBDBDNCache: 4000264
olmBDBIDLCache: 1

AFTER CACHED:

1000000

real    2m31.623s
user    0m28.145s
sys     0m8.637s

I would like to reinforce this test above was using a dncachesize knew that can store all DN's from my DB. This follow the same behavior as before where all DB can be stored into memory without slapd respecting the configuration boundaries(like it).

In my system I will have dn's that compose the entrance. Since my filter is for only one of the dn I was expecting this cache to be only related with the filter and then in 1,000,000.

So I'm expecting something greater than 6 minutes since for the first time system caches the information, or in other way used disk to read DB, the time was around 6 minutes.

With a cache smaller than 4 million entrances, I was expecting just a little more overhead since it would required slapd to flush cache and start to load the entrances more than once. In any case I would not expect something greater than 6 minutes since to flush memory and load from pieces should not cause too much overhead.

Only following queries would not take fully advantage, like half time at further tries, since the cache cannot keep all information.

But some strange things, by my perspective, happened. The dncache grew normally until reach the 1000000 value with cachesize following the boundary 1000 :

readOnly: FALSE
olmBDBEntryCache: 1001
olmBDBDNCache: 957271
olmBDBIDLCache: 1

But after the dncache limit is reached, by monitor, the cache became :

olmBDBEntryCache: 1
olmBDBDNCache: 1000262 (this is a very deterministic number)
olmBDBIDLCache: 1

Getting stuck here forever.

Also the performance degrades considerable. From this point on I let all night and the DB was not fully passed. This means that after the DN cache enter in restrictions, even a lower performance would be expected since disk access should be more often, it becomes unpractical.

I needed to stop the slapd so I could see how many entrances were parsed. This return a number around 250,000, meaning around 1/4 as the DN size. This appears to be that after cache boundary is reached for some reason the DB parse doesn't evolve anymore.

I lost the correct time information from my last night query running but I can send another query I did this morning with some similar information. I can say it took more than 8 hours without moving very further than 250,000 entrance(of 1,000,000).

ldap_result: Can't contact LDAP server (-1)
251512

real    121m51.697s
user    0m7.327s
sys     0m1.734s

This is very deterministic and normally get stuck around this value without much improvement. I do not believe query will end. This was the reason I needed to kill the slapd process to have an order of magnitude the search had reached since this search is never ending.

Something still looking with problems since performance is too much degraded even the disk would be more in use. Looks like search gets stuck after cache boundaries are reached.

Also looks like the entrance cache becomes 1(olmBDBEntryCache) after dn limit is reached.

Best Regards,

Rodrigo.

--- On Sun, 1/25/09, Howard Chu <hyc@symas.com> wrote:

> From: Howard Chu <hyc@symas.com>
> Subject: Re: (ITS#5860) slapd memeory leak under openldap 2.4
> To: rlvcosta@yahoo.com
> Cc: openldap-its@OpenLDAP.org
> Date: Sunday, January 25, 2009, 7:46 PM
> Rodrigo Costa wrote:
> > Howard,
> >
> > I tested the HEAD load last night. The results I
> believe are partial.
> 
> Thanks for the feedback, please try the new patch in HEAD.
> >
> > The memory stay stable and the use was much lower. I
> had in my slapd.conf the following cache boundaries :
> >
> > line 122 (cachesize       1000)
> > line 123 (idlcachesize    1000)
> > line 124 (dncachesize     2000)
> >
> > Which make with the new load to consume much less
> memory and stay stable.
> >
> > PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM   
> TIME+  COMMAND
> > 20715 ldap      18   0  184m  73m  68m S    0  0.6
> 174:05.81 slapd
> >
> > But by the other hand I use the monitor information to
> check the cache usage. I would notice that now dncache is
> keeping around the boundary I put of 20000 but the cache is
> now fixed as 1, not 1000. See information below :
> >
> > readOnly: FALSE
> > olmBDBEntryCache: 1
> > olmBDBDNCache: 2185
> > olmBDBIDLCache: 1
> > olmDbDirectory: /var/openldap-data/bdb2/
> > entryDN: cn=Database 2,cn=Databases,cn=Monitor
> >
> > This kept around this all the time I was making a
> search in DB.
> >
> > Before this load with this possible DN cache boundary
> solution I made the ldapsearch in around 7 minutes, for the
> first time not cached yet. And around 4 minutes after the
> entrances were cached.
> >
> > Now with this new load I made the search and it took :
> >
> > 1000000
> >
> > real    174m6.486s
> > user    0m43.746s
> > sys     0m16.923s
> >
> >
> > Almost 3 hours. It's ok that I didn't tune the
> cache sizes but I was at least expecting the
> olmBDBEntryCache would now follow the 1000 boundary and not
> single 1.
> >
> > I believe this is what is not making this huge
> performance difference and not the olmBDBDNCache.
> >
> > Please confirm this is some tuning missing or if
> something else like the olmBDBEntryCache would be improved.
> >
> > In terms of memory usage now it is respecting but the
> performance, probably based in the olmBDBEntryCache only
> using 1 position, is much worst.
> >
> > Please also let me know if you think this performance
> degradation and the olmBDBEntryCache marking 1 is a tuning
> issue or something still need to be modified in the code.
> >
> > Thanks for the fast solution for this issue since this
> would be affecting all systems and types of DB.
> >
> > Rodrigo.
> 
> 
> -- 
>    -- Howard Chu
>    CTO, Symas Corp.           http://www.symas.com
>    Director, Highland Sun     http://highlandsun.com/hyc/
>    Chief Architect, OpenLDAP 
> http://www.openldap.org/project/