[Date Prev][Date Next] [Chronological] [Thread] [Top]

strange performance problem Update 2



On Tue, 1 Apr 2003 xoror@infuse.org wrote:

Hi all,

I've managed to get openldap 2.1.17 working with ldbm (and bdb disabled).
All the test ran fine with ldbm. So it's somehwere in back-bdb. (i
retested it with an openldap build with only ldbm disabled.) Strangely
enough the problem can also be produced with a mix of
authentication request and filter request.

I've been digging into the bdb documentation and found that only one write
is allowed at a time. There's also a finegrained locking mechanism with
buckets described in bdb docs but somehow that isn't used by default.

I think that the locking somehow corrupt a certain index or the lock is
never released. Or something that is permanent (i.e written somwhere that
gets loaded when the server restarts) because a restart of openldap
doesn't fix the problem. I don't understand why authentication and filter
operations can 'currupt' the data. Are some of the  .bdb files beeing 
updated in case of auhthentication/filter operation ?

all this doesn't happen with ldbm.


> Hi all,
> 
> The last 2 weeks i've tried various things to try to solve the strange
> behaviour that i'm seeing. I'll try to give a short summary and the
> results of them.
> 
> 1. tried suggestions of the mailing list:
> - disk problem -> can't be i got 5GB free hdd left
> - Cache / memory prob -> i got 200 MB of free ram. (how does one set cache
> parameter for bdb4? like the cache parameters for ldbm.)
> - locking problem
> db_* utilities provided some usefull information. i've made a log of
> locking overviews during the testruns. But i don't see an explanation
> (yet) for the weird behaviour. 
> - concurrency problems with my test programms.
>  i've double checked the code and i'm pretty confident that there are no
> concurrency problem. There's only 1 shared resource and access to it is
> controlled with/by a mutex.
> 
> 2. I've upgraded from bdb 4.1.24 to 4.1.25
> this somehow helps a bit. With 4.1.25 i'm getting the weird behaviour
> after 10 testruns instead of 4/5. Somehow the new bdb seems to 'fix' this
> a bit (partially).
> 
> 3. tried to get ldbm working with openldap 2.1.x
> This failed, i don't know why exactly but my geuss is that it's a library
> problem. (i got bdb 1, bdb 3 and bdb4 installed)
> 
> 4. I installed openldap 2.0.27 on a new machine to test it with ldbm. The
> results where very surprising. 2.0.27 with ldbm doesn't show the weird
> behaviour. It also only used 20% cpu power max for 32 clients. i've run
> the tests like 80 times now, and i'm pretty confident that this version
> doesn't have the problem described below. The performance is quite good.
> the avarage testrun time was about 6 seconds for 1000 task.
> 
> I'm suspection that there's somehow something wrong in either 2.1.x or in
> bdb4 implementation. I'm doing this research as part of my master thesis.
> I'm really want to get 2.1.x working correctly since this one should have
> transactional capabilities (wich are important to the application i had
> in mind). any hints on possible problem are welcome. i'll be glad to look
> in to it. 
> 
> for now i'm digging in the bdb documentations (and changelog to see what
> they changed from .24 to .25 that can be related to my problem).
> 
> thank you for your time,
> Cuong
> 
> On Sun, 23 Mar 2003, Peter Marschall wrote:
> 
> > Hi,
> > 
> > have you only tred it with back-dbd od with back-ldbm too ?
> > Just to get an idea if it is more related to the core of slapd or to a
> > specific backend.
> > 
> > Peter
> > 
> > 
> > 
> > On Sunday 23 March 2003 14:47, Cuong bui wrote:
> > > Hello all,
> > >
> > > I'm doing some research on openldap to see if it's suitable for a certain
> > > kind of application. This application requires more write operations from
> > > the ldap server then an 'average' usage of ldap.
> > >
> > > For this purpose i've written an multithreaded application to simulate
> > > simultaneous access to the ldap server. This application creates an
> > > variable number of threads that simulate an client. There's a central queue
> > > of task that have to be performed. The queue has 3 kind of task (for now).
> > >
> > > 1. filter task. Simple search on a certain key
> > > 2. modify task. modification of a 'record'.
> > > 3. authentication task. (simple bind and unbind, simple authentication for
> > > now)
> > >
> > > The task queue is built in such a way that the type of tasks are mixed with
> > > eachother.
> > >
> > > i've generated a test dataset of 50.000 records and the .bdb files have
> > > been copied to an other location (ofcourse  when the server is offline) in
> > > order to restore data (instead of regenerating the test dataset, it can now
> > > be copied back).
> > >
> > > The testruns were very promising. for instance with 1000 tasks (300 filter,
> > > 600 modification, 100 authentication) and 32 threads only took 9 seconds to
> > > complete. However performing this same test a few times gives me a rather
> > > surprise effect. the first 4 runs all took about 9-13 seconds to complete.
> > > The 5th run and up takes up to 5 mins (!) or more.  Simple searches with
> > > ldapsearch takes up a few mins (after 4 testruns). (before that they only
> > > took a fraction of a second).
> > >
> > > during the first 4 testruns the openldapserver used 98% cpu processing
> > > time. (seen in top) After the initial 4 runs, slapd only consumes 40%-75%
> > > cpu processing time. somethimes it's even dropping to a few %.
> > >
> > > The only way to get slapd working 'normal' again, is to shutdown the server
> > > and restore the dataset. I have defined an index on  the field that i'm
> > > filtering on. (without it searches will take a very long time to complete).
> > >
> > > The test server consist of a pentium 4 2,53 ghz running gentoo linux with
> > > kernel 2.4.20 and ultra dma (enabled) hdd   with 512 MB ram. Openldap
> > > 2.1.16 was running on this machine. (but also tested with 2.1.12 and 2.1.15
> > > they all gave this result) BDB was chosen as backend for this test.)
> > >
> > > The testclient consist of an pentium 4 1,8ghz with gentoo linux running
> > > kernel 2.4.19. The 2 systems were connected on a 100 mbit (switched)
> > > network. All the software were compiled with g++ 3.2.1
> > >
> > > Has anyone an explanation for this behaviour ? Is this a known problem ?
> > > All hints are welcome.
> > >
> > >
> > > Thank in advance (and for your time)
> > > Cuong
> > >
> > > btw: are there any documentation on implementations details of openldap ?
> > > if yes, where can i find them :) Other documentation that might explain
> > > what i'm seeing is also (very) welcome.
> > 
> > -- 
> > Peter Marschall
> > eMail: peter@adpm.de
> > 
> > 
> 
>