[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapd eating up resource on 2.3.14

To: "Allan E. Johannesen" <aej@WPI.EDU>
Subject: RE: slapd eating up resource on 2.3.14
From: "Allan E. Johannesen" <aej@WPI.EDU>
Date: Fri, 6 Jan 2006 14:27:44 -0500
Cc: "Spicer, Kevin" <KevinS@bmrb.co.uk>, "Quanah Gibson-Mount" <quanah@stanford.edu>, <openldap-software@OpenLDAP.org>
In-reply-to: <17341.37660.897928.649679@ccc5.wpi.edu>
References: <BC47B6F911DA5744B6E05FFBCC59C80C03DFE034@ukealwxc003.emea.group.local> <17341.37660.897928.649679@ccc5.wpi.edu>

>>>>> "aej" == Allan E Johannesen <aej@WPI.EDU> writes:

aej> 2.3.15 also runs fine for the first thousand queries, but then bogs down.
aej> There were several changes to ...bdb/cache.c in 2.3.14, and that file is
aej> identical in 2.3.15.  I sort of suspect the problem is there, but I
aej> haven't the knowledge of the internals, nor the knowledge of Berkeley db
aej> locking, to figure it out.

I put a bunch of "Debug()" displays into servers/slapd/back-bdb/cache.c in
2.3.13 and 2.3.14.

At about a thousand client queries, 2.3.14 shows a loop in
bdb_cache_lru_purge(), a new routine, which was mainly excised from
bdb_cache_lru_add() in 2.3.13.

Since I only put the Debug()'s in after lock() and unlock() calls, I only see
that activity.  At this point in 2.3.14, there are 935 (in my test) repetitions
of

bdb_cache_lru_purge: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_purge: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )

A similarly long loop does not appear in the test run of 2.3.13.  There are
occasional single appearances of the pairs

bdb_cache_lru_add: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_add: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )

in the 2.3.13 run, but not a long repetition like the 2.3.14 run showed.  After
that loop, response is slow and slapd eats lots of CPU.

In the overall debug output (slapd -d1), the first call to the routine appears
at about line 101,000 in both cases, but in the 2.3.14 case, there is a loop of
lock/unlocks, but only single instances sprikled through the 2.3.13 case.

Unless there was a decision to change behavior, which would result in this
different activity, I think there's some sort of problem in the development of
bdb_cache_lru_purge() from the former 2.3.13 source...

Follow-Ups:
- RE: slapd eating up resource on 2.3.14
  - From: Quanah Gibson-Mount <quanah@stanford.edu>

References:
- RE: slapd eating up resource on 2.3.14
  - From: "Spicer, Kevin" <KevinS@bmrb.co.uk>
- RE: slapd eating up resource on 2.3.14
  - From: "Allan E. Johannesen" <aej@WPI.EDU>

Prev by Date: RE: slapd eating up resource on 2.3.14
Next by Date: Re: OpenLdap stopped with database of 3 000 000 entries
Index(es):
- Chronological
- Thread