[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapd eating up resource on 2.3.14



>>>>> "aej" == Allan E Johannesen <aej@WPI.EDU> writes:

aej> 2.3.15 also runs fine for the first thousand queries, but then bogs down.
aej> There were several changes to ...bdb/cache.c in 2.3.14, and that file is
aej> identical in 2.3.15.  I sort of suspect the problem is there, but I
aej> haven't the knowledge of the internals, nor the knowledge of Berkeley db
aej> locking, to figure it out.

I put a bunch of "Debug()" displays into servers/slapd/back-bdb/cache.c in
2.3.13 and 2.3.14.

At about a thousand client queries, 2.3.14 shows a loop in
bdb_cache_lru_purge(), a new routine, which was mainly excised from
bdb_cache_lru_add() in 2.3.13.

Since I only put the Debug()'s in after lock() and unlock() calls, I only see
that activity.  At this point in 2.3.14, there are 935 (in my test) repetitions
of

bdb_cache_lru_purge: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_purge: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )

A similarly long loop does not appear in the test run of 2.3.13.  There are
occasional single appearances of the pairs

bdb_cache_lru_add: bdb_cache_entry_db_lock( bdb->bi_dbenv, bdb->bi_cache.c_locker, elru, 1, 1, lockp )
bdb_cache_lru_add: bdb_cache_entry_db_unlock( bdb->bi_dbenv, lockp )

in the 2.3.13 run, but not a long repetition like the 2.3.14 run showed.  After
that loop, response is slow and slapd eats lots of CPU.

In the overall debug output (slapd -d1), the first call to the routine appears
at about line 101,000 in both cases, but in the 2.3.14 case, there is a loop of
lock/unlocks, but only single instances sprikled through the 2.3.13 case.

Unless there was a decision to change behavior, which would result in this
different activity, I think there's some sort of problem in the development of
bdb_cache_lru_purge() from the former 2.3.13 source...