[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl w/ 2.3 and bdb 4.3 working great

The problems I had with BDB 4.3 (21 & 27) were related to data loading via slapadd (and the use of IN-MEMORY logs). The "-q" function in 2.3 removes the necessity of the IN-MEMORY logs. The other issues seen in 21 & 27 were reported by other people, and may well have been resolved in 4.3.28.

I did run into some snags testing openldap 2.3.7 w/ syncprov.c from HEAD and BDB 4.2 or 4.3 today when using hdb as the backend.

When I initially start up the slapd, it starts no problem.  But then
something happens during the initial search that takes some time.  If I do
an initial single search and wait for the response, it takes about 10
seconds and then each search after that point is immediate.

The problem is if I start a bunch of searches when the initial load up
happens.  In my case the searches are generated from a radius server
and they are aborted after a few seconds.  The slapd process slowly creeps
up over 100% when running top and I get a bunch of these in the logs

Sep 13 14:07:12 ldap1 slapd[5771]: connection_input: conn=113 deferring
operation: too many executing
Sep 13 14:07:14 ldap1 slapd[5771]: connection_input: conn=113 deferring
operation: pending operations
Sep 13 14:07:16 ldap1 slapd[5771]: connection_input: conn=113 deferring
operation: too many executing

It seems that slapd never catches up to what it is trying to do and I
don't get a response back.

The first thing I did is shut it down and then rebuilt one of the machines
on BDB 4.2 w/ 4 patches from sleepycat and one patch from openldap.  I
then re-tried and found the same problem.

Then I switched to bdb as the database backend on the 4.2 BDB install and
the problem went away.  I then switced to bdb on the 4.3 install and the
problem also went away.

When having the problem, slapd didn't core and I couldn't find anything
extremely interesting in debug mode minus the error above.  The one thing
I did notice is that if I search under part of the tree that has only a
few entries, I don't have this problem.  Its when the search is under a
part of the tree with a couple hundred thousand entries, I see this.

If anyone would like me to try again running under gdb or a certain debug
mode or send the logs or anything, just let me know.

For now, I have elected to use bdb as the backend as its working great
with my testing.

BTW - with bdb as the backend instead of hdb, I the same test as I was before for using syncrepl and compared the two database to the master. They all checked out, so replication still looks fine in either of these scenarios for me (bdb 4.2 or 4.3 w/ bdb as the backend).