[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: openldap hangs on bdb / database corruption

Hash: SHA1

Ingo Steuwer wrote:
| Am Fr, den 11.06.2004 schrieb Ingo Steuwer um 12:38:
| [..]
|>>|>From time to time the database seems to hang, ldapsearch gets no
|>>| and even db4.2_stat hangs at some point (needs kill -9 then). In this
|>>| cases I need to stop slapd and do a db4.2_recover.
|>>Looks like you might be exceeding 2GB of transaction logs which are
in use.
|>>| relevant configruation-parts:
|>>| slapd.conf:
|>>| ---------------------------------------------------
|>>| sizelimit               unlimited
|>>| modulepath      /usr/lib/ldap
|>>| moduleload      back_bdb.so
|>>| database        bdb
|>>| cachesize       500000
|>>| index
|>>| objectClass,uidNumber,gidNumber,memberUid,ou,uniqueMember pres,eq
|>>| index           uid,cn,sn,givenName,mail,description,displayName
|>>| pres,eq,sub
|>>| index           sambaSID,sambaPrimaryGroupSID,sambaDomainName eq
|>>| index           default sub
|>>| ---------------------------------------------------
|>>You don't appear to have a checkpoint setting, which would mean that all
|>>your transaction logs are open (or something to that effect, based on
|>>what I've seen).
|>That's true, I added "checkpoint 1024 1" now in slapd.conf. It may take
|>some days to see if it is the reason.
|>But is it true that the log-files which contains only 10MB each are held
|>open ? I thought this would be done only for the last=actual one, I
|>mean, for what reason should thy all be open ?
|>Well, as I have 211 logfiles now they are bigger than 2G all together.
| It made it worse. As long as I have the checkpoint-Option set (I tried
| also checkpoint 2048 2) I had a cpu-eating slapd for _each_ ldapmodify.
| The modifications were done, but the correspondig slapd-process goes
| mad.

Yep, if you have no transaction log settings, you will see this. Your
checkpoint time is probably a bit too low (checkpointing too often).

| strace shows me the already meantioned sched_yield()-calls, but the
| more mysterous is that such a process finishes after I try "ltrace -p"
| on it.
| Well, and after all it corrupted my database again. Back to the defaults
| and after doing some changes there is one CPU-eating slapd which will
| not finish with ltrace (it gives me countless
| "ldap_pvt_thread_yield(0xbfffdf48, 0x40009e90, 0, 0x6044a510,
| 0x40117f60)              = 0") but will also resist an
| "/etc/init.d/slapd stop" so I have to do a "kill -9". After a new start
| I get "Implementation Specific Error"s on an ldapdelete.
| So db_recover is needed again.

Yes, after any non-graceful shutdown of slapd you will likely need to
run database recovery.

But, I found that database recovery would not finish if there were more
than 2GB active transaction log files.

I ran automated tests with imports of a 250000 entry database, cluster
fail-over (approx 8 fail-overs) while running 12 simultaneous clients
for an extended period, removal of the database files, starting database
imports again, and succeeded with about 20 cycles like this without
needing catastrophic database recovery. But, I am running normal
database recovery on startup of openldap (thus during every failover).

You need to tune both the checkpoint settings, as well as the
transaction log settings to achieve the database performance you need.

For reference, the settings I was running with for the tests above were
something like:

checkpoint      1024 30
cachesize       150000

set_lg_bsize 262144
set_lg_max              2097152
set_cachesize   0       536870912       1

But, you *must* do some testing in your environment.

The tests were run on RHEL2.1 with openldap-2.1.25/2.1.29/2.1.30 on


- --
Buchan Milne                      Senior Support Technician
Obsidian Systems                  http://www.obsidian.co.za
B.Eng                                RHCE (803004789010797)
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org