[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: openldap hangs on bdb / database corruption

To: Ingo Steuwer <steuwer@univention.de>
Subject: Re: openldap hangs on bdb / database corruption
From: Buchan Milne <bgmilne@obsidian.co.za>
Date: Fri, 11 Jun 2004 16:11:47 +0200
Cc: openldap-software@OpenLDAP.org
In-reply-to: <1086957486.19168.115.camel@anton.knut.univention.de>
References: <1086941158.19170.72.camel@anton.knut.univention.de> <40C97D37.605@obsidian.co.za> <1086950298.19176.94.camel@anton.knut.univention.de> <1086957486.19168.115.camel@anton.knut.univention.de>
User-agent: Mozilla Thunderbird 0.6 (X11/20040609)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ingo Steuwer wrote:
| Am Fr, den 11.06.2004 schrieb Ingo Steuwer um 12:38:
| [..]
|
|>>|>From time to time the database seems to hang, ldapsearch gets no
answers
|>>| and even db4.2_stat hangs at some point (needs kill -9 then). In this
|>>| cases I need to stop slapd and do a db4.2_recover.
|>>|
|>>
|>>Looks like you might be exceeding 2GB of transaction logs which are
in use.
|>
|>[..]
|>
|>>| relevant configruation-parts:
|>>|
|>>| slapd.conf:
|>>| ---------------------------------------------------
|>>| sizelimit               unlimited
|>>| modulepath      /usr/lib/ldap
|>>| moduleload      back_bdb.so
|>>|
|>>| database        bdb
|>>|
|>>| cachesize       500000
|>>| index
|>>| objectClass,uidNumber,gidNumber,memberUid,ou,uniqueMember pres,eq
|>>| index           uid,cn,sn,givenName,mail,description,displayName
|>>| pres,eq,sub
|>>| index           sambaSID,sambaPrimaryGroupSID,sambaDomainName eq
|>>| index           default sub
|>>| ---------------------------------------------------
|>>|
|>>
|>>You don't appear to have a checkpoint setting, which would mean that all
|>>your transaction logs are open (or something to that effect, based on
|>>what I've seen).
|>
|>That's true, I added "checkpoint 1024 1" now in slapd.conf. It may take
|>some days to see if it is the reason.
|>
|>But is it true that the log-files which contains only 10MB each are held
|>open ? I thought this would be done only for the last=actual one, I
|>mean, for what reason should thy all be open ?
|>
|>Well, as I have 211 logfiles now they are bigger than 2G all together.
|>
|
|
| It made it worse. As long as I have the checkpoint-Option set (I tried
| also checkpoint 2048 2) I had a cpu-eating slapd for _each_ ldapmodify.
| The modifications were done, but the correspondig slapd-process goes
| mad.

Yep, if you have no transaction log settings, you will see this. Your
checkpoint time is probably a bit too low (checkpointing too often).

| strace shows me the already meantioned sched_yield()-calls, but the
| more mysterous is that such a process finishes after I try "ltrace -p"
| on it.
|
| Well, and after all it corrupted my database again. Back to the defaults
| and after doing some changes there is one CPU-eating slapd which will
| not finish with ltrace (it gives me countless
| "ldap_pvt_thread_yield(0xbfffdf48, 0x40009e90, 0, 0x6044a510,
| 0x40117f60)              = 0") but will also resist an
| "/etc/init.d/slapd stop" so I have to do a "kill -9". After a new start
| I get "Implementation Specific Error"s on an ldapdelete.
|
| So db_recover is needed again.
|

Yes, after any non-graceful shutdown of slapd you will likely need to
run database recovery.

But, I found that database recovery would not finish if there were more
than 2GB active transaction log files.

I ran automated tests with imports of a 250000 entry database, cluster
fail-over (approx 8 fail-overs) while running 12 simultaneous clients
for an extended period, removal of the database files, starting database
imports again, and succeeded with about 20 cycles like this without
needing catastrophic database recovery. But, I am running normal
database recovery on startup of openldap (thus during every failover).

You need to tune both the checkpoint settings, as well as the
transaction log settings to achieve the database performance you need.

For reference, the settings I was running with for the tests above were
something like:

#slapd.conf
checkpoint      1024 30
cachesize       150000

#DB_CONFIG
set_lg_bsize 262144
set_lg_max              2097152
set_cachesize   0       536870912       1

But, you *must* do some testing in your environment.

The tests were run on RHEL2.1 with openldap-2.1.25/2.1.29/2.1.30 on
db-4.2.52.2.

Regards,
Buchan

- --
Buchan Milne                      Senior Support Technician
Obsidian Systems                  http://www.obsidian.co.za
B.Eng                                RHCE (803004789010797)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAyb2jrJK6UGDSBKcRAoN4AJ0V30GjRebbbnBe595ZV3I7JD543gCfYVp5
lmyRiOxu2bBum85No84Abdw=
=haHX
-----END PGP SIGNATURE-----

References:
- openldap hangs on bdb / database corruption
  - From: Ingo Steuwer <steuwer@univention.de>
- Re: openldap hangs on bdb / database corruption
  - From: Buchan Milne <bgmilne@obsidian.co.za>
- Re: openldap hangs on bdb / database corruption
  - From: Ingo Steuwer <steuwer@univention.de>
- Re: openldap hangs on bdb / database corruption
  - From: Ingo Steuwer <steuwer@univention.de>

Prev by Date: Re: openldap hangs on bdb / database corruption
Next by Date: Re: Scaling for performance : Results and comments????
Index(es):
- Chronological
- Thread