[Date Prev][Date Next] [Chronological] [Thread] [Top]

BDB Corruption...

To: openldap-software@OpenLDAP.org
Subject: BDB Corruption...
From: Roy Butler <roy.butler@jpl.nasa.gov>
Date: Tue, 19 Oct 2004 15:27:42 -0700
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.6) Gecko/20040114

Hello everyone,

I'm running OpenLDAP 2.2.6 on SuSE 9.1 on a dual-Xeon Intel server. It is set up to provide user, group, netgroup, and automount data. I do a fair amount of writing on occasion when I repopulate some of these OUs using migration scripts to pull in updated data from a production NIS server. None of the OUs have more than about 1,000 entries. I've been running it about a month and twice I've had BDB corruption in which the server stopped responding or could only serve up a portion of its entries without hanging. Restarting the server had no beneficial effect. The first time, I just shut down, wiped the LDAP database directory, restarted, and slapadd'ed in a back-up LDIF. Now that it's happened the second time, I've performed the following to try to get to the bottom of it:

------------------------------
------------------------------

# ps -el | grep slapd 1 S 76 21668 1 0 76 0 - 7965 schedu ? 00:00:00 slapd (The process always seems to be waiting to be scheduled, as if from input on a file descriptor)

# strace -f -p 21668
Process 21668 attached - interrupt to quit
futex(0x40787bf8, FUTEX_WAIT, 21669, NULL

# slapcat
...
(Dumps up to a point, then hangs)

# ldapsearch -b ou=...,dc=... ...=... ... (Returns some entries, doesn't return others which should be there, and hangs on others)

# vmstat 1 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 278536 219132 328428 0 0 0 3 1 10 0 0 100 0 (The CPUs were rather idle the entire time)

# cd /var/lib/ldap; db_recover (This appeared to resolve the problem, though I still plan to wipe it out and restore from a back-up LDIF)

------------------------------
------------------------------

Has anyone come across a situation like this before and/or have any tips on how I might permanently avoid the condition in the future? On less well-used software I'd expect a possible reentrancy issue, but think that's unlikely to be the case here. If it's a known issue with there is a patch, I'd be happy to test it.


Thanks,
Roy

Follow-Ups:
- Re: BDB Corruption...
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: BDB Corruption...
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Re: Parsing Debug Output
Next by Date: Re: BDB Corruption...
Index(es):
- Chronological
- Thread