[Date Prev][Date Next] [Chronological] [Thread] [Top]

BDB corruption, running out of ideas OpenLDAP 2.2.23/Debian Sarge


I'm having this kind of error for quite some time now..... As I stated in a previous post, I have a 25 replica scenario. I just upgraded from OpenLDAP 2.1.30 to OpenLDAP 2.2.23. I'm using Debian, so I tried the experimental packages for openldap 2.2, but I noticed that it use BDB 4.3 (can this cause problems?).

Anyway, I'm getting far less db corruptions than before ( :) ), but I still get some problems in some of the remote replicas. More precicely, I'm having slapd crashes, I issue an db4.3_recover -h /var/lib/ldap/ -v and it start to work again, but from that time on I start to get some .rej..... Off course, before running the db_recover I also had some rejects ( :\ ). I know for certain that some replicas are getting shutdown uncleanly, but that's something that is *very* hard to avoid, so, in order to "reduce" the impact of such shutdowns, I tried to keep running a db4.3_checkpoint -h /var/lib/ldap/ -p 5 ........ I'm not sure if it is working.....

Anyway, I'm running out of ideas, and I'm gettint tired of having to "manually" resync the replicas, any help would be appreciated. Distro switching is not an option because the replicas are spread all over the country.

Off course I have a DB_CONFIG in each of the places that looks like this:

set_cachesize 0 67108864 1
set_lk_max_lockers 2500
set_lk_max_locks 7500
set_lk_max_objects 7500

Yes, the cache is huge (I read on the Berkley documention that if I'm unsure, make it big, so I made it big).

db4.3_stat -m -h /var/lib/ldap reports something like this:

80MB 1KB 604B   Total cache size
1       Number of caches
80MB 8KB        Pool individual cache size
0       Maximum memory-mapped file size
0       Maximum open file descriptors
0       Maximum sequential buffer writes
0       Sleep after writing maximum sequential buffers
0       Requested pages mapped into the process' address space
4176807 Requested pages found in the cache (99%)
26      Requested pages not found in the cache
3195    Pages created in the cache
26      Pages read into the cache
3117    Pages written from the cache to the backing file
0       Clean pages forced from the cache
0       Dirty pages forced from the cache
0       Dirty pages written by trickle-sync thread
3221    Current total page count
579     Current clean page count
2642    Current dirty page count
8191    Number of hash buckets used for page location
4180054 Total number of times hash chains searched for a page
2       The longest hash chain searched for a page

(......) and it goes. The cache size is in fact correct (and a slapadd that used to take about 30 minutes, now only takes about 3).

Thanks in advance, sincerelly,

Ildefonso Camargo.

Quanah Gibson-Mount wrote:

--On Wednesday, March 09, 2005 12:30 PM -0600 Aaron Thoreson <aaront@midco.net> wrote:

I should note that I'm using LDBM and no Berkeley.  I'm not opposed to
making the switch, sometime down the road, but .dbb worked fine for some
time (this has only started occurring recently after a hard drive
failure) and I'd like to fix the issue at hand rather than upgrade.  I
assume this may be the reason I have no db_recover.

Well, "database ldbm" could still be on top of berkeley db. db_recover is a command shipped with berkeley DB.

I would highly recommend upgrading to using BDB as the underlying database structure, with "database bdb" in your slapd.conf, as it is a much superior database format to ldbm. We ran with ldbm on our previous directory prior to OpenLDAP, and had many undetected database corruption errors because of it.


-- Quanah Gibson-Mount Principal Software Developer ITSS/Shared Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

"These censorship operations against schools and libraries are stronger
than ever in the present religio-political climate. They often focus on
fantasy and sf books, which foster that deadly enemy to bigotry and blind
faith, the imagination." -- Ursula K. Le Guin