[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: BDB corruption, running out of ideas OpenLDAP 2.2.23/Debian Sarge

To: Jose Ildefonso Camargo Tolosa <icamargo@unet.edu.ve>, openldap-software@OpenLDAP.org
Subject: Re: BDB corruption, running out of ideas OpenLDAP 2.2.23/Debian Sarge
From: Quanah Gibson-Mount <quanah@stanford.edu>
Date: Wed, 16 Mar 2005 20:09:13 -0800
Content-disposition: inline
In-reply-to: <42378F2E.1080204@unet.edu.ve>
References: <422F40B8.8020105@midco.net> <E2EF0A8B8EF6B78E74549BDF@cadabra-dsl.stanford.edu> <42378F2E.1080204@unet.edu.ve>

--On Tuesday, March 15, 2005 9:43 PM -0400 Jose Ildefonso Camargo Tolosa <icamargo@unet.edu.ve> wrote:

Hi!

I'm having this kind of error for quite some time now..... As I stated in
a previous post, I have a 25 replica scenario.  I just upgraded from
OpenLDAP 2.1.30 to OpenLDAP 2.2.23.  I'm using Debian, so I tried the
experimental packages for openldap 2.2, but I noticed that it use BDB 4.3
(can this cause problems?).

I've been having some discussions with the debian folks. I think their use of BDB 4.3 with OpenLDAP 2.2 will soon disappear.

Anyway, I'm getting far less db corruptions than before ( :) ), but I
still get some problems in some of the remote replicas.  More precicely,
I'm having slapd crashes, I issue an db4.3_recover -h /var/lib/ldap/ -v
and it start to work again, but from that time on I start to get some
.rej.....  Off course, before running the db_recover I also had some
rejects ( :\ ).  I know for certain that some replicas are getting
shutdown uncleanly, but that's something that is *very* hard to avoid,
so, in order to "reduce" the impact of such shutdowns, I tried to keep
running a db4.3_checkpoint -h /var/lib/ldap/ -p 5 ........ I'm not sure
if it is working.....

Were all 25 replica's and the master updated to OpenLDAP 2.2? I assume you reloaded the database via slapcat/slapadd...

I'll note that I have periodically had invalid reject files created because slurpd sometimes doesn't get a reply from the replica within its timeout period, and then replays the change. I would make sure and validate that all reject files are valid rejects.

Anyway, I'm running out of ideas, and I'm gettint tired of having to
"manually" resync the replicas, any help would be appreciated.  Distro
switching is not an option because the replicas are spread all over the
country.

What sort of Slapd cachesize do you have? What sort of slapd idlcachesize do you have?

Off course I have a DB_CONFIG in each of the places that looks like this:

set_cachesize 0 67108864 1
set_lk_max_lockers 2500
set_lk_max_locks 7500
set_lk_max_objects 7500

Yes, the cache is huge (I read on the Berkley documention that if I'm
unsure, make it big, so I made it big).


I honestly would say that is a small cache... I use

set_cachesize 2 0 1

which is 2GB.  Your 64MB one is quite tiny. ;)

However, given the output for your db_stat output, I'd say that it is sufficient for your setup.

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

"These censorship operations against schools and libraries are stronger
than ever in the present religio-political climate. They often focus on
fantasy and sf books, which foster that deadly enemy to bigotry and blind
faith, the imagination." -- Ursula K. Le Guin

References:
- Re: db corruption?
  - From: Aaron Thoreson <aaront@midco.net>
- Re: db corruption?
  - From: Quanah Gibson-Mount <quanah@stanford.edu>
- BDB corruption, running out of ideas OpenLDAP 2.2.23/Debian Sarge
  - From: Jose Ildefonso Camargo Tolosa <icamargo@unet.edu.ve>

Prev by Date: BDB corruption, running out of ideas OpenLDAP 2.2.23/Debian Sarge
Next by Date: Re: slapadd: dn="uid=someone,ou=users,o=somedomain.com" (line=44): (65) attribute 'c' not allowed
Index(es):
- Chronological
- Thread