[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Database deadlock when adding new entry



On 6/20/06, Aleksandar Milivojevic <alex@milivojevic.org> wrote:
Hi,

I'm experiencing database deadlocks when adding new entry into the
directory.  In short, all works fine for some time, than I attempt to
add new entry into the directory, and LDAP server (slapd) simply
hangs.

If this is not associated with a restart, it sounds like you are hitting the default DB library cache size

 I'm still able to bind to the server, however any operation
after that (read or write) will simply hang.  If I restart slapd, it
hangs while initializing bdb (DB4) backend and never starts listening
on network port.  These are the last entries in log files before slapd
hangs on restart:

Jun 19 11:03:40 ldap1 slapd[20331]: bdb_initialize: Sleepycat
Software: Berkeley DB 4.2.52: (December  3, 2003)
Jun 19 11:03:40 ldap1 slapd[20331]: bdb_db_init: Initializing BDB database

The only way to get out of this that I found so far is to stop slapd,
remove all database files in /var/lib/ldap directory, start slapd and
use ldapadd to readd all the old entries back.

Running 'db_recover -h /var/lib/ldap' would most likely fix this specific problem.

 If I attempt to add
same entry again, it hangs again.  However, if the same entry is added
using ldapadd with all the old entries, it works!?

I've attempted to reproduce the problem on my testing box.  If I use
ldapadd to add all the old entries (like the way I was reinitializing
database on production box), and than add the new entry, all works
fine.  If I copy the raw database files directly from the production
box, than slapd hangs on startup just like the production box.  So my
guess is that something in the database files got corrupted.

No, they are not corrupt, the database was shutdown uncleanly, and you have not recovered it, and Red Hat's init script does not do it for you, and the version your sitting with does not do automatic database recovery.

(When a filesystem needs to be checked, you don't say it is corrupt
just because you haven't run fsck ..).

The only difference between production and testing box is that
production box is replicated (using syncrepl).  Replication is one
way, all updates are done on the master (the one that hangs), and
slaves (which are read-only) pull the changes.  When the master hangs,
I'm still able to read out data from slaves.

But, eventually they will exhibit the same behaviour.

> I'm using CentOS4 (RHEL4 clone) openldap-2.2.13-4 and db4-4.2.52-7.1
RPM packages.  Database backend is bdb.

Bad idea.

Wondering if this is a known issue (maybe already fixed in current
version of openldap or db4?) or something new?  Anything I might try
changing in slapd or DB4 config?

Firstly, at a minimum: 1)Use DB_CONFIG file, see the FAQ for some discussions on "tuning" (where lack of tuning can prevent your LDAP server from working at all) 2)Enable checkpointing (see the slapd.conf man page) and set a cron job to checkpoint your DB 3)Ensure database recovery is run any time slapd could have been shutdown uncleanly (which is quite likely with the packages you are running).

If you can afford an upgrade, upgrade to 2.3.24, you can try these
packages, which will install in parallel with the original packages
(but, use a decent tool like yum or smart to do so):

http://anorien.warwick.ac.uk/mirrors/buchan/openldap/rhel4/

They ship with an example DB_CONFIG file, the slapd.conf shipped has
an example checkpoint directive, 2.3 does checkpointing (no need for a
cron job) and 2.3 does automatic database recovery.

We're running 2 sets of production LDAP servers on these packages (or
the previous ones), one set with ~ 60 000 entries, one set with ~ 600
000 entries.

Regards,
Buchan