[Date Prev][Date Next] [Chronological] [Thread] [Top]

Slapd database goes corrupt repeatedly after recovery



Dear list,

We are using an OpenLDAP/slapd server to manage the user accounts of our Samba server and have recently run into the problem that users cannot connect to Samba drives anymore after some time. Samba complains that it cannot connect to the LDAP server (see below for error message in Samba log) and the slapd log shows

  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (uid) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed
  Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): file id2entry.bdb has LSN 1/382892, past end of log at 1/283666
  Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): Commonly caused by moving a database from one database environment
  Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): to another without clearing the database LSNs, or by removing all of
  Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): the log files from a database environment
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/283666
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/283666
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/283666
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8
  Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): txn_checkpoint: failed to flush the buffer cache: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 11:38:51 office-server slapd[3433]: conn=62 op=29 do_search: invalid dn (sambaDomainName=,sambaDomainName=foo,dc=foo,dc=org)
  Mar 25 11:38:51 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:39:01 office-server slapd[3433]: last message repeated 26 times
  Mar 25 11:39:01 office-server CRON[3657]: (root) CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm)
  Mar 25 11:39:14 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:39:47 office-server slapd[3433]: last message repeated 35 times
  Mar 25 11:39:48 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:39:49 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:39:50 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:40:51 office-server slapd[3433]: last message repeated 164 times
  Mar 25 11:40:51 office-server slapd[3433]: last message repeated 3 times
  Mar 25 11:40:52 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 11:41:53 office-server slapd[3433]: last message repeated 294 times

Strangely, restarting slapd helps and users can use Samba again for a limited and arbitrary period of time until the problem pops up again. I tried fixing the database using

  db4.7_recover  -v -h /var/lib/ldap

but again, the problem pops up again later.

I realized that when I shut down slapd using "/etc/init.d/slapd stop", it complains about the database being corrupt (even if so far no problems appeared):

  Mar 25 10:12:35 office-server slapd[16880]: slapd shutdown: waiting for 0 operations/tasks to finish
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/278482
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/278482
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/278482
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": txn_checkpoint failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974).
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): File handles still open at environment close
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Open file handle: /var/lib/ldap/log.0000000001
  Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery
  Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": close failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974)
  Mar 25 10:12:35 office-server slapd[16880]: slapd stopped.
  Mar 25 10:12:46 office-server slapd[19194]: @(#) $OpenLDAP: slapd 2.4.18 (Sep  8 2009 17:47:22) $#012#011buildd@crested:/build/buildd/openldap-2.4.18/debian/build/servers/slapd

Does anybody have an idea what the problem might be?

Many thanks for any hints or pointers!
Kaspar

-- 
Samba Log File:

[2008/09/23 11:22:22, 0] lib/smbldap.c:smbldap_connect_system(982)
  failed to bind to server ldap://localhost/ with dn="cn=admin,dc=foo,dc=org" Error: Can't contact LDAP server (unknown)
[2008/09/23 11:22:22, 1] lib/smbldap.c:another_ldap_try(1153)
  Connection to LDAP server failed for the 1 try!
[2008/09/23 11:22:23, 2] lib/smbldap.c:smbldap_open_connection(786)
  smbldap_open_connection: connection opened
[2008/09/23 11:22:23, 2] lib/smbldap.c:smbldap_connect_system(982)
  failed to bind to server ldap://localhost/ with dn="cn=admin,dc=foo,dc=org" Error: Can't contact LDAP server (unknown)
[2008/09/23 11:22:23, 1] lib/smbldap.c:another_ldap_try(1153)
  Connection to LDAP server failed for the 2 try!

Server details: Ubuntu 9.10, slapd 2.4.18

Slapd configuration file (slapd.conf):

# This is the main slapd configuration file. See slapd.conf(5) for more
# info on the configuration options.

#######################################################################
# Global Directives:

# Features to permit
#allow bind_v2

# Schema and objectClass definitions
include         /etc/ldap/schema/core.schema
include         /etc/ldap/schema/cosine.schema
include         /etc/ldap/schema/nis.schema
include         /etc/ldap/schema/inetorgperson.schema
include         /etc/ldap/schema/samba.schema
include         /etc/ldap/schema/misc.schema

# Where the pid file is put. The init.d script
# will not stop the server if you change this.
pidfile         /var/run/slapd/slapd.pid

# List of arguments that were passed to the server
argsfile        /var/run/slapd/slapd.args

# Read slapd.conf(5) for possible values
loglevel        392

# Where the dynamically loaded modules are stored
modulepath	/usr/lib/ldap
moduleload	back_bdb

# The maximum number of entries that is returned for a search operation
sizelimit 500

# The tool-threads parameter sets the actual amount of cpu's that is used
# for indexing.
tool-threads 1

#######################################################################
# Specific Backend Directives for bdb:
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
backend		bdb

#######################################################################
# Specific Backend Directives for 'other':
# Backend specific directives apply to this backend until another
# 'backend' directive occurs
#backend		<other>

#######################################################################
# Specific Directives for database #1, of type bdb:
# Database specific directives apply to this databasse until another
# 'database' directive occurs
database        bdb

# The base of your directory in database #1
suffix          "dc=baselgovernance,dc=org"

# rootdn directive for specifying a superuser on the database. This is needed
# for syncrepl.
# rootdn          "cn=admin,dc=baselgovernance,dc=org"

# Where the database file are physically stored for database #1
directory       "/var/lib/ldap"

# The dbconfig settings are used to generate a DB_CONFIG file the first
# time slapd starts.  They do NOT override existing an existing DB_CONFIG
# file.  You should therefore change these settings in DB_CONFIG directly
# or remove DB_CONFIG and restart slapd for changes to take effect.

# For the Debian package we use 2MB as default but be sure to update this
# value if you have plenty of RAM
dbconfig set_cachesize 0 2097152 0

# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057 for more
# information.

# Number of objects that can be locked at the same time.
dbconfig set_lk_max_objects 1500
# Number of locks (both requested and granted)
dbconfig set_lk_max_locks 1500
# Number of lockers
dbconfig set_lk_max_lockers 1500

# Indexing options for database #1
index           objectClass eq

# Save the time that the entry gets modified, for database #1
lastmod         on

# Checkpoint the BerkeleyDB database periodically in case of system
# failure and to speed slapd shutdown.
checkpoint      512 30

# Where to store the replica logs for database #1
# replogfile	/var/lib/ldap/replog

# The userPassword by default can be changed
# by the entry owning it if they are authenticated.
# Others should not be able to see it, except the
# admin entry below
# These access lines apply to database #1 only
access to attrs=userPassword,sambaNTPassword,sambaLMPassword
        by dn="cn=admin,dc=baselgovernance,dc=org" write
        by anonymous auth
        by self write
        by * none

# Ensure read access to the base for things like
# supportedSASLMechanisms.  Without this you may
# have problems with SASL not knowing what
# mechanisms are available and the like.
# Note that this is covered by the 'access to *'
# ACL below too but if you change that as people
# are wont to do you'll still need this if you
# want SASL (and possible other things) to work 
# happily.
access to dn.base="" by * read

# The admin dn has full write access, everyone else
# can read everything.
access to *
        by dn="cn=admin,dc=baselgovernance,dc=org" write
        by * read

# For Netscape Roaming support, each user gets a roaming
# profile for which they have write access to
#access to dn=".*,ou=Roaming,o=morsnet"
#        by dn="cn=admin,dc=baselgovernance,dc=org" write
#        by dnattr=owner write

#######################################################################
# Specific Directives for database #2, of type 'other' (can be bdb too):
# Database specific directives apply to this databasse until another
# 'database' directive occurs
#database        <other>

# The base of your directory for database #2
#suffix		"dc=debian,dc=org"
# Indices to maintain
## required by OpenLDAP
#index objectclass             eq

index cn                      pres,sub,eq
index sn                      pres,sub,eq
## required to support pdb_getsampwnam
index uid                     pres,sub,eq
## required to support pdb_getsambapwrid()
index displayName             pres,sub,eq

## uncomment these if you are storing posixAccount and
## posixGroup entries in the directory as well
##index uidNumber               eq
##index gidNumber               eq
##index memberUid               eq

index   sambaSID              pres,sub,eq
index   sambaPrimaryGroupSID  eq
index   sambaDomainName       eq
index   default               sub