[Date Prev][Date Next] [Chronological] [Thread] [Top]

[BUG] OpenLDAP or BDB problem?



hi list,

for several reasons we use BDB as backend for ldap.
the following behaviour is reproduceable on the following software
(tested on 3 different hardware plattforms, all x86, Opterons with 2-4GB
RAM):

DEBIAN Sarge
- slapd 2.2.23-8
- libdb4.2 4.2.52-18
- db4.2-util 4.2.52-18

SLES9
- db-utils-4.2.52-86.3
- db-4.2.52-86.3
- openldap2-2.2.24-4.5
- openldap2-client-2.2.24-4.8

we have a little "torture" script to simulate traffic for the upcomming
production environment with samba/openldap. this torture script is perl
and simply writes random values to "userPassword" of each entry (~1100).
we started 10-50 simultanous instances of this script. sometimes it
completes successfully, but very often it crashes openldap and our BDB
database with the following error:

Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb_modify: retrying...
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb(dc=eva,dc=mpg,dc=de):
DB_TXN->abort: Log undo failed for LSN: 3 2173192: DB_NOTFOUND: No
matching key/data pai
r found
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb(dc=eva,dc=mpg,dc=de):
PANIC: DB_NOTFOUND: No matching key/data pair found
Oct 21 10:39:06 ldapmaster2 slapd[17172]: send_ldap_result: conn=16
op=10 p=3
Oct 21 10:39:06 ldapmaster2 slapd[17172]: send_ldap_response: msgid=13
tag=103 err=80
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb(dc=eva,dc=mpg,dc=de):
PANIC: fatal region error detected; run recovery
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb_cache_entry_db_relock:
entry 552, rw 1, rc -30978
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb(dc=eva,dc=mpg,dc=de):
PANIC: fatal region error detected; run recovery
Oct 21 10:39:06 ldapmaster2 slapd[17172]: bdb_modify: txn_commit failed:
DB_RUNRECOVERY: Fatal error, run database recovery (-30978)
Oct 21 10:39:06 ldapmaster2 slapd[17172]: send_ldap_result: conn=17
op=11 p=3
Oct 21 10:39:06 ldapmaster2 slapd[17172]: send_ldap_response: msgid=14
tag=103 err=80

then we have to run db_recover to get the system running again. this is
very annoying because the system is not reliable at all!

first we had /var/lib/ldap on the system partition (XFS), after that we
put it on a seperate partition (EXT3), but the result stays the same.

we also tested 2 different BDB config files and played around with some
entries (DB_CONFIG, see bottom).
any help would be great!!!

regards,
micha

# cat /etc/openldap/slapd.conf
##################################################################################
include         /etc/openldap/schema/core.schema
include		....

allow           bind_v2
pidfile         /var/run/slapd/slapd.pid
argsfile        /var/run/slapd/slapd.args
loglevel        0
sizelimit       -1
concurrency     3
modulepath      /usr/lib/openldap/modules

access to
attrs=userPassword,sambaNTPassword,sambaLMPassword,sambaPwdLastSet,sambaPwdMustChange
      by dn="uid=sambamanager,dc=my,dc=company" write
      by self write
      by anonymous auth
      by * none

access to dn.regex="cn=[^,]+,dc=my,dc=company"
     by dn="uid=sambamanager,dc=my,dc=company" none
     by * read
access to
attrs=objectClass,entry,homeDirectory,uid,uidNumber,gidNumber,memberUid
      by dn="uid=sambamanager,dc=my,dc=company" write
      by * read
access to
attrs=description,telephoneNumber,roomNumber,homePhone,loginShell,gecos,cn,sn,givenname
      by dn="uid=sambamanager,dc=my,dc=company" write
      by self write
      by * read

access to
attrs=cn,sambaLMPassword,sambaNTPassword,sambaPwdLastSet,sambaLogonTime,sambaLogoffTime,sambaKickoffTime,sambaPwdCanChange,sambaPwdMustChange,sambaAcctFlags,displayName,sambaHomePath,sambaHomeDrive,sambaLogonScript,sambaProfilePath,description,sambaUserWorkstations,sambaPrimaryGroupSID,sambaDomainName,sambaMungedDial,sambaBadPasswordCount,sambaBadPasswordTime,sambaPasswordHistory,sambaLogonHours,sambaSID,sambaSIDList,sambaTrustFlags,sambaGroupType,sambaNextRid,sambaNextGroupRid,sambaNextUserRid,sambaAlgorithmicRidBase,sambaShareName,sambaOptionName,sambaBoolOption,sambaIntegerOption,sambaStringOption,sambaStringListoption
      by dn="uid=sambamanager,dc=my,dc=company" write
      by self read
      by * none

access to dn.base="dc=my,dc=company"
      by dn="uid=sambamanager,dc=my,dc=company" write
      by * none


access to dn="ou=Groups,dc=my,dc=company" by dn="uid=sambamanager,dc=my,dc=company" write by * none access to dn="ou=Computers,dc=my,dc=company" by dn="uid=sambamanager,dc=my,dc=company" write by * none

access to *
      by self read
      by * none

backend         bdb
database        bdb
checkpoint      1024    5
cachesize       100000
suffix          "dc=my,dc=company"
rootdn          "uid=cyrus,dc=my,dc=company"
password-hash   {SSHA} {CRYPT} {MD5}
rootpw          {CRYPT}FFFFFFFFFFFFFFFF

directory       /var/lib/ldap
mode            660
index   objectClass                             eq
index   sambaSID                                eq,pres
index   uid                                     eq,pres,subany
index   sambaPrimaryGroupSID                    eq,pres
index   uidNumber                               eq,pres
index   gidNumber                               eq,pres
index   displayName                             eq,pres,subany
index   cn,sn,givenname,mail,memberuid          pres,subany,eq
##################################################################################

# cat /var/lib/ldap/DB_CONFIG
set_cachesize 0 15000000 1
set_lg_bsize 2097152

# cat /var/lib/ldap/DB_CONFIG_DEBIAN
set_cachesize   0       268435456        0
set_lg_regionmax        10048576
set_lg_max              10485760
set_lg_bsize            2097152
set_lg_dir /var/lib/ldap
set_lk_max_objects      1000000
set_lk_max_locks        1000000
set_lk_max_lockers      1000000


-- Michael Gasch Max Planck Institute for Evolutionary Anthropology Department of Human Evolution (IT) Deutscher Platz 6 D-04103 Leipzig Germany

Phone: 49 (0)341 - 3550 137