[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7378) Slapd hangs on bdb write lock



Full_Name: Nikolai Schupbach
Version: 2.4.31
OS: FreeBSD
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (202.78.158.60)


We are experiencing frequent hangs in slapd. Once hung we can continue to
connect, but all searches will just hang indefinitely until we kill -9 the slapd
process and restart it. The directory is used for mail routing and we have been
migrating to it from an existing directory server over the last 3 weeks - we
have noted the busier the directory becomes the more often it hangs (now once
every 2 days).

We have one master and 10 syncrepl read only replicas - the master is used
mainly for writes and has not hung yet, but most of the replicas have hung at
least once. The replicas receive anywhere between 50 to 300 searches/sec, while
the master would only get 1/sec. There are 45k entries in the directory.

We are running:

FreeBSD 8.3/9.0 x64
OpenLDAP 2.4.31
Berkeley DB 4.6.21

The old directory we are migrating from has the same load and is also running
OpenLDAP, but has been rock solid for 5 years. It is running Berkeley DB 4.3.29
and OpenLDAP 2.3.27.

We have managed to collect db_stat lock information, which indicates the same
issue each time - a write lock on dn2id.bdb.

Locks grouped by object:
Locker   Mode      Count Status  ----------------- Object ---------------
8000a85e READ          1 HELD    0xb26c8 len:   9 data: 60xa800000000000000

      8a READ          1 HELD    id2entry.bdb              handle        0

      8c READ          1 HELD    dn2id.bdb                 handle        0

      96 READ          1 HELD    objectClass.bdb           handle        0

      93 READ          1 HELD    entryCSN.bdb              handle        0

      90 READ          1 HELD    entryUUID.bdb             handle        0

8000a85f WRITE         4 HELD    dn2id.bdb                 page        219

80000782 READ          1 HELD    dn2id.bdb                 page        768
80000a45 READ          1 HELD    dn2id.bdb                 page        768
80000b9e READ          1 HELD    dn2id.bdb                 page        768
800006a0 READ          1 HELD    dn2id.bdb                 page        768
80000771 READ          1 HELD    dn2id.bdb                 page        768
80000534 READ          1 HELD    dn2id.bdb                 page        768
80000a44 READ          1 HELD    dn2id.bdb                 page        768
80000641 READ          1 HELD    dn2id.bdb                 page        768
80001049 READ          1 HELD    dn2id.bdb                 page        768
8000104a READ          1 HELD    dn2id.bdb                 page        768
80001048 READ          1 HELD    dn2id.bdb                 page        768
80000783 READ          1 HELD    dn2id.bdb                 page        768
80000535 READ          1 HELD    dn2id.bdb                 page        768
8000066e READ          1 HELD    dn2id.bdb                 page        768
80000697 READ          1 HELD    dn2id.bdb                 page        768
8000a85f READ          1 HELD    dn2id.bdb                 page        768

8000a85e READ          1 HELD    0xb19a8 len:   9 data: 40xa800000000000000

8000a85f READ          1 HELD    dn2id.bdb                 page        933
8000a85f WRITE         2 HELD    dn2id.bdb                 page        933

80001047 WRITE         1 HELD    dn2id.bdb                 page        559
80000782 READ          1 WAIT    dn2id.bdb                 page        559
80000a45 READ          1 WAIT    dn2id.bdb                 page        559
80000b9e READ          1 WAIT    dn2id.bdb                 page        559
800006a0 READ          1 WAIT    dn2id.bdb                 page        559
80000771 READ          1 WAIT    dn2id.bdb                 page        559
80000534 READ          1 WAIT    dn2id.bdb                 page        559
80000a44 READ          1 WAIT    dn2id.bdb                 page        559
80000641 READ          1 WAIT    dn2id.bdb                 page        559
80001049 READ          1 WAIT    dn2id.bdb                 page        559
8000104a READ          1 WAIT    dn2id.bdb                 page        559
80001048 READ          1 WAIT    dn2id.bdb                 page        559
80000783 READ          1 WAIT    dn2id.bdb                 page        559
80000535 READ          1 WAIT    dn2id.bdb                 page        559
8000066e READ          1 WAIT    dn2id.bdb                 page        559
80000697 READ          1 WAIT    dn2id.bdb                 page        559
8000a85f READ          1 WAIT    dn2id.bdb                 page        559

8000a85f READ          2 HELD    dn2id.bdb                 page       1362
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1362

8000a85f READ          2 HELD    dn2id.bdb                 page       1353
8000a85f WRITE         2 HELD    dn2id.bdb                 page       1353

      b6 READ          1 HELD    uid.bdb                   handle        0

      a5 READ          1 HELD    mail.bdb                  handle        0

      af READ          1 HELD    mailLocalAddress.bdb      handle        0

      9b READ          1 HELD    miLoginid.bdb             handle        0

      aa READ          1 HELD    mailHost.bdb              handle        0

      bb READ          1 HELD    miDomainName.bdb          handle        0

      c0 READ          1 HELD    mpMailHost.bdb            handle        0

      a0 READ          1 HELD    mpMailUserType.bdb        handle        0


We have also collected the backtrace for all the threads which I have uploaded
to:

ftp://ftp.openldap.org/incoming/nikolai-gdb-120902.txt

The full db_stat output is located at:

ftp://ftp.openldap.org/incoming/nikolai-dbstat-120902.txt

Our DB_CONFIG:

# One 512MB cache
set_cachesize 0 536870912 1

# Transaction Log settings
set_lg_regionmax 1048576
set_lg_max 10485760
set_lg_bsize 2097152
set_flags DB_LOG_AUTOREMOVE

# Increase lock maximums
set_lk_max_locks 2000
set_lk_max_lockers 2000
set_lk_max_objects 2000

Our slapd.conf on our replicas:

# Load the following schema files
include                /usr/local/etc/openldap/schema/core.schema
include                /usr/local/etc/openldap/schema/cosine.schema
include                /usr/local/etc/openldap/schema/nis.schema
include                /usr/local/etc/openldap/schema/inetorgperson.schema
include                /usr/local/etc/openldap/schema/misc.schema
include                /usr/local/etc/openldap/schema/mirapoint.schema
include                /usr/local/etc/openldap/schema/smp.schema

# Runtime settings for slapd
pidfile                /var/run/openldap/slapd.pid
argsfile               /var/run/openldap/slapd.args
loglevel               none

# TLS security options for slapd.
TLSCipherSuite         HIGH
TLSCACertificateFile   /usr/local/etc/openldap/tls/ca-cert.pem
TLSCertificateFile     /usr/local/etc/openldap/tls/server-cert.pem
TLSCertificateKeyFile  /usr/local/etc/openldap/tls/server-key.pem

# This option configures one or more hashes to be used in generation 
# of user passwords stored in the userPassword attribute during 
# processing of LDAP Password Modify Extended Operations (RFC 3062).
password-hash         {SSHA}

# Load dynamic backend modules:
modulepath            /usr/local/libexec/openldap
moduleload            back_bdb
moduleload            back_monitor

# Do not limit size or time of requests.
sizelimit             unlimited
timelimit             unlimited

# Require authentication prior to directory operations
require               authc

###############################################################################
# BDB Database Definitions
#
# The following configuration directives relate to bdb database definitions
###############################################################################

# The remaining configuration directives relate to bdb database definitions
database              bdb
suffix               "o=top"
rootdn               "cn=root,o=top"

# Cleartext passwords, especially for the rootdn, should
# be avoid.  See slappasswd(8) and slapd.conf(5) for details.
rootpw	             {SSHA}**********

# The database directory must exist prior to running slapd and 
# should only be accessible by the slapd and slap tools.
directory             /var/db/openldap-data

# Indices to maintain
index                 cn                  eq,sub,pres
index                 entryUUID           eq
index                 entryCSN            eq
index                 mail                eq,sub,pres
index                 mailHost            eq
index                 mailLocalAddress    eq,sub,pres
index                 miDomainName        eq,sub
index                 miLoginId           eq,pres
index                 mpMailHost          eq
index                 mpMailUserType      eq
index                 mpSystemRole        eq
index                 objectClass         eq,pres
index                 uid                 eq,pres

# Specify the number of entries which should be held in memory
cachesize             200000

# Set transactional checkpoint
checkpoint            512     60

###############################################################################
# LDAP Sync Replication
#
# A unique replica id number is required for each replication client
###############################################################################

# LDAP sync replication settings
syncrepl rid=36
   provider=ldaps://ldapmaster/
   type=refreshAndPersist
   retry=30,+
   searchbase="o=top"
   filter="(objectClass=*)"
   scope=sub
   attrs="*"
   sizelimit=unlimited
   timelimit=unlimited
   schemachecking=off
   bindmethod=simple
   binddn="cn=replica,ou=users,ou=directory,o=top"
   credentials=**********

# Where to refer ldap updates to
updateref ldaps://ldapmaster/

###############################################################################
# LDAP Statistics
#
# The OpenLDAP server can be configured to provide real time performance 
# statistics through the monitor branch. 
###############################################################################

# Enable the statistics monitoring database
database             monitor

# Allow access to monitoring user only
access to dn.subtree="cn=monitor"
        by dn.exact="cn=monitor,ou=users,ou=directory,o=top" read
        by * none


Sincerely,
Nikolai Schupbach