[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5354) slapd repeatedly hangs and stops reponding



Full_Name: Oren Laadan
Version: 2.4.7
OS: debian/sid
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (128.59.18.70)


Description:
The slapd server hangs frequently, and completely stops responding to any
request. This results in the entire machine (as well as all clients) becoming
unresponsive. Under certain configurations it can happen as frequently as every
few minutes. As a temporary measure, we have a script that restarts the server
every 10 minutes, just in case.

Operating System: Debian Sid
   Linux ____ 2.6.22-3-amd64 #1 SMP Mon Nov 12 10:28:43 UTC 2007 x86_64
GNU/Linux

LDAP version:
Our configuration uses both BDB and META back-ends; as it turns out the standard
debian package of LDAP fails to run with the META back-end configured (see also
complaints here: http://www.openldap.org/lists/openldap-bugs/200705/msg00011.html
and here: http://arkiv.netbsd.se/?ml=OpenLDAP-bugs&a=2007-02&t=3076794).

So I downloaded the 2.4.7 and compiled with the following options:
./configure \
        --enable-dynamic \
        --prefix=/opt/ldap-2.4.7 \
        --sysconfdir=/etc \
        --localstatedir=/var/lib \
        --disable-ipv6 \
        --enable-spasswd \
        --enable-bdb --enable-hdb --enable-ldap --enable-meta --enable-relay \
        --enable-overlays \
        --with-threads \
        --with-tls

Setup:
Our server is used by clients in abc.main.example.com domain, who are also
part of main.example.com domain. In the latter there is already an ldap server
ldap.main.example.com; Our server builds on that server extending it by adding
some records (e.g. groups that are private to abc.main.example.com).
To implement this, I first built a BDB backend that holds the local data that
is added to the big database, serving the domain abc.main.example.com. Then I
setup a META backend that redirects queries to both the parent server and to
the local server; sending to local server also uses "suffixmassage" to convert
from the original domain (main.example.com) to the domain
(abc.main.example.com).
Note that the clients are configured for the regular, original domain, that is
main.example.com. Therefore the new domain is only for internal use within the
LDAP server. 
See the configuration file for details. 

Problem:
Originally the config file didn't have the two lines setting "thread" and 
"idle_timeout"; To reproduce the problem, we would run on one of the clients
a script that opens 300 ssh connections outside to some destinations. The ldap
server would freeze.
Looking for a solution, some posts in google suggested playing with these two
parameters. So I added them. And - voila - now it not only happens under those
circumstances, but in fact it happens randomly and frequently, maybe every 15
minutes or so, sometimes more sometimes less.
It's quite reproducible, and I'm willing to work with whoever to try to solve
this. Right now our setup is hardly usable :(
Logs and other material are avilable on request. 


Config file:
----------------------------------------------------------------------------
include         /etc/ldap/schema/core.schema
include         /etc/ldap/schema/cosine.schema
include         /etc/ldap/schema/nis.schema
include         /etc/ldap/schema/inetorgperson.schema
include         /etc/ldap/schema/autofs.schema
include         /etc/ldap/schema/dnszone.schema
include         /etc/ldap/schema/dhcp.schema
# Where the pid file is put. The init.d script
# will not stop the server if you change this.
pidfile         /var/run/slapd/slapd.pid

# List of arguments that were passed to the server
argsfile        /var/run/slapd/slapd.args

# Read slapd.conf(5) for possible values
#loglevel        256
loglevel        64

# TRY TO SOLVE ISSUES ?
threads         32
idletimeout     30

# Where the dynamically loaded modules are stored
# [orenl] we use statically build backends/overlays
modulepath      /opt/ldap/lib
moduleload      back_ldap
moduleload      back_meta
moduleload      back_bdb

# The maximum number of entries that is returned for a search operation
sizelimit 10000

# The tool-threads parameter sets the actual amount of cpu's that is used
# for indexing.
tool-threads 1


#######################################################################
# Specific Backend Directives for bdb:
backend         bdb

TLSCipherSuite HIGH:MEDIUM:LOW
TLSCACertificateFile    /etc/ldap/ssl/SOME_IP.crt
TLSCertificateFile      /etc/ldap/ssl/SOME_IP.crt
TLSCertificateKeyFile   /etc/ldap/ssl/SOME_IP.pem
TLSVerifyClient         try


#######################################################################
# Specific Backend Directives for meta:
backend         meta

TLSCipherSuite HIGH:MEDIUM:LOW
TLSCACertificateFile    /etc/ldap/ssl/SOME_IP.crt
TLSCACertificateFile    /etc/ssl/certs/OTHER_IP.crt

TLSCertificateFile      /etc/ldap/ssl/SOME_IP.crt
TLSCertificateKeyFile   /etc/ldap/ssl/SOME_IP.pem
TLSVerifyClient         allow

#######################################################################
# Specific Directives for database #1, of type bdb:
database        bdb                                            
                                                               
# The base of your directory in database #1
suffix          "dc=abc,dc=main,dc=example,dc=com"
readonly        on

# rootdn directive for specifying a superuser on the database. This is needed
# for syncrepl.
#rootdn "cn=admin,dc=main,dc=example,dc=com"
#rootpw "change"

# Where the database file are physically stored for database #1
directory       "/var/lib/ldap"

# For the Debian package we use 2MB as default but be sure to update this
# value if you have plenty of RAM
dbconfig set_cachesize 0 2097152 0

# Sven Hartge reported that he had to set this value incredibly high
# to get slapd running at all. See http://bugs.debian.org/303057
# for more information.

# Number of objects that can be locked at the same time.
dbconfig set_lk_max_objects 1500
# Number of locks (both requested and granted)
dbconfig set_lk_max_locks 1500
# Number of lockers
dbconfig set_lk_max_lockers 1500

# Indexing options for database #1
index      objectClass eq

# Save the time that the entry gets modified, for database #1
lastmod         on

# Everyone can read everything.
#access to * by * read


#######################################################################
# Specific Directives for database #2, of type 'ldap':
database        meta

lastmod         off
#rebind-as-user yes

# The base of your directory in database #3
suffix          "dc=main,dc=example,dc=com"

uri             "ldaps://OTHER_IP/dc=main,dc=example,dc=com"
uri             "ldaps:///dc=main,dc=example,dc=com"
suffixmassage   "dc=main,dc=example,dc=com" "dc=abc,dc=main,dc=eaxmple,dc=com"
----------------------------------------------------------------------------