[Date Prev][Date Next] [Chronological] [Thread] [Top]

SLAPD hangs on existing connections, but accepts new one



At some point slapd stops processing new requests on existing connections. I am
using version 1.2.11 on Solaris 2.7.

The scenario was the following: Three clients are searching for entries and one
client is modifying entries. All clients work on the same 900 objects in
parallel. Each client sends a new request as soon as he receives the answer for
the last request. This is ment to be a simple stress test.
After about 40.000.000 requests (sum) I restarted all my clients. After another
15.000.000 requests (sum) all my clients hang - waiting for slapd.

The monitoring facility outputs the lines:
	CN=MONITOR
	version=slapd 1.2.11-Release (Thu Apr  5 09:14:27 MET DST 2001)
	threads=5
	connection=5 : 20010412075524Z : 101 : 101 : cn=Manager, dc=wapldap,
dc=detemobil, dc=de :
	connection=13 : 20010420072202Z : 3790771 : 3790771 : cn=Manager,
dc=wapldap, dc=detemobil, dc=de :
	connection=16 : 20010420072202Z : 3981891 : 3981891 : cn=Manager,
dc=wapldap, dc=detemobil, dc=de :
	connection=17 : 20010420072202Z : 3968004 : 3968004 : cn=Manager,
dc=wapldap, dc=detemobil, dc=de :
	connection=18 : 20010423074721Z : 1 : 0 : NULLDN :
	connection=19 : 20010420072202Z : 3971366 : 3971366 : cn=Manager,
dc=wapldap, dc=detemobil, dc=de :
	currentconnections=6
	totalconnections=39
	dtablesize=1024
	writewaiters=0
	readwaiters=0
	opsinitiated=54672536
	opscompleted=54672535
	entriessent=41531895
	bytessent=1823511052
	currenttime=20010423074721Z
	starttime=20010412075522Z
	nbackends=1

My slapd process does not look very healthy - it lost connection to allmost all
its shared library. I quote the output of "lsof":

./lsof_4.55 -p 17944
COMMAND   PID USER   FD   TYPE        DEVICE   SIZE/OFF     NODE NAME
slapd   17944 root  cwd   VDIR         155,0       1536        2 /
slapd   17944 root  txt   VREG         155,0      36316   155378
/usr/lib/libpthread.so.1
slapd   17944 root    0u  VCHR          13,2      0t260   788442
/devices/pseudo/mm@0:null
slapd   17944 root    1u  VCHR          13,2      0t260   788442
/devices/pseudo/mm@0:null
slapd   17944 root    2u  VCHR          13,2      0t260   788442
/devices/pseudo/mm@0:null
slapd   17944 root    3u  inet 0x30005b332e8        0t0      TCP *:ldap
(LISTEN)
slapd   17944 root    4w  VCHR          21,0        0t0   788438
/devices/pseudo/log@0:conslog->LOG
slapd   17944 root    5u  inet 0x30005b32488    0t11426      TCP
dxcsx6:ldap->dxcsx6:44948 (ESTABLISHED)
slapd   17944 root    6uW VREG           0,1    1023992 11512841 /tmp (swap)
slapd   17944 root    7uW VREG           0,1  131449038 12598323 /tmp (swap)
slapd   17944 root    8r  DOOR         196,0        0t0     7884
/etc/.name_service_door (door to nscd[370])
slapd   17944 root    9uW VREG           0,1    3043324 10922789 /tmp (swap)
slapd   17944 root   10uW VREG           0,1    1585165 12598843 /tmp (swap)
slapd   17944 root   11uW VREG           0,1    1593357 12598563 /tmp (swap)
slapd   17944 root   12uW VREG           0,1     122895 12599123 /tmp (swap)
slapd   17944 root   13u  inet 0x30002b9f9d8 0x22ba0a58      TCP
dxcsx6:ldap->dxcsx3:63563 (ESTABLISHED)
slapd   17944 root   14uW VREG           0,1      81928 10922909 /tmp (swap)
slapd   17944 root   15uW VREG           0,1    2806042 11660500 /tmp (swap)
slapd   17944 root   16u  inet 0x30005fc6d78 0x525d03d6      TCP
dxcsx6:ldap->dxcsx3:63564 (ESTABLISHED)
slapd   17944 root   17u  inet 0x30005b32ac8 0x52137b2c      TCP
dxcsx6:ldap->dxcsx3:apd   17944 root   19u  inet 0x3000
6fb10a0 0x522543f3      TCP dxcsx6:ldap->dxcsx3:63565 (ESTABLISHED)

A healthy slapd has additional file descriptors to:
/usr/local/openldap-1.2.11-debug/libexec/slapd
/usr/lib/libthread.so.1
/usr/platform/sun4u/lib/libc_psr.so.1
/usr/lib/libaio.so.1
/usr/lib/libc.so.1
/usr/lib/libmp.so.2
/usr/lib/libnsl.so.1
/usr/lib/librt.so.1
/usr/lib/libsocket.so.1
/usr/lib/libgen.so.1
/usr/lib/libresolv.so.2
/usr/local/lib/libgdbm.so.2.0.0
/usr/lib/libdl.so.1
/usr/lib/ld.so.1

At the current state slapd accepts new connections and processes requests as
expected. But my clients just hang on the existing connections. Although this
error is not easy to produce, we have observed it more than twice on different
machines. We saw the bug also with openldap-1.2.7.

Has anybody had the same problems? Can anybody give me advice to fix this bug
or to circumvent it? Should I migrate to version 2.0.7? 

Thanks,

Marion Thyen

------------------------------------------------------------------
Marion Thyen

Phone     +49 228 936 6416
Fax	  +49 228 936 3359
MailFax   +49 228 936 881085
eMail     Marion.Thyen@T-Mobil.de