[Date Prev][Date Next] [Chronological] [Thread] [Top]

The futex() problem



Judging from the list archives, this is something which has bitten others
as well as myself.

I want to upgrade the OpenLDAP software we're currently using, currently
2.1.30.  It runs famously, but I'm really after the server-paged results
extension.

I've had 2.2.23 and 2.2.24 up and running on our Redhat machines. 
Specifically:

	Red Hat Enterprise Linux AS release 3
	2.4.21-15.0.4.ELsmp kernel version.

Both versions exhibit the deadlock on futex() problem.

Others have mentioned that tuning BDB when using back-bdb can help avoid
the problem, but I've played with the settings and it didn't seem to make
a difference.  Hopefully I'm just doing something wrong.

Here's my DB_CONFIG:

	set_cachesize           0       104857600       0
	set_flags               DB_TXN_NOSYNC
	set_lg_regionmax        10485760
	set_lg_max              104857600
	set_lg_bsize            26214400
	set_lg_dir              /appl/ldap/logs
	set_tmp_dir             /appl/ldap/tmp


... this is a "test" instance which gets blown away and reloaded a lot,
thus the DB_TXN_NOSYNC.  Production server wouldn't have that.

The database def from slapd.conf:

database        bdb
directory       /appl/ldap/data
suffix          "dc=Dal,dc=Ca" 

Built using BerkeleyDB 4.3.27 (I see 4.3.28 is out now, but haven't tried
it) with the following:

	./configure  --with-slapd --with-slurpd \
		--without-ldapd --with-threads=posix \
		--enable-static --quiet --enable-local \
		--enable-cldap --disable-rlookups --without-kerberos \
		--with-tls=openssl --enable-crypt --prefix=/appl/ldap \
		--libexecdir=/appl/ldap/sbin --localstatedir=/var/run \
		--datadir=/appl/ldap/data --mandir=/usr/share/man \
		--sysconfdir=/appl/ldap/etc --with-subdir=no \
		--enable-monitor --with-cyrus-sasl=no

With slapd built this way, I managed to do some big directory updates and
queries running every two seconds for four days without a hitch.  Then,
after I STOPPED beating on slapd, *that* is when it decided to deadlock on
me.

Our directory contains about 150k entities, and the stress-tests I was 
doing involved making tens of thousands of changes at a time.

Can someone point me in the right direction to dealing with this?  Are my
BDB tunings not generous enough?  The FAQ-o-Matic seems to mostly deal with
ldbm, and the discussions for tuning BDB made it sound like a ten megabyte
cache was more than enough.