OpenLDAP 2.4.16 hanging - ideas welcome

We run 4 2.4.16 servers as 2 provider/consumer pairs, one pair for our
staff systems and one pair for our teaching facilities.
They are all on Solaris10u7 xen virtual hosts.

The staff pair run fine

The consumer on the teaching pair runs fine
The provider on the teaching pair runs fine until it gets hit by a heavy
load, eg start of a lab when ~100 PCs try and authenticate their user.  At
this point it refuses to serve LDAP requests.  Traffic is still coming in
to the box and existing connections seem OK.
The break point is about 35PCs, below that there isn't a problem.
Restarting slapd cures the problem and off we go until the start of the
next big lab.

I've run at various log levels but not been able to see any obvious
messages.  All I see, even when everything is fine, are messages of the

send_search_entry: conn 11639  ber write failed.
connection_read(38): no connection!

The slapd.conf (minux the syncprov bit) is:

include         /usr/local/etc/openldap/schema/core.schema
include         /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema
include         /usr/local/etc/openldap/schema/nis.schema
include         /usr/local/etc/openldap/schema/duaconf.schema
include         /usr/local/etc/openldap/schema/local.schema

pidfile         /var/openldap/run/slapd.pid
argsfile        /var/openldap/run/slapd.args

conn_max_pending        200
idletimeout     60

sizelimit       2000

loglevel        256

database        bdb
suffix          "dc=my,dc=domain"
rootdn          "cn=me,dc=my,dc=domain"
rootpw          {SSHA}guess
directory       /var/openldap/openldap-data

index   cn,entryCSN,entryUUID,gidNumber,ipHostNumber,memberUid eq
index   objectclass,uid,uidNumber,uniqueMember  eq

cachefree       16
cachesize       1500
checkpoint      0 60
dncachesize     1500
idlcachesize    3000

access to attrs=userPassword
        by self write
        by anonymous auth
        by dn.base="cn=fred,ou=Profile,dc=my,dc=domain"
        by * none
access to *
        by self write
        by users read
        by * read

The only entry in DB_CONFIG is set_cachesize   0       26214400        0

cache hits are at 99%

I'm stumped for a cause/solution, can anyone either give me a pointer as
to what to look for in the logs or suggest a possible cause.  Could it be
hitting the 256 open file limit?


John Landamore

Department of Computer Science
University of Leicester
University Road, LEICESTER, LE1 7RH
Phone: +44 (0)116 2523410       Fax: +44 (0)116 2523604