[Date Prev][Date Next] [Chronological] [Thread] [Top]

write block in ber_flush under Solaris 2.6 (ITS#338)



Full_Name: Paul Amaranth
Version: 1.2.3 & 1.2.7
OS: Solaris 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (208.231.254.31)


We have a product that uses OpenLDAP libraries to communicate with a remote LDAP
server.
When the client machine load spikes to 100%, the client program becomes
blocked.
After 30 minutes or so, it will unblock and regain normal behavior, but this is
unacceptable for a server (the client is an authentication server).  This is
being built and executed on a Solaris 2.6 system.

Attaching to the blocked process with gdb, I see the following:

#0  0xef5391a4 in _write () from /usr/lib/libc.so.1
#1  0xef46645c in ber_flush (sb=0xbb7b0, ber=0x20a210, freeit=0) at io.c:318
#2  0xef462b08 in ldap_send_server_request (ld=0xbb7b0, ber=0x20a210, 
    msgid=5266, parentreq=0x0, srvlist=0x0, lc=0xc0e68, bind=0)
    at request.c:255
#3  0xef4628f8 in ldap_send_initial_request (ld=0xbb7b0, msgtype=99, 
    dn=0xc0e44 "o=Company Name,c=US", ber=0x20a210) at request.c:169
#4  0xef45f61c in ldap_search (ld=0xbb7b0, 
    base=0xc0e44 "o=Company Name,c=US", scope=2, 
    filter=0x1fce08 "uid=t020l", attrs=0x0, attrsonly=0) at search.c:79

This has happened with OpenLDAP 1.2.3 and 1.2.7.  This has happened with
versions configured without threads and using Solaris lwp.  This has
happened with versions compiled with gcc 2.8.1 and the egcs-1.1.2 release.

If the library was compiled with gcc 2.8.1, it would block at a significantly
lower CPU load, maybe 70% utilization.  The later egcs release is more
robust, but it will block if the CPU utilization maxes out. The point at
which the CPU maxes out is about 15-20 LDAP queries/second, more or less.
It always blocks at the same place.

Anyone have _any_ ideas?  We're a little frustrated by this and we have
a rollout deadline coming up.  Any help will be greatly appreciated.