[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP freezes and doesn't respond



On Tuesday, 12 July 2011 11:59:52 Cyril GROSJEAN wrote:
> I randomly notice my OpenLDAP server freezes, and I can't udnerstand why.
> I have a few LDAP clients (ldapsearch, a legacy Java app. and
> ApacheDirectoryStudio), running from different systems, either locally on
> the OpenLDAP server, or on another OpenLDAP
> server, or on a remote workstation, and none manages to get an answer from
> OpenLDAP. The connection is established but each client gets stuck waiting
> for any result.

[...]

> Jul 12 10:20:05 dev-ldap1 slapd[28525]: connection_input: conn=3377
> deferring operation: binding

This is the code (at least in 2.4.26) that generates the message:

        /* Don't process requests when the conn is in the middle of a
         * Bind, or if it's closing. Also, don't let any single conn
         * use up all the available threads, and don't execute if we're
         * currently blocked on output. And don't execute if there are
         * already pending ops, let them go first.  Abandon operations
         * get exceptions to some, but not all, cases.
         */
        switch( tag ){
        default:
                /* Abandon and Unbind are exempt from these checks */
                if (conn->c_conn_state == SLAP_C_CLOSING) {
                        defer = "closing";
                        break;
                } else if (conn->c_writewaiter) {
                        defer = "awaiting write";
                        break;
                } else if (conn->c_n_ops_pending) {
                        defer = "pending operations";
                        break;
                }
                /* FALLTHRU */
        case LDAP_REQ_ABANDON:
                /* Unbind is exempt from these checks */
                if (conn->c_n_ops_executing >= connection_pool_max/2) {
                        defer = "too many executing";
                        break;
                } else if (conn->c_conn_state == SLAP_C_BINDING) {
                        defer = "binding";
                        break;
                }
                /* FALLTHRU */
        case LDAP_REQ_UNBIND:
                break;
        }

        if( defer ) {
                int max = conn->c_dn.bv_len
                        ? slap_conn_max_pending_auth
                        : slap_conn_max_pending;

                Debug( LDAP_DEBUG_ANY,
                        "connection_input: conn=%lu deferring operation: 
%s\n",
                        conn->c_connid, defer, 0 );
                conn->c_n_ops_pending++;
                LDAP_STAILQ_INSERT_TAIL( &conn->c_pending_ops, op, o_next );
                rc = ( conn->c_n_ops_pending > max ) ? -1 : 0;

        } else {

... carry on and handle the op.

As far as I understand, the intention is to (among others) ignore operations 
from connections where a BIND operation is still pending. However, some of the 
comments now appear to be a bit misplaced (e.g. Unbind comment vs 
LDAP_REQ_ABANDON). Also, the code appears (to me, not being very familiar with 
it, and quite rusty at C) to not be doing the right thing. The portion 
generating the "deferring operation: binding" message appears to be when an 
abandon operation is received on a connection that has a pending BIND 
operation. Shouldn't an abandon be allowed for a BIND? Or, am I reading it 
wrong? Also, it looks as if the "too many executing" is also only applicable 
to abandon?

Shouldn't the LDAP_REQ_ABANDON case be breaking without setting 'defer'?

Shouldn't the 'conn->c_conn_state == SLAP_C_BINDING' and 'conn-
>c_n_ops_executing >= connection_pool_max/2' conditions be handled by the 
default case as well?

We have been running into both the "deferring: binding" and "deferring: too 
many executing" messages, but I hadn't had time to trace what the LDAP client 
software was doing, but now I wonder if maybe it was sending abandon requests 
when some operations weren't returning in time (after > 18000 successful 
operations on a connection. I think its behaviour regarding its use of LDAP 
connections may be wrong, but I would prefer to be able to prove that its 
behaviour is wrong to the vendor without other log entries that show its 
correct behaviour being handled incorrectly.

Also, the hard-coded 'one connection may not use more pending operations than 
half the number of threads' rule seems a bit arbitrary. Could we get a knob to 
twiddle this?


Regards,
Buchan