[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40




Hello,

On Thu, 17 Jan 2019, Howard Chu wrote:

> These aren't spurious - your TLS library has genuinely failed to start a 
> session. Which TLS library are you using? What OS are you running on? 
> The most common cause for periodic failures is running out of entropy 
> for the PRNG.

Could it be because of too short timeout period? Please see the following 
code analysis.

OpenLDAP 2.4.46 on Fedora 29 back-ldap/bind.c has:

rs->sr_err = ldap_back_start_tls( ld, op->o_protocol, &is_tls,
            li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], 
&rs->sr_text );
    li->li_uri_mutex_do_not_lock = 0; 
    ldap_pvt_thread_mutex_unlock( &li->li_uri_mutex );
    if ( rs->sr_err != LDAP_SUCCESS ) {
        ldap_unbind_ext( ld, NULL, NULL );
        rs->sr_text = "Start TLS failed";
        goto error_return;


So the call to:

ldap_back_start_tls( ld, op->o_protocol, &is_tls,
            li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], 
&rs->sr_text )

is the one that fails. 


By the way, I am wondering here about the correctness of parameter

  li->li_timeout[ SLAP_OP_EXTENDED ]

slap.h has:

/*
 * Operation indices
 */
typedef enum {
        SLAP_OP_BIND = 0,
        SLAP_OP_UNBIND,
        SLAP_OP_SEARCH,
        SLAP_OP_COMPARE,
        SLAP_OP_MODIFY,
        SLAP_OP_MODRDN,
        SLAP_OP_ADD,
        SLAP_OP_DELETE,
        SLAP_OP_ABANDON,
        SLAP_OP_EXTENDED,
        SLAP_OP_LAST
} slap_op_t;

So SLAP_OP_EXTENDED has value 9. But back-ldap/config.c has:

/* see enum in slap.h */
static slap_cf_aux_table timeout_table[] = {
    { BER_BVC("bind="), SLAP_OP_BIND * sizeof( time_t ),    'u', 0, NULL },
    /* unbind makes no sense */
    { BER_BVC("add="),  SLAP_OP_ADD * sizeof( time_t ),     'u', 0, NULL },
    { BER_BVC("delete="),   SLAP_OP_DELETE * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("modrdn="),   SLAP_OP_MODRDN * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("modify="),   SLAP_OP_MODIFY * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("compare="),  SLAP_OP_COMPARE * sizeof( time_t ), 'u', 0, NULL },
    { BER_BVC("search="),   SLAP_OP_SEARCH * sizeof( time_t ),  'u', 0, NULL },
    /* abandon makes little sense */
#if 0   /* not implemented yet */
    { BER_BVC("extended="), SLAP_OP_EXTENDED * sizeof( time_t ),    'u', 0, NULL },
#endif
    { BER_BVNULL, 0, 0, 0, NULL }
};

Hmmm, SLAP_OP_EXTENDED is not defined, so does li->li_timeout[ SLAP_OP_EXTENDED ] 
end up pointing outside of this timeout_table[]? I am not sure.


In any case, back-ldap/bind.c has:


static int
ldap_back_start_tls(
    LDAP        *ld,
    int     protocol,
    int     *is_tls,
    const char  *url,
    unsigned    flags,
    int     timeout,
    const char  **text )
{

...

The async code is compiled in because of a #define in back-ldap.h:

#ifdef SLAP_STARTTLS_ASYNCHRONOUS

...

if ( timeout ) {
                tv.tv_sec = timeout;
                tv.tv_usec = 0;
            } else {
                LDAP_BACK_TV_SET( &tv );
            }


Now I assume (timeout == 0) and we go to the else-branch. 

LDAP_BACK_TV_SET macro is:

#define LDAP_BACK_RESULT_TIMEOUT    (0)
#define LDAP_BACK_RESULT_UTIMEOUT   (100000)
#define LDAP_BACK_TV_SET(tv) \
    do { \
        (tv)->tv_sec = LDAP_BACK_RESULT_TIMEOUT; \
        (tv)->tv_usec = LDAP_BACK_RESULT_UTIMEOUT; \
    } while ( 0 )


My current guess is that the 0.1 second default timeout is set, and for 
some reason, the slapd backend TLS initialization sometimes takes longer 
than that, causing the TLS failure.

Do you think this is a possible scenario?

If so, as a temporary fix, would it be safe to hardcode, say 
5 seconds for the timeout just for the TLS timeout else-branch?


Best regards,
Unto Sten