[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40

To: Howard Chu <hyc@symas.com>
Subject: Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40
From: openldap-technical@kolttonen.fi
Date: Tue, 22 Jan 2019 16:58:16 +0200 (EET)
Cc: openldap-technical@openldap.org
Dkim-filter: OpenDKIM Filter v2.11.0 vcust561.louhi.net x0MEwDFZ018025
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kolttonen.fi; s=mail; t=1548169094; bh=AGn0wImZHvS3p/QxiiVZkkcU135I5IR/EBKMfi75ItQ=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=trJ6cQA5L8bCv2OuHrsghbnrsDpMtuCgtEE+I4mWCMmBSrrEtg7oWsKQPS4U702Ne k5ZSAVxOLfeAxeIKaAtaCgSWBudBv73ISH6/JEkT2lowHQqH3i7oGErtkM7nolv3Uz RIm4ttpWn4qoq2VEIrfZ2I8NkLzcwkA9al6e3AdflAZGiudofIK1s/zrpwX6G2r5+w WlsQGeBDVRouv0s5kdE2vLIFT/Uu58Wq/gUj5KcMHPJHzAyUrLIthwhtZ8Wz93ZynG ev17P4gCS/3GUnKuBo2kO4c1AE1XY+DYYjokPrjSp8KX6hQuJ6N4W6kfgoQveQ8KA7 IaUsCYFlUYtxA==
In-reply-to: <45403ce3-d89f-d8d3-7190-817198830d7c@symas.com>
References: <20190117163350.GA26474@helsinki.fi> <45403ce3-d89f-d8d3-7190-817198830d7c@symas.com>
User-agent: Alpine 2.21 (LFD 202 2017-01-01)


Hello,

On Thu, 17 Jan 2019, Howard Chu wrote:

> These aren't spurious - your TLS library has genuinely failed to start a 
> session. Which TLS library are you using? What OS are you running on? 
> The most common cause for periodic failures is running out of entropy 
> for the PRNG.

Could it be because of too short timeout period? Please see the following 
code analysis.

OpenLDAP 2.4.46 on Fedora 29 back-ldap/bind.c has:

rs->sr_err = ldap_back_start_tls( ld, op->o_protocol, &is_tls,
            li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], 
&rs->sr_text );
    li->li_uri_mutex_do_not_lock = 0; 
    ldap_pvt_thread_mutex_unlock( &li->li_uri_mutex );
    if ( rs->sr_err != LDAP_SUCCESS ) {
        ldap_unbind_ext( ld, NULL, NULL );
        rs->sr_text = "Start TLS failed";
        goto error_return;


So the call to:

ldap_back_start_tls( ld, op->o_protocol, &is_tls,
            li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], 
&rs->sr_text )

is the one that fails. 


By the way, I am wondering here about the correctness of parameter

  li->li_timeout[ SLAP_OP_EXTENDED ]

slap.h has:

/*
 * Operation indices
 */
typedef enum {
        SLAP_OP_BIND = 0,
        SLAP_OP_UNBIND,
        SLAP_OP_SEARCH,
        SLAP_OP_COMPARE,
        SLAP_OP_MODIFY,
        SLAP_OP_MODRDN,
        SLAP_OP_ADD,
        SLAP_OP_DELETE,
        SLAP_OP_ABANDON,
        SLAP_OP_EXTENDED,
        SLAP_OP_LAST
} slap_op_t;

So SLAP_OP_EXTENDED has value 9. But back-ldap/config.c has:

/* see enum in slap.h */
static slap_cf_aux_table timeout_table[] = {
    { BER_BVC("bind="), SLAP_OP_BIND * sizeof( time_t ),    'u', 0, NULL },
    /* unbind makes no sense */
    { BER_BVC("add="),  SLAP_OP_ADD * sizeof( time_t ),     'u', 0, NULL },
    { BER_BVC("delete="),   SLAP_OP_DELETE * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("modrdn="),   SLAP_OP_MODRDN * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("modify="),   SLAP_OP_MODIFY * sizeof( time_t ),  'u', 0, NULL },
    { BER_BVC("compare="),  SLAP_OP_COMPARE * sizeof( time_t ), 'u', 0, NULL },
    { BER_BVC("search="),   SLAP_OP_SEARCH * sizeof( time_t ),  'u', 0, NULL },
    /* abandon makes little sense */
#if 0   /* not implemented yet */
    { BER_BVC("extended="), SLAP_OP_EXTENDED * sizeof( time_t ),    'u', 0, NULL },
#endif
    { BER_BVNULL, 0, 0, 0, NULL }
};

Hmmm, SLAP_OP_EXTENDED is not defined, so does li->li_timeout[ SLAP_OP_EXTENDED ] 
end up pointing outside of this timeout_table[]? I am not sure.


In any case, back-ldap/bind.c has:


static int
ldap_back_start_tls(
    LDAP        *ld,
    int     protocol,
    int     *is_tls,
    const char  *url,
    unsigned    flags,
    int     timeout,
    const char  **text )
{

...

The async code is compiled in because of a #define in back-ldap.h:

#ifdef SLAP_STARTTLS_ASYNCHRONOUS

...

if ( timeout ) {
                tv.tv_sec = timeout;
                tv.tv_usec = 0;
            } else {
                LDAP_BACK_TV_SET( &tv );
            }


Now I assume (timeout == 0) and we go to the else-branch. 

LDAP_BACK_TV_SET macro is:

#define LDAP_BACK_RESULT_TIMEOUT    (0)
#define LDAP_BACK_RESULT_UTIMEOUT   (100000)
#define LDAP_BACK_TV_SET(tv) \
    do { \
        (tv)->tv_sec = LDAP_BACK_RESULT_TIMEOUT; \
        (tv)->tv_usec = LDAP_BACK_RESULT_UTIMEOUT; \
    } while ( 0 )


My current guess is that the 0.1 second default timeout is set, and for 
some reason, the slapd backend TLS initialization sometimes takes longer 
than that, causing the TLS failure.

Do you think this is a possible scenario?

If so, as a temporary fix, would it be safe to hardcode, say 
5 seconds for the timeout just for the TLS timeout else-branch?


Best regards,
Unto Sten

Follow-Ups:
- Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40
  - From: openldap-technical@kolttonen.fi

References:
- Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40
  - From: Janne Peltonen <janne.peltonen@helsinki.fi>
- Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40
  - From: Howard Chu <hyc@symas.com>

Prev by Date: Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40
Next by Date: Copying SSHA userPassword from Oracle to OpenLDAP
Index(es):
- Chronological
- Thread