Issue 9474 - ldap_install_tls() should return meaningful error code
Summary: ldap_install_tls() should return meaningful error code
Status: VERIFIED WONTFIX
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: libraries (show other issues)
Version: 2.4.57
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-19 12:47 UTC by Simon Pichugin
Modified: 2021-06-16 00:59 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Simon Pichugin 2021-02-19 12:47:50 UTC
The description of my findings (take a note that these are OpenLDAP logs that happen under the application that uses libldap):

    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: tls_write: want=610, written=610
    ...
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace: SSL_connect:SSLv3 flush data
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: tls_read: want=5 error=Interrupted system call
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace: SSL_connect:error in SSLv3 read finished A
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace: SSL_connect:error in SSLv3 read finished A
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS: can't connect: .
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_free_connection 1 1
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_send_unbind
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ber_flush2: 7 bytes to sd 23
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:   0000:  00 05 00 01 00 42 00
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_write: want=7, written=7
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:   0000:  00 05 00 01 01 42 00       
    [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_free_connection: actually freed

So, 'error=Interrupted system call' is caught by this:
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/tls2.c#L360
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/liblber/sockbuf.c#L829
It is only the debug message that comes from the caller itself so we can see what is passed to OpenSSL.
And 'Interrupted system call' is just an EINTR string representation.

What we should do is to catch the error that OpenSSL returns to us after it is interrupted.

As we can see from the logs:
"libldap: TLS: can't connect: ."
This line returns nothing:
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/tls2.c#L427
So 'ld->ld_error' is set to empty value.

If we go deeper into the 'tls_imp->ti_session_errmsg' call we can reach the point where ERR_peek_error() is called:
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/tls2.c#L416
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/tls_o.c#L563
https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/tls_o.c#L569

In the conclusion:
ldap_install_tls() should return meaningful error code that would allow to figure out a reason for the failure, especially network IO fail due to EITR.
Comment 1 Howard Chu 2021-02-19 14:40:34 UTC
(In reply to Simon Pichugin from comment #0)
> The description of my findings (take a note that these are OpenLDAP logs
> that happen under the application that uses libldap):
> 
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: tls_write:
> want=610, written=610
>     ...
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace:
> SSL_connect:SSLv3 flush data
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: tls_read: want=5
> error=Interrupted system call
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace:
> SSL_connect:error in SSLv3 read finished A
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS trace:
> SSL_connect:error in SSLv3 read finished A
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: TLS: can't connect:
> .
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:
> ldap_free_connection 1 1
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_send_unbind
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ber_flush2: 7 bytes
> to sd 23
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:   0000:  00 05 00
> 01 00 42 00
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap: ldap_write: want=7,
> written=7
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:   0000:  00 05 00
> 01 01 42 00       
>     [sssd[be[LDAP]]] [sss_ldap_debug] (0x4000): libldap:
> ldap_free_connection: actually freed
> 
> So, 'error=Interrupted system call' is caught by this:
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/
> tls2.c#L360
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/liblber/
> sockbuf.c#L829
> It is only the debug message that comes from the caller itself so we can see
> what is passed to OpenSSL.
> And 'Interrupted system call' is just an EINTR string representation.
> 
> What we should do is to catch the error that OpenSSL returns to us after it
> is interrupted.
> 
> As we can see from the logs:
> "libldap: TLS: can't connect: ."
> This line returns nothing:
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/
> tls2.c#L427
> So 'ld->ld_error' is set to empty value.
> 
> If we go deeper into the 'tls_imp->ti_session_errmsg' call we can reach the
> point where ERR_peek_error() is called:
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/
> tls2.c#L416
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/
> tls_o.c#L563
> https://git.openldap.org/openldap/openldap/-/blob/master/libraries/libldap/
> tls_o.c#L569
> 
> In the conclusion:
> ldap_install_tls() should return meaningful error code that would allow to
> figure out a reason for the failure, especially network IO fail due to EITR.

Sorry, your request is still vague. "Should return meaningful error code" - which error code are you asking to be returned? The EINTR is returned from the I/O layer to the TLS library, so it's up to the TLS library to give that back to libldap. If ERR_peek_error() gives us an empty string then there's nothing else for us to return.
Comment 2 Simon Pichugin 2021-02-19 16:03:14 UTC
(In reply to Howard Chu from comment #1)
> 
> Sorry, your request is still vague. "Should return meaningful error code" -
> which error code are you asking to be returned? The EINTR is returned from
> the I/O layer to the TLS library, so it's up to the TLS library to give that
> back to libldap. If ERR_peek_error() gives us an empty string then there's
> nothing else for us to return.

ERR_peek_error() returns the earliest error code from the thread's error queue.

My assumption is that the queue may have a few errors at the moment when ERR_peek_error() function is called.

I understand why ERR_peek_error() is chosen here but we can make libldap more robust and check other errors if ERR_peek_error() has an empty value.
If that's the case...

What do you think?
Comment 3 Howard Chu 2021-02-19 16:43:18 UTC
(In reply to Simon Pichugin from comment #2)
> (In reply to Howard Chu from comment #1)
> > 
> > Sorry, your request is still vague. "Should return meaningful error code" -
> > which error code are you asking to be returned? The EINTR is returned from
> > the I/O layer to the TLS library, so it's up to the TLS library to give that
> > back to libldap. If ERR_peek_error() gives us an empty string then there's
> > nothing else for us to return.
> 
> ERR_peek_error() returns the earliest error code from the thread's error
> queue.
> 
> My assumption is that the queue may have a few errors at the moment when
> ERR_peek_error() function is called.
> 
> I understand why ERR_peek_error() is chosen here but we can make libldap
> more robust and check other errors if ERR_peek_error() has an empty value.
> If that's the case...
> 
> What do you think?

Feel free to post a patch demonstrating your idea.
Comment 4 Simon Pichugin 2021-03-11 09:27:27 UTC
Okay, after a deeper investigation we found that both ERR_peek_error() and ERR_peek_last_error() are empty when OpenSSL is interruted while doing read() or poll() so it's impossible to get the verbose info from ld->ld_error (as it's  empty). 

But we found a place where we can retrieve the error - 'errno'.
After ldap_install_tls() fails, 'errno == EINTR' which describes exactly the cause.

So I'd like to change the purpose of this bug (or I can open a new one if you say me do so).

Could we please update doc/man/man3/ldap_tls.3 and describe there that the libldap caller can rely on 'errno' value if ldap_install_tls() just has failed?

I know that 'errno' can be changed in certain cases and we want to be sure that libldap won't change 'errno' after ldap_install_tls() failure. So we can use it safely and write something like this:


    lret = ldap_install_tls(state->ldap);
    if (lret != LDAP_SUCCESS) {
        if (errno == EINTR) {
            /* we can retry later in certain cases */
            DEBUG(SSSDBG_CRIT_FAILURE,
                  "ldap_install_tls failed: connect was interrupted\n");
            sss_log(SSS_LOG_ERR,
                    "Could not start TLS encryption. [%d] [%s]",
                    errno, strerror(errno));
            ret = errno;
            goto fail; 
        } else {
            /* more error processing with ld->ld_error */
            ...
        }
    }

Should I suggest a patch for the man page? If you are okay with the approach...
Comment 5 Quanah Gibson-Mount 2021-03-11 17:58:31 UTC
(In reply to Simon Pichugin from comment #4) 
> Should I suggest a patch for the man page? If you are okay with the
> approach...

OpenLDAP supports multiple TLS implementations where this behavior can differ depending on implementation, so a man page patch is not appropriate. Thank you though!