Issue 8090 - libldap bugfix: Spurious initial LDAP_CONNECT_ERROR with LDAP_OPT_CONNECT_ASYNC enabled
Summary: libldap bugfix: Spurious initial LDAP_CONNECT_ERROR with LDAP_OPT_CONNECT_ASY...
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-25 10:24 UTC by olli.salli@vincit.fi
Modified: 2015-07-02 17:50 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description olli.salli@vincit.fi 2015-03-25 10:24:08 UTC
Full_Name: Olli Salli
Version: git master
OS: Windows 8.1, Linux 3.18.10
URL: ftp://ftp.openldap.org/incoming/olli-salli-150325.patch
Submission from: (NULL) (83.102.45.242)


We are working on an application which needs to perform some simple LDAP search
queries every once in a while. The application is running as a daemon in an
embedded server environment and has no user interface, and is instead remotely
controlled and configured via a control TCP connection. This includes the
configuration specifying the LDAP server address and port, and whether the LDAP
queries should be attempted at all (if there is no server available). We are
currently developing on Windows 8.1 with Visual Studio 2013 but will also run in
Linux environments.

To ensure the control TCP connection stays alive at all times (and the daemon
otherwise functional), and to avoid using threads, whahave used the openldap
asynchronous APIs, including the asynchronous connect option - if the configured
LDAP server is unreachable, the initial search query can block for a very long
time otherwise. It is here that we have hit a small issue.

When using LDAP_OPT_CONNECT_ASYNC, if the LDAP server is unreachable, the
initial request (e.g. using ldap_search_ext) does not block, which is correct.
However, this first call returns LDAP_CONNECT_ERROR. If we disregard this, and
continue reissuing the ldap_search_ext request periodically, following calls
correctly return LDAP_X_CONNECTING. Then when the NETWORK_TIMEOUT has elapsed,
LDAP_CONNECT_ERROR is returned again, which is correct.

LDAP_CONNECT_ERROR might result in the first ldap_search_ext call from
legitimate error conditions in connect() even in asynchronous mode, for example
out-of-resource conditions (EADDRNOTAVAIL),  or all local network interfaces
being down (ENETUNREACH), etc. These should be handled as fatal errors, but
would be impossible to distinguish from the false initial LDAP_CONNECT_ERROR
resulting from using LDAP_OPT_CONNECT_ASYNC.

The issue seems to be in ldap_send_initial_request
(http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/blbldap/request.c;h=3c1b41f76d11e618062d58da43b1b1e062a9d617;hb=HEAD#l128).

When async connect is enabled what happens during the first request is:
1) sd is initialized to AC_SOCKET_INVALID
2) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) is called to determine whether
there is already a connection, and to fetch its socket descriptor to sd
3) as there is no connection, it returns -1 and sd stays AC_SOCKET_INVALID
4) a new connection is formed using ldap_open_defconn()
5) ldap_int_check_async_open( ld, sd ) is called, but sd is still
AC_SOCKET_INVALID, and thus the poll fails
6) LDAP_CONNECT_ERROR is returned

On successive calls, what happens is
1) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) returns success and a valid
socket descriptor
2) opening a new connection is skipped
3) ldap_int_check_async_open( ld, sd ) is called this time with a valid socket
descriptor
4) the poll works as intended

A simple fix is to simply reissue ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... )
after opening the connection. This fixes the first poll to return
LDAP_X_CONNECTING as intended. This is implemented in the patch at the URL. A
perhaps more semantically correct alternative could be to return the created
socket descriptor from ldap_open_defconn().
Comment 1 Howard Chu 2015-04-01 19:46:42 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 2 Howard Chu 2015-04-01 20:18:51 UTC
olli.salli@vincit.fi wrote:
> Full_Name: Olli Salli
> Version: git master
> OS: Windows 8.1, Linux 3.18.10
> URL: ftp://ftp.openldap.org/incoming/olli-salli-150325.patch
> Submission from: (NULL) (83.102.45.242)

Thanks for the report, fixed now in master.
>
> We are working on an application which needs to perform some simple LDAP search
> queries every once in a while. The application is running as a daemon in an
> embedded server environment and has no user interface, and is instead remotely
> controlled and configured via a control TCP connection. This includes the
> configuration specifying the LDAP server address and port, and whether the LDAP
> queries should be attempted at all (if there is no server available). We are
> currently developing on Windows 8.1 with Visual Studio 2013 but will also run in
> Linux environments.
>
> To ensure the control TCP connection stays alive at all times (and the daemon
> otherwise functional), and to avoid using threads, whahave used the openldap
> asynchronous APIs, including the asynchronous connect option - if the configured
> LDAP server is unreachable, the initial search query can block for a very long
> time otherwise. It is here that we have hit a small issue.
>
> When using LDAP_OPT_CONNECT_ASYNC, if the LDAP server is unreachable, the
> initial request (e.g. using ldap_search_ext) does not block, which is correct.
> However, this first call returns LDAP_CONNECT_ERROR. If we disregard this, and
> continue reissuing the ldap_search_ext request periodically, following calls
> correctly return LDAP_X_CONNECTING. Then when the NETWORK_TIMEOUT has elapsed,
> LDAP_CONNECT_ERROR is returned again, which is correct.
>
> LDAP_CONNECT_ERROR might result in the first ldap_search_ext call from
> legitimate error conditions in connect() even in asynchronous mode, for example
> out-of-resource conditions (EADDRNOTAVAIL),  or all local network interfaces
> being down (ENETUNREACH), etc. These should be handled as fatal errors, but
> would be impossible to distinguish from the false initial LDAP_CONNECT_ERROR
> resulting from using LDAP_OPT_CONNECT_ASYNC.
>
> The issue seems to be in ldap_send_initial_request
> (http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/blbldap/request.c;h=3c1b41f76d11e618062d58da43b1b1e062a9d617;hb=HEAD#l128).
>
> When async connect is enabled what happens during the first request is:
> 1) sd is initialized to AC_SOCKET_INVALID
> 2) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) is called to determine whether
> there is already a connection, and to fetch its socket descriptor to sd
> 3) as there is no connection, it returns -1 and sd stays AC_SOCKET_INVALID
> 4) a new connection is formed using ldap_open_defconn()
> 5) ldap_int_check_async_open( ld, sd ) is called, but sd is still
> AC_SOCKET_INVALID, and thus the poll fails
> 6) LDAP_CONNECT_ERROR is returned
>
> On successive calls, what happens is
> 1) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) returns success and a valid
> socket descriptor
> 2) opening a new connection is skipped
> 3) ldap_int_check_async_open( ld, sd ) is called this time with a valid socket
> descriptor
> 4) the poll works as intended
>
> A simple fix is to simply reissue ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... )
> after opening the connection. This fixes the first poll to return
> LDAP_X_CONNECTING as intended. This is implemented in the patch at the URL. A
> perhaps more semantically correct alternative could be to return the created
> socket descriptor from ldap_open_defconn().
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Quanah Gibson-Mount 2015-04-03 19:10:39 UTC
changed notes
changed state Test to Release
Comment 4 OpenLDAP project 2015-07-02 17:50:10 UTC
fixed in master
fixed in RE25
fixed in RE24
Comment 5 Quanah Gibson-Mount 2015-07-02 17:50:10 UTC
changed notes
changed state Release to Closed