[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8090) libldap bugfix: Spurious initial LDAP_CONNECT_ERROR with LDAP_OPT_CONNECT_ASYNC enabled



Full_Name: Olli Salli
Version: git master
OS: Windows 8.1, Linux 3.18.10
URL: ftp://ftp.openldap.org/incoming/olli-salli-150325.patch
Submission from: (NULL) (83.102.45.242)


We are working on an application which needs to perform some simple LDAP search
queries every once in a while. The application is running as a daemon in an
embedded server environment and has no user interface, and is instead remotely
controlled and configured via a control TCP connection. This includes the
configuration specifying the LDAP server address and port, and whether the LDAP
queries should be attempted at all (if there is no server available). We are
currently developing on Windows 8.1 with Visual Studio 2013 but will also run in
Linux environments.

To ensure the control TCP connection stays alive at all times (and the daemon
otherwise functional), and to avoid using threads, whahave used the openldap
asynchronous APIs, including the asynchronous connect option - if the configured
LDAP server is unreachable, the initial search query can block for a very long
time otherwise. It is here that we have hit a small issue.

When using LDAP_OPT_CONNECT_ASYNC, if the LDAP server is unreachable, the
initial request (e.g. using ldap_search_ext) does not block, which is correct.
However, this first call returns LDAP_CONNECT_ERROR. If we disregard this, and
continue reissuing the ldap_search_ext request periodically, following calls
correctly return LDAP_X_CONNECTING. Then when the NETWORK_TIMEOUT has elapsed,
LDAP_CONNECT_ERROR is returned again, which is correct.

LDAP_CONNECT_ERROR might result in the first ldap_search_ext call from
legitimate error conditions in connect() even in asynchronous mode, for example
out-of-resource conditions (EADDRNOTAVAIL),  or all local network interfaces
being down (ENETUNREACH), etc. These should be handled as fatal errors, but
would be impossible to distinguish from the false initial LDAP_CONNECT_ERROR
resulting from using LDAP_OPT_CONNECT_ASYNC.

The issue seems to be in ldap_send_initial_request
(http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/blbldap/request.c;h=3c1b41f76d11e618062d58da43b1b1e062a9d617;hb=HEAD#l128).

When async connect is enabled what happens during the first request is:
1) sd is initialized to AC_SOCKET_INVALID
2) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) is called to determine whether
there is already a connection, and to fetch its socket descriptor to sd
3) as there is no connection, it returns -1 and sd stays AC_SOCKET_INVALID
4) a new connection is formed using ldap_open_defconn()
5) ldap_int_check_async_open( ld, sd ) is called, but sd is still
AC_SOCKET_INVALID, and thus the poll fails
6) LDAP_CONNECT_ERROR is returned

On successive calls, what happens is
1) ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... ) returns success and a valid
socket descriptor
2) opening a new connection is skipped
3) ldap_int_check_async_open( ld, sd ) is called this time with a valid socket
descriptor
4) the poll works as intended

A simple fix is to simply reissue ber_sockbuf_ctrl( ... LBER_SB_OPT_GET_FD ... )
after opening the connection. This fixes the first poll to return
LDAP_X_CONNECTING as intended. This is implemented in the patch at the URL. A
perhaps more semantically correct alternative could be to return the created
socket descriptor from ldap_open_defconn().