[Date Prev][Date Next] [Chronological] [Thread] [Top]

race condition in -lldap/openssl??



This is unrelated to my previous NetBSD problem (now fixed, my error).

Executive summary:

I'm having a problem where two RHL7.2 LDAP clients out of many don't 
authenticate against an OpenLDAP server.

In openldap-2.0.21/libraries/libldap/tls.c line ~625

err = SSL_connect( ssl );

If the failing client is "slightly bogged down by ltracing the sshd
process", then err == 1 (sucess), otherwise err == 0 (failure), checking
SSL_get_error I get SSL_ERROR_SYSCALL.

The man page says:

 SSL_ERROR_SYSCALL
           Some I/O error occurred.  The OpenSSL error queue may contain 
	   more information on the error.  If the error queue is empty 
	   (i.e. ERR_get_error() returns 0), ret can be used to find out 
	   more about the error: If ret == 0, an EOF was observed that
           violates the protocol.

The box is SMP dual Pentium III box, running Red Hat Linux 7.2 fully
updated with all official errata, plus the latest pam/nss_ldap, OpenLDAP
2.0.21, OpenSSL 0.9.6b.  I'm also having, what appears to be, the same
problem on another box, which is single cpu AMD 1700+.

The Red Hat OpenSSL RPM was configured/built with: 

./config no-asm 386 no-idea no-mdc2 no-rc5 shared

The OpenLDAP RPM configured/built with:

CPPFLAGS="-I/usr/kerberos/include"; export CPPFLAGS
CFLAGS="$CPPFLAGS $RPM_OPT_FLAGS -D_REENTRANT -DHAVE_KERBEROS_V -fPIC"; 
export CFLAGS

%configure \
        --with-slapd --with-slurpd --without-ldapd \
        --with-threads=posix --enable-shared --enable-static \
        --enable-ldbm --with-ldbm-api=gdbm \
        --enable-passwd \
        --enable-shell \
        \
        --enable-local --enable-cldap --disable-rlookups \
        \
        --with-kerberos=k5only \
        --with-tls \
        --with-cyrus-sasl \
        \
        --enable-wrappers \
        \
        --enable-cleartext \
        --enable-crypt \
        --enable-kpasswd \
        --enable-spasswd \
        \
        --libexecdir=%{_sbindir} \
        --localstatedir=/%{_var}/run

Details:

I've tracked it down closely.  I'm now officially "over my head" (tm).  
Keep in mind, I'm just a Perl guy.

pam_ldap.so calls ldap_start_tls_s. I tracked that down to:

openldap-2.0.21/libraries/libldap/tls.c

Eventually the ldap_int_tls_connect function is called.

The important lines from this function are:

ssl = alloc_handle( ctx ); 
err = SSL_connect( ssl );

Then the existing code does:

if ( err <= 0 ) {
	blah


I've modified it by adding this code right above it:

 if ( err == 0 ) {
                syslog (LOG_ERR, "SSL_connect returned 0\n");
                switch(SSL_get_error(ssl, err)) {

                        case SSL_ERROR_NONE:
                                syslog (LOG_ERR, "SSL_ERROR_NONE\n");
                                break;
                        case SSL_ERROR_ZERO_RETURN:
                                syslog (LOG_ERR, "SSL_ERROR_ZERO_RETURN\n");
                                break;
                        case SSL_ERROR_WANT_READ:
                                syslog (LOG_ERR, "SSL_ERROR_WANT_READ\n");
                                break;
                        case SSL_ERROR_WANT_WRITE:
                                syslog (LOG_ERR, "SSL_ERROR_WANT_WRITE\n");
                                break;
                        case SSL_ERROR_WANT_CONNECT:
                                syslog (LOG_ERR, "SSL_ERROR_WANT_CONNECT\n");
                                break;
                        case SSL_ERROR_WANT_X509_LOOKUP:
                                syslog (LOG_ERR, "SSL_ERROR_WANT_X509_LOOKUP\n");
                                break;
                        case SSL_ERROR_SYSCALL:
                                syslog (LOG_ERR, "SSL_ERROR_SYSCALL\n");
                                break;
                        case SSL_ERROR_SSL:
                                syslog (LOG_ERR, "SSL_ERROR_SSL\n");
                                break;
                        default:
                                syslog (LOG_ERR, "Error in reading SSL handle\n");
                }
        }

SSH attempt (sucessful BTW) into the machine slightly bogged down:

Feb  7 02:04:33 mooru sshd[17186]: SSL_connect returned 1

SSH attempt into the machine not bogged down:

Feb  7 02:12:18 mooru sshd[19396]: SSL_connect returned 0
Feb  7 02:12:18 mooru sshd[19396]: SSL_ERROR_SYSCALL
Feb  7 02:12:18 mooru sshd[19396]: TLS: can't connect. (other debug I added)
Feb  7 02:12:18 mooru sshd[19396]: pam_ldap: ldap_starttls_s: Connect error

At this point, I am at a loss how to further debug/diagnosis it. I'm more 
than happy to test out patches though.

Dax Kelson