[Date Prev][Date Next] [Chronological] [Thread] [Top]

Hang in wait4msg() during search (ITS#3192)



Full_Name: Ian Puleston
Version: 2.1.29
OS: VxWorks
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (64.220.173.243)


I am working on a port of OpenLDAP that runs under VxWorks and have encountered
a problem where during a search request the LDAP client hangs in
ldap_int_select() called from ldap_result() via wait4msg(). This is a timing
issue and I only see it when using TLS to Active Directory on a Windows 2000
server. Non-TLS to the same server works OS, as do both TLS and non-TLS to an
OpenLDAP Server.

This may be the problem causing issue #s 3015, 3054 and 3124, and I have a
solution (see below). First here's the trace:

ldap_result msgid -1
ldap_chkResponseList for msgid=-1, all=0
ldap_chkResponseList returns NULL
wait4msg (infinite timeout), msgid -1
wait4msg continue, msgid -1, all 0
** Connections:
* host: ianserver.sd80.com  port: 636  (default)
  refcnt: 2  status: Connected
  last used: FRI JUN 18 23:08:19 2004

** Outstanding Requests:
 * msgid 2,  origid 2, status InProgress
   outstanding referrals 0, parent count 0
** Response Queue:
   Empty
ldap_chkResponseList for msgid=-1, all=0
ldap_chkResponseList returns NULL
read1msg: msgid -1, all 0
ldap_read: message type search-result msgid 2, original id 2
new result:  res_errno: 0, res_error: <>, res_matched: <>
read1msg:  0 new referrals
read1msg:  mark request completed, id = 2
request 2 done
res_errno: 0, res_error: <>, res_matched: <>
ldap_free_request (origid 2, msgid 2)
ldap_free_connection
ldap_free_connection: refcnt 1
ldap_int_select

And there it hangs ad-infinitum. If I change the code to use a timeout in the
select then it times out and the search completes OK (but slowly).

When it goes wrong and hangs data is already available, the
LBER_SB_OPT_DATA_READY BER request returns true and try_read1msg() is called.
When it does not hang no data is available at that point and it must do the
select to wait for the data.

The problem appears to be that wait4msg checks for lc == NULL to decide whether
it needs to wait for data on the socket, but if the data is already received
then try_read1msg() returns with lc set to NULL. That then causes
ldap_int_select() to get called even if there is no more data coming, resulting
in the hang up / timeout.

My fix, which I've tested and it seems to work fine, is to change the following
code in wait4msg() (from result.c in OpenLDAP 2.1.29:

        if( (*result = chkResponseList(ld, msgid, all)) != NULL ) {
            rc = (*result)->lm_msgtype;
        } else {

			for ( lc = ld->ld_conns; lc != NULL; lc = nextlc ) {
				nextlc = lc->lconn_next;
				if ( ber_sockbuf_ctrl( lc->lconn_sb,
						LBER_SB_OPT_DATA_READY, NULL ) ) {
					rc = try_read1msg( ld, msgid, all, lc->lconn_sb,
						&lc, result );
				    break;
				}
	        }

		    if ( lc == NULL ) {
			    rc = ldap_int_select( ld, tvp );


To:

        if( (*result = chkResponseList(ld, msgid, all)) != NULL ) {
            rc = (*result)->lm_msgtype;
        } else {
			int found_msg = 0;

			for ( lc = ld->ld_conns; lc != NULL; lc = nextlc ) {
				nextlc = lc->lconn_next;
				if ( ber_sockbuf_ctrl( lc->lconn_sb,
						LBER_SB_OPT_DATA_READY, NULL ) ) {
					rc = try_read1msg( ld, msgid, all, lc->lconn_sb,
						&lc, result );
					found_msg = 1;
				    break;
				}
	        }

		    if ( !found_msg ) {
			    rc = ldap_int_select( ld, tvp );


And here are the diffs to result.c for the patch:
299a300
>                       int found_msg = 0;
306a308
>                                       found_msg = 1;
311c313
<                   if ( lc == NULL ) {
---
>                   if ( !found_msg ) {