Issue 10023 - Asynchronous connects are broken
Summary: Asynchronous connects are broken
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: libraries (show other issues)
Version: 2.5.14
Hardware: All All
: --- normal
Target Milestone: 2.5.15
Assignee: Howard Chu
URL:
Keywords:
Depends on:
Blocks: 9244
  Show dependency treegraph
 
Reported: 2023-03-15 15:26 UTC by ipuleston@sonicwall.com
Modified: 2023-07-10 21:08 UTC (History)
1 user (show)

See Also:


Attachments
Patch for option #1, makes connect fully async, but will affect the fix for ITS #8957 (801 bytes, patch)
2023-03-16 20:22 UTC, ipuleston@sonicwall.com
Details
Patch for option #2, doesn't affect the fix for ITS #8957, but leaves connect with TLS being synchronous (557 bytes, patch)
2023-03-16 20:24 UTC, ipuleston@sonicwall.com
Details

Note You need to log in before you can comment on or make changes to this issue.
Description ipuleston@sonicwall.com 2023-03-15 15:26:09 UTC
We have a port of OpenLDAP client running in an embedded system, which is using asynchronous connects to the LDAP server. We have been using OpenLDAP 2.4.40 for a long time, and I just upgraded it to use 2.5.14 (as the current LTS release). After doing this, async connects to the LDAP server no longer work. You can see this in the following debug output:

A successful async connect with 2.4.40:
  ldap_send_initial_request
  ldap_new_connection 1 1 0
  ldap_int_open_connection
  ldap_connect_to_host: TCP Ian-DC1.sd80.com:389
  ldap_pvt_gethostbyname_a: host=Ian-DC1.sd80.com, r=0
  ldap_new_socket: 251
  ldap_prepare_socket: 251
  ldap_connect_to_host: Trying 192.168.168.3:389
  ldap_pvt_connect: fd: 251 tm: 10 async: -1
  ldap_ndelay_on: 251
  attempting to connect: 
  connect errno: 115
  ldap_int_poll: fd: -1 tm: 0

A failed async connect with 2.5.14:
  ldap_send_initial_request
  ldap_new_connection 1 1 0
  ldap_int_open_connection
  ldap_connect_to_host: TCP Ian-DC1.sd80.com:389
  ldap_pvt_gethostbyname_a: host=Ian-DC1.sd80.com, r=0
  ldap_new_socket: 247
  ldap_prepare_socket: 247
  ldap_connect_to_host: Trying 10.21.61.3:389
  ldap_pvt_connect: fd: 247 tm: 10 async: -1
  ldap_ndelay_on: 247
  attempting to connect: 
  connect errno: 115
  ldap_open_defconn: successful
  ldap_send_server_request
  Sending Bind Request, len=0x6ca10c1f
  ldap_write: want=63 error=Resource temporarily unavailable

Note that in both cases the connect attempt returns errno 115, EINPROGRESS, meaning that it has not completed. But after that:

● 2.4.40 calls ldap_int_poll (via ldap_send_initial_request -> ldap_int_check_async_open) to begin the wait for async completion.

● 2.5.14 instead reports a successful connect, and tries to send the bind which fails since thre socket is not yet connected.

I tracked the problem down to a change made for ITS #8022 "an async connect may still succeed immediately" in this commit: https://git.openldap.org/openldap/openldap/-/commit/ae6347bac12bbf843678a838ca26089080705f81

That change in ldap_new_connection makes it set lconn_status for an async connect to LDAP_CONNST_CONNECTED rather than LDAP_CONNST_CONNECTING if ldap_int_open_connection returns 0. The problem is that ldap_int_open_connection returns 0 after getting the EINPROGRESS. ldap_connect_to_host returns -2 for the latter, but ldap_int_open_connection doesn't check for that, returning 0 for any return code other than -1.

I think that the bug is actually in ldap_int_open_connection rather than in the above commit. It should probably return -2 when ldap_connect_to_host returns that.
Comment 1 ipuleston@sonicwall.com 2023-03-15 16:19:41 UTC
Additionally I spotted an 2nd related issue which came in with this subsequent commit for ITS #8957: 

https://git.openldap.org/openldap/openldap/-/commit/09ff530036a04a01ad4250eeac2cf30349976ddc

Note that ticket reports the same "connect errno: 115" followed by "ldap_open_defconn: successful" that I show above, and it probably also a result of the same commit that I referenced above.

What this commit does is to make ldap_int_open_connection call ldap_int_tls_start if LDAP_OPT_X_TLS_HARD is set and ldap_connect_to_host returned -2 (EWOULDBLOCK/EINPROGRESS). There are two problems with doing that:

1. It can't start TLS before the connect has completed. If ldap_int_tls_start does not wait for that then it will fail.

2. If ldap_int_tls_start does wait for the connect to complete then it will have to wait synchronously for that, which will break the asynchronicity of the connect.

To make it properly asynchronous, what should be happening on a -2 return code  is that control is returned to the caller of ldap_sasl_bind() etc. with return code LDAP_X_CONNECTING. That caller should then use poll or select to wait for the connect to complete, and then in the case of TLS it would need to make the call to start that (via ldap_int_tls_start).
Comment 2 ipuleston@sonicwall.com 2023-03-15 17:37:51 UTC
There are two ways to fix this, and I've tested that both work:

1. Add the following in ldap_int_open_connection before the LDAP_OPT_X_TLS_HARD check that was updated for ITS #8957:

	if ( async && rc == -2) {
		/* Need to let the connect complete asynchronously before we continue */
		return -2;
	}

2. Change the return( 0 ) statement at the end of ldap_int_open_connection (after the LDAP_OPT_X_TLS_HARD check) to this (or maybe just return rc):

	return ( (rc == -2) ? -2 : 0 );

I prefer #1 because it makes the connect properly function asynchronously, but it will affect the fix for ITS #8957 and the reporter of that may have to do some extra coding
Comment 3 ipuleston@sonicwall.com 2023-03-15 17:51:47 UTC
Sorry, saved the above comment before I had completed it. To continue...

I prefer option #1 because it makes the connect properly function asynchronously when using TLS, but it will affect the fix for ITS #8957 and the reporter of that may have to do some extra coding (I have added them as a cc).

Option #2 fixes my reported issue without affecting that fix, but leaving it doing a synchronous connect when LDAP_OPT_X_TLS_HARD is set along with LDAP_OPT_CONNECT_ASYNC.

I don't actually use LDAP_OPT_X_TLS_HARD in my code (I start TLS from outside of the OpenLDAP code after waiting for the connect to complete) so either one will work for me.
Comment 4 Howard Chu 2023-03-15 19:06:34 UTC
(In reply to ipuleston@sonicwall.com from comment #1)

> To make it properly asynchronous, what should be happening on a -2 return
> code  is that control is returned to the caller of ldap_sasl_bind() etc.
> with return code LDAP_X_CONNECTING. That caller should then use poll or
> select to wait for the connect to complete, and then in the case of TLS it
> would need to make the call to start that (via ldap_int_tls_start).

Note that ldap_int_* functions are for internal use only, not for users to invoke.
Comment 5 Howard Chu 2023-03-15 19:11:18 UTC
(In reply to ipuleston@sonicwall.com from comment #2)
> There are two ways to fix this, and I've tested that both work:
> 
> 1. Add the following in ldap_int_open_connection before the
> LDAP_OPT_X_TLS_HARD check that was updated for ITS #8957:
> 
> 	if ( async && rc == -2) {
> 		/* Need to let the connect complete asynchronously before we continue */
> 		return -2;
> 	}

Please supply an actual diff with your proposed fix, thanks.
Comment 6 ipuleston@sonicwall.com 2023-03-16 19:02:34 UTC
(In reply to Howard Chu from comment #4)
> (In reply to ipuleston@sonicwall.com from comment #1)
> 
> Note that ldap_int_* functions are for internal use only, not for users to
> invoke.

Yes, as things stand OpenLDAP doesn't really support asynchronous connects with TLS. Without TLS, LDAP_OPT_CONNECT_ASYNC does give asynchronous connects, but when TLS is used the "connect and start TLS" operation that it currently provides becomes synchronous.

To address this I had to patch my port to add a ldap_tls_start_async() API for starting TLS after completing an async connect. It also provides for doing the TLS handshake step-by-step asynchronously.

I would be happy to provide this as a patch, should you want to pick it up.
Comment 7 ipuleston@sonicwall.com 2023-03-16 20:22:08 UTC
Created attachment 952 [details]
Patch for option #1, makes connect fully async, but will affect the fix for ITS #8957
Comment 8 ipuleston@sonicwall.com 2023-03-16 20:24:16 UTC
Created attachment 953 [details]
Patch for option #2, doesn't affect the fix for ITS #8957, but leaves connect with TLS being synchronous
Comment 9 ipuleston@sonicwall.com 2023-03-16 20:25:41 UTC
(In reply to Howard Chu from comment #5)
> Please supply an actual diff with your proposed fix, thanks.

I've attached patches for both fix options.
Comment 10 ipuleston@sonicwall.com 2023-03-17 01:40:11 UTC
Note: see also ITS # 9244 which also reports similar issues stemming from the same problem with commit ae6347bac and ldap_int_open_connection not returning -2 (thanks to Randy for making me aware of that).
Comment 11 Howard Chu 2023-05-09 14:18:06 UTC
(In reply to ipuleston@sonicwall.com from comment #6)
> (In reply to Howard Chu from comment #4)
> > (In reply to ipuleston@sonicwall.com from comment #1)
> > 
> > Note that ldap_int_* functions are for internal use only, not for users to
> > invoke.
> 
> Yes, as things stand OpenLDAP doesn't really support asynchronous connects
> with TLS. Without TLS, LDAP_OPT_CONNECT_ASYNC does give asynchronous
> connects, but when TLS is used the "connect and start TLS" operation that it
> currently provides becomes synchronous.
> 
> To address this I had to patch my port to add a ldap_tls_start_async() API
> for starting TLS after completing an async connect. It also provides for
> doing the TLS handshake step-by-step asynchronously.
> 
> I would be happy to provide this as a patch, should you want to pick it up.

Please do. I considered moving the TLS setup from ldap_int_open_conn() to ldap_new_connection() but not sure that really makes things any easier.
Comment 12 Howard Chu 2023-05-09 14:23:43 UTC
To be clear: I'm going with option #1 for the fix.

Historically, OpenLDAP has always done TLS initialization synchronously. I think changing that is out of scope.
Comment 14 Quanah Gibson-Mount 2023-05-25 19:06:37 UTC
head:

  • 12d2382b 
by Ian Puleston at 2023-05-25T16:56:00+00:00 
ITS#10023 libldap: fix asynch connects

RE26:

 • dcbfb330 
by Ian Puleston at 2023-05-25T19:02:36+00:00 
ITS#10023 libldap: fix asynch connects

RE25:

  • cfb43597 
by Ian Puleston at 2023-05-25T19:05:03+00:00 
ITS#10023 libldap: fix asynch connects