Issue 8871 - mutex issue with cancel operation
Summary: mutex issue with cancel operation
Status: VERIFIED INVALID
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.46
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: Ondřej Kuzník
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-26 14:54 UTC by hsuenju_ko@stratus.com
Modified: 2021-03-08 17:17 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description hsuenju_ko@stratus.com 2018-06-26 14:54:17 UTC
Full_Name: HsuenJu Ko
Version: 2.4.46
OS: VOS
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (198.97.42.5)


A multi-threaded test case using C API experiences a hang involving
cancel operation.  One thread which performs cancel operation hangs
waiting for a mutex(ld_conn_mutex) in ldap_send_initial_request while
the other thread is performing ldap_result loop waiting for the result
of search operation. The same mutex is held by wait4msg() across the
ldap_int_select() call.  It appears that before the ITS#6672 is installed,
the ld_conn_mutex is unlocked before the ldap_int_select() and after the
ITS#6672 the unlock is moved after ldap_int_select() which causes
the thread performing the cancel to hang until ldap_result returns.
By holding this mutex across select(), then all other threads needing
the mutex are frozen until this select completes.
Comment 1 Ondřej Kuzník 2021-03-03 15:02:43 UTC
If you want to use a single connection from multiple threads in this way, you will have to select()/poll() on the fd directly and/or call ldap_result with a timeout.
Comment 2 hsuenju_ko@stratus.com 2021-03-04 18:24:28 UTC
This used to work before #6672. The code used to unlock the ld_conn_mutex before the select call.

what if one thread is doing ldap_result with indefinite wait while other thread is doing something, not necessary cancel, which also requires holding the ld_conn_mutex lock?  Are you saying no other thread is allowed to do anything requiring the same ld_conn_mutex? 

If I can not use the same connection, how do I do multiple connections?  Can I cancel operations from different connection?

Thanks for any feedback!
Comment 3 Quanah Gibson-Mount 2021-03-04 22:08:17 UTC
The use case here is invalid, and the code prior to ITS#6672 was broken.  There is nothing here to be fixed.
Comment 4 Ondřej Kuzník 2021-03-05 09:14:44 UTC
On Thu, Mar 04, 2021 at 06:24:28PM +0000, openldap-its@openldap.org wrote:
> --- Comment #2 from hsuenju_ko@stratus.com <hsuenju_ko@stratus.com> ---
> This used to work before #6672. The code used to unlock the ld_conn_mutex
> before the select call.
> 
> what if one thread is doing ldap_result with indefinite wait while other thread
> is doing something, not necessary cancel, which also requires holding the
> ld_conn_mutex lock?  Are you saying no other thread is allowed to do anything
> requiring the same ld_conn_mutex? 

You can't use the connection while another thread is waiting there (and
holding the mutex), this is not how libldap works. What you can do is
retrieve the fd with LDAP_OPT_DESC, wait until there's activity and then
call ldap_result with a timeout set to 0 to see if you got what you were
interested in. Then you're able to send more requests while waiting,
from any thread you want as long as they're not stuck waiting on network.

> If I can not use the same connection, how do I do multiple connections?  Can I
> cancel operations from different connection?

Cancel and unbind operations can only stop the processing of a request
sent over the same connection.
Comment 5 hsuenju_ko@stratus.com 2021-03-08 11:48:43 UTC
Thanks for the explanation.  What you are saying is that operations over same connection needs to be serialized among threads if not doing what you suggested.  Is that correct?  So if the application needs to do different operations over same connection among thread it needs to do the following:

 do async operation
 get fd
 do select/poll on the fd
 do ldap_result with 0 timeout


And since every operation involves ldap_send_initial_request even the timeout value specified for the operation itself has to be reasonable short enough to prevent same lock situation.  For most part we can use different connections to perform various operations among threads except cancel has to be done over same connection. Is that assumption correct?

Thanks!
Comment 6 hsuenju_ko@stratus.com 2021-03-08 12:30:55 UTC
It seems cancel is not very useful if one cannot cancel itself and other thread can not cancel over same connection until the thread which performs the cancelled operation timeout either during the operation itself or during ldap_result, or doing the polling while waiting for ldap_result.  Once you have timed out, there is no need to cancel, isn't it?
Comment 7 Ondřej Kuzník 2021-03-08 15:43:57 UTC
On Mon, Mar 08, 2021 at 12:30:55PM +0000, openldap-its@openldap.org wrote:
> --- Comment #6 from hsuenju_ko@stratus.com <hsuenju_ko@stratus.com> ---
> It seems cancel is not very useful if one cannot cancel itself and other thread
> can not cancel over same connection until the thread which performs the
> cancelled operation timeout either during the operation itself or during
> ldap_result, or doing the polling while waiting for ldap_result.  Once you have
> timed out, there is no need to cancel, isn't it?

This bug tracker is not for usage questions, these should be posted to
the appropriate mailing list, usually openldap-technical.

None of what you're asking for would have worked if the thread calling
ldap_result were to release its locks anyway. You can still send a
cancel exop and you will get a response to both, if you require that the
other thread be notified immediately, you need to do that in your own
application as libldap has never had such ambitions.
Comment 8 hsuenju_ko@stratus.com 2021-03-08 17:17:00 UTC
Sorry for not using the right forum. I will use a more appropriate mailing list in the future.