Full_Name: HsuenJu Ko Version: 2.4.46 OS: VOS URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (198.97.42.5) A multi-threaded test case using C API experiences a hang involving cancel operation. One thread which performs cancel operation hangs waiting for a mutex(ld_conn_mutex) in ldap_send_initial_request while the other thread is performing ldap_result loop waiting for the result of search operation. The same mutex is held by wait4msg() across the ldap_int_select() call. It appears that before the ITS#6672 is installed, the ld_conn_mutex is unlocked before the ldap_int_select() and after the ITS#6672 the unlock is moved after ldap_int_select() which causes the thread performing the cancel to hang until ldap_result returns. By holding this mutex across select(), then all other threads needing the mutex are frozen until this select completes.
If you want to use a single connection from multiple threads in this way, you will have to select()/poll() on the fd directly and/or call ldap_result with a timeout.
This used to work before #6672. The code used to unlock the ld_conn_mutex before the select call. what if one thread is doing ldap_result with indefinite wait while other thread is doing something, not necessary cancel, which also requires holding the ld_conn_mutex lock? Are you saying no other thread is allowed to do anything requiring the same ld_conn_mutex? If I can not use the same connection, how do I do multiple connections? Can I cancel operations from different connection? Thanks for any feedback!
The use case here is invalid, and the code prior to ITS#6672 was broken. There is nothing here to be fixed.
On Thu, Mar 04, 2021 at 06:24:28PM +0000, openldap-its@openldap.org wrote: > --- Comment #2 from hsuenju_ko@stratus.com <hsuenju_ko@stratus.com> --- > This used to work before #6672. The code used to unlock the ld_conn_mutex > before the select call. > > what if one thread is doing ldap_result with indefinite wait while other thread > is doing something, not necessary cancel, which also requires holding the > ld_conn_mutex lock? Are you saying no other thread is allowed to do anything > requiring the same ld_conn_mutex? You can't use the connection while another thread is waiting there (and holding the mutex), this is not how libldap works. What you can do is retrieve the fd with LDAP_OPT_DESC, wait until there's activity and then call ldap_result with a timeout set to 0 to see if you got what you were interested in. Then you're able to send more requests while waiting, from any thread you want as long as they're not stuck waiting on network. > If I can not use the same connection, how do I do multiple connections? Can I > cancel operations from different connection? Cancel and unbind operations can only stop the processing of a request sent over the same connection.
Thanks for the explanation. What you are saying is that operations over same connection needs to be serialized among threads if not doing what you suggested. Is that correct? So if the application needs to do different operations over same connection among thread it needs to do the following: do async operation get fd do select/poll on the fd do ldap_result with 0 timeout And since every operation involves ldap_send_initial_request even the timeout value specified for the operation itself has to be reasonable short enough to prevent same lock situation. For most part we can use different connections to perform various operations among threads except cancel has to be done over same connection. Is that assumption correct? Thanks!
It seems cancel is not very useful if one cannot cancel itself and other thread can not cancel over same connection until the thread which performs the cancelled operation timeout either during the operation itself or during ldap_result, or doing the polling while waiting for ldap_result. Once you have timed out, there is no need to cancel, isn't it?
On Mon, Mar 08, 2021 at 12:30:55PM +0000, openldap-its@openldap.org wrote: > --- Comment #6 from hsuenju_ko@stratus.com <hsuenju_ko@stratus.com> --- > It seems cancel is not very useful if one cannot cancel itself and other thread > can not cancel over same connection until the thread which performs the > cancelled operation timeout either during the operation itself or during > ldap_result, or doing the polling while waiting for ldap_result. Once you have > timed out, there is no need to cancel, isn't it? This bug tracker is not for usage questions, these should be posted to the appropriate mailing list, usually openldap-technical. None of what you're asking for would have worked if the thread calling ldap_result were to release its locks anyway. You can still send a cancel exop and you will get a response to both, if you require that the other thread be notified immediately, you need to do that in your own application as libldap has never had such ambitions.
Sorry for not using the right forum. I will use a more appropriate mailing list in the future.