[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection



Timo Aaltonen wrote:
> On Thu, 20 Jan 2011, Howard Chu wrote:
>
>> timo.aaltonen@aalto.fi wrote:
>>>    	Hi
>>>
>>>      Here's some information that Stephen asked would be of use. There is
>>> one forest, one domain, but three sites in the layout. The functional
>>> level of the forest and the domain is W2008, but the servers have 2008R2.
>>>
>>> And the full backtrace of the hung process:
>>
>>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>>> (mutex=0x7f8f6553fc80)
>>>        at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>>> No locals.
>>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>>> dn=0x0,
>>>        mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
>>> flags=2,
>>>        interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at
>>> sasl.c:426
>>>            rc = -1921681294
>>>            smechs = 0x0
>>
>> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in
>> June 2001. Perhaps back then it was unsafe to have multiple SASL operations
>> outstanding at once; I would expect that was only an issue in the Cyrus 1.5
>> days and it should be safe now with Cyrus 2.x. We should probably just delete
>> this mutex.
>
> Ok, so by doing this:
>
> --- openldap-2.4.23.orig/libraries/libldap/sasl.c
> +++ openldap-2.4.23/libraries/libldap/sasl.c
> @@ -421,10 +421,11 @@
>    {
>           int rc;
>           char *smechs = NULL;
> -
> +/*
>    #if defined( LDAP_R_COMPILE )&&  defined( HAVE_CYRUS_SASL )
>           ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex );
>    #endif
> +*/
>    #ifdef LDAP_CONNECTIONLESS
>           if( LDAP_IS_UDP(ld) ) {
>                   /* Just force it to simple bind, silly to make the user
>
> --
>
> .. the process doesn't hang anymore. But it still doesn't do what it's
> supposed to, but that could be a bug in SSSD. I'll investigate further.
>
> Thanks!
>
As I noted in a previous followup, it's not clear to me that the Cyrus SASL 
library is actually safe to use without that mutex. Also, going through your 
provided backtraces, I see the real issue is that two different requests were 
active at the same time. I.e., there was an active request that triggered a 
referral, and an unrelated request. You would also have avoided this issue if 
you waited for the request that triggered the referrals to complete before 
issuing any other requests.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/