OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Incoming/6798
Full headers

From: sgallagh@redhat.com
Subject: Mutex starvation on two-level referral for SASL connection
Compose comment
Download message
State:
0 replies:
12 followups: 1 2 3 4 5 6 7 8 9 10 11 12

Major security issue: yes  no

Notes:

Notification:


Date: Wed, 19 Jan 2011 12:40:47 +0000
From: sgallagh@redhat.com
To: openldap-its@OpenLDAP.org
Subject: Mutex starvation on two-level referral for SASL connection
Full_Name: Stephen Gallagher
Version: 2.4.21-0ubuntu5.2
OS: Ubuntu 10.04
URL: 
Submission from: (NULL) (98.110.239.235)


This was discovered by a user of SSSD. When referrals are enabled with
LDAP_OPT_REFERRALS on, SSSD has a rebind procedure set up to handle
authenticating to the new server. This seems to work fine when we're dealing
with a simple bind, but when we attempt to use SASL bind (for Kerberos-based
GSSAPI authentication), we discovered a problem.

Our rebind procedure calls ldap_sasl_interactive_bind_s() with LDAP_SASL_QUIET.

This arrangement works fine for a single referral, however if the server has
nested referrals (say, entry1 refers to entry2 which refers to entry3 on another
server) then we hit a deadlock condition.

Attaching gdb, we see the backtrace included at the bottom of this message. What
appears to be happening is that for the first ldap_sasl_interactive_bind_s(),
openldap is locking a mutex, and when it is called a second time it's attempting
to lock that same mutex that has not yet been released.


Backtrace:
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
No locals.
#1  0x00007f48285975d9 in _L_lock_953 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x00007f48285973fb in __pthread_mutex_lock (mutex=0x7f4829c05980) at
pthread_mutex_lock.c:61
        ignore1 = <value optimized out>
        ignore2 = 700471680
        ignore3 = -512
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = <value optimized out>
#3  0x00007f48299d3e4e in ldap_sasl_interactive_bind_s () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#4  0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, 
    url=0x2250220 "ldap://DomainDnsZones.org.example.com/DC=DomainDnsZones,DC=org,DC=example,DC=com",

    request=<value optimized out>, msgid=<value optimized out>,
params=<value
optimized out>)
    at src/providers/ldap/sdap_async_connection.c:1624
        p = <value optimized out>
        sasl_mech = <value optimized out>
        user_dn = <value optimized out>
        password = {bv_len = 0, bv_val = 0x0}
        ctrls = {0x0, 0x0}
        tmp_ctx = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_rebind_proc"
#5  0x00007f48299df6d1 in ldap_new_connection () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#6  0x00007f48299e0523 in ldap_send_server_request () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#7  0x00007f48299e11cd in ldap_chase_v3referrals () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#8  0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#9  0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#10 0x00007f48299d4788 in ldap_sasl_bind_s () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#11 0x00007f48299d1751 in ldap_int_sasl_bind () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#12 0x00007f48299d3ea8 in ldap_sasl_interactive_bind_s () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#13 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, 
    url=0x225bc20 "ldap://ForestDnsZones.org.example.com/DC=ForestDnsZones,DC=org,DC=example,DC=com",

    request=<value optimized out>, msgid=<value optimized out>,
params=<value
optimized out>)
    at src/providers/ldap/sdap_async_connection.c:1624
        p = <value optimized out>
        sasl_mech = <value optimized out>
        user_dn = <value optimized out>
        password = {bv_len = 0, bv_val = 0x0}
        ctrls = {0x0, 0x0}
        tmp_ctx = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_rebind_proc"
#14 0x00007f48299df6d1 in ldap_new_connection () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#15 0x00007f48299e0523 in ldap_send_server_request () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#16 0x00007f48299e11cd in ldap_chase_v3referrals () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#17 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#18 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#19 0x00007f4825acfcb4 in sdap_process_result (ev=0x221b1f0, pvt=<value
optimized out>)
    at src/providers/ldap/sdap_async.c:178
        sh = 0x2241650
        no_timeout = {tv_sec = 0, tv_usec = 0}
        te = <value optimized out>
        msg = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_process_result"
#20 0x00007f482b4f1825 in ?? () from /usr/lib/libtevent.so.0
No symbol table info availabl

Message of length 6949 truncated

Followup 1

Download message
Date: Wed, 19 Jan 2011 17:39:54 +0100 (CET)
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL 
     connection
From: masarati@aero.polimi.it
To: sgallagh@redhat.com
Cc: openldap-its@openldap.org
> Full_Name: Stephen Gallagher
> Version: 2.4.21-0ubuntu5.2

I think this issue report is obsoleted by ITS#6510 (released with OL
2.4.22), which prevents binds from returning referrals.  Please upgrade
and re-check.

p.



Followup 2

Download message
Date: Wed, 19 Jan 2011 16:40:38 -0500
From: Stephen Gallagher <sgallagh@redhat.com>
To: masarati@aero.polimi.it
CC: openldap-its@openldap.org, Timo Aaltonen <timo.aaltonen@aalto.fi>
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL  
    connection
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote:
>> Full_Name: Stephen Gallagher
>> Version: 2.4.21-0ubuntu5.2
> 
> I think this issue report is obsoleted by ITS#6510 (released with OL
> 2.4.22), which prevents binds from returning referrals.  Please upgrade
> and re-check.

Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510
that it was suggested that nested mutexes could be used to resolve this.
Perhaps we should revisit that.

As you can see from our backtrace, this is actually occurring during the
processing of ldap_result(). We're handling a rebind as ldap_result
travels through a series of referrals.


- -- 
Stephen Gallagher
RHCE 804006346421761

Delivering value year after year.
Red Hat ranks #1 in value among software vendors.
http://www.redhat.com/promo/vendor/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk03WlYACgkQeiVVYja6o6PiXACgmykXZhhKZkixweJTb/qIVPAc
DMAAn2GHK7wu3nC5mkcYq7jI2f13Ql+N
=rj7f
-----END PGP SIGNATURE-----



Followup 3

Download message
Date: Thu, 20 Jan 2011 02:23:59 -0800
From: Howard Chu <hyc@symas.com>
To: sgallagh@redhat.com
CC: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
sgallagh@redhat.com wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote:
>>> Full_Name: Stephen Gallagher
>>> Version: 2.4.21-0ubuntu5.2
>>
>> I think this issue report is obsoleted by ITS#6510 (released with OL
>> 2.4.22), which prevents binds from returning referrals.  Please upgrade
>> and re-check.
>
> Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510
> that it was suggested that nested mutexes could be used to resolve this.
> Perhaps we should revisit that.

No. Nested mutexes are non-portable, and as already discussed in #6510, it is 
incorrect to process referrals returned in response to Bind requests.

> As you can see from our backtrace, this is actually occurring during the
> processing of ldap_result(). We're handling a rebind as ldap_result
> travels through a series of referrals.

Most likely your server is not using an actual Bind Response tag in its 
response message. It would be good if you could run this using a debug build 
(no optimization, full debug symbols present) and examine the tag that was 
parsed from the referral result.

Looks like libldap needs to be changed to actually record the tag of the 
outgoing requests. (It ought to do so anyway, and probably should return a 
ProtocolError result if it receives a response message whose tag doesn't match 
its request type.)

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 4

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Thu, 20 Jan 2011 12:19:44 +0100
To: hyc@symas.com
Cc: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL
	connection
hyc@symas.com writes:
> Looks like libldap needs to be changed to actually record the tag of the 
> outgoing requests. (It ought to do so anyway, and probably should return a 
> ProtocolError result if it receives a response message whose tag doesn't
match 
> its request type.)

Yes.  OpenLDAP is too trunsting of incoming PDUs to be valid.

...as an option, if this would be noticeable for a lot of users.  Users
who Just Want Their Programs To Work without OpenLDAP breaking them, and
won't be interested in "finger-pointing" against someone else's servers.

-- 
Hallvard



Followup 5

Download message
Date: Thu, 20 Jan 2011 17:38:10 +0200
From: Timo Aaltonen <timo.aaltonen@aalto.fi>
To: <openldap-its@openldap.org>
CC: Stephen Gallagher <sgallagh@redhat.com>
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL    
  connection
 	Hi

   Here's some information that Stephen asked would be of use. There is 
one forest, one domain, but three sites in the layout. The functional 
level of the forest and the domain is W2008, but the servers have 2008R2.

And the full backtrace of the hung process:

#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
No locals.
#1  0x00007f8f63ec05d9 in _L_lock_953 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x00007f8f63ec03fb in __pthread_mutex_lock (mutex=0x7f8f6553fc80)
     at pthread_mutex_lock.c:61
         ignore1 = <value optimized out>
         ignore2 = 1700002944
         ignore3 = -512
         __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
         type = <value optimized out>
#3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock 
(mutex=0x7f8f6553fc80)
     at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
No locals.
#4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, 
dn=0x0,
     mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, 
flags=2,
     interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x2124a50) at 
sasl.c:426
         rc = -1921681294
         smechs = 0x0
#5  0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20,
     url=0x2125a00 
"ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com", 
request=<value optimized out>, msgid=<value optimized out>,
params=<value 
optimized out>)
     at src/providers/ldap/sdap_async_connection.c:1637
         p = <value optimized out>
         sasl_mech = <value optimized out>
         user_dn = <value optimized out>
         password = {bv_len = 0, bv_val = 0x0}
         ctrls = {0x0, 0x0}
         tmp_ctx = <value optimized out>
         ret = <value optimized out>
         __FUNCTION__ = "sdap_rebind_proc"
#6  0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, 
srvlist=0x7fffdebf22f8,
     use_ldsb=0, connect=1, bind=0x7fffdebf22b0) at request.c:518
         srvfunc = 0x2122da0
         err = 0
         savedefconn = 0x21208d0
         lc = 0x21227c0
         async = 0
         __PRETTY_FUNCTION__ = "ldap_new_connection"
#7  0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, 
ber=0x2122760, msgid=8,
     parentreq=0x21304f0, srvlist=0x7fffdebf22f8, lc=0x0, 
bind=0x7fffdebf22b0) at request.c:211
         lr = 0x63
         incparent = 1
         rc = 1697594315
#8  0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, 
lr=0x21304f0, refs=0x0,
     sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf24f8) at request.c:1211
         unfollowed = 0x0
         unfollowedcnt = 0
         origreq = 0x21304f0
         srv = 0x2125a60
         ber = 0x2122760
         refarray = 0x2122720
         lc = 0x0
         rc = 0
         count = 0
         i = 0
         j = 0
         id = 8
         rinfo = {ri_msgid = 5, ri_request = 99,
           ri_url = 0x2125a00 
"ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com"}
#9  0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=7, all=1, 
lc=0x2117760,
     result=0x7fffdebf26b0) at result.c:708
         refs = 0x2122720
         ber = 0x212e910
         newmsg = 0x2117c20
         l = 0x2124140
         prev = 0x7debf25e0
         id = 5
         idx = 0
         tag = 115
         len = 76
         foundit = 0
         lr = 0x21304f0
         tmplr = 0x7f8f6531039d
         dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, 
lr_outrefcnt = 0,
           lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, 
lr_res_msgtype = 0,
           lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, 
lr_ber = 0x0,
           lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 
0x0, lr_child = 0x0,
           lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0}
         tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 
0}, ber_tag = 0,
           ber_len = 85, ber_usertag = 0, ber_buf = 0x2131cd0 
"\002\001\005s\204",
           ber_ptr = 0x2131d25 "", ber_end = 0x2131d25 "", ber_sos_ptr = 
0x0, ber_rwptr = 0x0,
           ber_memctx = 0x0}
         rc = -2
         refer_cnt = 0
         hadref = 0
         simple_request = 0
         err = 1
         lderr = 0
         tmp = 0x0
         chain_head = 0x0
         moremsgs = 0
         isv2 = 0
         __PRETTY_FUNCTION__ = "try_read1msg"
#10 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=7, all=1, 
timeout=0x7fffdebf25b0,
     result=0x7fffdebf26b0) at result.c:390
         lnext = 0x2117760
         lc_ready = 1
         rc = -2
         tv = {tv_sec = 6, tv_usec = 0}
         tv0 = {tv_sec = 6, tv_usec = 0}
         start_time_tv = {tv_sec = 1295533580, tv_usec = 63738}
         tvp = 0x7fffdebf25b0
         lc = 0x2117760
         __PRETTY_FUNCTION__ = "wait4msg"
#11 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=7, all=1, 
timeout=0x0,
     result=0x7fffdebf26b0) at result.c:120
         rc = 

Message of length 12605 truncated


Followup 6

Download message
Date: Thu, 20 Jan 2011 12:16:27 -0800
From: Howard Chu <hyc@symas.com>
To: timo.aaltonen@aalto.fi
CC: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
timo.aaltonen@aalto.fi wrote:
>   	Hi
>
>     Here's some information that Stephen asked would be of use. There is
> one forest, one domain, but three sites in the layout. The functional
> level of the forest and the domain is W2008, but the servers have 2008R2.
>
> And the full backtrace of the hung process:

Thanks, but this trace is from 2.4.21, which is obsolete. Please use the 
current release (2.4.23) since the relevant code has changed. There is no 
point in us spending time tracking down issues in non-existent code.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 7

Download message
Date: Thu, 20 Jan 2011 12:25:05 -0800
From: Howard Chu <hyc@symas.com>
To: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
hyc@symas.com wrote:
> timo.aaltonen@aalto.fi wrote:
>>    	Hi
>>
>>      Here's some information that Stephen asked would be of use. There
is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have
2008R2.
>>
>> And the full backtrace of the hung process:
>
> Thanks, but this trace is from 2.4.21, which is obsolete. Please use the
> current release (2.4.23) since the relevant code has changed. There is no
> point in us spending time tracking down issues in non-existent code.

And, one more time: please use a debug build, like I asked in my first reply.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 8

Download message
Date: Thu, 20 Jan 2011 22:44:01 +0200
From: Timo Aaltonen <timo.aaltonen@aalto.fi>
To: Howard Chu <hyc@symas.com>
CC: <openldap-its@openldap.org>
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL
 connection
On Thu, 20 Jan 2011, Howard Chu wrote:

> timo.aaltonen@aalto.fi wrote:
>>   	Hi
>>
>>     Here's some information that Stephen asked would be of use. There
is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have
2008R2.
>> 
>> And the full backtrace of the hung process:
>
> Thanks, but this trace is from 2.4.21, which is obsolete. Please use the 
> current release (2.4.23) since the relevant code has changed. There is no 
> point in us spending time tracking down issues in non-existent code.

Not true, I backported 2.4.23 from natty, and made it build in lucid 
pbuilder.

nexus6 sssd # apt-cache policy libldap-2.4-2
libldap-2.4-2:
   Installed: 2.4.23-6ubuntu4aalto1
   Candidate: 2.4.23-6ubuntu4aalto1
   Version table:
  *** 2.4.23-6ubuntu4aalto1 0
         100 /var/lib/dpkg/status
      2.4.21-0ubuntu5.3 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid-updates/main Packages
      2.4.21-0ubuntu5.2 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid-security/main Packages
      2.4.21-0ubuntu5 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid/main Packages

I believe there is no reason to rebuild sssd against that, since they are 
ABI compatible?

-- 
Timo Aaltonen
Systems Specialist, Aalto IT



Followup 9

Download message
Date: Thu, 20 Jan 2011 13:17:56 -0800
From: Howard Chu <hyc@symas.com>
To: timo.aaltonen@aalto.fi
CC: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
timo.aaltonen@aalto.fi wrote:
>   	Hi
>
>     Here's some information that Stephen asked would be of use. There is
> one forest, one domain, but three sites in the layout. The functional
> level of the forest and the domain is W2008, but the servers have 2008R2.
>
> And the full backtrace of the hung process:

> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
> (mutex=0x7f8f6553fc80)
>       at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
> No locals.
> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
> dn=0x0,
>       mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
> flags=2,
>       interact=0x7f8f61405120<sdap_sasl_interact>,
defaults=0x2124a50) at
> sasl.c:426
>           rc = -1921681294
>           smechs = 0x0

This particular mutex seems kind of bogus to me; the code is from rev 1.31 in 
June 2001. Perhaps back then it was unsafe to have multiple SASL operations 
outstanding at once; I would expect that was only an issue in the Cyrus 1.5 
days and it should be safe now with Cyrus 2.x. We should probably just delete 
this mutex.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 10

Download message
Date: Thu, 20 Jan 2011 13:53:13 -0800
From: Howard Chu <hyc@symas.com>
To: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
hyc@symas.com wrote:
> timo.aaltonen@aalto.fi wrote:
>>    	Hi
>>
>>      Here's some information that Stephen asked would be of use. There
is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have
2008R2.
>>
>> And the full backtrace of the hung process:
>
>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>> (mutex=0x7f8f6553fc80)
>>        at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>> No locals.
>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>> dn=0x0,
>>        mechs=0x210d530 "GSSAPI", serverControls=0x0,
clientControls=0x0,
>> flags=2,
>>        interact=0x7f8f61405120<sdap_sasl_interact>,
defaults=0x2124a50) at
>> sasl.c:426
>>            rc = -1921681294
>>            smechs = 0x0
>
> This particular mutex seems kind of bogus to me; the code is from rev 1.31
in
> June 2001. Perhaps back then it was unsafe to have multiple SASL operations
> outstanding at once; I would expect that was only an issue in the Cyrus 1.5
> days and it should be safe now with Cyrus 2.x. We should probably just
delete
> this mutex.
>
Although googling for "Cyrus sasl reentrancy" does not leave me with 
warm/fuzzy feelings.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 11

Download message
Date: Fri, 21 Jan 2011 00:05:13 +0200
From: Timo Aaltonen <timo.aaltonen@aalto.fi>
To: Howard Chu <hyc@symas.com>
CC: <openldap-its@openldap.org>
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL
 connection
On Thu, 20 Jan 2011, Howard Chu wrote:

> timo.aaltonen@aalto.fi wrote:
>>   	Hi
>>
>>     Here's some information that Stephen asked would be of use. There
is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have
2008R2.
>> 
>> And the full backtrace of the hung process:
>
>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>> (mutex=0x7f8f6553fc80)
>>       at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>> No locals.
>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>> dn=0x0,
>>       mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
>> flags=2,
>>       interact=0x7f8f61405120<sdap_sasl_interact>,
defaults=0x2124a50) at
>> sasl.c:426
>>           rc = -1921681294
>>           smechs = 0x0
>
> This particular mutex seems kind of bogus to me; the code is from rev 1.31
in 
> June 2001. Perhaps back then it was unsafe to have multiple SASL operations

> outstanding at once; I would expect that was only an issue in the Cyrus 1.5

> days and it should be safe now with Cyrus 2.x. We should probably just
delete 
> this mutex.

Ok, so by doing this:

--- openldap-2.4.23.orig/libraries/libldap/sasl.c
+++ openldap-2.4.23/libraries/libldap/sasl.c
@@ -421,10 +421,11 @@
  {
         int rc;
         char *smechs = NULL;
-
+/*
  #if defined( LDAP_R_COMPILE ) && defined( HAVE_CYRUS_SASL )
         ldap_pvt_thread_mutex_lock( &ldap_int_sasl_mutex );
  #endif
+*/
  #ifdef LDAP_CONNECTIONLESS
         if( LDAP_IS_UDP(ld) ) {
                 /* Just force it to simple bind, silly to make the user

--

.. the process doesn't hang anymore. But it still doesn't do what it's 
supposed to, but that could be a bug in SSSD. I'll investigate further.

Thanks!

-- 
Timo Aaltonen
Systems Specialist, Aalto IT



Followup 12

Download message
Date: Tue, 01 Feb 2011 13:31:58 -0800
From: Howard Chu <hyc@symas.com>
To: Timo Aaltonen <timo.aaltonen@aalto.fi>
CC: openldap-its@openldap.org
Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
Timo Aaltonen wrote:
> On Thu, 20 Jan 2011, Howard Chu wrote:
>
>> timo.aaltonen@aalto.fi wrote:
>>>    	Hi
>>>
>>>      Here's some information that Stephen asked would be of use.
There is
>>> one forest, one domain, but three sites in the layout. The
functional
>>> level of the forest and the domain is W2008, but the servers have
2008R2.
>>>
>>> And the full backtrace of the hung process:
>>
>>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>>> (mutex=0x7f8f6553fc80)
>>>        at
/tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>>> No locals.
>>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s
(ld=0x2117c20,
>>> dn=0x0,
>>>        mechs=0x210d530 "GSSAPI", serverControls=0x0,
clientControls=0x0,
>>> flags=2,
>>>        interact=0x7f8f61405120<sdap_sasl_interact>,
defaults=0x2124a50) at
>>> sasl.c:426
>>>            rc = -1921681294
>>>            smechs = 0x0
>>
>> This particular mutex seems kind of bogus to me; the code is from rev
1.31 in
>> June 2001. Perhaps back then it was unsafe to have multiple SASL
operations
>> outstanding at once; I would expect that was only an issue in the Cyrus
1.5
>> days and it should be safe now with Cyrus 2.x. We should probably just
delete
>> this mutex.
>
> Ok, so by doing this:
>
> --- openldap-2.4.23.orig/libraries/libldap/sasl.c
> +++ openldap-2.4.23/libraries/libldap/sasl.c
> @@ -421,10 +421,11 @@
>    {
>           int rc;
>           char *smechs = NULL;
> -
> +/*
>    #if defined( LDAP_R_COMPILE )&&  defined( HAVE_CYRUS_SASL )
>           ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex );
>    #endif
> +*/
>    #ifdef LDAP_CONNECTIONLESS
>           if( LDAP_IS_UDP(ld) ) {
>                   /* Just force it to simple bind, silly to make the user
>
> --
>
> .. the process doesn't hang anymore. But it still doesn't do what it's
> supposed to, but that could be a bug in SSSD. I'll investigate further.
>
> Thanks!
>
As I noted in a previous followup, it's not clear to me that the Cyrus SASL 
library is actually safe to use without that mutex. Also, going through your 
provided backtraces, I see the real issue is that two different requests were 
active at the same time. I.e., there was an active request that triggered a 
referral, and an unrelated request. You would also have avoided this issue if 
you waited for the request that triggered the referrals to complete before 
issuing any other requests.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org