Logged in as guest
Viewing Incoming/6798 Full headers
Major security issue: yes no
Notes: Notification:
Date: Wed, 19 Jan 2011 12:40:47 +0000 From: sgallagh@redhat.com To: openldap-its@OpenLDAP.org Subject: Mutex starvation on two-level referral for SASL connection
Full_Name: Stephen Gallagher Version: 2.4.21-0ubuntu5.2 OS: Ubuntu 10.04 URL: Submission from: (NULL) (98.110.239.235) This was discovered by a user of SSSD. When referrals are enabled with LDAP_OPT_REFERRALS on, SSSD has a rebind procedure set up to handle authenticating to the new server. This seems to work fine when we're dealing with a simple bind, but when we attempt to use SASL bind (for Kerberos-based GSSAPI authentication), we discovered a problem. Our rebind procedure calls ldap_sasl_interactive_bind_s() with LDAP_SASL_QUIET. This arrangement works fine for a single referral, however if the server has nested referrals (say, entry1 refers to entry2 which refers to entry3 on another server) then we hit a deadlock condition. Attaching gdb, we see the backtrace included at the bottom of this message. What appears to be happening is that for the first ldap_sasl_interactive_bind_s(), openldap is locking a mutex, and when it is called a second time it's attempting to lock that same mutex that has not yet been released. Backtrace: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00007f48285975d9 in _L_lock_953 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007f48285973fb in __pthread_mutex_lock (mutex=0x7f4829c05980) at pthread_mutex_lock.c:61 ignore1 = <value optimized out> ignore2 = 700471680 ignore3 = -512 __PRETTY_FUNCTION__ = "__pthread_mutex_lock" type = <value optimized out> #3 0x00007f48299d3e4e in ldap_sasl_interactive_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #4 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, url=0x2250220 "ldap://DomainDnsZones.org.example.com/DC=DomainDnsZones,DC=org,DC=example,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1624 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #5 0x00007f48299df6d1 in ldap_new_connection () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #6 0x00007f48299e0523 in ldap_send_server_request () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #7 0x00007f48299e11cd in ldap_chase_v3referrals () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #8 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #9 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #10 0x00007f48299d4788 in ldap_sasl_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #11 0x00007f48299d1751 in ldap_int_sasl_bind () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #12 0x00007f48299d3ea8 in ldap_sasl_interactive_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #13 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, url=0x225bc20 "ldap://ForestDnsZones.org.example.com/DC=ForestDnsZones,DC=org,DC=example,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1624 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #14 0x00007f48299df6d1 in ldap_new_connection () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #15 0x00007f48299e0523 in ldap_send_server_request () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #16 0x00007f48299e11cd in ldap_chase_v3referrals () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #17 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #18 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #19 0x00007f4825acfcb4 in sdap_process_result (ev=0x221b1f0, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:178 sh = 0x2241650 no_timeout = {tv_sec = 0, tv_usec = 0} te = <value optimized out> msg = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_process_result" #20 0x00007f482b4f1825 in ?? () from /usr/lib/libtevent.so.0 No symbol table info availabl
Date: Wed, 19 Jan 2011 17:39:54 +0100 (CET) Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection From: masarati@aero.polimi.it To: sgallagh@redhat.com Cc: openldap-its@openldap.org
> Full_Name: Stephen Gallagher > Version: 2.4.21-0ubuntu5.2 I think this issue report is obsoleted by ITS#6510 (released with OL 2.4.22), which prevents binds from returning referrals. Please upgrade and re-check. p.
Date: Wed, 19 Jan 2011 16:40:38 -0500 From: Stephen Gallagher <sgallagh@redhat.com> To: masarati@aero.polimi.it CC: openldap-its@openldap.org, Timo Aaltonen <timo.aaltonen@aalto.fi> Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote: >> Full_Name: Stephen Gallagher >> Version: 2.4.21-0ubuntu5.2 > > I think this issue report is obsoleted by ITS#6510 (released with OL > 2.4.22), which prevents binds from returning referrals. Please upgrade > and re-check. Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510 that it was suggested that nested mutexes could be used to resolve this. Perhaps we should revisit that. As you can see from our backtrace, this is actually occurring during the processing of ldap_result(). We're handling a rebind as ldap_result travels through a series of referrals. - -- Stephen Gallagher RHCE 804006346421761 Delivering value year after year. Red Hat ranks #1 in value among software vendors. http://www.redhat.com/promo/vendor/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk03WlYACgkQeiVVYja6o6PiXACgmykXZhhKZkixweJTb/qIVPAc DMAAn2GHK7wu3nC5mkcYq7jI2f13Ql+N =rj7f -----END PGP SIGNATURE-----
Date: Thu, 20 Jan 2011 02:23:59 -0800 From: Howard Chu <hyc@symas.com> To: sgallagh@redhat.com CC: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
sgallagh@redhat.com wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote: >>> Full_Name: Stephen Gallagher >>> Version: 2.4.21-0ubuntu5.2 >> >> I think this issue report is obsoleted by ITS#6510 (released with OL >> 2.4.22), which prevents binds from returning referrals. Please upgrade >> and re-check. > > Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510 > that it was suggested that nested mutexes could be used to resolve this. > Perhaps we should revisit that. No. Nested mutexes are non-portable, and as already discussed in #6510, it is incorrect to process referrals returned in response to Bind requests. > As you can see from our backtrace, this is actually occurring during the > processing of ldap_result(). We're handling a rebind as ldap_result > travels through a series of referrals. Most likely your server is not using an actual Bind Response tag in its response message. It would be good if you could run this using a debug build (no optimization, full debug symbols present) and examine the tag that was parsed from the referral result. Looks like libldap needs to be changed to actually record the tag of the outgoing requests. (It ought to do so anyway, and probably should return a ProtocolError result if it receives a response message whose tag doesn't match its request type.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no> Date: Thu, 20 Jan 2011 12:19:44 +0100 To: hyc@symas.com Cc: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
hyc@symas.com writes: > Looks like libldap needs to be changed to actually record the tag of the > outgoing requests. (It ought to do so anyway, and probably should return a > ProtocolError result if it receives a response message whose tag doesn't match > its request type.) Yes. OpenLDAP is too trunsting of incoming PDUs to be valid. ...as an option, if this would be noticeable for a lot of users. Users who Just Want Their Programs To Work without OpenLDAP breaking them, and won't be interested in "finger-pointing" against someone else's servers. -- Hallvard
Date: Thu, 20 Jan 2011 17:38:10 +0200 From: Timo Aaltonen <timo.aaltonen@aalto.fi> To: <openldap-its@openldap.org> CC: Stephen Gallagher <sgallagh@redhat.com> Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
Hi Here's some information that Stephen asked would be of use. There is one forest, one domain, but three sites in the layout. The functional level of the forest and the domain is W2008, but the servers have 2008R2. And the full backtrace of the hung process: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00007f8f63ec05d9 in _L_lock_953 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007f8f63ec03fb in __pthread_mutex_lock (mutex=0x7f8f6553fc80) at pthread_mutex_lock.c:61 ignore1 = <value optimized out> ignore2 = 1700002944 ignore3 = -512 __PRETTY_FUNCTION__ = "__pthread_mutex_lock" type = <value optimized out> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock (mutex=0x7f8f6553fc80) at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 No locals. #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, dn=0x0, mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, flags=2, interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x2124a50) at sasl.c:426 rc = -1921681294 smechs = 0x0 #5 0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20, url=0x2125a00 "ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1637 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #6 0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, srvlist=0x7fffdebf22f8, use_ldsb=0, connect=1, bind=0x7fffdebf22b0) at request.c:518 srvfunc = 0x2122da0 err = 0 savedefconn = 0x21208d0 lc = 0x21227c0 async = 0 __PRETTY_FUNCTION__ = "ldap_new_connection" #7 0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, ber=0x2122760, msgid=8, parentreq=0x21304f0, srvlist=0x7fffdebf22f8, lc=0x0, bind=0x7fffdebf22b0) at request.c:211 lr = 0x63 incparent = 1 rc = 1697594315 #8 0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, lr=0x21304f0, refs=0x0, sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf24f8) at request.c:1211 unfollowed = 0x0 unfollowedcnt = 0 origreq = 0x21304f0 srv = 0x2125a60 ber = 0x2122760 refarray = 0x2122720 lc = 0x0 rc = 0 count = 0 i = 0 j = 0 id = 8 rinfo = {ri_msgid = 5, ri_request = 99, ri_url = 0x2125a00 "ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com"} #9 0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=7, all=1, lc=0x2117760, result=0x7fffdebf26b0) at result.c:708 refs = 0x2122720 ber = 0x212e910 newmsg = 0x2117c20 l = 0x2124140 prev = 0x7debf25e0 id = 5 idx = 0 tag = 115 len = 76 foundit = 0 lr = 0x21304f0 tmplr = 0x7f8f6531039d dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, lr_outrefcnt = 0, lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, lr_res_msgtype = 0, lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, lr_ber = 0x0, lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 0x0, lr_child = 0x0, lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0} tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 0}, ber_tag = 0, ber_len = 85, ber_usertag = 0, ber_buf = 0x2131cd0 "\002\001\005s\204", ber_ptr = 0x2131d25 "", ber_end = 0x2131d25 "", ber_sos_ptr = 0x0, ber_rwptr = 0x0, ber_memctx = 0x0} rc = -2 refer_cnt = 0 hadref = 0 simple_request = 0 err = 1 lderr = 0 tmp = 0x0 chain_head = 0x0 moremsgs = 0 isv2 = 0 __PRETTY_FUNCTION__ = "try_read1msg" #10 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=7, all=1, timeout=0x7fffdebf25b0, result=0x7fffdebf26b0) at result.c:390 lnext = 0x2117760 lc_ready = 1 rc = -2 tv = {tv_sec = 6, tv_usec = 0} tv0 = {tv_sec = 6, tv_usec = 0} start_time_tv = {tv_sec = 1295533580, tv_usec = 63738} tvp = 0x7fffdebf25b0 lc = 0x2117760 __PRETTY_FUNCTION__ = "wait4msg" #11 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=7, all=1, timeout=0x0, result=0x7fffdebf26b0) at result.c:120 rc =
Date: Thu, 20 Jan 2011 12:16:27 -0800 From: Howard Chu <hyc@symas.com> To: timo.aaltonen@aalto.fi CC: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
timo.aaltonen@aalto.fi wrote: > Hi > > Here's some information that Stephen asked would be of use. There is > one forest, one domain, but three sites in the layout. The functional > level of the forest and the domain is W2008, but the servers have 2008R2. > > And the full backtrace of the hung process: Thanks, but this trace is from 2.4.21, which is obsolete. Please use the current release (2.4.23) since the relevant code has changed. There is no point in us spending time tracking down issues in non-existent code. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Thu, 20 Jan 2011 12:25:05 -0800 From: Howard Chu <hyc@symas.com> To: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
hyc@symas.com wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > > Thanks, but this trace is from 2.4.21, which is obsolete. Please use the > current release (2.4.23) since the relevant code has changed. There is no > point in us spending time tracking down issues in non-existent code. And, one more time: please use a debug build, like I asked in my first reply. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Thu, 20 Jan 2011 22:44:01 +0200 From: Timo Aaltonen <timo.aaltonen@aalto.fi> To: Howard Chu <hyc@symas.com> CC: <openldap-its@openldap.org> Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
On Thu, 20 Jan 2011, Howard Chu wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > > Thanks, but this trace is from 2.4.21, which is obsolete. Please use the > current release (2.4.23) since the relevant code has changed. There is no > point in us spending time tracking down issues in non-existent code. Not true, I backported 2.4.23 from natty, and made it build in lucid pbuilder. nexus6 sssd # apt-cache policy libldap-2.4-2 libldap-2.4-2: Installed: 2.4.23-6ubuntu4aalto1 Candidate: 2.4.23-6ubuntu4aalto1 Version table: *** 2.4.23-6ubuntu4aalto1 0 100 /var/lib/dpkg/status 2.4.21-0ubuntu5.3 0 500 http://ubuntu.hut.fi/ubuntu/ lucid-updates/main Packages 2.4.21-0ubuntu5.2 0 500 http://ubuntu.hut.fi/ubuntu/ lucid-security/main Packages 2.4.21-0ubuntu5 0 500 http://ubuntu.hut.fi/ubuntu/ lucid/main Packages I believe there is no reason to rebuild sssd against that, since they are ABI compatible? -- Timo Aaltonen Systems Specialist, Aalto IT
Date: Thu, 20 Jan 2011 13:17:56 -0800 From: Howard Chu <hyc@symas.com> To: timo.aaltonen@aalto.fi CC: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
timo.aaltonen@aalto.fi wrote: > Hi > > Here's some information that Stephen asked would be of use. There is > one forest, one domain, but three sites in the layout. The functional > level of the forest and the domain is W2008, but the servers have 2008R2. > > And the full backtrace of the hung process: > #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock > (mutex=0x7f8f6553fc80) > at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 > No locals. > #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, > dn=0x0, > mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, > flags=2, > interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at > sasl.c:426 > rc = -1921681294 > smechs = 0x0 This particular mutex seems kind of bogus to me; the code is from rev 1.31 in June 2001. Perhaps back then it was unsafe to have multiple SASL operations outstanding at once; I would expect that was only an issue in the Cyrus 1.5 days and it should be safe now with Cyrus 2.x. We should probably just delete this mutex. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Thu, 20 Jan 2011 13:53:13 -0800 From: Howard Chu <hyc@symas.com> To: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
hyc@symas.com wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > >> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >> (mutex=0x7f8f6553fc80) >> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >> No locals. >> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >> dn=0x0, >> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >> flags=2, >> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >> sasl.c:426 >> rc = -1921681294 >> smechs = 0x0 > > This particular mutex seems kind of bogus to me; the code is from rev 1.31 in > June 2001. Perhaps back then it was unsafe to have multiple SASL operations > outstanding at once; I would expect that was only an issue in the Cyrus 1.5 > days and it should be safe now with Cyrus 2.x. We should probably just delete > this mutex. > Although googling for "Cyrus sasl reentrancy" does not leave me with warm/fuzzy feelings. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Date: Fri, 21 Jan 2011 00:05:13 +0200 From: Timo Aaltonen <timo.aaltonen@aalto.fi> To: Howard Chu <hyc@symas.com> CC: <openldap-its@openldap.org> Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
On Thu, 20 Jan 2011, Howard Chu wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > >> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >> (mutex=0x7f8f6553fc80) >> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >> No locals. >> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >> dn=0x0, >> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >> flags=2, >> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >> sasl.c:426 >> rc = -1921681294 >> smechs = 0x0 > > This particular mutex seems kind of bogus to me; the code is from rev 1.31 in > June 2001. Perhaps back then it was unsafe to have multiple SASL operations > outstanding at once; I would expect that was only an issue in the Cyrus 1.5 > days and it should be safe now with Cyrus 2.x. We should probably just delete > this mutex. Ok, so by doing this: --- openldap-2.4.23.orig/libraries/libldap/sasl.c +++ openldap-2.4.23/libraries/libldap/sasl.c @@ -421,10 +421,11 @@ { int rc; char *smechs = NULL; - +/* #if defined( LDAP_R_COMPILE ) && defined( HAVE_CYRUS_SASL ) ldap_pvt_thread_mutex_lock( &ldap_int_sasl_mutex ); #endif +*/ #ifdef LDAP_CONNECTIONLESS if( LDAP_IS_UDP(ld) ) { /* Just force it to simple bind, silly to make the user -- .. the process doesn't hang anymore. But it still doesn't do what it's supposed to, but that could be a bug in SSSD. I'll investigate further. Thanks! -- Timo Aaltonen Systems Specialist, Aalto IT
Date: Tue, 01 Feb 2011 13:31:58 -0800 From: Howard Chu <hyc@symas.com> To: Timo Aaltonen <timo.aaltonen@aalto.fi> CC: openldap-its@openldap.org Subject: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
Timo Aaltonen wrote: > On Thu, 20 Jan 2011, Howard Chu wrote: > >> timo.aaltonen@aalto.fi wrote: >>> Hi >>> >>> Here's some information that Stephen asked would be of use. There is >>> one forest, one domain, but three sites in the layout. The functional >>> level of the forest and the domain is W2008, but the servers have 2008R2. >>> >>> And the full backtrace of the hung process: >> >>> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >>> (mutex=0x7f8f6553fc80) >>> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >>> No locals. >>> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >>> dn=0x0, >>> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >>> flags=2, >>> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >>> sasl.c:426 >>> rc = -1921681294 >>> smechs = 0x0 >> >> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in >> June 2001. Perhaps back then it was unsafe to have multiple SASL operations >> outstanding at once; I would expect that was only an issue in the Cyrus 1.5 >> days and it should be safe now with Cyrus 2.x. We should probably just delete >> this mutex. > > Ok, so by doing this: > > --- openldap-2.4.23.orig/libraries/libldap/sasl.c > +++ openldap-2.4.23/libraries/libldap/sasl.c > @@ -421,10 +421,11 @@ > { > int rc; > char *smechs = NULL; > - > +/* > #if defined( LDAP_R_COMPILE )&& defined( HAVE_CYRUS_SASL ) > ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex ); > #endif > +*/ > #ifdef LDAP_CONNECTIONLESS > if( LDAP_IS_UDP(ld) ) { > /* Just force it to simple bind, silly to make the user > > -- > > .. the process doesn't hang anymore. But it still doesn't do what it's > supposed to, but that could be a bug in SSSD. I'll investigate further. > > Thanks! > As I noted in a previous followup, it's not clear to me that the Cyrus SASL library is actually safe to use without that mutex. Also, going through your provided backtraces, I see the real issue is that two different requests were active at the same time. I.e., there was an active request that triggered a referral, and an unrelated request. You would also have avoided this issue if you waited for the request that triggered the referrals to complete before issuing any other requests. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
______________ © Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org