Issue 6798 - Mutex starvation on two-level referral for SASL connection
Summary: Mutex starvation on two-level referral for SASL connection
Status: VERIFIED FEEDBACK
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-01-19 12:40 UTC by sgallagh@redhat.com
Modified: 2021-08-03 18:13 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description sgallagh@redhat.com 2011-01-19 12:40:47 UTC
Full_Name: Stephen Gallagher
Version: 2.4.21-0ubuntu5.2
OS: Ubuntu 10.04
URL: 
Submission from: (NULL) (98.110.239.235)


This was discovered by a user of SSSD. When referrals are enabled with
LDAP_OPT_REFERRALS on, SSSD has a rebind procedure set up to handle
authenticating to the new server. This seems to work fine when we're dealing
with a simple bind, but when we attempt to use SASL bind (for Kerberos-based
GSSAPI authentication), we discovered a problem.

Our rebind procedure calls ldap_sasl_interactive_bind_s() with LDAP_SASL_QUIET.

This arrangement works fine for a single referral, however if the server has
nested referrals (say, entry1 refers to entry2 which refers to entry3 on another
server) then we hit a deadlock condition.

Attaching gdb, we see the backtrace included at the bottom of this message. What
appears to be happening is that for the first ldap_sasl_interactive_bind_s(),
openldap is locking a mutex, and when it is called a second time it's attempting
to lock that same mutex that has not yet been released.


Backtrace:
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
No locals.
#1  0x00007f48285975d9 in _L_lock_953 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x00007f48285973fb in __pthread_mutex_lock (mutex=0x7f4829c05980) at
pthread_mutex_lock.c:61
        ignore1 = <value optimized out>
        ignore2 = 700471680
        ignore3 = -512
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = <value optimized out>
#3  0x00007f48299d3e4e in ldap_sasl_interactive_bind_s () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#4  0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, 
    url=0x2250220 "ldap://DomainDnsZones.org.example.com/DC=DomainDnsZones,DC=org,DC=example,DC=com",

    request=<value optimized out>, msgid=<value optimized out>, params=<value
optimized out>)
    at src/providers/ldap/sdap_async_connection.c:1624
        p = <value optimized out>
        sasl_mech = <value optimized out>
        user_dn = <value optimized out>
        password = {bv_len = 0, bv_val = 0x0}
        ctrls = {0x0, 0x0}
        tmp_ctx = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_rebind_proc"
#5  0x00007f48299df6d1 in ldap_new_connection () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#6  0x00007f48299e0523 in ldap_send_server_request () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#7  0x00007f48299e11cd in ldap_chase_v3referrals () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#8  0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#9  0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#10 0x00007f48299d4788 in ldap_sasl_bind_s () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#11 0x00007f48299d1751 in ldap_int_sasl_bind () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#12 0x00007f48299d3ea8 in ldap_sasl_interactive_bind_s () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#13 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, 
    url=0x225bc20 "ldap://ForestDnsZones.org.example.com/DC=ForestDnsZones,DC=org,DC=example,DC=com",

    request=<value optimized out>, msgid=<value optimized out>, params=<value
optimized out>)
    at src/providers/ldap/sdap_async_connection.c:1624
        p = <value optimized out>
        sasl_mech = <value optimized out>
        user_dn = <value optimized out>
        password = {bv_len = 0, bv_val = 0x0}
        ctrls = {0x0, 0x0}
        tmp_ctx = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_rebind_proc"
#14 0x00007f48299df6d1 in ldap_new_connection () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#15 0x00007f48299e0523 in ldap_send_server_request () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#16 0x00007f48299e11cd in ldap_chase_v3referrals () from
/usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#17 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#18 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2
No symbol table info available.
#19 0x00007f4825acfcb4 in sdap_process_result (ev=0x221b1f0, pvt=<value
optimized out>)
    at src/providers/ldap/sdap_async.c:178
        sh = 0x2241650
        no_timeout = {tv_sec = 0, tv_usec = 0}
        te = <value optimized out>
        msg = <value optimized out>
        ret = <value optimized out>
        __FUNCTION__ = "sdap_process_result"
#20 0x00007f482b4f1825 in ?? () from /usr/lib/libtevent.so.0
No symbol table info available.
#21 0x00007f482b4f363b in ?? () from /usr/lib/libtevent.so.0
No symbol table info available.
#22 0x00007f482b4f09f0 in _tevent_loop_once () from /usr/lib/libtevent.so.0
No symbol table info available.
#23 0x00007f482b4f0a5b in ?? () from /usr/lib/libtevent.so.0
No symbol table info available.
#24 0x0000000000431e51 in server_loop (main_ctx=0x221c360) at
src/util/server.c:526
No locals.
#25 0x000000000040f8bb in main (argc=5, argv=<value optimized out>)
    at src/providers/data_provider_be.c:1333
        opt = <value optimized out>
        pc = <value optimized out>
        be_domain = 0x2212490 "AALTO"
        srv_name = <value optimized out>
        conf_entry = <value optimized out>
        main_ctx = 0x221c360
        ret = 0
        long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg
= 0x644d00, 
            val = 0, descrip = 0x436e64 "Help options:", argDescrip = 0x0}, {
            longName = 0x436e72 "debug-level", shortName = 100 'd', argInfo = 2,

            arg = 0x644de0, val = 0, descrip = 0x436e43 "Debug level",
argDescrip = 0x0}, {
            longName = 0x436e7e "debug-to-files", shortName = 102 'f', argInfo =
0, 
            arg = 0x644de4, val = 0, 
            descrip = 0x437ad8 "Send the debug output to files instead of
stderr", 
            argDescrip = 0x0}, {longName = 0x436e8d "debug-timestamps",
shortName = 0 '\000', 
            argInfo = 2, arg = 0x644cc0, val = 0, descrip = 0x436e4f "Add debug
timestamps", 
            argDescrip = 0x0}, {longName = 0x438458 "domain", shortName = 0
'\000', 
            argInfo = 1, arg = 0x7fffe703f208, val = 0, 
            descrip = 0x437b10 "Domain of the information provider (mandatory)",

            argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo =
0, arg = 0x0, 
            val = 0, descrip = 0x0, argDescrip = 0x0}}
        __FUNCTION__ = "main"
Comment 1 ando@openldap.org 2011-01-19 16:39:54 UTC
> Full_Name: Stephen Gallagher
> Version: 2.4.21-0ubuntu5.2

I think this issue report is obsoleted by ITS#6510 (released with OL
2.4.22), which prevents binds from returning referrals.  Please upgrade
and re-check.

p.

Comment 2 sgallagh@redhat.com 2011-01-19 21:40:38 UTC
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote:
>> Full_Name: Stephen Gallagher
>> Version: 2.4.21-0ubuntu5.2
> 
> I think this issue report is obsoleted by ITS#6510 (released with OL
> 2.4.22), which prevents binds from returning referrals.  Please upgrade
> and re-check.

Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510
that it was suggested that nested mutexes could be used to resolve this.
Perhaps we should revisit that.

As you can see from our backtrace, this is actually occurring during the
processing of ldap_result(). We're handling a rebind as ldap_result
travels through a series of referrals.


- -- 
Stephen Gallagher
RHCE 804006346421761

Delivering value year after year.
Red Hat ranks #1 in value among software vendors.
http://www.redhat.com/promo/vendor/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk03WlYACgkQeiVVYja6o6PiXACgmykXZhhKZkixweJTb/qIVPAc
DMAAn2GHK7wu3nC5mkcYq7jI2f13Ql+N
=rj7f
-----END PGP SIGNATURE-----

Comment 3 Howard Chu 2011-01-20 10:23:59 UTC
sgallagh@redhat.com wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote:
>>> Full_Name: Stephen Gallagher
>>> Version: 2.4.21-0ubuntu5.2
>>
>> I think this issue report is obsoleted by ITS#6510 (released with OL
>> 2.4.22), which prevents binds from returning referrals.  Please upgrade
>> and re-check.
>
> Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510
> that it was suggested that nested mutexes could be used to resolve this.
> Perhaps we should revisit that.

No. Nested mutexes are non-portable, and as already discussed in #6510, it is 
incorrect to process referrals returned in response to Bind requests.

> As you can see from our backtrace, this is actually occurring during the
> processing of ldap_result(). We're handling a rebind as ldap_result
> travels through a series of referrals.

Most likely your server is not using an actual Bind Response tag in its 
response message. It would be good if you could run this using a debug build 
(no optimization, full debug symbols present) and examine the tag that was 
parsed from the referral result.

Looks like libldap needs to be changed to actually record the tag of the 
outgoing requests. (It ought to do so anyway, and probably should return a 
ProtocolError result if it receives a response message whose tag doesn't match 
its request type.)

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Hallvard Furuseth 2011-01-20 11:19:44 UTC
hyc@symas.com writes:
> Looks like libldap needs to be changed to actually record the tag of the 
> outgoing requests. (It ought to do so anyway, and probably should return a 
> ProtocolError result if it receives a response message whose tag doesn't match 
> its request type.)

Yes.  OpenLDAP is too trunsting of incoming PDUs to be valid.

...as an option, if this would be noticeable for a lot of users.  Users
who Just Want Their Programs To Work without OpenLDAP breaking them, and
won't be interested in "finger-pointing" against someone else's servers.

-- 
Hallvard

Comment 5 Timo Aaltonen 2011-01-20 15:38:10 UTC
 	Hi

   Here's some information that Stephen asked would be of use. There is 
one forest, one domain, but three sites in the layout. The functional 
level of the forest and the domain is W2008, but the servers have 2008R2.

And the full backtrace of the hung process:

#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
No locals.
#1  0x00007f8f63ec05d9 in _L_lock_953 () from /lib/libpthread.so.0
No symbol table info available.
#2  0x00007f8f63ec03fb in __pthread_mutex_lock (mutex=0x7f8f6553fc80)
     at pthread_mutex_lock.c:61
         ignore1 = <value optimized out>
         ignore2 = 1700002944
         ignore3 = -512
         __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
         type = <value optimized out>
#3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock 
(mutex=0x7f8f6553fc80)
     at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
No locals.
#4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, 
dn=0x0,
     mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, 
flags=2,
     interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x2124a50) at 
sasl.c:426
         rc = -1921681294
         smechs = 0x0
#5  0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20,
     url=0x2125a00 
"ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com", 
request=<value optimized out>, msgid=<value optimized out>, params=<value 
optimized out>)
     at src/providers/ldap/sdap_async_connection.c:1637
         p = <value optimized out>
         sasl_mech = <value optimized out>
         user_dn = <value optimized out>
         password = {bv_len = 0, bv_val = 0x0}
         ctrls = {0x0, 0x0}
         tmp_ctx = <value optimized out>
         ret = <value optimized out>
         __FUNCTION__ = "sdap_rebind_proc"
#6  0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, 
srvlist=0x7fffdebf22f8,
     use_ldsb=0, connect=1, bind=0x7fffdebf22b0) at request.c:518
         srvfunc = 0x2122da0
         err = 0
         savedefconn = 0x21208d0
         lc = 0x21227c0
         async = 0
         __PRETTY_FUNCTION__ = "ldap_new_connection"
#7  0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, 
ber=0x2122760, msgid=8,
     parentreq=0x21304f0, srvlist=0x7fffdebf22f8, lc=0x0, 
bind=0x7fffdebf22b0) at request.c:211
         lr = 0x63
         incparent = 1
         rc = 1697594315
#8  0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, 
lr=0x21304f0, refs=0x0,
     sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf24f8) at request.c:1211
         unfollowed = 0x0
         unfollowedcnt = 0
         origreq = 0x21304f0
         srv = 0x2125a60
         ber = 0x2122760
         refarray = 0x2122720
         lc = 0x0
         rc = 0
         count = 0
         i = 0
         j = 0
         id = 8
         rinfo = {ri_msgid = 5, ri_request = 99,
           ri_url = 0x2125a00 
"ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com"}
#9  0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=7, all=1, 
lc=0x2117760,
     result=0x7fffdebf26b0) at result.c:708
         refs = 0x2122720
         ber = 0x212e910
         newmsg = 0x2117c20
         l = 0x2124140
         prev = 0x7debf25e0
         id = 5
         idx = 0
         tag = 115
         len = 76
         foundit = 0
         lr = 0x21304f0
         tmplr = 0x7f8f6531039d
         dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, 
lr_outrefcnt = 0,
           lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, 
lr_res_msgtype = 0,
           lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, 
lr_ber = 0x0,
           lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 
0x0, lr_child = 0x0,
           lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0}
         tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 
0}, ber_tag = 0,
           ber_len = 85, ber_usertag = 0, ber_buf = 0x2131cd0 
"\002\001\005s\204",
           ber_ptr = 0x2131d25 "", ber_end = 0x2131d25 "", ber_sos_ptr = 
0x0, ber_rwptr = 0x0,
           ber_memctx = 0x0}
         rc = -2
         refer_cnt = 0
         hadref = 0
         simple_request = 0
         err = 1
         lderr = 0
         tmp = 0x0
         chain_head = 0x0
         moremsgs = 0
         isv2 = 0
         __PRETTY_FUNCTION__ = "try_read1msg"
#10 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=7, all=1, 
timeout=0x7fffdebf25b0,
     result=0x7fffdebf26b0) at result.c:390
         lnext = 0x2117760
         lc_ready = 1
         rc = -2
         tv = {tv_sec = 6, tv_usec = 0}
         tv0 = {tv_sec = 6, tv_usec = 0}
         start_time_tv = {tv_sec = 1295533580, tv_usec = 63738}
         tvp = 0x7fffdebf25b0
         lc = 0x2117760
         __PRETTY_FUNCTION__ = "wait4msg"
#11 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=7, all=1, 
timeout=0x0,
     result=0x7fffdebf26b0) at result.c:120
         rc = 1227
         __PRETTY_FUNCTION__ = "ldap_result"
#12 0x00007f8f65300a06 in ldap_sasl_bind_s (ld=0x2117c20, dn=0x0,
     mechanism=0x7f8f5ffe894c "GSSAPI", cred=0x7fffdebf2740, sctrls=0x0, 
cctrls=0x0,
     servercredp=0x7fffdebf2758) at sasl.c:194
         rc = 0
         msgid = 7
         result = 0x0
         scredp = 0x0
#13 0x00007f8f652fce5b in ldap_int_sasl_bind (ld=0x2117c20, dn=0x0, 
mechs=0x210d530 "GSSAPI",
     sctrls=0x0, cctrls=0x0, flags=2, interact=0x7f8f61405120 
<sdap_sasl_interact>,
     defaults=0x21170b0) at cyrus.c:535
         scred = 0x0
         credlen = 32767
         data = 0x8d757872de362e00 <Address 0x8d757872de362e00 out of 
bounds>
         mech = 0x7f8f5ffe894c "GSSAPI"
         pmech = 0x7f8f5ffe894c "GSSAPI"
         saslrc = 1
         rc = 0
         ssf = 0x0
         ctx = 0x2127380
         oldctx = 0x0
         prompts = 0x0
         credlen = 1202
         ccred = {bv_len = 1202,
           bv_val = 0x2125540 
"`\202\004\256\006\t*\206H\206\367\022\001\002\002\001"}
         sd = 23
         ssl = 0x0
#14 0x00007f8f6530120f in ldap_sasl_interactive_bind_s (ld=0x2117c20, 
dn=0x0,
     mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, 
flags=2,
     interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x21170b0) at 
sasl.c:464
         rc = -1921681294
         smechs = 0x0
#15 0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20,
     url=0x2126270 
"ldap://ForestDnsZones.domain.com/DC=ForestDnsZones,DC=domain,DC=com", 
request=<value optimized out>, msgid=<value optimized out>, params=<value 
optimized out>)
     at src/providers/ldap/sdap_async_connection.c:1637
         p = <value optimized out>
         sasl_mech = <value optimized out>
         user_dn = <value optimized out>
         password = {bv_len = 0, bv_val = 0x0}
         ctrls = {0x0, 0x0}
         tmp_ctx = <value optimized out>
         ret = <value optimized out>
         __FUNCTION__ = "sdap_rebind_proc"
#16 0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, 
srvlist=0x7fffdebf2bd8,
     use_ldsb=0, connect=1, bind=0x7fffdebf2b90) at request.c:518
         srvfunc = 0x2122190
         err = 0
         savedefconn = 0x2117760
         lc = 0x21208d0
         async = 0
         __PRETTY_FUNCTION__ = "ldap_new_connection"
#17 0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, 
ber=0x21206e0, msgid=6,
     parentreq=0x21304f0, srvlist=0x7fffdebf2bd8, lc=0x0, 
bind=0x7fffdebf2b90) at request.c:211
         lr = 0x63
         incparent = 1
         rc = 1697594315
#18 0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, 
lr=0x21304f0, refs=0x0,
     sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf2dd8) at request.c:1211
         unfollowed = 0x0
         unfollowedcnt = 0
         origreq = 0x21304f0
         srv = 0x2116b40
         ber = 0x21206e0
         refarray = 0x212b470
         lc = 0x0
         rc = 0
         count = 0
         i = 0
         j = 0
         id = 6
         rinfo = {ri_msgid = 5, ri_request = 99,
           ri_url = 0x2126270 
"ldap://ForestDnsZones.domain.com/DC=ForestDnsZones,DC=domain,DC=com"}
#19 0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=-1, all=0, 
lc=0x2117760,
     result=0x7fffdebf2f68) at result.c:708
         refs = 0x212b470
         ber = 0x2122580
         newmsg = 0x7f8f674287b8
         l = 0x7fffdebf2ef0
         prev = 0x7f8f67460cd8
         id = 5
         idx = 0
         tag = 115
         len = 76
         foundit = 0
         lr = 0x21304f0
         tmplr = 0x100000000
         dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, 
lr_outrefcnt = 0,
           lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, 
lr_res_msgtype = 0,
           lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, 
lr_ber = 0x0,
           lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 
0x0, lr_child = 0x0,
           lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0}
         tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 
0}, ber_tag = 0,
           ber_len = 85, ber_usertag = 0, ber_buf = 0x2125410 
"\002\001\005s\204",
           ber_ptr = 0x2125465 "", ber_end = 0x2125465 "", ber_sos_ptr = 
0x0, ber_rwptr = 0x0,
           ber_memctx = 0x0}
         rc = -2
         refer_cnt = 0
         hadref = 0
         simple_request = 0
         err = 0
         lderr = 32655
         tmp = 0x0
         chain_head = 0x0
         moremsgs = 0
         isv2 = 0
         __PRETTY_FUNCTION__ = "try_read1msg"
#20 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=-1, all=0, 
timeout=0x7fffdebf2f50,
     result=0x7fffdebf2f68) at result.c:390
         lnext = 0x7f8f6742f000
         lc_ready = 1
         rc = -2
         tv = {tv_sec = 0, tv_usec = 0}
         tv0 = {tv_sec = 0, tv_usec = 0}
         start_time_tv = {tv_sec = 1295533580, tv_usec = 60251}
         tvp = 0x7fffdebf2e90
         lc = 0x2117760
         __PRETTY_FUNCTION__ = "wait4msg"
#21 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=-1, all=0, 
timeout=0x7fffdebf2f50,
     result=0x7fffdebf2f68) at result.c:120
         rc = 32655
         __PRETTY_FUNCTION__ = "ldap_result"
#22 0x00007f8f613f7cb4 in sdap_process_result (ev=0x20f11f0, pvt=<value 
optimized out>)
     at src/providers/ldap/sdap_async.c:178
         sh = 0x21176f0
         no_timeout = {tv_sec = 0, tv_usec = 0}
         te = <value optimized out>
         msg = <value optimized out>
         ret = <value optimized out>
         __FUNCTION__ = "sdap_process_result"
#23 0x00007f8f66e3077e in ?? () from /usr/lib/libtevent.so.0
No symbol table info available.
#24 0x00007f8f66e2d9f0 in _tevent_loop_once () from 
/usr/lib/libtevent.so.0
No symbol table info available.
#25 0x00007f8f66e2da5b in ?? () from /usr/lib/libtevent.so.0
No symbol table info available.
#26 0x0000000000431e51 in server_loop (main_ctx=0x20f2360) at 
src/util/server.c:526
No locals.
#27 0x000000000040f8bb in main (argc=5, argv=<value optimized out>)
     at src/providers/data_provider_be.c:1333
         opt = <value optimized out>
         pc = <value optimized out>
         be_domain = 0x20e8490 "DOMAIN"
         srv_name = <value optimized out>
         conf_entry = <value optimized out>
         main_ctx = 0x20f2360
         ret = 0
         long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 
4, arg = 0x644d00,
             val = 0, descrip = 0x436e64 "Help options:", argDescrip = 
0x0}, {
             longName = 0x436e72 "debug-level", shortName = 100 'd', 
argInfo = 2,
             arg = 0x644de0, val = 0, descrip = 0x436e43 "Debug level", 
argDescrip = 0x0}, {
             longName = 0x436e7e "debug-to-files", shortName = 102 'f', 
argInfo = 0,
             arg = 0x644de4, val = 0,
             descrip = 0x437ad8 "Send the debug output to files instead of 
stderr",
             argDescrip = 0x0}, {longName = 0x436e8d "debug-timestamps", 
shortName = 0 '\000',
             argInfo = 2, arg = 0x644cc0, val = 0, descrip = 0x436e4f "Add 
debug timestamps",
             argDescrip = 0x0}, {longName = 0x438458 "domain", shortName = 
0 '\000',
             argInfo = 1, arg = 0x7fffdebf31d8, val = 0,
             descrip = 0x437b10 "Domain of the information provider 
(mandatory)",
             argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', 
argInfo = 0, arg = 0x0,
             val = 0, descrip = 0x0, argDescrip = 0x0}}
         __FUNCTION__ = "main"


-- 
Timo Aaltonen
Systems Specialist, Aalto IT

Comment 6 Howard Chu 2011-01-20 20:16:27 UTC
timo.aaltonen@aalto.fi wrote:
>   	Hi
>
>     Here's some information that Stephen asked would be of use. There is
> one forest, one domain, but three sites in the layout. The functional
> level of the forest and the domain is W2008, but the servers have 2008R2.
>
> And the full backtrace of the hung process:

Thanks, but this trace is from 2.4.21, which is obsolete. Please use the 
current release (2.4.23) since the relevant code has changed. There is no 
point in us spending time tracking down issues in non-existent code.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 7 Howard Chu 2011-01-20 20:25:05 UTC
hyc@symas.com wrote:
> timo.aaltonen@aalto.fi wrote:
>>    	Hi
>>
>>      Here's some information that Stephen asked would be of use. There is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have 2008R2.
>>
>> And the full backtrace of the hung process:
>
> Thanks, but this trace is from 2.4.21, which is obsolete. Please use the
> current release (2.4.23) since the relevant code has changed. There is no
> point in us spending time tracking down issues in non-existent code.

And, one more time: please use a debug build, like I asked in my first reply.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 Timo Aaltonen 2011-01-20 20:44:01 UTC
On Thu, 20 Jan 2011, Howard Chu wrote:

> timo.aaltonen@aalto.fi wrote:
>>   	Hi
>>
>>     Here's some information that Stephen asked would be of use. There is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have 2008R2.
>> 
>> And the full backtrace of the hung process:
>
> Thanks, but this trace is from 2.4.21, which is obsolete. Please use the 
> current release (2.4.23) since the relevant code has changed. There is no 
> point in us spending time tracking down issues in non-existent code.

Not true, I backported 2.4.23 from natty, and made it build in lucid 
pbuilder.

nexus6 sssd # apt-cache policy libldap-2.4-2
libldap-2.4-2:
   Installed: 2.4.23-6ubuntu4aalto1
   Candidate: 2.4.23-6ubuntu4aalto1
   Version table:
  *** 2.4.23-6ubuntu4aalto1 0
         100 /var/lib/dpkg/status
      2.4.21-0ubuntu5.3 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid-updates/main Packages
      2.4.21-0ubuntu5.2 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid-security/main Packages
      2.4.21-0ubuntu5 0
         500 http://ubuntu.hut.fi/ubuntu/ lucid/main Packages

I believe there is no reason to rebuild sssd against that, since they are 
ABI compatible?

-- 
Timo Aaltonen
Systems Specialist, Aalto IT

Comment 9 Howard Chu 2011-01-20 21:17:56 UTC
timo.aaltonen@aalto.fi wrote:
>   	Hi
>
>     Here's some information that Stephen asked would be of use. There is
> one forest, one domain, but three sites in the layout. The functional
> level of the forest and the domain is W2008, but the servers have 2008R2.
>
> And the full backtrace of the hung process:

> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
> (mutex=0x7f8f6553fc80)
>       at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
> No locals.
> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
> dn=0x0,
>       mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
> flags=2,
>       interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at
> sasl.c:426
>           rc = -1921681294
>           smechs = 0x0

This particular mutex seems kind of bogus to me; the code is from rev 1.31 in 
June 2001. Perhaps back then it was unsafe to have multiple SASL operations 
outstanding at once; I would expect that was only an issue in the Cyrus 1.5 
days and it should be safe now with Cyrus 2.x. We should probably just delete 
this mutex.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 10 Howard Chu 2011-01-20 21:53:13 UTC
hyc@symas.com wrote:
> timo.aaltonen@aalto.fi wrote:
>>    	Hi
>>
>>      Here's some information that Stephen asked would be of use. There is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have 2008R2.
>>
>> And the full backtrace of the hung process:
>
>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>> (mutex=0x7f8f6553fc80)
>>        at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>> No locals.
>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>> dn=0x0,
>>        mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
>> flags=2,
>>        interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at
>> sasl.c:426
>>            rc = -1921681294
>>            smechs = 0x0
>
> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in
> June 2001. Perhaps back then it was unsafe to have multiple SASL operations
> outstanding at once; I would expect that was only an issue in the Cyrus 1.5
> days and it should be safe now with Cyrus 2.x. We should probably just delete
> this mutex.
>
Although googling for "Cyrus sasl reentrancy" does not leave me with 
warm/fuzzy feelings.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 Timo Aaltonen 2011-01-20 22:05:13 UTC
On Thu, 20 Jan 2011, Howard Chu wrote:

> timo.aaltonen@aalto.fi wrote:
>>   	Hi
>>
>>     Here's some information that Stephen asked would be of use. There is
>> one forest, one domain, but three sites in the layout. The functional
>> level of the forest and the domain is W2008, but the servers have 2008R2.
>> 
>> And the full backtrace of the hung process:
>
>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>> (mutex=0x7f8f6553fc80)
>>       at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>> No locals.
>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>> dn=0x0,
>>       mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
>> flags=2,
>>       interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at
>> sasl.c:426
>>           rc = -1921681294
>>           smechs = 0x0
>
> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in 
> June 2001. Perhaps back then it was unsafe to have multiple SASL operations 
> outstanding at once; I would expect that was only an issue in the Cyrus 1.5 
> days and it should be safe now with Cyrus 2.x. We should probably just delete 
> this mutex.

Ok, so by doing this:

--- openldap-2.4.23.orig/libraries/libldap/sasl.c
+++ openldap-2.4.23/libraries/libldap/sasl.c
@@ -421,10 +421,11 @@
  {
         int rc;
         char *smechs = NULL;
-
+/*
  #if defined( LDAP_R_COMPILE ) && defined( HAVE_CYRUS_SASL )
         ldap_pvt_thread_mutex_lock( &ldap_int_sasl_mutex );
  #endif
+*/
  #ifdef LDAP_CONNECTIONLESS
         if( LDAP_IS_UDP(ld) ) {
                 /* Just force it to simple bind, silly to make the user

--

.. the process doesn't hang anymore. But it still doesn't do what it's 
supposed to, but that could be a bug in SSSD. I'll investigate further.

Thanks!

-- 
Timo Aaltonen
Systems Specialist, Aalto IT

Comment 12 Howard Chu 2011-02-01 13:33:24 UTC
changed state Open to Feedback
Comment 13 Howard Chu 2011-02-01 21:31:58 UTC
Timo Aaltonen wrote:
> On Thu, 20 Jan 2011, Howard Chu wrote:
>
>> timo.aaltonen@aalto.fi wrote:
>>>    	Hi
>>>
>>>      Here's some information that Stephen asked would be of use. There is
>>> one forest, one domain, but three sites in the layout. The functional
>>> level of the forest and the domain is W2008, but the servers have 2008R2.
>>>
>>> And the full backtrace of the hung process:
>>
>>> #3  0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock
>>> (mutex=0x7f8f6553fc80)
>>>        at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296
>>> No locals.
>>> #4  0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20,
>>> dn=0x0,
>>>        mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0,
>>> flags=2,
>>>        interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at
>>> sasl.c:426
>>>            rc = -1921681294
>>>            smechs = 0x0
>>
>> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in
>> June 2001. Perhaps back then it was unsafe to have multiple SASL operations
>> outstanding at once; I would expect that was only an issue in the Cyrus 1.5
>> days and it should be safe now with Cyrus 2.x. We should probably just delete
>> this mutex.
>
> Ok, so by doing this:
>
> --- openldap-2.4.23.orig/libraries/libldap/sasl.c
> +++ openldap-2.4.23/libraries/libldap/sasl.c
> @@ -421,10 +421,11 @@
>    {
>           int rc;
>           char *smechs = NULL;
> -
> +/*
>    #if defined( LDAP_R_COMPILE )&&  defined( HAVE_CYRUS_SASL )
>           ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex );
>    #endif
> +*/
>    #ifdef LDAP_CONNECTIONLESS
>           if( LDAP_IS_UDP(ld) ) {
>                   /* Just force it to simple bind, silly to make the user
>
> --
>
> .. the process doesn't hang anymore. But it still doesn't do what it's
> supposed to, but that could be a bug in SSSD. I'll investigate further.
>
> Thanks!
>
As I noted in a previous followup, it's not clear to me that the Cyrus SASL 
library is actually safe to use without that mutex. Also, going through your 
provided backtraces, I see the real issue is that two different requests were 
active at the same time. I.e., there was an active request that triggered a 
referral, and an unrelated request. You would also have avoided this issue if 
you waited for the request that triggered the referrals to complete before 
issuing any other requests.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 14 Quanah Gibson-Mount 2017-04-07 23:54:02 UTC
moved from Incoming to Software Bugs
Comment 15 Quanah Gibson-Mount 2017-05-10 16:56:24 UTC
Hi Stephen,

Would you be able to check and confirm if the fix for ITS#8648 that has 
been checked into RE24 resolves this issue?

Thanks,
Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 16 sgallagh@redhat.com 2017-05-10 17:09:43 UTC
Sorry, I'm not sure why you're directing this at me? I've not been
involved with this bug at all.

On 05/10/2017 12:56 PM, Quanah Gibson-Mount wrote:
> Hi Stephen,
>
> Would you be able to check and confirm if the fix for ITS#8648 that
> has been checked into RE24 resolves this issue?
>
> Thanks,
> Quanah
>
> -- 
>
> Quanah Gibson-Mount
> Product Architect
> Symas Corporation
> Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
> <http://www.symas.com>
>


Comment 17 Quanah Gibson-Mount 2017-05-10 17:13:19 UTC
--On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher 
<sgallagh@redhat.com> wrote:

> Sorry, I'm not sure why you're directing this at me? I've not been
> involved with this bug at all.

Hi Stephen,

I'm not sure what you mean by "not involved".  You are the one who filed 
the bug report with us.  Here is a direct link for a refresher:

<http://www.openldap.org/its/index.cgi/?findid=6798>

Regards,
Quanah



--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>


Comment 18 sgallagh@redhat.com 2017-05-10 17:19:50 UTC

On 05/10/2017 01:13 PM, Quanah Gibson-Mount wrote:
> --On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher
> <sgallagh@redhat.com> wrote:
>
>> Sorry, I'm not sure why you're directing this at me? I've not been
>> involved with this bug at all.
>
> Hi Stephen,
>
> I'm not sure what you mean by "not involved".  You are the one who
> filed the bug report with us.  Here is a direct link for a refresher:
>
> <http://www.openldap.org/its/index.cgi/?findid=6798>


In your previous email, you sent me the ITS number 8648. I missed that
this was in reference to 6798 which I filed in 2011.

I'm not sure if we still have a reproducer; I think we gave up years ago
on using the built-in referral tracking of libldap. I'm CCing Jakub
Hrozek who is the current tech lead on SSSD to see if he can still
reproduce the issue.

Comment 19 Jakub Hrozek 2017-05-11 07:56:21 UTC
On Wed, May 10, 2017 at 01:19:50PM -0400, Stephen Gallagher wrote:
> 
> 
> On 05/10/2017 01:13 PM, Quanah Gibson-Mount wrote:
> > --On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher
> > <sgallagh@redhat.com> wrote:
> >
> >> Sorry, I'm not sure why you're directing this at me? I've not been
> >> involved with this bug at all.
> >
> > Hi Stephen,
> >
> > I'm not sure what you mean by "not involved".  You are the one who
> > filed the bug report with us.  Here is a direct link for a refresher:
> >
> > <http://www.openldap.org/its/index.cgi/?findid=6798>
> 
> 
> In your previous email, you sent me the ITS number 8648. I missed that
> this was in reference to 6798 which I filed in 2011.
> 
> I'm not sure if we still have a reproducer; I think we gave up years ago
> on using the built-in referral tracking of libldap. I'm CCing Jakub
> Hrozek who is the current tech lead on SSSD to see if he can still
> reproduce the issue.

I'm sorry, but I couldn't reproduce the issue any longer either. I remember
from some of RH customers that they were occasionally hitting this (or
similar?) issue and working around it by disabling the LDAP referral
chasing in SSSD but I couldn't reproduce now on demand, sorry.