Full_Name: Stephen Gallagher Version: 2.4.21-0ubuntu5.2 OS: Ubuntu 10.04 URL: Submission from: (NULL) (98.110.239.235) This was discovered by a user of SSSD. When referrals are enabled with LDAP_OPT_REFERRALS on, SSSD has a rebind procedure set up to handle authenticating to the new server. This seems to work fine when we're dealing with a simple bind, but when we attempt to use SASL bind (for Kerberos-based GSSAPI authentication), we discovered a problem. Our rebind procedure calls ldap_sasl_interactive_bind_s() with LDAP_SASL_QUIET. This arrangement works fine for a single referral, however if the server has nested referrals (say, entry1 refers to entry2 which refers to entry3 on another server) then we hit a deadlock condition. Attaching gdb, we see the backtrace included at the bottom of this message. What appears to be happening is that for the first ldap_sasl_interactive_bind_s(), openldap is locking a mutex, and when it is called a second time it's attempting to lock that same mutex that has not yet been released. Backtrace: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00007f48285975d9 in _L_lock_953 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007f48285973fb in __pthread_mutex_lock (mutex=0x7f4829c05980) at pthread_mutex_lock.c:61 ignore1 = <value optimized out> ignore2 = 700471680 ignore3 = -512 __PRETTY_FUNCTION__ = "__pthread_mutex_lock" type = <value optimized out> #3 0x00007f48299d3e4e in ldap_sasl_interactive_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #4 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, url=0x2250220 "ldap://DomainDnsZones.org.example.com/DC=DomainDnsZones,DC=org,DC=example,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1624 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #5 0x00007f48299df6d1 in ldap_new_connection () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #6 0x00007f48299e0523 in ldap_send_server_request () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #7 0x00007f48299e11cd in ldap_chase_v3referrals () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #8 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #9 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #10 0x00007f48299d4788 in ldap_sasl_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #11 0x00007f48299d1751 in ldap_int_sasl_bind () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #12 0x00007f48299d3ea8 in ldap_sasl_interactive_bind_s () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #13 0x00007f4825ae00fd in sdap_rebind_proc (ldap=0x2241b80, url=0x225bc20 "ldap://ForestDnsZones.org.example.com/DC=ForestDnsZones,DC=org,DC=example,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1624 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #14 0x00007f48299df6d1 in ldap_new_connection () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #15 0x00007f48299e0523 in ldap_send_server_request () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #16 0x00007f48299e11cd in ldap_chase_v3referrals () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #17 0x00007f48299cbf95 in ?? () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #18 0x00007f48299ccc2d in ldap_result () from /usr/lib/libldap_r-2.4.so.2 No symbol table info available. #19 0x00007f4825acfcb4 in sdap_process_result (ev=0x221b1f0, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:178 sh = 0x2241650 no_timeout = {tv_sec = 0, tv_usec = 0} te = <value optimized out> msg = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_process_result" #20 0x00007f482b4f1825 in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #21 0x00007f482b4f363b in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #22 0x00007f482b4f09f0 in _tevent_loop_once () from /usr/lib/libtevent.so.0 No symbol table info available. #23 0x00007f482b4f0a5b in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #24 0x0000000000431e51 in server_loop (main_ctx=0x221c360) at src/util/server.c:526 No locals. #25 0x000000000040f8bb in main (argc=5, argv=<value optimized out>) at src/providers/data_provider_be.c:1333 opt = <value optimized out> pc = <value optimized out> be_domain = 0x2212490 "AALTO" srv_name = <value optimized out> conf_entry = <value optimized out> main_ctx = 0x221c360 ret = 0 long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x644d00, val = 0, descrip = 0x436e64 "Help options:", argDescrip = 0x0}, { longName = 0x436e72 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x644de0, val = 0, descrip = 0x436e43 "Debug level", argDescrip = 0x0}, { longName = 0x436e7e "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x644de4, val = 0, descrip = 0x437ad8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x436e8d "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x644cc0, val = 0, descrip = 0x436e4f "Add debug timestamps", argDescrip = 0x0}, {longName = 0x438458 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fffe703f208, val = 0, descrip = 0x437b10 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main"
> Full_Name: Stephen Gallagher > Version: 2.4.21-0ubuntu5.2 I think this issue report is obsoleted by ITS#6510 (released with OL 2.4.22), which prevents binds from returning referrals. Please upgrade and re-check. p.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote: >> Full_Name: Stephen Gallagher >> Version: 2.4.21-0ubuntu5.2 > > I think this issue report is obsoleted by ITS#6510 (released with OL > 2.4.22), which prevents binds from returning referrals. Please upgrade > and re-check. Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510 that it was suggested that nested mutexes could be used to resolve this. Perhaps we should revisit that. As you can see from our backtrace, this is actually occurring during the processing of ldap_result(). We're handling a rebind as ldap_result travels through a series of referrals. - -- Stephen Gallagher RHCE 804006346421761 Delivering value year after year. Red Hat ranks #1 in value among software vendors. http://www.redhat.com/promo/vendor/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk03WlYACgkQeiVVYja6o6PiXACgmykXZhhKZkixweJTb/qIVPAc DMAAn2GHK7wu3nC5mkcYq7jI2f13Ql+N =rj7f -----END PGP SIGNATURE-----
sgallagh@redhat.com wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 01/19/2011 11:39 AM, masarati@aero.polimi.it wrote: >>> Full_Name: Stephen Gallagher >>> Version: 2.4.21-0ubuntu5.2 >> >> I think this issue report is obsoleted by ITS#6510 (released with OL >> 2.4.22), which prevents binds from returning referrals. Please upgrade >> and re-check. > > Upgraded to 2.4.23 and the same behavior occurs. I note in issue 6510 > that it was suggested that nested mutexes could be used to resolve this. > Perhaps we should revisit that. No. Nested mutexes are non-portable, and as already discussed in #6510, it is incorrect to process referrals returned in response to Bind requests. > As you can see from our backtrace, this is actually occurring during the > processing of ldap_result(). We're handling a rebind as ldap_result > travels through a series of referrals. Most likely your server is not using an actual Bind Response tag in its response message. It would be good if you could run this using a debug build (no optimization, full debug symbols present) and examine the tag that was parsed from the referral result. Looks like libldap needs to be changed to actually record the tag of the outgoing requests. (It ought to do so anyway, and probably should return a ProtocolError result if it receives a response message whose tag doesn't match its request type.) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com writes: > Looks like libldap needs to be changed to actually record the tag of the > outgoing requests. (It ought to do so anyway, and probably should return a > ProtocolError result if it receives a response message whose tag doesn't match > its request type.) Yes. OpenLDAP is too trunsting of incoming PDUs to be valid. ...as an option, if this would be noticeable for a lot of users. Users who Just Want Their Programs To Work without OpenLDAP breaking them, and won't be interested in "finger-pointing" against someone else's servers. -- Hallvard
Hi Here's some information that Stephen asked would be of use. There is one forest, one domain, but three sites in the layout. The functional level of the forest and the domain is W2008, but the servers have 2008R2. And the full backtrace of the hung process: #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 No locals. #1 0x00007f8f63ec05d9 in _L_lock_953 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007f8f63ec03fb in __pthread_mutex_lock (mutex=0x7f8f6553fc80) at pthread_mutex_lock.c:61 ignore1 = <value optimized out> ignore2 = 1700002944 ignore3 = -512 __PRETTY_FUNCTION__ = "__pthread_mutex_lock" type = <value optimized out> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock (mutex=0x7f8f6553fc80) at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 No locals. #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, dn=0x0, mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, flags=2, interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x2124a50) at sasl.c:426 rc = -1921681294 smechs = 0x0 #5 0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20, url=0x2125a00 "ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1637 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #6 0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, srvlist=0x7fffdebf22f8, use_ldsb=0, connect=1, bind=0x7fffdebf22b0) at request.c:518 srvfunc = 0x2122da0 err = 0 savedefconn = 0x21208d0 lc = 0x21227c0 async = 0 __PRETTY_FUNCTION__ = "ldap_new_connection" #7 0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, ber=0x2122760, msgid=8, parentreq=0x21304f0, srvlist=0x7fffdebf22f8, lc=0x0, bind=0x7fffdebf22b0) at request.c:211 lr = 0x63 incparent = 1 rc = 1697594315 #8 0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, lr=0x21304f0, refs=0x0, sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf24f8) at request.c:1211 unfollowed = 0x0 unfollowedcnt = 0 origreq = 0x21304f0 srv = 0x2125a60 ber = 0x2122760 refarray = 0x2122720 lc = 0x0 rc = 0 count = 0 i = 0 j = 0 id = 8 rinfo = {ri_msgid = 5, ri_request = 99, ri_url = 0x2125a00 "ldap://DomainDnsZones.domain.com/DC=DomainDnsZones,DC=domain,DC=com"} #9 0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=7, all=1, lc=0x2117760, result=0x7fffdebf26b0) at result.c:708 refs = 0x2122720 ber = 0x212e910 newmsg = 0x2117c20 l = 0x2124140 prev = 0x7debf25e0 id = 5 idx = 0 tag = 115 len = 76 foundit = 0 lr = 0x21304f0 tmplr = 0x7f8f6531039d dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, lr_outrefcnt = 0, lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, lr_res_msgtype = 0, lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, lr_ber = 0x0, lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 0x0, lr_child = 0x0, lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0} tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 0}, ber_tag = 0, ber_len = 85, ber_usertag = 0, ber_buf = 0x2131cd0 "\002\001\005s\204", ber_ptr = 0x2131d25 "", ber_end = 0x2131d25 "", ber_sos_ptr = 0x0, ber_rwptr = 0x0, ber_memctx = 0x0} rc = -2 refer_cnt = 0 hadref = 0 simple_request = 0 err = 1 lderr = 0 tmp = 0x0 chain_head = 0x0 moremsgs = 0 isv2 = 0 __PRETTY_FUNCTION__ = "try_read1msg" #10 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=7, all=1, timeout=0x7fffdebf25b0, result=0x7fffdebf26b0) at result.c:390 lnext = 0x2117760 lc_ready = 1 rc = -2 tv = {tv_sec = 6, tv_usec = 0} tv0 = {tv_sec = 6, tv_usec = 0} start_time_tv = {tv_sec = 1295533580, tv_usec = 63738} tvp = 0x7fffdebf25b0 lc = 0x2117760 __PRETTY_FUNCTION__ = "wait4msg" #11 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=7, all=1, timeout=0x0, result=0x7fffdebf26b0) at result.c:120 rc = 1227 __PRETTY_FUNCTION__ = "ldap_result" #12 0x00007f8f65300a06 in ldap_sasl_bind_s (ld=0x2117c20, dn=0x0, mechanism=0x7f8f5ffe894c "GSSAPI", cred=0x7fffdebf2740, sctrls=0x0, cctrls=0x0, servercredp=0x7fffdebf2758) at sasl.c:194 rc = 0 msgid = 7 result = 0x0 scredp = 0x0 #13 0x00007f8f652fce5b in ldap_int_sasl_bind (ld=0x2117c20, dn=0x0, mechs=0x210d530 "GSSAPI", sctrls=0x0, cctrls=0x0, flags=2, interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x21170b0) at cyrus.c:535 scred = 0x0 credlen = 32767 data = 0x8d757872de362e00 <Address 0x8d757872de362e00 out of bounds> mech = 0x7f8f5ffe894c "GSSAPI" pmech = 0x7f8f5ffe894c "GSSAPI" saslrc = 1 rc = 0 ssf = 0x0 ctx = 0x2127380 oldctx = 0x0 prompts = 0x0 credlen = 1202 ccred = {bv_len = 1202, bv_val = 0x2125540 "`\202\004\256\006\t*\206H\206\367\022\001\002\002\001"} sd = 23 ssl = 0x0 #14 0x00007f8f6530120f in ldap_sasl_interactive_bind_s (ld=0x2117c20, dn=0x0, mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, flags=2, interact=0x7f8f61405120 <sdap_sasl_interact>, defaults=0x21170b0) at sasl.c:464 rc = -1921681294 smechs = 0x0 #15 0x00007f8f6140888d in sdap_rebind_proc (ldap=0x2117c20, url=0x2126270 "ldap://ForestDnsZones.domain.com/DC=ForestDnsZones,DC=domain,DC=com", request=<value optimized out>, msgid=<value optimized out>, params=<value optimized out>) at src/providers/ldap/sdap_async_connection.c:1637 p = <value optimized out> sasl_mech = <value optimized out> user_dn = <value optimized out> password = {bv_len = 0, bv_val = 0x0} ctrls = {0x0, 0x0} tmp_ctx = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_rebind_proc" #16 0x00007f8f65310a46 in ldap_new_connection (ld=0x2117c20, srvlist=0x7fffdebf2bd8, use_ldsb=0, connect=1, bind=0x7fffdebf2b90) at request.c:518 srvfunc = 0x2122190 err = 0 savedefconn = 0x2117760 lc = 0x21208d0 async = 0 __PRETTY_FUNCTION__ = "ldap_new_connection" #17 0x00007f8f6530fdf0 in ldap_send_server_request (ld=0x2117c20, ber=0x21206e0, msgid=6, parentreq=0x21304f0, srvlist=0x7fffdebf2bd8, lc=0x0, bind=0x7fffdebf2b90) at request.c:211 lr = 0x63 incparent = 1 rc = 1697594315 #18 0x00007f8f653125dc in ldap_chase_v3referrals (ld=0x2117c20, lr=0x21304f0, refs=0x0, sref=1, errstrp=0x2130520, hadrefp=0x7fffdebf2dd8) at request.c:1211 unfollowed = 0x0 unfollowedcnt = 0 origreq = 0x21304f0 srv = 0x2116b40 ber = 0x21206e0 refarray = 0x212b470 lc = 0x0 rc = 0 count = 0 i = 0 j = 0 id = 6 rinfo = {ri_msgid = 5, ri_request = 99, ri_url = 0x2126270 "ldap://ForestDnsZones.domain.com/DC=ForestDnsZones,DC=domain,DC=com"} #19 0x00007f8f652f6165 in try_read1msg (ld=0x2117c20, msgid=-1, all=0, lc=0x2117760, result=0x7fffdebf2f68) at result.c:708 refs = 0x212b470 ber = 0x2122580 newmsg = 0x7f8f674287b8 l = 0x7fffdebf2ef0 prev = 0x7f8f67460cd8 id = 5 idx = 0 tag = 115 len = 76 foundit = 0 lr = 0x21304f0 tmplr = 0x100000000 dummy_lr = {lr_msgid = 0, lr_status = 0, lr_refcnt = 0, lr_outrefcnt = 0, lr_abandoned = 0, lr_origid = 0, lr_parentcnt = 0, lr_res_msgtype = 0, lr_res_errno = 0, lr_res_error = 0x0, lr_res_matched = 0x0, lr_ber = 0x0, lr_conn = 0x0, lr_dn = {bv_len = 0, bv_val = 0x0}, lr_parent = 0x0, lr_child = 0x0, lr_refnext = 0x0, lr_prev = 0x0, lr_next = 0x0} tmpber = {ber_opts = {lbo_valid = 2, lbo_options = 1, lbo_debug = 0}, ber_tag = 0, ber_len = 85, ber_usertag = 0, ber_buf = 0x2125410 "\002\001\005s\204", ber_ptr = 0x2125465 "", ber_end = 0x2125465 "", ber_sos_ptr = 0x0, ber_rwptr = 0x0, ber_memctx = 0x0} rc = -2 refer_cnt = 0 hadref = 0 simple_request = 0 err = 0 lderr = 32655 tmp = 0x0 chain_head = 0x0 moremsgs = 0 isv2 = 0 __PRETTY_FUNCTION__ = "try_read1msg" #20 0x00007f8f652f571b in wait4msg (ld=0x2117c20, msgid=-1, all=0, timeout=0x7fffdebf2f50, result=0x7fffdebf2f68) at result.c:390 lnext = 0x7f8f6742f000 lc_ready = 1 rc = -2 tv = {tv_sec = 0, tv_usec = 0} tv0 = {tv_sec = 0, tv_usec = 0} start_time_tv = {tv_sec = 1295533580, tv_usec = 60251} tvp = 0x7fffdebf2e90 lc = 0x2117760 __PRETTY_FUNCTION__ = "wait4msg" #21 0x00007f8f652f4ddf in ldap_result (ld=0x2117c20, msgid=-1, all=0, timeout=0x7fffdebf2f50, result=0x7fffdebf2f68) at result.c:120 rc = 32655 __PRETTY_FUNCTION__ = "ldap_result" #22 0x00007f8f613f7cb4 in sdap_process_result (ev=0x20f11f0, pvt=<value optimized out>) at src/providers/ldap/sdap_async.c:178 sh = 0x21176f0 no_timeout = {tv_sec = 0, tv_usec = 0} te = <value optimized out> msg = <value optimized out> ret = <value optimized out> __FUNCTION__ = "sdap_process_result" #23 0x00007f8f66e3077e in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #24 0x00007f8f66e2d9f0 in _tevent_loop_once () from /usr/lib/libtevent.so.0 No symbol table info available. #25 0x00007f8f66e2da5b in ?? () from /usr/lib/libtevent.so.0 No symbol table info available. #26 0x0000000000431e51 in server_loop (main_ctx=0x20f2360) at src/util/server.c:526 No locals. #27 0x000000000040f8bb in main (argc=5, argv=<value optimized out>) at src/providers/data_provider_be.c:1333 opt = <value optimized out> pc = <value optimized out> be_domain = 0x20e8490 "DOMAIN" srv_name = <value optimized out> conf_entry = <value optimized out> main_ctx = 0x20f2360 ret = 0 long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x644d00, val = 0, descrip = 0x436e64 "Help options:", argDescrip = 0x0}, { longName = 0x436e72 "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x644de0, val = 0, descrip = 0x436e43 "Debug level", argDescrip = 0x0}, { longName = 0x436e7e "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x644de4, val = 0, descrip = 0x437ad8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, {longName = 0x436e8d "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x644cc0, val = 0, descrip = 0x436e4f "Add debug timestamps", argDescrip = 0x0}, {longName = 0x438458 "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fffdebf31d8, val = 0, descrip = 0x437b10 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main" -- Timo Aaltonen Systems Specialist, Aalto IT
timo.aaltonen@aalto.fi wrote: > Hi > > Here's some information that Stephen asked would be of use. There is > one forest, one domain, but three sites in the layout. The functional > level of the forest and the domain is W2008, but the servers have 2008R2. > > And the full backtrace of the hung process: Thanks, but this trace is from 2.4.21, which is obsolete. Please use the current release (2.4.23) since the relevant code has changed. There is no point in us spending time tracking down issues in non-existent code. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > > Thanks, but this trace is from 2.4.21, which is obsolete. Please use the > current release (2.4.23) since the relevant code has changed. There is no > point in us spending time tracking down issues in non-existent code. And, one more time: please use a debug build, like I asked in my first reply. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Thu, 20 Jan 2011, Howard Chu wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > > Thanks, but this trace is from 2.4.21, which is obsolete. Please use the > current release (2.4.23) since the relevant code has changed. There is no > point in us spending time tracking down issues in non-existent code. Not true, I backported 2.4.23 from natty, and made it build in lucid pbuilder. nexus6 sssd # apt-cache policy libldap-2.4-2 libldap-2.4-2: Installed: 2.4.23-6ubuntu4aalto1 Candidate: 2.4.23-6ubuntu4aalto1 Version table: *** 2.4.23-6ubuntu4aalto1 0 100 /var/lib/dpkg/status 2.4.21-0ubuntu5.3 0 500 http://ubuntu.hut.fi/ubuntu/ lucid-updates/main Packages 2.4.21-0ubuntu5.2 0 500 http://ubuntu.hut.fi/ubuntu/ lucid-security/main Packages 2.4.21-0ubuntu5 0 500 http://ubuntu.hut.fi/ubuntu/ lucid/main Packages I believe there is no reason to rebuild sssd against that, since they are ABI compatible? -- Timo Aaltonen Systems Specialist, Aalto IT
timo.aaltonen@aalto.fi wrote: > Hi > > Here's some information that Stephen asked would be of use. There is > one forest, one domain, but three sites in the layout. The functional > level of the forest and the domain is W2008, but the servers have 2008R2. > > And the full backtrace of the hung process: > #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock > (mutex=0x7f8f6553fc80) > at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 > No locals. > #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, > dn=0x0, > mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, > flags=2, > interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at > sasl.c:426 > rc = -1921681294 > smechs = 0x0 This particular mutex seems kind of bogus to me; the code is from rev 1.31 in June 2001. Perhaps back then it was unsafe to have multiple SASL operations outstanding at once; I would expect that was only an issue in the Cyrus 1.5 days and it should be safe now with Cyrus 2.x. We should probably just delete this mutex. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > >> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >> (mutex=0x7f8f6553fc80) >> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >> No locals. >> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >> dn=0x0, >> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >> flags=2, >> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >> sasl.c:426 >> rc = -1921681294 >> smechs = 0x0 > > This particular mutex seems kind of bogus to me; the code is from rev 1.31 in > June 2001. Perhaps back then it was unsafe to have multiple SASL operations > outstanding at once; I would expect that was only an issue in the Cyrus 1.5 > days and it should be safe now with Cyrus 2.x. We should probably just delete > this mutex. > Although googling for "Cyrus sasl reentrancy" does not leave me with warm/fuzzy feelings. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Thu, 20 Jan 2011, Howard Chu wrote: > timo.aaltonen@aalto.fi wrote: >> Hi >> >> Here's some information that Stephen asked would be of use. There is >> one forest, one domain, but three sites in the layout. The functional >> level of the forest and the domain is W2008, but the servers have 2008R2. >> >> And the full backtrace of the hung process: > >> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >> (mutex=0x7f8f6553fc80) >> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >> No locals. >> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >> dn=0x0, >> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >> flags=2, >> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >> sasl.c:426 >> rc = -1921681294 >> smechs = 0x0 > > This particular mutex seems kind of bogus to me; the code is from rev 1.31 in > June 2001. Perhaps back then it was unsafe to have multiple SASL operations > outstanding at once; I would expect that was only an issue in the Cyrus 1.5 > days and it should be safe now with Cyrus 2.x. We should probably just delete > this mutex. Ok, so by doing this: --- openldap-2.4.23.orig/libraries/libldap/sasl.c +++ openldap-2.4.23/libraries/libldap/sasl.c @@ -421,10 +421,11 @@ { int rc; char *smechs = NULL; - +/* #if defined( LDAP_R_COMPILE ) && defined( HAVE_CYRUS_SASL ) ldap_pvt_thread_mutex_lock( &ldap_int_sasl_mutex ); #endif +*/ #ifdef LDAP_CONNECTIONLESS if( LDAP_IS_UDP(ld) ) { /* Just force it to simple bind, silly to make the user -- .. the process doesn't hang anymore. But it still doesn't do what it's supposed to, but that could be a bug in SSSD. I'll investigate further. Thanks! -- Timo Aaltonen Systems Specialist, Aalto IT
changed state Open to Feedback
Timo Aaltonen wrote: > On Thu, 20 Jan 2011, Howard Chu wrote: > >> timo.aaltonen@aalto.fi wrote: >>> Hi >>> >>> Here's some information that Stephen asked would be of use. There is >>> one forest, one domain, but three sites in the layout. The functional >>> level of the forest and the domain is W2008, but the servers have 2008R2. >>> >>> And the full backtrace of the hung process: >> >>> #3 0x00007f8f652f3bcb in ldap_pvt_thread_mutex_lock >>> (mutex=0x7f8f6553fc80) >>> at /tmp/buildd/openldap-2.4.23/libraries/libldap_r/thr_posix.c:296 >>> No locals. >>> #4 0x00007f8f653010bf in ldap_sasl_interactive_bind_s (ld=0x2117c20, >>> dn=0x0, >>> mechs=0x210d530 "GSSAPI", serverControls=0x0, clientControls=0x0, >>> flags=2, >>> interact=0x7f8f61405120<sdap_sasl_interact>, defaults=0x2124a50) at >>> sasl.c:426 >>> rc = -1921681294 >>> smechs = 0x0 >> >> This particular mutex seems kind of bogus to me; the code is from rev 1.31 in >> June 2001. Perhaps back then it was unsafe to have multiple SASL operations >> outstanding at once; I would expect that was only an issue in the Cyrus 1.5 >> days and it should be safe now with Cyrus 2.x. We should probably just delete >> this mutex. > > Ok, so by doing this: > > --- openldap-2.4.23.orig/libraries/libldap/sasl.c > +++ openldap-2.4.23/libraries/libldap/sasl.c > @@ -421,10 +421,11 @@ > { > int rc; > char *smechs = NULL; > - > +/* > #if defined( LDAP_R_COMPILE )&& defined( HAVE_CYRUS_SASL ) > ldap_pvt_thread_mutex_lock(&ldap_int_sasl_mutex ); > #endif > +*/ > #ifdef LDAP_CONNECTIONLESS > if( LDAP_IS_UDP(ld) ) { > /* Just force it to simple bind, silly to make the user > > -- > > .. the process doesn't hang anymore. But it still doesn't do what it's > supposed to, but that could be a bug in SSSD. I'll investigate further. > > Thanks! > As I noted in a previous followup, it's not clear to me that the Cyrus SASL library is actually safe to use without that mutex. Also, going through your provided backtraces, I see the real issue is that two different requests were active at the same time. I.e., there was an active request that triggered a referral, and an unrelated request. You would also have avoided this issue if you waited for the request that triggered the referrals to complete before issuing any other requests. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
moved from Incoming to Software Bugs
Hi Stephen, Would you be able to check and confirm if the fix for ITS#8648 that has been checked into RE24 resolves this issue? Thanks, Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>
Sorry, I'm not sure why you're directing this at me? I've not been involved with this bug at all. On 05/10/2017 12:56 PM, Quanah Gibson-Mount wrote: > Hi Stephen, > > Would you be able to check and confirm if the fix for ITS#8648 that > has been checked into RE24 resolves this issue? > > Thanks, > Quanah > > -- > > Quanah Gibson-Mount > Product Architect > Symas Corporation > Packaged, certified, and supported LDAP solutions powered by OpenLDAP: > <http://www.symas.com> >
--On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher <sgallagh@redhat.com> wrote: > Sorry, I'm not sure why you're directing this at me? I've not been > involved with this bug at all. Hi Stephen, I'm not sure what you mean by "not involved". You are the one who filed the bug report with us. Here is a direct link for a refresher: <http://www.openldap.org/its/index.cgi/?findid=6798> Regards, Quanah -- Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: <http://www.symas.com>
On 05/10/2017 01:13 PM, Quanah Gibson-Mount wrote: > --On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher > <sgallagh@redhat.com> wrote: > >> Sorry, I'm not sure why you're directing this at me? I've not been >> involved with this bug at all. > > Hi Stephen, > > I'm not sure what you mean by "not involved". You are the one who > filed the bug report with us. Here is a direct link for a refresher: > > <http://www.openldap.org/its/index.cgi/?findid=6798> In your previous email, you sent me the ITS number 8648. I missed that this was in reference to 6798 which I filed in 2011. I'm not sure if we still have a reproducer; I think we gave up years ago on using the built-in referral tracking of libldap. I'm CCing Jakub Hrozek who is the current tech lead on SSSD to see if he can still reproduce the issue.
On Wed, May 10, 2017 at 01:19:50PM -0400, Stephen Gallagher wrote: > > > On 05/10/2017 01:13 PM, Quanah Gibson-Mount wrote: > > --On Wednesday, May 10, 2017 2:09 PM -0400 Stephen Gallagher > > <sgallagh@redhat.com> wrote: > > > >> Sorry, I'm not sure why you're directing this at me? I've not been > >> involved with this bug at all. > > > > Hi Stephen, > > > > I'm not sure what you mean by "not involved". You are the one who > > filed the bug report with us. Here is a direct link for a refresher: > > > > <http://www.openldap.org/its/index.cgi/?findid=6798> > > > In your previous email, you sent me the ITS number 8648. I missed that > this was in reference to 6798 which I filed in 2011. > > I'm not sure if we still have a reproducer; I think we gave up years ago > on using the built-in referral tracking of libldap. I'm CCing Jakub > Hrozek who is the current tech lead on SSSD to see if he can still > reproduce the issue. I'm sorry, but I couldn't reproduce the issue any longer either. I remember from some of RH customers that they were occasionally hitting this (or similar?) issue and working around it by disabling the LDAP referral chasing in SSSD but I couldn't reproduce now on demand, sorry.