Full_Name: Jan Vcelak Version: 2.4.29 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (209.132.186.34) There is a possible regression caused by 06ec9f1db2bb9dc304a4adcd0d0506203cf9e6b6. SSSD's event loop listens for events on a file descriptor retrieved using ber_sockbuf_ctrl(sb, LBER_SB_OPT_GET_FD, &ber_fd). If there are some data to be read, processing using ldap_result is triggered. (Internally, tevent from Samba project is used, which uses epoll.) But sometimes, ldap_result() returns -1 error code. It seems that the descriptor is marked as readable untimely. This problem is not present with OpenLDAP 2.4.26, it appears with all future versions. When the mentioned commit is reverted, everything works well. Original report in our Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=771484 It was also reported in Ubuntu: https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/929888 And i similar report, hopefully with the same cause: https://bugzilla.redhat.com/show_bug.cgi?id=790414
jvcelak@redhat.com wrote: > Full_Name: Jan Vcelak > Version: 2.4.29 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (209.132.186.34) > > > There is a possible regression caused by > 06ec9f1db2bb9dc304a4adcd0d0506203cf9e6b6. > > SSSD's event loop listens for events on a file descriptor retrieved using > ber_sockbuf_ctrl(sb, LBER_SB_OPT_GET_FD,&ber_fd). If there are some data to be > read, processing using ldap_result is triggered. (Internally, tevent from Samba > project is used, which uses epoll.) But sometimes, ldap_result() returns -1 > error code. It seems that the descriptor is marked as readable untimely. > > This problem is not present with OpenLDAP 2.4.26, it appears with all future > versions. When the mentioned commit is reverted, everything works well. > > Original report in our Bugzilla: > https://bugzilla.redhat.com/show_bug.cgi?id=771484 Thanks for the report and the detailed investigation, but this sounds to me like a kernel bug. We expect select/epoll/whatever to only return that a descriptor is readable if it actually is readable. (Gosh, who'dathunkit.) If not, then these syscalls are not behaving as documented/designed. > > It was also reported in Ubuntu: > https://bugs.launchpad.net/ubuntu/+source/openldap/+bug/929888 > > And i similar report, hopefully with the same cause: > https://bugzilla.redhat.com/show_bug.cgi?id=790414 > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu wrote: > jvcelak@redhat.com wrote: >> Full_Name: Jan Vcelak >> Version: 2.4.29 >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (209.132.186.34) >> >> >> There is a possible regression caused by >> 06ec9f1db2bb9dc304a4adcd0d0506203cf9e6b6. >> >> SSSD's event loop listens for events on a file descriptor retrieved using >> ber_sockbuf_ctrl(sb, LBER_SB_OPT_GET_FD,&ber_fd). If there are some data to be >> read, processing using ldap_result is triggered. (Internally, tevent from Samba >> project is used, which uses epoll.) But sometimes, ldap_result() returns -1 >> error code. It seems that the descriptor is marked as readable untimely. >> >> This problem is not present with OpenLDAP 2.4.26, it appears with all future >> versions. When the mentioned commit is reverted, everything works well. >> >> Original report in our Bugzilla: >> https://bugzilla.redhat.com/show_bug.cgi?id=771484 > > Thanks for the report and the detailed investigation, but this sounds to me > like a kernel bug. We expect select/epoll/whatever to only return that a > descriptor is readable if it actually is readable. (Gosh, who'dathunkit.) If > not, then these syscalls are not behaving as documented/designed. By the way, the patch in question was due to ITS#7035; you should read that ITS for the motivation behind this change. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
----- Original Message ----- > By the way, the patch in question was due to ITS#7035; you should > read that > ITS for the motivation behind this change. Yes, I have read that ITS. I'm just not sure, if I understand it correctly. My server runs on the same machine as the SSSD client. The connection should not be closed unexpectedly. Therefore I think that POLL_HANGUP is not relevant.
Jan Vcelak wrote: > ----- Original Message ----- >> By the way, the patch in question was due to ITS#7035; you should >> read that >> ITS for the motivation behind this change. > > Yes, I have read that ITS. I'm just not sure, if I understand it > correctly. My server runs on the same machine as the SSSD client. > The connection should not be closed unexpectedly. Therefore > I think that POLL_HANGUP is not relevant. To clarify - are all of the failures occuring when TLS is in use on the LDAP session? It's common for the socket layer to have data, but for the TLS layer to report that nothing is available yet since it hasn't read a complete record yet. If that's what is happening here, then yes, we need to fix this patch. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
hyc@symas.com wrote: > Jan Vcelak wrote: >> ----- Original Message ----- >>> By the way, the patch in question was due to ITS#7035; you should >>> read that >>> ITS for the motivation behind this change. >> >> Yes, I have read that ITS. I'm just not sure, if I understand it >> correctly. My server runs on the same machine as the SSSD client. >> The connection should not be closed unexpectedly. Therefore >> I think that POLL_HANGUP is not relevant. > > To clarify - are all of the failures occuring when TLS is in use on the LDAP > session? It's common for the socket layer to have data, but for the TLS layer > to report that nothing is available yet since it hasn't read a complete record > yet. If that's what is happening here, then yes, we need to fix this patch. > Please test the attached patch and see if the problem is resolved, thanks. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed state Open to Active
Thank you for quick responses. :-) > To clarify - are all of the failures occuring when TLS is in use on > the LDAP session? It's common for the socket layer to have data, but > for the TLS layer to report that nothing is available yet since it > hasn't read a complete record yet. If that's what is happening here, > then yes, we need to fix this patch. No, TLS is disabled in my case. > Please test the attached patch and see if the problem is resolved, > thanks. The patch didn't help. I'm still getting the error. Jan
Here is what I have tried. This is the processing of the failing ldap_result(): (gdb) c Continuing. Breakpoint 3, ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1005 1005 sip = (struct selectinfo *)ld->ld_selectinfo; (gdb) n 1007 if (ber_sockbuf_ctrl( sb, LBER_SB_OPT_DATA_READY, NULL )) (gdb) 1010 ber_sockbuf_ctrl( sb, LBER_SB_OPT_GET_FD, &sd ); (gdb) 1016 for(i=0; i < sip->si_maxfd; i++) { (gdb) 1017 if( sip->si_fds[i].fd == sd ) { (gdb) 1018 return sip->si_fds[i].revents & POLL_READ; (gdb) p sip->si_fds[0].revents $14 = 4 (gdb) bt full #0 ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1018 i = 0 sip = 0x7f48f3a2f590 sd = 17 #1 0x00007f48f17ab743 in wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) at ../../../libraries/libldap/result.c:357 lnext = 0x0 serviced = 0 lc_ready = 1 rc = -2 tv = {tv_sec = 0, tv_usec = 0} tv0 = {tv_sec = 0, tv_usec = 0} start_time_tv = {tv_sec = 1329407008, tv_usec = 270973} tvp = 0x7fff16e75c70 lc = 0x7f48f3a2db80 __PRETTY_FUNCTION__ = "wait4msg" #2 0x00007f48f17aaf0e in ldap_result (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) at ../../../libraries/libldap/result.c:117 rc = 32584 __PRETTY_FUNCTION__ = "ldap_result" #3 0x00007f48e6da897e in sdap_process_result (ev=0x7f48f39f94d0, pvt=0x7f48f3a38b70) at src/providers/ldap/sdap_async.c:179 sh = 0x7f48f3a38b70 no_timeout = {tv_sec = 0, tv_usec = 0} te = 0x7f48f3a2cab0 msg = 0x0 ret = 101 __FUNCTION__ = "sdap_process_result" #4 0x00007f48e6da8594 in sdap_ldap_next_result (ev=0x7f48f39f94d0, te=0x7f48f3a2cab0, tv=..., pvt=0x7f48f3a38b70) at src/providers/ldap/sdap_async.c:159 No locals. #5 0x00007f48f2f19310 in tevent_common_loop_timer_delay (ev=ev@entry=0x7f48f39f94d0) at ../tevent_timed.c:254 current_time = {tv_sec = 0, tv_usec = 0} te = 0x7f48f3a2cab0 #6 0x00007f48f2f189ec in std_event_loop_once (ev=0x7f48f39f94d0, location=<optimized out>) at ../tevent_standard.c:558 std_ev = 0x7f48f39f9590 tval = {tv_sec = 0, tv_usec = 0} #7 0x00007f48f2f15cb0 in _tevent_loop_once (ev=ev@entry=0x7f48f39f94d0, location=location@entry=0x7f48f37e52e7 "src/util/server.c:572") at ../tevent.c:504 ret = <optimized out> nesting_stack_ptr = 0x0 #8 0x00007f48f2f15e3b in tevent_common_loop_wait (ev=0x7f48f39f94d0, location=0x7f48f37e52e7 "src/util/server.c:572") at ../tevent.c:605 ret = <optimized out> #9 0x00007f48f37b8a85 in server_loop (main_ctx=0x7f48f39fa640) at src/util/server.c:572 No locals. #10 0x00007f48f3779a88 in main (argc=5, argv=0x7fff16e763a8) at src/providers/data_provider_be.c:2003 opt = -1 pc = 0x7f48f39f8010 be_domain = 0x7f48f39f8420 "foo" srv_name = 0x7f48f39f80d0 "sssd[be[foo]]" main_ctx = 0x7f48f39fa640 confdb_path = 0x7f48f39f8140 "config/domain/foo" ret = 0 long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7f48f2d06200, val = 0, descrip = 0x7f48f37dcb7c "Help options:", argDescrip = 0x0}, {longName = 0x7f48f37dcb8a "debug-level", shortName = 100 'd', argInfo = 2, arg = 0x7f48f39f4738, val = 0, descrip = 0x7f48f37dcb96 "Debug level", argDescrip = 0x0}, {longName = 0x7f48f37dcba2 "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x7f48f39f473c, val = 0, descrip = 0x7f48f37dcbb8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, { longName = 0x7f48f37dcbe9 "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x7f48f39f4598, val = 0, descrip = 0x7f48f37dcbfa "Add debug timestamps", argDescrip = 0x0}, {longName = 0x7f48f37dcc0f "debug-microseconds", shortName = 0 '\000', argInfo = 2, arg = 0x7f48f39f459c, val = 0, descrip = 0x7f48f37dcc28 "Show timestamps with microseconds", argDescrip = 0x0}, {longName = 0x7f48f37dcc4a "domain", shortName = 0 '\000', argInfo = 1, arg = 0x7fff16e76058, val = 0, descrip = 0x7f48f37dcc58 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} __FUNCTION__ = "main" (gdb) fin Run till exit from #0 ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1028 wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) at ../../../libraries/libldap/result.c:356 356 if ( lc->lconn_status == LDAP_CONNST_CONNECTED && Value returned is $15 = 0 (gdb) c Continuing. Breakpoint 2, wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) at ../../../libraries/libldap/result.c:378 378 rc = -1; As you may see, there is only a POLLOUT flag set: > (gdb) p sip->si_fds[0].revents > $14 = 4 /usr/include/bits/poll.h: > #define POLLOUT 0x004 /* Writing now will not block. */ So I believe, that there is some other problem, because lc_ready is 1. And without the reverted patch, the operation would just keep looping with LDAP_MSG_X_KEEP_LOOKING. (Or will timeout alternatively.)
jvcelak@redhat.com wrote: > Here is what I have tried. This is the processing of the failing ldap_result(): > > (gdb) c > Continuing. > > Breakpoint 3, ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1005 > 1005 sip = (struct selectinfo *)ld->ld_selectinfo; > (gdb) n > 1007 if (ber_sockbuf_ctrl( sb, LBER_SB_OPT_DATA_READY, NULL )) > (gdb) > 1010 ber_sockbuf_ctrl( sb, LBER_SB_OPT_GET_FD,&sd ); > (gdb) > 1016 for(i=0; i< sip->si_maxfd; i++) { > (gdb) > 1017 if( sip->si_fds[i].fd == sd ) { > (gdb) > 1018 return sip->si_fds[i].revents& POLL_READ; > (gdb) p sip->si_fds[0].revents > $14 = 4 > (gdb) bt full > #0 ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1018 > i = 0 > sip = 0x7f48f3a2f590 > sd = 17 > #1 0x00007f48f17ab743 in wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) > at ../../../libraries/libldap/result.c:357 > lnext = 0x0 > serviced = 0 > lc_ready = 1 > rc = -2 > tv = {tv_sec = 0, tv_usec = 0} > tv0 = {tv_sec = 0, tv_usec = 0} > start_time_tv = {tv_sec = 1329407008, tv_usec = 270973} > tvp = 0x7fff16e75c70 > lc = 0x7f48f3a2db80 > __PRETTY_FUNCTION__ = "wait4msg" > #2 0x00007f48f17aaf0e in ldap_result (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) > at ../../../libraries/libldap/result.c:117 > rc = 32584 > __PRETTY_FUNCTION__ = "ldap_result" > #3 0x00007f48e6da897e in sdap_process_result (ev=0x7f48f39f94d0, pvt=0x7f48f3a38b70) at src/providers/ldap/sdap_async.c:179 > sh = 0x7f48f3a38b70 > no_timeout = {tv_sec = 0, tv_usec = 0} > te = 0x7f48f3a2cab0 > msg = 0x0 > ret = 101 > __FUNCTION__ = "sdap_process_result" > #4 0x00007f48e6da8594 in sdap_ldap_next_result (ev=0x7f48f39f94d0, te=0x7f48f3a2cab0, tv=..., pvt=0x7f48f3a38b70) > at src/providers/ldap/sdap_async.c:159 > No locals. > #5 0x00007f48f2f19310 in tevent_common_loop_timer_delay (ev=ev@entry=0x7f48f39f94d0) at ../tevent_timed.c:254 > current_time = {tv_sec = 0, tv_usec = 0} > te = 0x7f48f3a2cab0 > #6 0x00007f48f2f189ec in std_event_loop_once (ev=0x7f48f39f94d0, location=<optimized out>) at ../tevent_standard.c:558 > std_ev = 0x7f48f39f9590 > tval = {tv_sec = 0, tv_usec = 0} > #7 0x00007f48f2f15cb0 in _tevent_loop_once (ev=ev@entry=0x7f48f39f94d0, > location=location@entry=0x7f48f37e52e7 "src/util/server.c:572") at ../tevent.c:504 > ret =<optimized out> > nesting_stack_ptr = 0x0 > #8 0x00007f48f2f15e3b in tevent_common_loop_wait (ev=0x7f48f39f94d0, location=0x7f48f37e52e7 "src/util/server.c:572") > at ../tevent.c:605 > ret =<optimized out> > #9 0x00007f48f37b8a85 in server_loop (main_ctx=0x7f48f39fa640) at src/util/server.c:572 > No locals. > #10 0x00007f48f3779a88 in main (argc=5, argv=0x7fff16e763a8) at src/providers/data_provider_be.c:2003 > opt = -1 > pc = 0x7f48f39f8010 > be_domain = 0x7f48f39f8420 "foo" > srv_name = 0x7f48f39f80d0 "sssd[be[foo]]" > main_ctx = 0x7f48f39fa640 > confdb_path = 0x7f48f39f8140 "config/domain/foo" > ret = 0 > long_options = {{longName = 0x0, shortName = 0 '\000', argInfo = 4, arg = 0x7f48f2d06200, val = 0, > descrip = 0x7f48f37dcb7c "Help options:", argDescrip = 0x0}, {longName = 0x7f48f37dcb8a "debug-level", > shortName = 100 'd', argInfo = 2, arg = 0x7f48f39f4738, val = 0, descrip = 0x7f48f37dcb96 "Debug level", > argDescrip = 0x0}, {longName = 0x7f48f37dcba2 "debug-to-files", shortName = 102 'f', argInfo = 0, arg = 0x7f48f39f473c, > val = 0, descrip = 0x7f48f37dcbb8 "Send the debug output to files instead of stderr", argDescrip = 0x0}, { > longName = 0x7f48f37dcbe9 "debug-timestamps", shortName = 0 '\000', argInfo = 2, arg = 0x7f48f39f4598, val = 0, > descrip = 0x7f48f37dcbfa "Add debug timestamps", argDescrip = 0x0}, {longName = 0x7f48f37dcc0f "debug-microseconds", > shortName = 0 '\000', argInfo = 2, arg = 0x7f48f39f459c, val = 0, > descrip = 0x7f48f37dcc28 "Show timestamps with microseconds", argDescrip = 0x0}, {longName = 0x7f48f37dcc4a "domain", > shortName = 0 '\000', argInfo = 1, arg = 0x7fff16e76058, val = 0, > descrip = 0x7f48f37dcc58 "Domain of the information provider (mandatory)", argDescrip = 0x0}, {longName = 0x0, > shortName = 0 '\000', argInfo = 0, arg = 0x0, val = 0, descrip = 0x0, argDescrip = 0x0}} > __FUNCTION__ = "main" > (gdb) fin > Run till exit from #0 ldap_is_read_ready (ld=0x7f48f3a170a0, sb=0x7f48f3a22580) at ../../../libraries/libldap/os-ip.c:1028 > wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) > at ../../../libraries/libldap/result.c:356 > 356 if ( lc->lconn_status == LDAP_CONNST_CONNECTED&& > Value returned is $15 = 0 > (gdb) c > Continuing. > > Breakpoint 2, wait4msg (ld=0x7f48f3a170a0, msgid=-1, all=0, timeout=0x7fff16e75da0, result=0x7fff16e75d60) > at ../../../libraries/libldap/result.c:378 > 378 rc = -1; > > As you may see, there is only a POLLOUT flag set: > >> (gdb) p sip->si_fds[0].revents >> $14 = 4 > > /usr/include/bits/poll.h: >> #define POLLOUT 0x004 /* Writing now will not block. */ > > So I believe, that there is some other problem, because lc_ready is 1. And > without the reverted patch, the operation would just keep looping with > LDAP_MSG_X_KEEP_LOOKING. (Or will timeout alternatively.) Your debug session indicates that the socket is writable but libldap doesn't have any record of being blocked for any writes, so it doesn't check for writable state. Frankly this sounds more like an sssd bug, since libldap is still working fine for just about everything else out there. Perhaps you should set libldap's debug level to 7 and see what actually got handled in the library up to the point of failure. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Try this instead. Howard Chu wrote: > Please test the attached patch and see if the problem is resolved, thanks. > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Active to Test moved from Incoming to Software Bugs
Perfect! This patch resolves the issue. Thank you, Howard! Jan ----- Original Message ----- > This is a multi-part message in MIME format. > --------------060206020501030606090605 > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: 7bit > > Try this instead. > > Howard Chu wrote: > > Please test the attached patch and see if the problem is resolved, > > thanks. > >
changed notes changed state Test to Release
changed notes changed state Release to Closed
fix in master fixed in RE24