Full_Name: Bernardo Innocenti Version: 2.2.26 to 2.3.5, possibly others OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (151.38.19.110) Calling sched_yield() makes slapd slow down considerably under load since kernel 2.6. See: http://www.ussg.iu.edu/hypermail/linux/kernel/0508.2/0516.html
bernie@develer.com wrote: > Full_Name: Bernardo Innocenti > Version: 2.2.26 to 2.3.5, possibly others > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (151.38.19.110) > > > Calling sched_yield() makes slapd slow down considerably > under load since kernel 2.6. > > See: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0508.2/0516.html > > Thanks for pointing this out. I'm not sure why we ever used sched_yield() in the first place, it was known to cause headaches on the Solaris build as well. We should probably leave it as a last resort, and prefer pthread_yield() instead. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
At 07:24 PM 8/17/2005, bernie@develer.com wrote: >Full_Name: Bernardo Innocenti >Version: 2.2.26 to 2.3.5, possibly others >OS: Linux >URL: ftp://ftp.openldap.org/incoming/ >Submission from: (NULL) (151.38.19.110) > > >Calling sched_yield() makes slapd slow down considerably >under load since kernel 2.6. > >See: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0508.2/0516.html I note that pthread_yield is not actually part of the POSIX Thread standard. It's part of the old DCE Thread interface, which was deprecated in favor of the POSIX Thread standard. No modern threaded application should be calling pthread_yield(3). sched_yield(3) should do the right thing. Guess we'll have to consider this yet another Linux pthread implementation quirk. Now, the fun thing would be to devise a configure test to determine that the deprecated pthread_yield(3) interface should be used (without resorting to #ifdef linux or equivalent). Oh, BTW, sched_yield(3p) on my SuSE Linux box says: The sched_yield() function shall force the running thread to relinquish the processor until it again becomes the head of its thread list. It takes no arguments. Kurt
Howard Chu wrote: > Thanks for pointing this out. I'm not sure why we ever used > sched_yield() in the first place, it was known to cause headaches on > the Solaris build as well. We should probably leave it as a last > resort, and prefer pthread_yield() instead. > I take this back. pthread_yield was deleted from the POSIX pthread specification. sched_yield is the correct call to be used by all POSIX-compliant apps. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
Howard Chu wrote: > bernie@develer.com wrote: > >> Full_Name: Bernardo Innocenti >> Version: 2.2.26 to 2.3.5, possibly others >> OS: Linux >> URL: ftp://ftp.openldap.org/incoming/ >> Submission from: (NULL) (151.38.19.110) >> >> >> Calling sched_yield() makes slapd slow down considerably >> under load since kernel 2.6. >> >> See: >> >> http://www.ussg.iu.edu/hypermail/linux/kernel/0508.2/0516.html > > Thanks for pointing this out. I'm not sure why we ever used > sched_yield() in the first place, it was known to cause headaches on the > Solaris build as well. We should probably leave it as a last resort, and > prefer pthread_yield() instead. In glibc 2.3, pthread_yield() is just a stub for sched_yield(). I don't quite understand why OpenLDAP ever needs to poll with pthread_yield(). Instead of spinning on shared variables, locking should be done using pthread's synchronization primitives, most of which are implemented efficiently on Linux with futex(). If OpenLDAP must remain compatible with OSes where pthreads aren't available, locking can be abstracted in portability functions like sched_yield() was. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
bernie@develer.com wrote: > I don't quite understand why OpenLDAP ever needs to poll with > pthread_yield(). Instead of spinning on shared variables, locking > should be done using pthread's synchronization primitives, most of > which are implemented efficiently on Linux with futex(). > If OpenLDAP must remain compatible with OSes where pthreads aren't > available, locking can be abstracted in portability functions like > sched_yield() was. This is not an issue of OpenLDAP compatibility, or OSes which do or don't support pthreads. sched_yield() is a standard feature of pthreads. It is neither optional nor "non-portable" and is used heavily in more software than just OpenLDAP. If the current Linux kernel hackers want to convince all the application developers out there to rewrite their software for the 2.6 kernel's benefit that's certainly their prerogative, but they should not be surprised if this decision is met with resistance. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
Howard Chu wrote: >> If OpenLDAP must remain compatible with OSes where pthreads aren't >> available, locking can be abstracted in portability functions like >> sched_yield() was. > > This is not an issue of OpenLDAP compatibility, or OSes which do or > don't support pthreads. sched_yield() > is a standard feature of pthreads. It is neither optional nor > "non-portable" and is used heavily in more software than just OpenLDAP. > If the current Linux kernel hackers want to convince all the application > developers out there to rewrite their software for the 2.6 kernel's > benefit that's certainly their prerogative, but they should not be > surprised if this decision is met with resistance. I, too, consider the old behavior of sched_yield() in 2.4 a better approximation of my personal idea of how yielding the CPU should be handled by a scheduler with dynamic and static priorities. However, the kernel hackers appear to think otherwise: http://marc.theaimsgroup.com/?l=linux-kernel&m=112433402319824&w=4 I don't know what leads to this conclusion, but at least it seems it's done deliberately and not by mistake. I'll ask out of curiosity. Regardless, a threaded server application such as slapd shouldn't ever need to call sched_yield(). The pthreads API provides plenty of synchronization primitives that don't require low-level tweaking of scheduling behavior and are therefore more portable and usually more efficient. Disclaimer: I know very little about slapd internals, so I may well have overseen something... -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
bernie@develer.com wrote: > Regardless, a threaded server application such as slapd > shouldn't ever need to call sched_yield(). The pthreads > API provides plenty of synchronization primitives that don't > require low-level tweaking of scheduling behavior and are > therefore more portable and usually more efficient. > > Disclaimer: I know very little about slapd internals, > so I may well have overseen something... Please try the OpenLDAP 2.3 release; I don't believe any performance issue exists here. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
Howard Chu wrote: > bernie@develer.com wrote: > >> Regardless, a threaded server application such as slapd >> shouldn't ever need to call sched_yield(). The pthreads >> API provides plenty of synchronization primitives that don't >> require low-level tweaking of scheduling behavior and are >> therefore more portable and usually more efficient. >> >> Disclaimer: I know very little about slapd internals, >> so I may well have overseen something... > > Please try the OpenLDAP 2.3 release; I don't believe any performance > issue exists here. Thank you! Is anyone providing ready-made RPMs for Fedora? Otherwise, I'll have to build OpenLDAP from source and install it in /usr/local, which I tend to avoid to reduce maintenance overhead on the servers. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
Bernie, on fedora-devel list you mention that this is fixed in 2.3.7. Can you point to the relevant CVS commit by which you make this statement. I don't see this issue mentioned in 2.3.7 CHANGES file
yusufg@outblaze.com wrote: > Bernie, on fedora-devel list you mention that this is fixed in 2.3.7. > Can you point to the relevant CVS commit by which you make this > statement. > I don't see this issue mentioned in 2.3.7 CHANGES file I didn't see Bernie's confirmation on this ITS. But there were a large number of changes between release 2.2.x and 2.3.1 that are not tracked in any CHANGES file (new features, miscellaneous fixes for problems discovered in HEAD, etc...). -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
Yusuf Goolamabbas wrote: > Bernie, on fedora-devel list you mention that this is fixed in 2.3.7. > Can you point to the relevant CVS commit by which you make this statement. > I don't see this issue mentioned in 2.3.7 CHANGES file Howard Chu told me the performance problem was mitigated by optimizing away most of the unnecessary calls to sched_yield(). -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
changed notes changed state Open to Closed
Bernardo Innocenti wrote: > Yusuf Goolamabbas wrote: > >> Bernie, on fedora-devel list you mention that this is fixed in 2.3.7. >> Can you point to the relevant CVS commit by which you make this statement. >> I don't see this issue mentioned in 2.3.7 CHANGES file > > Howard Chu told me the performance problem was mitigated by > optimizing away most of the unnecessary calls to sched_yield(). I've switched to OpenLDAP 2.3.11 and the performance problem is still there, as bad as ever. I must now wait 20s to see the output of "ls -l /home" under load. Please reopen this bug. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
Howard Chu wrote: > Please try the OpenLDAP 2.3 release; I don't believe any performance > issue exists here. Sorry, I've just upgraded to 2.3.11 and the problem is still there. (see the audit trail of #3950 for more information). Is there a way to build slapd without threading? Or using a backend that doesn't require sched_yield()? This issue is killing a critical server of mine. Thank you for helping. -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
Bernardo Innocenti wrote: > Howard Chu wrote: > >> Please try the OpenLDAP 2.3 release; I don't believe any performance >> issue exists here. >> > > Sorry, I've just upgraded to 2.3.11 and the problem is still there. > (see the audit trail of #3950 for more information). > > Is there a way to build slapd without threading? There was, but I suspect a lot of the code in 2.3 will break without it now. > Or using a backend > that doesn't require sched_yield()? This issue is killing a critical > server of mine. > Try this patch for libldap_r/thr_posix.c; recompile libldap_r and relink slapd... RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap_r/thr_posix.c,v retrieving revision 1.40 diff -u -r1.40 thr_posix.c --- thr_posix.c 17 Sep 2005 21:28:08 -0000 1.40 +++ thr_posix.c 25 Nov 2005 20:07:23 -0000 @@ -20,6 +20,11 @@ #include <ac/errno.h> +#if defined( HAVE_YIELDING_SELECT ) +#include <ac/socket.h> +#include <ac/time.h> +#endif + #include "ldap_pvt_thread.h" /* Get the thread interface */ #define LDAP_THREAD_IMPLEMENTATION #define LDAP_THREAD_RDWR_IMPLEMENTATION @@ -207,7 +212,11 @@ int ldap_pvt_thread_yield( void ) { -#if HAVE_THR_YIELD +#if HAVE_YIELDING_SELECT + struct timeval tv = {0,0}; + select( 0, NULL, NULL, NULL, &tv ); + return 0; +#elif HAVE_THR_YIELD return thr_yield(); #elif HAVE_PTHREADS == 10 -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/
Howard Chu wrote: > Try this patch for libldap_r/thr_posix.c; recompile libldap_r and relink > slapd... IT WORKS! Everything is blazingly fast again, thank you! I've attached this patch to the RedHat bug I had opened. > > RCS file: /repo/OpenLDAP/pkg/ldap/libraries/libldap_r/thr_posix.c,v > retrieving revision 1.40 > diff -u -r1.40 thr_posix.c > --- thr_posix.c 17 Sep 2005 21:28:08 -0000 1.40 > +++ thr_posix.c 25 Nov 2005 20:07:23 -0000 > @@ -20,6 +20,11 @@ > > #include <ac/errno.h> > > +#if defined( HAVE_YIELDING_SELECT ) > +#include <ac/socket.h> > +#include <ac/time.h> > +#endif > + > #include "ldap_pvt_thread.h" /* Get the thread interface */ > #define LDAP_THREAD_IMPLEMENTATION > #define LDAP_THREAD_RDWR_IMPLEMENTATION > @@ -207,7 +212,11 @@ > int > ldap_pvt_thread_yield( void ) > { > -#if HAVE_THR_YIELD > +#if HAVE_YIELDING_SELECT > + struct timeval tv = {0,0}; > + select( 0, NULL, NULL, NULL, &tv ); > + return 0; > +#elif HAVE_THR_YIELD > return thr_yield(); > > #elif HAVE_PTHREADS == 10 > > > > -- // Bernardo Innocenti - Develer S.r.l., R&D dept. \X/ http://www.develer.com/
changed notes changed state Closed to Test
changed notes changed state Test to Suspended
changed notes changed state Suspended to Test moved from Incoming to Software Bugs
changed notes changed state Test to Release moved from Software Bugs to Build
changed notes changed state Release to Closed
moved from Build to Archive.Build
fixed in HEAD/re23 patched for all Linux, bug is only on kernel>=2.6