[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd quits under heavy load



On 3/14/06, Quanah Gibson-Mount <quanah@stanford.edu> wrote:
>
>
> --On Tuesday, March 14, 2006 10:19 AM -0800 Herb Hrowal <hhrowal@gmail.com>
> wrote:
>
> > I'm having a problem where slapd decides to shut itself down under
> > heavy load. I've traced the problem to this section of code in
> > daemon.c.
> >
> >               switch(ns = SLAP_EVENT_WAIT(tvp)) {
> >               case -1: {      /* failure - try again */
> >                               int err = sock_errno();
> >
> >                               if( err == EBADF
> ># ifdef WSAENOTSOCK
> >                                       /* you'd think this would be EBADF */
> >                                       || err == WSAENOTSOCK
> ># endif
> >                               ) {
> >                                       if (++ebadf < SLAPD_EBADF_LIMIT)
> >                                               continue;
> >                               }
> >
> >                               if( err != EINTR ) {
> >                                       Debug( LDAP_DEBUG_CONNS,
> >                                               "daemon: select failed (%d): %s\n",
> >                                               err, sock_errstr(err), 0 );
> >                                       slapd_shutdown = 2;
> >                               }
> >                       }
> >                       continue;
> >
> > SLAPD_EBADF_LIMIT is defined as 16.
> >
> > My question is, why was 16 chosen for this value? Is there any reason
> > for us to either increase this limit or remove this section
> > alltogether? We would like to get to the point where slapd doesn't
> > silently stop running.
>
> What version of OpenLDAP are you running?  What operating system are you
> running under?
>
> That particular bit of code was to address a *bug* in the /Solaris Kernel/.
> Basically, that clause is only going to get triggered when there are
> serious problems with your OS.
>
> --Quanah
>
> --
> Quanah Gibson-Mount
> Principal Software Developer
> ITS/Shared Application Services
> Stanford University
> GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
>

We're running Openldap 2.2.23 under Solaris 8 Generic_108528-26.
However, the section of code above was copied from OpenLDAP 2.3.17 and
is virtually identical to the same section in 2.2.23. We run on
various hardware configurations including V210, 280r, and 440 and they
all exhibit the same behaviour.

What is the bug in the kernel that you were trying to work around? Is
there a better way to address it? Does it even exist in the latest
patches?

Our work around for slapd quitting is to have a watchdog process
relaunch it whenever it quits, but we would like to move away from
that method of keeping it running.

Herb