[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3855) recovery from EBADF error for slapd



Jason Townsend wrote:
>  It looks like this fix was done in before I merged in 2.2.19 actually
>  (I know, bad Jason for only just submitting this patch now), so at
>  the time I was running into the 100% CPU rather than an abnormal exit
>  of slapd which is what ITS 3400 provided. Either way I think having
>  slapd be able to recover from this on its own is better.

Well yes, ordinarily I would agree. But this is one of those "should 
never happen" situations and since we haven't got a reliable means of 
reproducing it, we really have no idea what it means when the situation 
does occur. The only time I can recall seeing it was with someone's 
custom back-perl script that was closing and dup'ing descriptors without 
mutexes. Their code was doing something like
    x = open(some file);
    close(fd);
    dup2( x, fd );
(I have no idea why.) There is a race condition where the descriptor got 
closed and was immediately re-used by a call to accept(). When their 
dup2 executed, they were stomping on the socket descriptor because they 
thought it still pointed at their flat file. (The fix is not to do the 
explicit close, since dup/dup2 will do that automatically and 
atomically.) Anyway, in cases like this of severe programmer error the 
connection table consistency is totally shot, and you cannot rely on it, 
so the safest thing is to actually bail out.

> > The patch looks interesting; probably it should break out of the
> > loop after it detects a single bad descriptor. (It is already
> > pretty rare to have one bad descriptor, what's the likelihood of
> > more than one?)
>
>  That would be easy enough to do. If there was more than one bad
>  descriptor though you'd have to iterate over the descriptors once for
>  each of them in that case.
>
> > I haven't looked closely at the code yet, does it work with
> > outbound connections too (e.g. syncrepl consumer)?
>
>  We don't currently use syncrepl so it's possible this patch needs to
>  be modified to take that into account... are those connections in the
>  same connection pool as incoming connections?

Yes, the consumer connections use the same connection pool, though the 
structures are filled a little bit dfferently.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support