[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#3855) recovery from EBADF error for slapd
Jason Townsend wrote:
> It looks like this fix was done in before I merged in 2.2.19 actually
> (I know, bad Jason for only just submitting this patch now), so at
> the time I was running into the 100% CPU rather than an abnormal exit
> of slapd which is what ITS 3400 provided. Either way I think having
> slapd be able to recover from this on its own is better.
Well yes, ordinarily I would agree. But this is one of those "should
never happen" situations and since we haven't got a reliable means of
reproducing it, we really have no idea what it means when the situation
does occur. The only time I can recall seeing it was with someone's
custom back-perl script that was closing and dup'ing descriptors without
mutexes. Their code was doing something like
x = open(some file);
close(fd);
dup2( x, fd );
(I have no idea why.) There is a race condition where the descriptor got
closed and was immediately re-used by a call to accept(). When their
dup2 executed, they were stomping on the socket descriptor because they
thought it still pointed at their flat file. (The fix is not to do the
explicit close, since dup/dup2 will do that automatically and
atomically.) Anyway, in cases like this of severe programmer error the
connection table consistency is totally shot, and you cannot rely on it,
so the safest thing is to actually bail out.
> > The patch looks interesting; probably it should break out of the
> > loop after it detects a single bad descriptor. (It is already
> > pretty rare to have one bad descriptor, what's the likelihood of
> > more than one?)
>
> That would be easy enough to do. If there was more than one bad
> descriptor though you'd have to iterate over the descriptors once for
> each of them in that case.
>
> > I haven't looked closely at the code yet, does it work with
> > outbound connections too (e.g. syncrepl consumer)?
>
> We don't currently use syncrepl so it's possible this patch needs to
> be modified to take that into account... are those connections in the
> same connection pool as incoming connections?
Yes, the consumer connections use the same connection pool, though the
structures are filled a little bit dfferently.
--
-- Howard Chu
Chief Architect, Symas Corp. Director, Highland Sun
http://www.symas.com http://highlandsun.com/hyc
Symas: Premier OpenSource Development and Support