[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3855) recovery from EBADF error for slapd



On Jul 11, 2005, at 7:17 PM, Howard Chu wrote:
> Well yes, ordinarily I would agree. But this is one of those  
> "should never happen" situations and since we haven't got a  
> reliable means of reproducing it, we really have no idea what it  
> means when the situation does occur. The only time I can recall  
> seeing it was with someone's custom back-perl script that was  
> closing and dup'ing descriptors without mutexes. Their code was  
> doing something like
>    x = open(some file);
>    close(fd);
>    dup2( x, fd );
> (I have no idea why.) There is a race condition where the  
> descriptor got closed and was immediately re-used by a call to  
> accept(). When their dup2 executed, they were stomping on the  
> socket descriptor because they thought it still pointed at their  
> flat file. (The fix is not to do the explicit close, since dup/dup2  
> will do that automatically and atomically.) Anyway, in cases like  
> this of severe programmer error the connection table consistency is  
> totally shot, and you cannot rely on it, so the safest thing is to  
> actually bail out.

I'd love to find the problem that is causing the bad file descriptor  
in the first place... but in the absence of that fix it seemed better  
to take the approach of this patch than do nothing. The script I used  
to reproduce this is here:

http://www.opendarwin.org/~jtownsend/bindstress.pl

If you start up about 6 instances of that script it is a pretty good  
torture test. It just runs ldapwhoami over and over with no delay in  
between using random DNs to bind as. It assumes all the passwords are  
set to the same thing. There are a few parameters at the top of the  
script that can be adjusted depending on the particular LDAP server  
in use. Typically I had at least 2000 user record when doing this test.

It's entirely possible that this problem is specific to the version  
of OpenLDAP included with Mac OS X as we have some changes in there  
to support Password Server authentication, but we were never able to  
track down a problem within that code which was causing EBADF errors.

-Jason