[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#3855) recovery from EBADF error for slapd



Jason Townsend wrote:
> On Jul 11, 2005, at 7:17 PM, Howard Chu wrote:
>> Well yes, ordinarily I would agree. But this is one of those "should 
>> never happen" situations and since we haven't got a reliable means of 
>> reproducing it, we really have no idea what it means when the 
>> situation does occur. The only time I can recall seeing it was with 
>> someone's custom back-perl script that was closing and dup'ing 
>> descriptors without mutexes. Their code was doing something like
>>    x = open(some file);
>>    close(fd);
>>    dup2( x, fd );
>> (I have no idea why.) There is a race condition where the descriptor 
>> got closed and was immediately re-used by a call to accept(). When 
>> their dup2 executed, they were stomping on the socket descriptor 
>> because they thought it still pointed at their flat file. (The fix is 
>> not to do the explicit close, since dup/dup2 will do that 
>> automatically and atomically.) Anyway, in cases like this of severe 
>> programmer error the connection table consistency is totally shot, 
>> and you cannot rely on it, so the safest thing is to actually bail out.
>
> I'd love to find the problem that is causing the bad file descriptor 
> in the first place... but in the absence of that fix it seemed better 
> to take the approach of this patch than do nothing. The script I used 
> to reproduce this is here:
>
> http://www.opendarwin.org/~jtownsend/bindstress.pl
>
> If you start up about 6 instances of that script it is a pretty good 
> torture test. It just runs ldapwhoami over and over with no delay in 
> between using random DNs to bind as. It assumes all the passwords are 
> set to the same thing. There are a few parameters at the top of the 
> script that can be adjusted depending on the particular LDAP server in 
> use. Typically I had at least 2000 user record when doing this test.
>
> It's entirely possible that this problem is specific to the version of 
> OpenLDAP included with Mac OS X as we have some changes in there to 
> support Password Server authentication, but we were never able to 
> track down a problem within that code which was causing EBADF errors.

If your slapd has to talk to an external Password Server to validate 
these Binds, then I'm going to have to place the blame on that 
communication step. I've been unable to reproduce this EBADF situation 
on the plain OpenLDAP code.

-- 
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc
  OpenLDAP Core Team            http://www.openldap.org/project/