[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 2.3.18 not stable on Solaris





--On Friday, January 20, 2006 11:21 AM -0800 Quanah Gibson-Mount <quanah@stanford.edu> wrote:



--On Friday, January 20, 2006 2:16 PM -0500 Aaron Richton
<richton@nbcs.rutgers.edu> wrote:

How reliably are you reproducing this with, say, test008? (I'm not sure
what your definition of "under load" in this case.) I haven't seen this
one quite yet, but obviously I take Solaris issues kind of seriously...

I had it hit two of my production servers last night within hours of upgrading to 2.3.18, which of course was disconcerting.

I set up 2.3.18 on a test server, and ran slamd against it, where I was
able to recreate it multiple times a minute (sometimes more than once a
second).  Prior to 2.3.18, I had *never* hit it running a slamd job.  I
had a patch from Howard in place to prevent slapd from shutting down when
it occurred (its close to the fix in HEAD) so that I could log how often
it happened.

Just to follow up, this problem is excaberated by having multiple CPUs. It appears that the Solaris kernel is freeing fd's after select() is entered, which is why the error 89 occurs. In any case the fix in HEAD (and now RE_23, I see) is sufficient to resolve it.


--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html