[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: calloc failure




"Kurt D. Zeilenga" wrote:
> 
> At 07:34 PM 8/31/99 -0700, Yuri Rabover wrote:
> >I am using the latest OpenLDAP, 1.2.6 on Solaris 2.6. I am testing
> >the performance and robustness aspects to see whether it'd satisfy
> >our requirements. During one of the tests I was running 2 ldapadd
> >commands adding about 100 entries each. It reliably causes the server
> >to exit sometimes, with the following diagnostics:
> 
> Try OPENLDAP_REL_ENG_1_2 (available via AnonCVS
> http://www.openldap.org/software/repo.html).  It includes
> a number of memory management fixes that are being tested
> for release.
> 

I tried it with the same results.

> >calloc of 709707071 elems of 4 bytes failed
> 
> That strikes me as a bit large.  ~2.5 GB.   You might
> insert an abort in ch_malloc.c where this message is
> printed from such that you can obtain a stack back
> trace from the core dump.

As Howard Chu noticed this number is actually an ASCII string,
which represents some pieces of the directory information. It means
the stack is overwritten. I managed to obtain the stack trace (it crashed
once) and
it crashes in t_delete, which in my experience represents a locking
window in Solaris multi-threaded programs. On any arena corruption
Solaris MT malloc usually crashes in this function on the attempt
to allocate a new chunk.

I decided to test if this guess is true and rebuilt OpenLDAP --without-threads
Ran my tests again, everything worked like a charm with an obvious
performance impact because everything is now serialized. It'd be
very nice if somebody familiar with the code could analyze the locking
scheme, for now I have to work in a single-threaded mode. Sigh...

Thanks,
		Yuri Rabover
		3Cube, Inc.


> Haven't tested BDB 2.7.7 yet.

Same effect with GDBM, BTW.