[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6310) Slapd with pcache crashes under load



> hyc@symas.com wrote:
>> I haven't touched back-sql in quite a while, and don't have a suitable
>> config
>> for testing this. I suggest you run under valgrind and see what errors
>> it reports.
>
> Hello,
>
> I have run the server through valgrind. It crashed on the same structure
> on different location. The gdb session could be found here:
> http://purgatory.spnet.net/~karavelov/attr_list/gdb
>
> The output from valgrind could be found here:
> http://purgatory.spnet.net/~karavelov/attr_list/vg
>
> In it the lines up to 2833 are output on server start. You could find
> the address 0x500000000 (found from gdb session) around line 3180.
>
> I could see nothing interesting in the log file.
>
> I hope this info will help you. If I could gather more information, just
> say.

Thanks for collecting this info.  The valgrind output could be of some
use, but unfortunately I don't have time right now to set up a working
RDBMS and extensively debug things.  I'll keep this on my todo list.

You should please re-run valgrind with --num-callers=30 or more, because
in some cases errors are in too nested functions to get a clear idea of
whether the issue is caused by garbage fed by slapd/back-sql or by errors
inside the RDBMS/ODBC layers.  The fact that valgrind systematically
complains about internals of the RDBMS/ODBC reading past the end of memory
chunks malloc'ed by slapd could be related to passing some non-nul
terminated bervals that are dealt with as strings.  Having a longer call
stack could help tracking those occurrences.  However, those issues should
not be critical, since there's no invalid writes.

Also, you should walk through the list of attributes being returned, to
provide a hint about whether back-sql is computing a screwed attrlist or
so.  Along the lines of your current gdb session, you should get to frame
#5, refresh_merge() in pcache.c, and print *e->e_attrs,
*e->e_attrs->a_desc, *e->e_attrs->a_vals[0]; then move to
e->e_attrs->a_next and repeat the prints to the end of the list.  The fact
you get a value of "a" equal to 0x500000000 looks definitely odd to me, as
that attr list should result from be_entry_get_rw(), which in turn should
collect it from the local database.  Unless valgrind reveals some oddity
in back-sql, the behavior you notice should not depend on the specific
remote database you're using, but rather from the local one.

p.