[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#6200)
Sorry about the stripped trace. I forgot that the install procedure
always strips the binaries...
Okay, with our stress profile it takes ~36 hours to fail. I always
start with a clean db rebuild before each run. Each failure produces
the same traceback:
(gdb) where
#0 0x00b97410 in __kernel_vsyscall ()
#1 0x00471d80 in raise () from /lib/libc.so.6
#2 0x00473691 in abort () from /lib/libc.so.6
#3 0x0046b1fb in __assert_fail () from /lib/libc.so.6
#4 0x0808d532 in ch_malloc (size=4436335) at ch_malloc.c:57
#5 0x08079ad2 in entry_encode (e=0x3a3dac0, bv=0x3a3d9b0) at entry.c:742
#6 0x0815240e in bdb_id2entry_put (be=0x3a3dca0, tid=0xbc6f7378, e=0x3a3dac0, flag=0) at id2entry.c:54
#7 0x08152508 in hdb_id2entry_update (be=0x3a3dca0, tid=0xbc6f7378, e=0x3a3dac0) at id2entry.c:90
#8 0x08106374 in hdb_modify (op=0xdabbc28, rs=0x3a3f0e4) at modify.c:611
#9 0x080ea38e in overlay_op_walk (op=0xdabbc28, rs=0x3a3f0e4, which=op_modify, oi=0x8be2788, on=0x0) at backover.c:669
#10 0x080ea543 in over_op_func (op=0xdabbc28, rs=0x3a3f0e4, which=op_modify) at backover.c:721
#11 0x080ea60b in over_op_modify (op=0xdabbc28, rs=0x3a3f0e4) at backover.c:755
#12 0x08089151 in fe_op_modify (op=0xdabbc28, rs=0x3a3f0e4) at modify.c:301
#13 0x08088b90 in do_modify (op=0xdabbc28, rs=0x3a3f0e4) at modify.c:175
#14 0x0806be8f in connection_operation (ctx=0x3a3f1d0, arg_v=0xdabbc28) at connection.c:1115
#15 0x0806c3cf in connection_read_thread (ctx=0x3a3f1d0, argv=0x1a) at connection.c:1251
#16 0x081d8fa9 in ldap_int_thread_pool_wrapper (xpool=0x8b941b0) at tpool.c:685
#17 0x0043749b in start_thread () from /lib/libpthread.so.0
#18 0x0051a42e in clone () from /lib/libc.so.6
---
Tracy Stenvik
University Computing Services 354843. University of Washington
email: imf@u.washington.edu voice: (206) 685-3344
On Thu, 23 Jul 2009, Howard Chu wrote:
> imf@u.washington.edu wrote:
>> The changes made to 2.4.17 seem to have fixed the crashes in the caching
>> module. Thanks for that.
>>
>> We still are able to crash 2.4.17, however. It only happens after a heavy
>> load is placed on the producer for>24 hours continuous. Unfortunately,
>> we've not been able to get good tracebacks. They all look like this,
>>
>> (gdb) where
>> #0 0x00869410 in __kernel_vsyscall ()
>> #1 0x00390d80 in raise () from /lib/libc.so.6
>> #2 0x00392691 in abort () from /lib/libc.so.6
>> #3 0x0038a1fb in __assert_fail () from /lib/libc.so.6
>> #4 0x0808d532 in malloc ()
>> #5 0x0822c93f in ?? ()
>> #6 0x0822c933 in ?? ()
>> #7 0x00000039 in ?? ()
>> #8 0x0822c908 in ?? ()
>> #9 0x00000000 in ?? ()
>>
>> The producer slowly grows its memory footprint. I can't tell if it's from
>> just normal operations or memory leaks. I suspect it's a little of both.
>> The end result, as you can see from the core above, is that there's likely
>> some corrupted (or unfreed) memory somewhere. Sorry I can't nail it down
>> further.
>
> There's not enough information here.
>
> An assert is always accompanied by an error message on stderr; we need to see
> the actual error message.
>
> There are no symbols in the stack trace pertaining to slapd itself. You seem
> to be running a stripped binary. Please provide a trace using a non-stripped
> binary that was compiled with -g (debug symbols enabled).
>>
>> The load profile that we placed on the server is documented in my prior
>> report. See above.
>
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
>