[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5687) corrupted double-linked list on shutdown



--On Saturday, September 06, 2008 1:12 AM +0000 quanah@zimbra.com wrote:

> --On Saturday, September 06, 2008 12:08 AM +0000 hyc@symas.com wrote:
>
>> ando@sys-net.it wrote:
>>> h.b.furuseth@usit.uio.no wrote:
>>>>> [quanah@freelancer tests]$ *** glibc detected ***
>>>>> /home/quanah/q/openldap-2.4.12/servers/slapd/.libs/lt-slapd: corrupted
>>>>> double-linked list: 0x000000000d28a430 ***
>>>> This message comes from glibc malloc.  Its data structures are
>>>> corrupted.
>>>>
>>>> I couldn't figure out much from the the valgrind output.
>>>> But you'll get more frequent mallocs with CPPFLAGS=-DSLAP_NO_SL_MALLOC,
>>>> if that code isn't rotted - maybe that'll bring out the problem in
>>>> another guise.  If that code still works, I don't remember.
>>>
>>> Here's my output after compiling with -DSLAP_NO_SL_MALLOC=1, running
>>> test039 under valgrind and suddenly killing the process:
>>>
>>> ==9657== Memcheck, a memory error detector.
>>> ==9657== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et al.
>>> ==9657== Using LibVEX rev 1658, a library for dynamic binary
>>> translation. ==9657== Copyright (C) 2004-2006, and GNU GPL'd, by
>>> OpenWorks LLP. ==9657== Using valgrind-3.2.1, a dynamic binary
>>> instrumentation framework. ==9657== Copyright (C) 2000-2006, and GNU
>>> GPL'd, by Julian Seward et al. ==9657== For more details, rerun with: -v
>>> ==9657==
>>> @(#) $OpenLDAP: slapd 2.X (Sep  6 2008 01:35:19) $
>>> 	masarati@odino.intra.sys-net.it:/home/masarati/Lavoro/sysnet/Ldap/ldap-
>>> 	devel/servers/slapd slapd starting
>>> daemon: shutdown requested and initiated.
>>> slapd shutdown: waiting for 8 threads to terminate
>>> ==9657== Thread 3:
>>> ==9657== Invalid read of size 4
>>> ==9657==    at 0x807EF03: slap_listener (daemon.c:1754)
>>> ==9657==    by 0x807F9AC: slap_listener_thread (daemon.c:1997)
>>> ==9657==    by 0x820F3D4: ldap_int_thread_pool_wrapper (tpool.c:663)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==  Address 0x415D3A8 is 32 bytes inside a block of size 164
>>> free'd ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80A59E5: ch_free (ch_malloc.c:139)
>>> ==9657==    by 0x807EDF8: close_listeners (daemon.c:1703)
>>> ==9657==    by 0x808137F: slapd_daemon_task (daemon.c:2578)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==
>>> ==9657== Invalid write of size 4
>>> ==9657==    at 0x807EF22: slap_listener (daemon.c:1759)
>>> ==9657==    by 0x807F9AC: slap_listener_thread (daemon.c:1997)
>>> ==9657==    by 0x820F3D4: ldap_int_thread_pool_wrapper (tpool.c:663)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==  Address 0x415D3A4 is 28 bytes inside a block of size 164
>>> free'd ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80A59E5: ch_free (ch_malloc.c:139)
>>> ==9657==    by 0x807EDF8: close_listeners (daemon.c:1703)
>>> ==9657==    by 0x808137F: slapd_daemon_task (daemon.c:2578)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==
>>> ==9657== Invalid read of size 4
>>> ==9657==    at 0x807EFCD: slap_listener (daemon.c:1781)
>>> ==9657==    by 0x807F9AC: slap_listener_thread (daemon.c:1997)
>>> ==9657==    by 0x820F3D4: ldap_int_thread_pool_wrapper (tpool.c:663)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==  Address 0x415D3A8 is 32 bytes inside a block of size 164
>>> free'd ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80A59E5: ch_free (ch_malloc.c:139)
>>> ==9657==    by 0x807EDF8: close_listeners (daemon.c:1703)
>>> ==9657==    by 0x808137F: slapd_daemon_task (daemon.c:2578)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> daemon: accept(7) failed errno=9 (Bad file descriptor)
>>> ==9657==
>>> ==9657== Thread 5:
>>> ==9657== Invalid free() / delete / delete[]
>>> ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80E522B: slap_sl_free (sl_malloc.c:451)
>>> ==9657==    by 0x80874AF: do_search (search.c:221)
>>> ==9657==    by 0x8084347: connection_operation (connection.c:1084)
>>> ==9657==    by 0x8084834: connection_read_thread (connection.c:1211)
>>> ==9657==    by 0x820F3D4: ldap_int_thread_pool_wrapper (tpool.c:663)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==  Address 0x5595D60 is 0 bytes inside a block of size 29 free'd
>>> ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80A59E5: ch_free (ch_malloc.c:139)
>>> ==9657==    by 0x81E1D64: rwm_op_cleanup (rwm.c:64)
>>> ==9657==    by 0x80976CD: slap_cleanup_play (result.c:341)
>>> ==9657==    by 0x8097DEA: send_ldap_response (result.c:522)
>>> ==9657==    by 0x809850E: slap_send_ldap_result (result.c:642)
>>> ==9657==    by 0x81855CE: ldap_back_op_result (bind.c:1823)
>>> ==9657==    by 0x8184C3D: ldap_back_dobind_int (bind.c:1486)
>>> ==9657==    by 0x8184D41: ldap_back_dobind (bind.c:1520)
>>> ==9657==    by 0x812C9F7: ldap_back_search (search.c:174)
>>> ==9657==    by 0x80F7DF4: glue_sub_search (backglue.c:340)
>>> ==9657==    by 0x80F8458: glue_op_search (backglue.c:467)
>>> ==9657==    by 0x80FACE3: overlay_op_walk (backover.c:657)
>>> ==9657==    by 0x80FAF18: over_op_func (backover.c:719)
>>> ==9657==    by 0x80FAF9C: over_op_search (backover.c:741)
>>> ==9657==    by 0x8087ADB: fe_op_search (search.c:366)
>>> ==9657==    by 0x80FAD63: overlay_op_walk (backover.c:667)
>>> ==9657==    by 0x80FAF18: over_op_func (backover.c:719)
>>> ==9657==    by 0x80FAF9C: over_op_search (backover.c:741)
>>> ==9657==
>>> ==9657== Invalid free() / delete / delete[]
>>> ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80E522B: slap_sl_free (sl_malloc.c:451)
>>> ==9657==    by 0x80874D3: do_search (search.c:224)
>>> ==9657==    by 0x8084347: connection_operation (connection.c:1084)
>>> ==9657==    by 0x8084834: connection_read_thread (connection.c:1211)
>>> ==9657==    by 0x820F3D4: ldap_int_thread_pool_wrapper (tpool.c:663)
>>> ==9657==    by 0xDEB46A: start_thread (in /lib/libpthread-2.5.so)
>>> ==9657==    by 0xD42DBD: clone (in /lib/libc-2.5.so)
>>> ==9657==  Address 0x5595E60 is 0 bytes inside a block of size 29 free'd
>>> ==9657==    at 0x4004FDA: free (vg_replace_malloc.c:233)
>>> ==9657==    by 0x82407C9: ber_memfree_x (memory.c:152)
>>> ==9657==    by 0x80A59E5: ch_free (ch_malloc.c:139)
>>> ==9657==    by 0x81E1D90: rwm_op_cleanup (rwm.c:69)
>>> ==9657==    by 0x80976CD: slap_cleanup_play (result.c:341)
>>> ==9657==    by 0x8097DEA: send_ldap_response (result.c:522)
>>> ==9657==    by 0x809850E: slap_send_ldap_result (result.c:642)
>>> ==9657==    by 0x81855CE: ldap_back_op_result (bind.c:1823)
>>> ==9657==    by 0x8184C3D: ldap_back_dobind_int (bind.c:1486)
>>> ==9657==    by 0x8184D41: ldap_back_dobind (bind.c:1520)
>>> ==9657==    by 0x812C9F7: ldap_back_search (search.c:174)
>>> ==9657==    by 0x80F7DF4: glue_sub_search (backglue.c:340)
>>> ==9657==    by 0x80F8458: glue_op_search (backglue.c:467)
>>> ==9657==    by 0x80FACE3: overlay_op_walk (backover.c:657)
>>> ==9657==    by 0x80FAF18: over_op_func (backover.c:719)
>>> ==9657==    by 0x80FAF9C: over_op_search (backover.c:741)
>>> ==9657==    by 0x8087ADB: fe_op_search (search.c:366)
>>> ==9657==    by 0x80FAD63: overlay_op_walk (backover.c:667)
>>> ==9657==    by 0x80FAF18: over_op_func (backover.c:719)
>>> ==9657==    by 0x80FAF9C: over_op_search (backover.c:741)
>>> conn=3 op=1 ldap_back_retry: retrying URI="ldap://localhost:9011";
>>> DN="cn=manager,dc=example,dc=com"
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> ==9657== Warning: invalid file descriptor -1 in syscall close()
>>> slapd stopped.
>>> ==9657==
>>> ==9657== ERROR SUMMARY: 7 errors from 5 contexts (suppressed: 36 from 1)
>>> ==9657== malloc/free: in use at exit: 1,127 bytes in 15 blocks.
>>> ==9657== malloc/free: 34,414 allocs, 34,403 frees, 12,376,432 bytes
>>> allocated.
>>> ==9657== For counts of detected errors, rerun with: -v
>>> ==9657== searching for pointers to 15 not-freed blocks.
>>> ==9657== checked 774,072 bytes.
>>>
>>> Apparently, the listener is freed while some thread is still using it.
>>
>> Since that can only occur during shutdown, I don't think it's really a
>> high  priority problem. More to the point, it will not cause any heap
>> corruption.  The serious problem is in thread 5, where rwm_op_cleanup
>> frees a DN that  do_search also tries to free. It looks like
>> rwm_op_cleanup is badly broken  here, it apparently should be checking to
>> make sure ros->ro_dn is != ros->r_dn  before freeing r_dn (and ndn).
>
> With the latest rwm revision, I still get the glibc corruption:
>
> PID=15912 - Search(500): base="o=Example,c=US" scope=sub
> filter="(cn=Alumni  Assoc Staff)" attrs=cn (more...).
>
> [quanah@tribes tests]$ *** glibc detected *** corrupted double-linked
> list:  0x0000003bf99316b8 ***
>
> when I hit control-C during the search.

Longer backtrace:

Loaded symbols for /lib64/libgcc_s.so.1
#0  0x0000003bf972e25d in raise () from /lib64/tls/libc.so.6
(gdb) bt
#0  0x0000003bf972e25d in raise () from /lib64/tls/libc.so.6
#1  0x0000003bf972fa5e in abort () from /lib64/tls/libc.so.6
#2  0x0000003bf97635e1 in __libc_message () from /lib64/tls/libc.so.6
#3  0x0000003bf9768a78 in malloc_consolidate () from /lib64/tls/libc.so.6
#4  0x0000003bf97698a9 in _int_malloc () from /lib64/tls/libc.so.6
#5  0x0000003bf976b682 in malloc () from /lib64/tls/libc.so.6
#6  0x0000002a956bc46d in ber_memalloc_x (s=4060, ctx=0x0) at memory.c:226
#7  0x0000002a956bc5b1 in ber_memrealloc_x (p=0x0, s=4060, ctx=0x0) at 
memory.c:304
#8  0x0000002a956ba5ef in ber_realloc (ber=0x7ce520, len=1) at io.c:155
#9  0x0000002a956ba49b in ber_write (ber=0x7ce520, buf=0x42001010 
"\002�\177@", len=1, nosos=0) at io.c:114
#10 0x0000002a956b8145 in ber_put_tag (ber=0x7ce520, tag=0, nosos=0) at 
encode.c:100
#11 0x0000002a956b877e in ber_put_int_or_enum (ber=0x7ce520, num=6, tag=2) 
at encode.c:284
#12 0x0000002a956b8933 in ber_put_int (ber=0x7ce520, num=6, tag=2) at 
encode.c:333
#13 0x0000002a956b99f7 in ber_printf (ber=0x7ce520, fmt=0x2a955a1629 "iti") 
at encode.c:774
#14 0x0000002a95574f68 in do_abandon (ld=0x81f9f0, origid=5, msgid=5, 
sctrls=0x0, sendabandon=1) at abandon.c:243
#15 0x0000002a95574c49 in ldap_abandon_ext (ld=0x81f9f0, msgid=5, 
sctrls=0x0, cctrls=0x0) at abandon.c:80
#16 0x0000002a974ac864 in ldap_back_cancel (lc=0x81f5d0, op=0x7bab00, 
rs=0x42002cc0, msgid=5, sendok=LDAP_BACK_DONTSEND) at bind.c:1607
#17 0x0000002a974a7d02 in ldap_back_search (op=0x7bab00, rs=0x42002cc0) at 
search.c:287
#18 0x00000000004ac45d in glue_sub_search (op=0x7bab00, rs=0x42002cc0, 
b0=0x42001740, on=0x6c1e70) at backglue.c:340
#19 0x00000000004aca5f in glue_op_search (op=0x7bab00, rs=0x42002cc0) at 
backglue.c:452
#20 0x00000000004afacf in overlay_op_walk (op=0x7bab00, rs=0x42002cc0, 
which=op_search, oi=0x6c1c90, on=0x6c1e70) at backover.c:657
#21 0x00000000004afd2b in over_op_func (op=0x7bab00, rs=0x42002cc0, 
which=op_search) at backover.c:719
#22 0x00000000004afdc1 in over_op_search (op=0x7bab00, rs=0x42002cc0) at 
backover.c:741
#23 0x0000000000435ea8 in fe_op_search (op=0x7bab00, rs=0x42002cc0) at 
search.c:366
#24 0x00000000004afb63 in overlay_op_walk (op=0x7bab00, rs=0x42002cc0, 
which=op_search, oi=0x702480, on=0x0) at backover.c:667
#25 0x00000000004afd2b in over_op_func (op=0x7bab00, rs=0x42002cc0, 
which=op_search) at backover.c:719
#26 0x00000000004afdc1 in over_op_search (op=0x7bab00, rs=0x42002cc0) at 
backover.c:741
#27 0x0000000000435865 in do_search (op=0x7bab00, rs=0x42002cc0) at 
search.c:217
#28 0x0000000000432839 in connection_operation (ctx=0x42002e20, 
arg_v=0x7bab00) at connection.c:1084
#29 0x0000000000432d75 in connection_read_thread (ctx=0x42002e20, 
argv=0x2b) at connection.c:1211
#30 0x0000002a95568cd0 in ldap_int_thread_pool_wrapper (xpool=0x6a3500) at 
tpool.c:663
#31 0x0000003bfa006137 in start_thread () from /lib64/tls/libpthread.so.0
#32 0x0000003bf97c7533 in clone () from /lib64/tls/libc.so.6


(gdb) info threads
  15 process 25502  0x0000003bfa00714b in pthread_join () from 
/lib64/tls/libpthread.so.0
  14 process 25532  0x0000003bfa008b3a in pthread_cond_wait@@GLIBC_2.3.2 () 
from /lib64/tls/libpthread.so.0
  13 process 25554  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  12 process 25555  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  11 process 25821  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  10 process 25834  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  9 process 25852  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  8 process 25869  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  7 process 25870  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  6 process 25871  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  5 process 25923  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  4 process 25925  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  3 process 25928  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
  2 process 25931  0x0000003bf97be5e2 in poll () from /lib64/tls/libc.so.6
* 1 process 25820  0x0000003bf972e25d in raise () from /lib64/tls/libc.so.6


--Quanah

--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration