[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7587) slapd crashes when using pcache overlay applied to a translucent proxy



amoneger@cisco.com wrote:
> Full_Name: Alex
> Version: 2.4.35
> OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP)
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (64.103.25.102)
>
>
> When using the pcache overlay over a translucent proxy, the slapd daemon crashes
> after the second LDAP request which misses the cache. For example, the following
> requests will trigger the issue. The important part is to miss the cache.
> Assuming nothing is cached for the aaaa and bbbb uid, the following request will
> trigger the issue (172.16.206.156 being the openldap server):

Thanks for the report, this is now fixed in git master.

> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid
> st
> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid
> st
>
> Whether aaaa and bbbb exist or not does not matter.
>
> The following config is used:
>
> include         /usr/local/etc/openldap/schema/core.schema
> include         /usr/local/etc/openldap/schema/cosine.schema
> include         /usr/local/etc/openldap/schema/inetorgperson.schema
> include         /usr/local/etc/openldap/schema/misc.schema
> include         /usr/local/etc/openldap/schema/nis.schema
>
> moduleload pcache.la
> moduleload translucent.la
>
> database        bdb
> suffix          "o=xxxx"
> #checkpoint     1024 15
> rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
> overlay translucent
> translucent_local uidNumber,gidNumber,homeDirectory,loginShell
> translucent_strict
> rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
> uri             ldap://zzzz/
> #tls ldaps tls_reqcert=demand
> tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain
> overlay pcache
> pcache bdb 10000 1 50 100
> pcacheAttrset 0 *
> pcacheTemplate (uid=) 0 3600
> pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx
> pcacheOffline TRUE
> pcachePersist TRUE
> pcacheValidate FALSE
> directory /var/cache/ldap
> cachesize 1000
> index pcacheQueryid                     eq
>
> The crash seems to be caused by a SIGABRT which is raised by libc free() due to
> a double free. Here is the traceback:
> Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6
> (gdb) c
> Continuing.
>
> Program received signal SIGABRT, Aborted.
> 0x000000312aa328a5 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x000000312aa328a5 in raise () from /lib64/libc.so.6
> #1  0x000000312aa34085 in abort () from /lib64/libc.so.6
> #2  0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6
> #3  0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at
> search.c:263
> #5  0x0000000000421449 in connection_operation (ctx=0x7f1366764a90,
> arg_v=0x7f135c000b80) at connection.c:1155
> #6  0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90,
> argv=<value optimized out>) at connection.c:1291
> #7  0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at
> tpool.c:688
> #8  0x000000312b207851 in start_thread () from /lib64/libpthread.so.0
> #9  0x000000312aae890d in clone () from /lib64/libc.so.6
>
> I was unable to track back the particular piece of code triggering the double
> free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:
>
> (gdb) delete 3
> (gdb) break search.c:263
> Breakpoint 4 at 0x5645df: file search.c, line 263.
> (gdb) c
> Continuing.
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493	{
> (gdb) s
> 498		if (!ptr)
> (gdb) s
> 501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> 649	}
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127	{
> (gdb) s
> 128		if( p == NULL ) {
> (gdb) s
> 134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160	}
> (gdb) s
> 152			free( p );
> (gdb) print p
> $1 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10:	0x00000001	0x00000000	0x005ffd31	0x00000000
> 0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
> 0x1d64a30:	0x00000000	0x00000000
> (gdb) c
> Continuing.
> [New Thread 0x7fa561994700 (LWP 5823)]
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493	{
> (gdb) s
> 498		if (!ptr)
> (gdb) s
> 501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> 649	}
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127	{
> (gdb) s
> 128		if( p == NULL ) {
> (gdb) s
> 134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160	}
> (gdb) s
> 152			free( p );
> (gdb) print P
> No symbol "P" in current context.
> (gdb) print p
> $2 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10:	0x00000000	0x00000000	0x005ffd31	0x00000000
> 0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
> 0x1d64a30:	0x00000000	0x00000000
>
> So the same pointer is being freed twice by the 2 connections which miss the
> cache. I'm unable to figure out who is responsible for that call though, but the
> same op->ors_attrs is freed by do_search():
>
> 	if ( op->ors_attrs != NULL ) {
> 		op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
>
> Parameters seem correct in both cases:
> (gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free
> $12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free>
> (gdb) print op->o_request.oq_search.rs_attrs
> $15 = (AttributeName *) 0x1f85a10
>
> The call is done via connection_operation(), but that code part is  a bit above
> my head, so I'm unable to track this further.
>
> I thought this could be due to a threading problem, but building slapd with
> --with-threads=no does not make a difference.
>
> I tried uploading the core dump to your ftp server, but seems like there is an
> issue with ftp.openldap.org
> [root@centos63 tmp]# ftp ftp.openldap.org
> Trying 204.152.186.57...
> Connected to ftp.openldap.org (204.152.186.57).
> 220- OpenLDAP FTP Service
> 220 boole.openldap.org FTP server (Version 6.00LS) ready.
> Name (ftp.openldap.org:cisco): anonymous
> 331 Guest login ok, send your email address as password.
> Password:
> 230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved.
> 230- COPYING RESTRICTIONS APPLY, see:
> 230- 	ftp://ftp.openldap.org/COPYRIGHT
> 230- 	ftp://ftp.openldap.org/LICENSE
> 230 Guest login ok, access restrictions apply.
> Remote system type is UNIX.
> Using binary mode to transfer files.
> ftp> cd incoming
> 250 CWD command successful.
> ftp> binary
> 200 Type set to I.
> ftp> put core-slapd-6-55-55-26548-1368063267
> local: core-slapd-6-55-55-26548-1368063267 remote:
> core-slapd-6-55-55-26548-1368063267
> 227 Entering Passive Mode (204,152,186,57,242,33)
> 553 core-slapd-6-55-55-26548-1368063267: No space left on device.
>
> Let me know if you need anything. I can provide further debugs or cores. I'm
> also happy to try things out.
>
> Cheers,
> Alex
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/