[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7587) slapd crashes when using pcache overlay applied to a translucent proxy



Full_Name: Alex
Version: 2.4.35
OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (64.103.25.102)


When using the pcache overlay over a translucent proxy, the slapd daemon crashes
after the second LDAP request which misses the cache. For example, the following
requests will trigger the issue. The important part is to miss the cache.
Assuming nothing is cached for the aaaa and bbbb uid, the following request will
trigger the issue (172.16.206.156 being the openldap server):

ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid
st
ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid
st

Whether aaaa and bbbb exist or not does not matter.

The following config is used:

include         /usr/local/etc/openldap/schema/core.schema
include         /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema
include         /usr/local/etc/openldap/schema/misc.schema
include         /usr/local/etc/openldap/schema/nis.schema

moduleload pcache.la
moduleload translucent.la

database        bdb
suffix          "o=xxxx"
#checkpoint     1024 15
rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
overlay translucent
translucent_local uidNumber,gidNumber,homeDirectory,loginShell
translucent_strict
rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
uri             ldap://zzzz/
#tls ldaps tls_reqcert=demand
tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain
overlay pcache
pcache bdb 10000 1 50 100
pcacheAttrset 0 *
pcacheTemplate (uid=) 0 3600
pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx
pcacheOffline TRUE
pcachePersist TRUE
pcacheValidate FALSE
directory /var/cache/ldap
cachesize 1000
index pcacheQueryid                     eq

The crash seems to be caused by a SIGABRT which is raised by libc free() due to
a double free. Here is the traceback:
Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6
(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
0x000000312aa328a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x000000312aa328a5 in raise () from /lib64/libc.so.6
#1  0x000000312aa34085 in abort () from /lib64/libc.so.6
#2  0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6
#3  0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at
search.c:263
#5  0x0000000000421449 in connection_operation (ctx=0x7f1366764a90,
arg_v=0x7f135c000b80) at connection.c:1155
#6  0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90,
argv=<value optimized out>) at connection.c:1291
#7  0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at
tpool.c:688
#8  0x000000312b207851 in start_thread () from /lib64/libpthread.so.0
#9  0x000000312aae890d in clone () from /lib64/libc.so.6

I was unable to track back the particular piece of code triggering the double
free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:

(gdb) delete 3
(gdb) break search.c:263
Breakpoint 4 at 0x5645df: file search.c, line 263.
(gdb) c
Continuing.

Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
(gdb) s
slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
493	{
(gdb) s
498		if (!ptr)
(gdb) s
501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
649	}
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
127	{
(gdb) s
128		if( p == NULL ) {
(gdb) s
134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
(gdb) s
160	}
(gdb) s
152			free( p );
(gdb) print p
$1 = (void *) 0x1d64a10
(gdb) x/10x p
0x1d64a10:	0x00000001	0x00000000	0x005ffd31	0x00000000
0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
0x1d64a30:	0x00000000	0x00000000
(gdb) c
Continuing.
[New Thread 0x7fa561994700 (LWP 5823)]

Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
(gdb) s
slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
493	{
(gdb) s
498		if (!ptr)
(gdb) s
501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
649	}
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
127	{
(gdb) s
128		if( p == NULL ) {
(gdb) s
134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
(gdb) s
160	}
(gdb) s
152			free( p );
(gdb) print P
No symbol "P" in current context.
(gdb) print p
$2 = (void *) 0x1d64a10
(gdb) x/10x p
0x1d64a10:	0x00000000	0x00000000	0x005ffd31	0x00000000
0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
0x1d64a30:	0x00000000	0x00000000

So the same pointer is being freed twice by the 2 connections which miss the
cache. I'm unable to figure out who is responsible for that call though, but the
same op->ors_attrs is freed by do_search():

	if ( op->ors_attrs != NULL ) {
		op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );

Parameters seem correct in both cases:
(gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free
$12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free>
(gdb) print op->o_request.oq_search.rs_attrs
$15 = (AttributeName *) 0x1f85a10

The call is done via connection_operation(), but that code part is  a bit above
my head, so I'm unable to track this further.

I thought this could be due to a threading problem, but building slapd with
--with-threads=no does not make a difference.

I tried uploading the core dump to your ftp server, but seems like there is an
issue with ftp.openldap.org
[root@centos63 tmp]# ftp ftp.openldap.org
Trying 204.152.186.57...
Connected to ftp.openldap.org (204.152.186.57).
220- OpenLDAP FTP Service
220 boole.openldap.org FTP server (Version 6.00LS) ready.
Name (ftp.openldap.org:cisco): anonymous
331 Guest login ok, send your email address as password.
Password:
230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved.
230- COPYING RESTRICTIONS APPLY, see:
230- 	ftp://ftp.openldap.org/COPYRIGHT
230- 	ftp://ftp.openldap.org/LICENSE
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd incoming
250 CWD command successful.
ftp> binary
200 Type set to I.
ftp> put core-slapd-6-55-55-26548-1368063267
local: core-slapd-6-55-55-26548-1368063267 remote:
core-slapd-6-55-55-26548-1368063267
227 Entering Passive Mode (204,152,186,57,242,33)
553 core-slapd-6-55-55-26548-1368063267: No space left on device.

Let me know if you need anything. I can provide further debugs or cores. I'm
also happy to try things out.

Cheers,
Alex