Issue 7587 - slapd crashes when using pcache overlay applied to a translucent proxy
Summary: slapd crashes when using pcache overlay applied to a translucent proxy
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.35
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-09 16:20 UTC by amoneger@cisco.com
Modified: 2014-10-23 07:30 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description amoneger@cisco.com 2013-05-09 16:20:44 UTC
Full_Name: Alex
Version: 2.4.35
OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (64.103.25.102)


When using the pcache overlay over a translucent proxy, the slapd daemon crashes
after the second LDAP request which misses the cache. For example, the following
requests will trigger the issue. The important part is to miss the cache.
Assuming nothing is cached for the aaaa and bbbb uid, the following request will
trigger the issue (172.16.206.156 being the openldap server):

ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid
st
ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid
st

Whether aaaa and bbbb exist or not does not matter.

The following config is used:

include         /usr/local/etc/openldap/schema/core.schema
include         /usr/local/etc/openldap/schema/cosine.schema
include         /usr/local/etc/openldap/schema/inetorgperson.schema
include         /usr/local/etc/openldap/schema/misc.schema
include         /usr/local/etc/openldap/schema/nis.schema

moduleload pcache.la
moduleload translucent.la

database        bdb
suffix          "o=xxxx"
#checkpoint     1024 15
rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
overlay translucent
translucent_local uidNumber,gidNumber,homeDirectory,loginShell
translucent_strict
rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
uri             ldap://zzzz/
#tls ldaps tls_reqcert=demand
tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain
overlay pcache
pcache bdb 10000 1 50 100
pcacheAttrset 0 *
pcacheTemplate (uid=) 0 3600
pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx
pcacheOffline TRUE
pcachePersist TRUE
pcacheValidate FALSE
directory /var/cache/ldap
cachesize 1000
index pcacheQueryid                     eq

The crash seems to be caused by a SIGABRT which is raised by libc free() due to
a double free. Here is the traceback:
Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6
(gdb) c
Continuing.

Program received signal SIGABRT, Aborted.
0x000000312aa328a5 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x000000312aa328a5 in raise () from /lib64/libc.so.6
#1  0x000000312aa34085 in abort () from /lib64/libc.so.6
#2  0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6
#3  0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6
#4  0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at
search.c:263
#5  0x0000000000421449 in connection_operation (ctx=0x7f1366764a90,
arg_v=0x7f135c000b80) at connection.c:1155
#6  0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90,
argv=<value optimized out>) at connection.c:1291
#7  0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at
tpool.c:688
#8  0x000000312b207851 in start_thread () from /lib64/libpthread.so.0
#9  0x000000312aae890d in clone () from /lib64/libc.so.6

I was unable to track back the particular piece of code triggering the double
free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:

(gdb) delete 3
(gdb) break search.c:263
Breakpoint 4 at 0x5645df: file search.c, line 263.
(gdb) c
Continuing.

Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
(gdb) s
slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
493	{
(gdb) s
498		if (!ptr)
(gdb) s
501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
649	}
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
127	{
(gdb) s
128		if( p == NULL ) {
(gdb) s
134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
(gdb) s
160	}
(gdb) s
152			free( p );
(gdb) print p
$1 = (void *) 0x1d64a10
(gdb) x/10x p
0x1d64a10:	0x00000001	0x00000000	0x005ffd31	0x00000000
0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
0x1d64a30:	0x00000000	0x00000000
(gdb) c
Continuing.
[New Thread 0x7fa561994700 (LWP 5823)]

Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
(gdb) s
slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
493	{
(gdb) s
498		if (!ptr)
(gdb) s
501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
649	}
(gdb) s
502			ber_memfree_x(ptr, NULL);
(gdb) s
ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
127	{
(gdb) s
128		if( p == NULL ) {
(gdb) s
134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
(gdb) s
160	}
(gdb) s
152			free( p );
(gdb) print P
No symbol "P" in current context.
(gdb) print p
$2 = (void *) 0x1d64a10
(gdb) x/10x p
0x1d64a10:	0x00000000	0x00000000	0x005ffd31	0x00000000
0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
0x1d64a30:	0x00000000	0x00000000

So the same pointer is being freed twice by the 2 connections which miss the
cache. I'm unable to figure out who is responsible for that call though, but the
same op->ors_attrs is freed by do_search():

	if ( op->ors_attrs != NULL ) {
		op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );

Parameters seem correct in both cases:
(gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free
$12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free>
(gdb) print op->o_request.oq_search.rs_attrs
$15 = (AttributeName *) 0x1f85a10

The call is done via connection_operation(), but that code part is  a bit above
my head, so I'm unable to track this further.

I thought this could be due to a threading problem, but building slapd with
--with-threads=no does not make a difference.

I tried uploading the core dump to your ftp server, but seems like there is an
issue with ftp.openldap.org
[root@centos63 tmp]# ftp ftp.openldap.org
Trying 204.152.186.57...
Connected to ftp.openldap.org (204.152.186.57).
220- OpenLDAP FTP Service
220 boole.openldap.org FTP server (Version 6.00LS) ready.
Name (ftp.openldap.org:cisco): anonymous
331 Guest login ok, send your email address as password.
Password:
230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved.
230- COPYING RESTRICTIONS APPLY, see:
230- 	ftp://ftp.openldap.org/COPYRIGHT
230- 	ftp://ftp.openldap.org/LICENSE
230 Guest login ok, access restrictions apply.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> cd incoming
250 CWD command successful.
ftp> binary
200 Type set to I.
ftp> put core-slapd-6-55-55-26548-1368063267
local: core-slapd-6-55-55-26548-1368063267 remote:
core-slapd-6-55-55-26548-1368063267
227 Entering Passive Mode (204,152,186,57,242,33)
553 core-slapd-6-55-55-26548-1368063267: No space left on device.

Let me know if you need anything. I can provide further debugs or cores. I'm
also happy to try things out.

Cheers,
Alex
Comment 1 Howard Chu 2014-03-18 11:49:27 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 2 Howard Chu 2014-03-18 18:50:27 UTC
amoneger@cisco.com wrote:
> Full_Name: Alex
> Version: 2.4.35
> OS: Centos 6.3 (2.6.32-279.el6.x86_64 #1 SMP)
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (64.103.25.102)
>
>
> When using the pcache overlay over a translucent proxy, the slapd daemon crashes
> after the second LDAP request which misses the cache. For example, the following
> requests will trigger the issue. The important part is to miss the cache.
> Assuming nothing is cached for the aaaa and bbbb uid, the following request will
> trigger the issue (172.16.206.156 being the openldap server):

Thanks for the report, this is now fixed in git master.

> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=aaaa" uid
> st
> ldapsearch -x -H ldap://172.16.206.156 -b "ou=yyyy,o=xxxx" -LLL "uid=bbbb" uid
> st
>
> Whether aaaa and bbbb exist or not does not matter.
>
> The following config is used:
>
> include         /usr/local/etc/openldap/schema/core.schema
> include         /usr/local/etc/openldap/schema/cosine.schema
> include         /usr/local/etc/openldap/schema/inetorgperson.schema
> include         /usr/local/etc/openldap/schema/misc.schema
> include         /usr/local/etc/openldap/schema/nis.schema
>
> moduleload pcache.la
> moduleload translucent.la
>
> database        bdb
> suffix          "o=xxxx"
> #checkpoint     1024 15
> rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
> overlay translucent
> translucent_local uidNumber,gidNumber,homeDirectory,loginShell
> translucent_strict
> rootdn          "uid=amoneger,ou=yyyy,o=xxxx"
> uri             ldap://zzzz/
> #tls ldaps tls_reqcert=demand
> tls_cacert=/usr/local/etc/openldap/certs/Cisco_ca_chain
> overlay pcache
> pcache bdb 10000 1 50 100
> pcacheAttrset 0 *
> pcacheTemplate (uid=) 0 3600
> pcacheBind (uid=) 0 1800 sub ou=yyyy,o=xxxx
> pcacheOffline TRUE
> pcachePersist TRUE
> pcacheValidate FALSE
> directory /var/cache/ldap
> cachesize 1000
> index pcacheQueryid                     eq
>
> The crash seems to be caused by a SIGABRT which is raised by libc free() due to
> a double free. Here is the traceback:
> Breakpoint 2, 0x000000312aa33f10 in abort () from /lib64/libc.so.6
> (gdb) c
> Continuing.
>
> Program received signal SIGABRT, Aborted.
> 0x000000312aa328a5 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x000000312aa328a5 in raise () from /lib64/libc.so.6
> #1  0x000000312aa34085 in abort () from /lib64/libc.so.6
> #2  0x000000312aa707b7 in __libc_message () from /lib64/libc.so.6
> #3  0x000000312aa760e6 in malloc_printerr () from /lib64/libc.so.6
> #4  0x000000000042391e in do_search (op=0x7f135c000b80, rs=0x7f1366764930) at
> search.c:263
> #5  0x0000000000421449 in connection_operation (ctx=0x7f1366764a90,
> arg_v=0x7f135c000b80) at connection.c:1155
> #6  0x0000000000421c25 in connection_read_thread (ctx=0x7f1366764a90,
> argv=<value optimized out>) at connection.c:1291
> #7  0x00000000005601f0 in ldap_int_thread_pool_wrapper (xpool=0x1f42770) at
> tpool.c:688
> #8  0x000000312b207851 in start_thread () from /lib64/libpthread.so.0
> #9  0x000000312aae890d in clone () from /lib64/libc.so.6
>
> I was unable to track back the particular piece of code triggering the double
> free, but the same pointer p is freed twice by ber_memfree_x() in memory.c:
>
> (gdb) delete 3
> (gdb) break search.c:263
> Breakpoint 4 at 0x5645df: file search.c, line 263.
> (gdb) c
> Continuing.
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493	{
> (gdb) s
> 498		if (!ptr)
> (gdb) s
> 501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> 649	}
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127	{
> (gdb) s
> 128		if( p == NULL ) {
> (gdb) s
> 134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160	}
> (gdb) s
> 152			free( p );
> (gdb) print p
> $1 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10:	0x00000001	0x00000000	0x005ffd31	0x00000000
> 0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
> 0x1d64a30:	0x00000000	0x00000000
> (gdb) c
> Continuing.
> [New Thread 0x7fa561994700 (LWP 5823)]
>
> Breakpoint 4, do_search (op=0x7fa554002930, rs=0x7fa5623a4930) at search.c:263
> 263			op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
> (gdb) s
> slap_sl_free (ptr=0x1d64a10, ctx=0x7fa554002780) at sl_malloc.c:493
> 493	{
> (gdb) s
> 498		if (!ptr)
> (gdb) s
> 501		if (No_sl_malloc || !sh || ptr < sh->sh_base || ptr >= sh->sh_end) {
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> 649	}
> (gdb) s
> 502			ber_memfree_x(ptr, NULL);
> (gdb) s
> ber_memfree_x (p=0x1d64a10, ctx=0x0) at memory.c:127
> 127	{
> (gdb) s
> 128		if( p == NULL ) {
> (gdb) s
> 134		if( ber_int_memory_fns == NULL || ctx == NULL ) {
> (gdb) s
> 160	}
> (gdb) s
> 152			free( p );
> (gdb) print P
> No symbol "P" in current context.
> (gdb) print p
> $2 = (void *) 0x1d64a10
> (gdb) x/10x p
> 0x1d64a10:	0x00000000	0x00000000	0x005ffd31	0x00000000
> 0x1d64a20:	0x00000000	0x00000000	0x00000000	0x00000000
> 0x1d64a30:	0x00000000	0x00000000
>
> So the same pointer is being freed twice by the 2 connections which miss the
> cache. I'm unable to figure out who is responsible for that call though, but the
> same op->ors_attrs is freed by do_search():
>
> 	if ( op->ors_attrs != NULL ) {
> 		op->o_tmpfree( op->ors_attrs, op->o_tmpmemctx );
>
> Parameters seem correct in both cases:
> (gdb) print op->o_hdr->oh_tmpmfuncs->bmf_free
> $12 = (BER_MEMFREE_FN *) 0x4733f0 <slap_sl_free>
> (gdb) print op->o_request.oq_search.rs_attrs
> $15 = (AttributeName *) 0x1f85a10
>
> The call is done via connection_operation(), but that code part is  a bit above
> my head, so I'm unable to track this further.
>
> I thought this could be due to a threading problem, but building slapd with
> --with-threads=no does not make a difference.
>
> I tried uploading the core dump to your ftp server, but seems like there is an
> issue with ftp.openldap.org
> [root@centos63 tmp]# ftp ftp.openldap.org
> Trying 204.152.186.57...
> Connected to ftp.openldap.org (204.152.186.57).
> 220- OpenLDAP FTP Service
> 220 boole.openldap.org FTP server (Version 6.00LS) ready.
> Name (ftp.openldap.org:cisco): anonymous
> 331 Guest login ok, send your email address as password.
> Password:
> 230- Copyright 1998-2010, The OpenLDAP Foundation, All Rights Reserved.
> 230- COPYING RESTRICTIONS APPLY, see:
> 230- 	ftp://ftp.openldap.org/COPYRIGHT
> 230- 	ftp://ftp.openldap.org/LICENSE
> 230 Guest login ok, access restrictions apply.
> Remote system type is UNIX.
> Using binary mode to transfer files.
> ftp> cd incoming
> 250 CWD command successful.
> ftp> binary
> 200 Type set to I.
> ftp> put core-slapd-6-55-55-26548-1368063267
> local: core-slapd-6-55-55-26548-1368063267 remote:
> core-slapd-6-55-55-26548-1368063267
> 227 Entering Passive Mode (204,152,186,57,242,33)
> 553 core-slapd-6-55-55-26548-1368063267: No space left on device.
>
> Let me know if you need anything. I can provide further debugs or cores. I'm
> also happy to try things out.
>
> Cheers,
> Alex
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Quanah Gibson-Mount 2014-04-01 16:58:26 UTC
changed notes
changed state Test to Release
Comment 4 OpenLDAP project 2014-10-23 07:30:41 UTC
fixed in master
fixed in RE25
fixed in RE24
Comment 5 Quanah Gibson-Mount 2014-10-23 07:30:41 UTC
changed notes
changed state Release to Closed