Issue 8368 - Process /usr/sbin/slapd was killed by signal 6 (SIGABRT)
Summary: Process /usr/sbin/slapd was killed by signal 6 (SIGABRT)
Status: VERIFIED FEEDBACK
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: 2.4.40
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-04 13:46 UTC by yaroslavr@digdes.com
Modified: 2020-03-22 03:03 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description yaroslavr@digdes.com 2016-02-04 13:46:18 UTC
Full_Name: Yaroslav Rutsky
Version: 2.4.40
OS: CentOS release 6.6 (Final)
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (85.114.5.9)


We used customized OpenLDAP RPM-packets, builded from
openldap-2.4.40-6.el6_7.src.rpm, from CentOS repository with patch to support
Outlook addressbook browsing feature (in conjunction with sssvlv overlay):
-----------------------openldap-2.4.40-outlook-sssvlv.patch-----------------------------------
diff -uNr openldap-2.4.40/openldap-2.4.40/servers/slapd/schema_prep.c
openldap-2.4.40sok/openldap-2.4.40/servers/slapd/schema_prep.c
--- openldap-2.4.40/openldap-2.4.40/servers/slapd/schema_prep.c	2014-09-19
05:48:49.000000000 +0400
+++ openldap-2.4.40mod/openldap-2.4.40/servers/slapd/schema_prep.c	2015-10-16
10:43:36.778308938 +0300
@@ -908,6 +908,7 @@
 			"DESC 'RFC4519: common supertype of name attributes' "
 			"EQUALITY caseIgnoreMatch "
 			"SUBSTR caseIgnoreSubstringsMatch "
+			"ORDERING caseIgnoreOrderingMatch "
 			"SYNTAX 1.3.6.1.4.1.1466.115.121.1.15{32768} )",
 		NULL, SLAP_AT_ABSTRACT,
 		NULL, NULL, 
----------------------------------------------------------------------------------------------

Also, we used LMDB as backend, and some other configurations (chain, syncrepl,
referral).
This server is slave in our replication structure, serving read requests by
themselves and chaining write requests to one of mirror-replicated servers.
Modifications, chained from this slave to the mirrored masters, comes back to
slave by syncrepl.

Recently, we observed occasionally crashed slapd with this error in
/var/log/messages:
Jan 20 16:51:35 server-02 kernel: slapd[20380] general protection
ip:7fe49bf76f85 sp:7fe45d1538d0 error:0 in libc-2.12.so[7fe49bf01000+18a000]

Unfortunately, coredump generation configured only after that.
Some time later, slapd crashed again:
Feb  3 08:05:31 server-02 abrt[24705]: Saved core mpmp of pid 27070
(/usr/sbin/slapd) to /var/spool/abrt/ccpp-2016-02-03-08:05:29-27070 (844488704
bytes)

openldap.log (olcLogLevel:256) shows nothing:
==============================================================D%D===
Feb  3 08:05:29 server-02 slapd[27070]: conn=275418 op=617 SEARCH RESULT tag=101
err=0 nentries=0 text=
Feb  3 08:05:29 server-02 slapd[27070]: conn=275418 op=618 SRCH base="" scope=2
deref=3 filter="(&(mail=*)(|(?mail=X*)(cn=X*)(sn=X*)(givenName=X*)(displayName=X*))"222
Feb  3 08:05:29 server-02 slapd[27070]: conn=275418 op=618 SRCH attr=cn
commonName mail roleOccupant display-name displayname sn surname co
organizationName o givenName legacyExchangeDN objectClass uid mailNickname title
company physicalDeliveryOfficeName telephoneNumber
Feb  3 08:05:29 server-02 slapd[27070]: <= mdb_substring_candidates: (sn) not
indexed
Feb  3 08:05:29 server-02 slapd[27070]: <= mdb_substring_candidates: (givenName)
not indexed
Feb  3 08:05:29 server-02 slapd[07070]: <= mdb_substring_candidates:
(displayName) not indexed
Feb  3 08:24:57 server-02 slapd[26054]: @(#) $OpenLDAP: slapd 2.4.40 (Nov 10
2015 11:22:22) $#012#011root@orw-oldp-01:/root/rpmbuild/BUILD/openldap-2.4.40/openldap-2.4.40/build-servers/servers/slapd
Feb  3 08:24:57 server-02 slapd&26055]: slapd starting

gdb shows:
==================================================================
# gdb /usr/sbin/slapd /var/tmp/coredump/coredump
..........
Core was generated by `/usr/sbin/slapd -h  ldap:/// ldapi:/// -u ldap'.
Proam t terminated with signal 6, Aborted.
#0  0x00007f2f4b7bc625 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install
libtool-ltdl-2.2.6-15.5.el6.x86_64
(gdb) bt
#0  0x00007f2f4b7bc625 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f2f4b7bde05 in abort () at abort.c:92
#2  0x00007f2f4b7fa537 in __libc_message (do_abort=2, fmt=0x7f2f4b8e2840 "") at
../sysdeps/unix/sysv/linux/libc_fatal.c:198
#3  0x00007f2f4b7ffe66 in malloc_printerr (action=3, str=0x7f2f4b8e2b18 "ree or
corruption (!prev)", ptr=<value optimized out>) at malloc.c:6336
#4  0x00007f2f48d8f76f in send_result (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00,
so=0f2f2ee0106890) at ../../../../servers/slapd/overlays/sssvlv.c:706
#5  0x00007f2f48d90109 in sssvlv_op_response (op=0x7f2efe4d6fe0,
rs=0x7f2eebffea00) at ../../../../servers/slapd/overlays/sssvlv.c:792
#6  0x00007f2f4e22f9de in slap_response_play (op=0x7f2efe4d6fe0,
rs=0x7f2eebffea00) at ../../../servers/slapd/result.c:508
#7  0x00007f2f4e2305c0 in send_ldap_response (op=0x7f2efe4d6fe0,
rs=0x7f2eebffea00) at ../../../servers/slapd/result.c:583
#8  0x00007f2f4e23158f in slap_send_ldap_result (op=0x7f2efe4d6fe0,
rs=0x7f2eebffea00) at ../../../servers/slapd/result.c:861
#9  0x00007f2f4e2ce9dd in mdb_search (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00) at
../../../../servers/slapd/back-mdb/search.c:1164
#10 0x00007f2f4e28e477 in overlay_op_walk (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00,
which=op_search, oi=0x7f2f4f1f9a60, on=0x0) at
../../../servers/slapd/backover.c:671
#11 0x00007f2f4e28eed4 in over_op_func (op=0x7f2efe4d6fe0, rs=<value optimized
out>, which=<value optimized out>) at ../../../servers/slapd/backover.c:723
#12 0x00007f2f4e221e79 in fe_op_search (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00) at
../../../servers/slapd/search.c:402
#13 0x00007f2f4e28e477 in overlay_op_walk (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00,
which=op_search, oi=0x7f2f4f1dd240, on=0x0) at
../../../servers/slapd/backover.c:671
#14 0x00007f2f4e28eed4 in over_op_func (op=0x7f2efe4d6fe0, rs=<value optimized
out>, which=<value optimized out>) at ../../../servers/slapd/backover.c:723
#15 0x00007f2f4e2226d7 in do_search (op=0x7f2efe4d6fe0, rs=<value optimized
out>) at ../../../servers/slapd/search.c:247
#16 0x00007f2f4e21f349 in connection_operation (ctx=0x7f2eebffeb70,
arg_v=0x7f2efe4d6fe0) at ../../../servers/slapd/connection.c:1155
#17 0x00007f2f4e220480 in connection_read_thread (ctx=0x7f2eebffeb70,
argv=<value optimized out>) at ../../../servers/slapd/connection.c:1291
#18 0x00007f2f4dd6cb68 in ldap_int_thread_pool_wrapper (xpool=0x7f2f4f184630) at
../../../libraries/libldap_r/tpool.c:688
#19 0x00007f2f4bd309d1 in start_thread (arg=0x7f2eebfff700) at
pthread_create.c:301
#20 0x00007f2f4b8728fd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) bt full 0,6
#0  0x00007f2f4b7bc625 in raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
        resultvar = 0
        pid = <value optimized out>
        selftid = 30625
#1  0x00007f2f4b7bde05 in abort () at abort.c:92
        save_stage = 2
        act = {__sigaction_handler = {sa_handler = 0x7f2eebe6c458, sa_sigaction
= 0x7f2eebe6c458}, sa_mask = {__val = {139839502992448, 139841150654840, 16,
139841107786729, 1, 139841106467759, 5, 139841107790384, 3, 139839502992446,
              2, 139841107786755, 1, 139841107793486, 3, 139839502992452}},
sa_flags = 12, sa_restorer = 0x7f2f4b8e1652}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x00007f2f4b7fa537 in __libc_message (do_abort=2, fmt=0x7f2f4b8e2840 "") at
../sysdeps/unix/sysv/linux/libc_fatal.c:198
        ap = {{gp_offset = 40, fp_offset = 48, overflow_arg_area =
0x7f2eebe6cdc0, reg_save_area = 0x7f2eebe6ccd0}}
        ap_copy = {{gp_offset = 16, fp_offset = 48, overflow_arg_area =
0x7f2eebe6cdc0, reg_save_area = 0x7f2eebe6ccd0}}
        fd = 2
        on_2 = <value optimized out>
        list = <value optimized out>
        nlist = <value optimized out>
        cp = <value optimized out>
        written = <value optimized out>
#3  0x00007f2f4b7ffe66 in malloc_printerr (action=3, str=0x7f2f4b8e2b18 "ree or
corruption (!prev)", ptr=<value optimized out>) at malloc.c:6336
        buf = "00007f2ee0106890"
        cp = <value optimized out>
#4  0x00007f2f48d8f76f in send_result (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00,
so=0x7f2ee0106890) at ../../../../servers/slapd/overlays/sssvlv.c:706
        ctrls = {0x7f2ee0004008, 0x7f2ee0005020, 0x0}
        rc = <value optimized out>
        i = <value optimized out>
#5  0x00007f2f48d90109 in sssvlv_op_response (op=0x7f2efe4d6fe0,
rs=0x7f2eebffea00) at ../../../../servers/slapd/overlays/sssvlv.c:792
        sc = 0x7f2ee0002a50
        so = 0x7f2ee0106890
(More stack frames follow...)
(gdb) frame 4
#4  0x00007f2f48d8f76f in send_result (op=0x7f2efe4d6fe0, rs=0x7f2eebffea00,
so=0x7f2ee0106890) at ../../../../servers/slapd/overlays/sssvlv.c:706
706                     free_sort_op( op->o_conn, so );
(gdb) list
701                     slap_add_ctrls( op, rs, ctrls );
702             send_ldap_result( op, rs );
703
704             if ( so->so_tree == NULL ) {
705                     /* Search finished, so clean up */
706                     free_sort_op( op->o_conn, so );
707             }
708     }
709
(gdb) frame 3
#3  0x00007f2f4b7ffe66 in malloc_printerr (action=3, str=0x7f2f4b8e2b18 "ree or
corruption (!prev)", ptr=<value optimized out>) at malloc.c:6336
6336          __libc_message (action & 2%
%D
(gdb) list
6331          buf[sizeof (buf) - 1] = '\0';
6332          char *cp = _itoa_word ((uintptr_t) ptr, &buf[sizeof (buf) - 1],
16, 0);
6333          while (cp > buf)
6334            *--cp = '0';
6335
6336        ____libc_message (action & 2,
6337                          "*** glibc detected *** %s: %s: 0x%s ***\n",
6338                          __libc_argv[0] ?: "<unknown>", str, cp);
6339        }
6340      else if (action & 2)
==========================3D%3================================================

What is the root cause of this error ("double free or corruption" at malloc.c)?
What direction we should dig to troubleshoot this error?
Should we monitor memory leaks or try to use tcmalloc instead of malloc?
Comment 1 Howard Chu 2016-02-04 16:54:39 UTC
yaroslavr@digdes.com wrote:
> Full_Name: Yaroslav Rutsky
> Version: 2.4.40
> OS: CentOS release 6.6 (Final)
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (85.114.5.9)
>
>
> We used customized OpenLDAP RPM-packets, builded from
> openldap-2.4.40-6.el6_7.src.rpm, from CentOS repository with patch to support
> Outlook addressbook browsing feature (in conjunction with sssvlv overlay):

> What is the root cause of this error ("double free or corruption" at malloc.c)?
> What direction we should dig to troubleshoot this error?
> Should we monitor memory leaks or try to use tcmalloc instead of malloc?

Since it appears that you can reproduce this problem, it ought to be easy to 
identify the cause using valgrind.

Meanwhile, 2.4.40 is 3 releases behind and we're about to release 2.4.44. If 
you can reproduce the problem in the current RE24 release candidate and track 
it down in valgrind there may be time to fix it for the 2.4.44 release. 
Otherwise, you'll have to wait for the next release.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Quanah Gibson-Mount 2017-04-12 16:54:50 UTC
moved from Incoming to Software Bugs
Comment 3 Quanah Gibson-Mount 2020-03-22 03:03:50 UTC
Need reproduction case or further follow up from the reporter.