Issue 7987 - SIGSEGV in LMDB while adding ldap-entries
Summary: SIGSEGV in LMDB while adding ldap-entries
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-11-26 16:31 UTC by Leonid Yuriev
Modified: 2020-03-12 15:55 UTC (History)
0 users

See Also:


Attachments
its7987-gitbisect-testcase.tar.gz (4.46 KB, application/x-gzip)
2014-11-27 09:26 UTC, Leonid Yuriev
Details
excessive-space-single-write-txn.patch (778 bytes, patch)
2014-12-02 11:17 UTC, Leonid Yuriev
Details
its7987-testcase.tar.gz (4.11 KB, application/x-gzip)
2014-11-26 19:11 UTC, Leonid Yuriev
Details

Note You need to log in before you can comment on or make changes to this issue.
Description Leonid Yuriev 2014-11-26 16:31:22 UTC
Full_Name: Leonid Yuriev
Version: 2.4 git head (OPENLDAP_REL_ENG_2_4 branch)
OS: Ubuntu 14.10
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)


Multi-master cluster of 4 nodes.
Testcase will be available shortly (config + script).

Program terminated with signal 11, Segmentation fault.
#0  0x00000000004a2e25 in mdb_cursor_put (mc=0x7f81f915c370, key=0x7f82177fd120,
data=0x7f82177fd130, flags=32) at ./../../../libraries/liblmdb/mdb.c:6358
6358		nsize = IS_LEAF2(mc->mc_pg[mc->mc_top]) ? key->mv_size :
mdb_leaf_size(env, key, rdata);
(gdb) bt
#0  0x00000000004a2e25 in mdb_cursor_put (mc=0x7f81f915c370, key=0x7f82177fd120,
data=0x7f82177fd130, flags=32) at ./../../../libraries/liblmdb/mdb.c:6358
#1  0x00000000004a2b52 in mdb_cursor_put (mc=mc@entry=0x7f81f915c370,
key=key@entry=0x7f82177fd120, data=data@entry=0x7f82177fd130,
flags=flags@entry2%2) at ./../../../libraries/liblmdb/mdb.c:6471
#2  0x00000000004d3e58 in mdb_idl_insert_keys (be=<optimised out>,
cursor=0x7f81f915c370, keys=<optimised out>, id=997543) at idl.c:534
#3  0x00000000004d5143 in indexer (op=0x7f81ed1356e0, txn=0x0, atname=0%0,
vals=0x7f81f81026d0, id=997543, opid=1, mask=4, ad=<optimised out>,
ai=<optimised out>, ai=<optimised out>) at index.c:219
#4  0x00000000004d5445 in index_at_values (op=0x7f81ed1356e0, txn=0x1b55110,
type=0x194a270, tags=0x1943830, vals=0x7f81f81026d0, id=997543, opid=1,
ad=<optimised out>) at index.c:337
#5  0x00000000004d59f9 in mdb_index_values (opid=<optimised out>, id=<optimised
out>, vals=<optimised out>, desc=<optimised out>, txn=<optimised out>,
op=<optimised out>) at index.c:386
#6  mdb_index_entry (op=0x0, txn=0x2C o opid=394252592, e=0x8) at index.c:558
#7  0x00000000004c9159 in mdb_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at
add.c:359
#8  0x000000000048a346 in overlay_op_walk (op=op@entry=0x7f81ed1356e0,
rs=0x7f82177fda80, which=op_add, oi=0x19a3730, on=0x0)t t backover.c:671
#9  0x000000000048a4b1 in over_op_func (op=0x7f81ed1356e0, rs=<optimised out>,
which=<optimised out>) at backover.c:723
#10 0x000000000042a9b0 in fe_op_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at
add.c:334
#11 0x000000000042b59f in do_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at
add.c:194
#12 0x0000000000424624 in connection_operation (ctx=0x7f82177fdbd0,
arg_v=0x7f81ed1356e0) at connection.c:1155
#13 0x0000000000424d3c in connection_read_thread (ctx=0x7f82177fdbd0, argv=0x13)
at connection.c:1291
#14 0x00007f824def1cf2 in ldap_int_thread_pool_wrapper (xpool=0x1956090) at
tpool.c:688
#15 0x00007f824dabc0a5 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#16 0x00007f824d7e984d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) p * mc
$1 = {mc_next = 0x0, mc_backup = 0x0, mc_xcursor = 0x7f81f915c4f8, mc_txn =
0x1b55110, mc_dbi = 5, mc_db = 0x1b55288, mc_dbx = 0x1b52660, mc_dbflag =
0x1b57015 "\v\n\n", mc_snum = 0, mc_top = 0, 
  mc_flags % 6 64, mc_pg = {0x0, 0x7f822f03f000, 0x7f822f287000, 0x308f, 0x2fb0,
0x2e37, 0x2d75, 0x2bbc, 0x2b76, 0x28eb, 0x25f4, 0x2582, 0x2439, 0x225f, 0x21d0,
0x1fdd, 0x1c89, 0x1b68, 0x1b00, 0x19db, 
    0x1323, 0xf00, 0x895, 0x223, 0x12b, 0x3cb, 0x32b, 0xe5, 0x0, 0x7f81f9006e63,
0x941, 0x7f81f80005b8}, mc_ki = {0, 0, 0, 0, 50336, 63765, 32641, 0, 50336,
63765, 32641, 0, 3, 0, 12849, 14641, 21032, 
    437, 0, 0, 9728, 437, 0, 0, 28691, 437, 0, 0, 3, 2, 65, 0}}
(gdb) p mc->mc_top
$2 = 0
(gdb) p mc->mc_pg
$3 = {0x0, 0f8f822f03f000, 0x7f822f287000, 0x308f, 0x2fb0, 0x2e37, 0x2d75,
0x2bbc, 0x2b76, 0x28eb, 0x25f4, 0x2582, 0x2439, 0x225f, 0x21d0, 0x1fdd, 0x1c89,
0x1b68, 0x1b00, 0x19db, 0x1323, 0xf00, 0x895, 
  0x223, 0x12b, 0x3cb, 0x32b, 0xe5, 0x0, 0x7f81f9006e63, 0x941, 0x7f81f80005b8}
(gdb) p mc->mc_pg[mc->mc_top]
$4 = (MDB_page *) 0x0

git show
commit 6b26910c5acbf141ff322d043b3301d0976a7913
Author: Quanah Gibson-Mount <quanah@openldap.org>
Date:   2014-10-03 15:35:39 -0500

    Silence compiler warning by adding explicit return 0 to ppolicy_db_destroy


Comment 1 Leonid Yuriev 2014-11-26 19:11:33 UTC
Just run a 'runme.sh' from attached tarball.
Problem (coredump) usually is reproduced for 10-20 min.
I will try 'git bisect'.

Leonid.
Comment 2 Leonid Yuriev 2014-11-27 09:26:20 UTC
Two runs gave the same result, bisect-testcase attached.
sudo will be called for ramfs-mount and ifconfig for ip-aliases (for cluster).

mkdir xxx
cd xxx
tar xaf its7987-gitbisect-testcase.tar.gz
git clone git://git.openldap.org/openldap.git openldap-source
cd openldap-source
git checkout OPENLDAP_REL_ENG_2_4
git bisect start 'OPENLDAP_REL_ENG_2_4' 'OPENLDAP_REL_ENG_2_4_39'
git bisect run ../git-bisect-probe.sh
git bisect log

# bad: [6b26910c5acbf141ff322d043b3301d0976a7913] Silence compiler
warning by adding explicit return 0 to ppolicy_db_destroy
# good: [6bfa8256161618e62e56d139243aa234b3ece875] Prep for release
git bisect start 'OPENLDAP_REL_ENG_2_4' 'OPENLDAP_REL_ENG_2_4_39'
# good: [0e104da454c0f59c6d3195add38c935179de2cb3] Merge
remote-tracking branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4
git bisect good 0e104da454c0f59c6d3195add38c935179de2cb3
# good: [5e4cc6d1c515ace1e0c151ca8986bf805f203834] ITS#7895, ITS#7915
git bisect good 5e4cc6d1c515ace1e0c151ca8986bf805f203834
# good: [d871d328544124573bdbf63e360c4bb47e6c0d38] ITS#7702
git bisect good d871d328544124573bdbf63e360c4bb47e6c0d38
# good: [e4c848e24ff36139eb235efe7f426d89e01056ae] ITS#7915 fix memory
leaks in previous patch
git bisect good e4c848e24ff36139eb235efe7f426d89e01056ae
# bad: [0659ef45d486b5daaafc020cb67b561a8029036d] ITS#7941 fix for repeated tags
git bisect bad 0659ef45d486b5daaafc020cb67b561a8029036d
# bad: [5ebeec95535786ce7a0e80173209e33f64ab9013] Merge
remote-tracking branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4
git bisect bad 5ebeec95535786ce7a0e80173209e33f64ab9013
# skip: [29fd241fadc3dd49b3486f0e3556b029b716bcbf] Remember oldest reader txnid
git bisect skip 29fd241fadc3dd49b3486f0e3556b029b716bcbf
# skip: [3646ba966c75137b01e38fc5baea6d5864189c8e] More for me_pgoldest
git bisect skip 3646ba966c75137b01e38fc5baea6d5864189c8e
# skip: [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn
git bisect skip 4d02c741b120786df1b87ee9ed49c1d3f9bc7522
# skip: [5ee99f1125a775f28ed69b06d991a43c60d894a9] Change retry to num
times 60.  Testing shows that on a known dataset, this has the same
growth behavior as 2.4.39, while num times 20 resulted in significant
growth.
git bisect skip 5ee99f1125a775f28ed69b06d991a43c60d894a9
# only skipped commits left to test
# possible first bad commit:
[5ebeec95535786ce7a0e80173209e33f64ab9013] Merge remote-tracking
branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4
# possible first bad commit:
[5ee99f1125a775f28ed69b06d991a43c60d894a9] Change retry to num times
60.  Testing shows that on a known dataset, this has the same growth
behavior as 2.4.39, while num times 20 resulted in significant growth.
# possible first bad commit:
[3646ba966c75137b01e38fc5baea6d5864189c8e] More for me_pgoldest
# possible first bad commit:
[29fd241fadc3dd49b3486f0e3556b029b716bcbf] Remember oldest reader
txnid
# possible first bad commit:
[4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn

2014-11-26 22:11 GMT+03:00 Леонид Юрьев <leo@yuriev.ru>:
> Just run a 'runme.sh' from attached tarball.
> Problem (coredump) usually is reproduced for 10-20 min.
> I will try 'git bisect'.
>
> Leonid.
Comment 3 Hallvard Furuseth 2014-11-27 10:07:52 UTC
On 11/27/2014 10:26 AM, leo@yuriev.ru wrote:
> # possible first bad commit:
> [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn

Partly fixed in mdb.master, but I think branch "mdb/fixes"
in <git://git.uio.no/u/hbf/openldap.git> is needed too.
That is, the "[Untested] ITS#7961 Re-fix txn init." commit.
Both branches have some remaining problems, however.

-- 
Hallvard

Comment 4 Leonid Yuriev 2014-11-27 18:00:31 UTC
confirm - this bug seems to be fixed (just merge with 'mdb/fixes' from
git://git.uio.no/u/hbf/openldap.git).

but now is another, like http://www.openldap.org/its/index.cgi/Incoming?id=7968

#3  0x00007f5061836c82 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f5061df4553 in ber_int_sb_write
(sb=sb@entry=0x7f5024105870, buf=0x7f4ff80019d0, len=len@entry=152) at
sockbuf.c:441
#5  0x00007f5061df0b8b in ber_flush2 (sb=0x7f5024105870,
ber=ber@entry=0x7f500cffb330, freeit=freeit@entry=0) at io.c:246
#6  0x0000000000433dce in send_ldap_ber (op=0x7f500cffb7c0,
ber=0x7f500cffb330) at result.c:339
#7  0x00000000004367a6 in slap_send_search_entry (op=0x7f500cffb7c0,
rs=0x7f500cffb5c0) at result.c:1430
#8  0x000000000051077a in syncprov_sendresp (mode=<optimised out>,
so=<optimised out>, opc=<optimised out>, op=<optimised out>) at
syncprov.c:895
#9  syncprov_qplay (so=<optimised out>, op=<optimised out>) at syncprov.c:944
#10 syncprov_qtask (ctx=0x7f4ff81086c8, arg=0x7f4ff8108670) at syncprov.c:1010
#11 0x00007f5062009d12 in ldap_int_thread_pool_wrapper
(xpool=0xd3b070) at tpool.c:688
#12 0x00007f5061bd40a5 in start_thread () from
/lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f506190184d in clone () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) frame 4
#4  0x00007f5061df4553 in ber_int_sb_write
(sb=sb@entry=0x7f5024105870, buf=0x7f4ff80019d0, len=len@entry=152) at
sockbuf.c:441
441 assert( sb->sb_iod != NULL );
(gdb) info local
ret = <optimised out>
__PRETTY_FUNCTION__ = "ber_int_sb_write"
(gdb) p *sb
$1 = {sb_opts = {lbo_valid = 3, lbo_options = 0, lbo_debug = 16384},
sb_iod = 0x0, sb_fd = 75, sb_max_incoming = 262143,
sb_trans_needs_read = 0, sb_trans_needs_write = 0}

tail xxx.log
54775e18 syncrepl_entry: rid=001 cn=tablet,uid=1040,dc=ngdr,dc=ldap
54775e18 slap_queue_csn: queueing 0x7f5020107890
20141127172334.589180Z#000000#001#000000
54775e18 syncprov_matchops: skipping original sid 001
54775e18 slap_graduate_commit_csn: removing 0x7f502012d740
20141127172334.589180Z#000000#001#000000
54775e18 syncrepl_entry: rid=001 be_delete
cn=tablet,uid=1040,dc=ngdr,dc=ldap (0)
54775e18 slap_queue_csn: queueing 0x7f5020107890
20141127172334.589180Z#000000#001#000000
54775e18 slap_graduate_commit_csn: removing 0x7f5020108020
20141127172334.589180Z#000000#001#000000
54775e18 do_syncrep2: rid=001
cookie=rid=001,sid=001,csn=20141127172334.589328Z#000000#001#000000
54775e18 syncrepl_message_to_entry: rid=001 DN:
cn=tablet,uid=1793,dc=ngdr,dc=ldap, UUID:
dc4d5d30-0aa5-1034-8ff5-a9ee5a485f53
slapd: sockbuf.c:441: ber_int_sb_write: Assertion `sb->sb_iod !=
((void *)0)' failed.

2014-11-27 13:07 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>:
> On 11/27/2014 10:26 AM, leo@yuriev.ru wrote:
>>
>> # possible first bad commit:
>> [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn
>
>
> Partly fixed in mdb.master, but I think branch "mdb/fixes"
> in <git://git.uio.no/u/hbf/openldap.git> is needed too.
> That is, the "[Untested] ITS#7961 Re-fix txn init." commit.
> Both branches have some remaining problems, however.
>
> --
> Hallvard

Comment 5 Leonid Yuriev 2014-11-28 13:11:47 UTC
Sure this bug is introduced by the 4d02c741 (Use a single write txn).

Could be fixed by:
cherry-pick ead57604 (ITS#7961 Re-fix txn init) from
git://git.uio.no/u/hbf/openldap.git
or
revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit
txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a
single write txn).

Leonid.

Comment 6 Howard Chu 2014-11-28 19:06:31 UTC
leo@yuriev.ru wrote:
> Sure this bug is introduced by the 4d02c741 (Use a single write txn).
>
> Could be fixed by:
> cherry-pick ead57604 (ITS#7961 Re-fix txn init) from
> git://git.uio.no/u/hbf/openldap.git
> or
> revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit
> txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a
> single write txn).

Totally fixed? Or are you still seeing the ITS#7968 crash now? So far 
I've run your testcase on the patched source (with ead57604) and gotten 
no crashes.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 7 Leonid Yuriev 2014-11-28 19:44:05 UTC
(1) ITS#7968 not fixed
- still crash in a few 'favorite' locations
- last time SIGSEGV in do_syncrep2() at  "if ( ber_bvcmp(
syncCookie.ctxcsn, &si->si_cookieState->cs_vals[i] ) <= 0 )"

(2) mdb.master seems to be have another bug(s), that also was fixed in
the 'mdb/fixes' branch on git://git.uio.no/u/hbf/openldap.git
- just a cherry-pick of ead57604 fixes this SIGSEGV, but I got a
'implementation specific' error in heavy add/delete test (same as in
the attached testcase).
- but merge 'mdb/fixes' passed my test.

(3) OPENLDAP_REL_ENG_2_4
without merge current mdb.master
and after revert all 4d02c741-related changes (single write txn)
also passed my test.

Leonid.

2014-11-28 22:06 GMT+03:00 Howard Chu <hyc@symas.com>:
> leo@yuriev.ru wrote:
>>
>> Sure this bug is introduced by the 4d02c741 (Use a single write txn).
>>
>> Could be fixed by:
>> cherry-pick ead57604 (ITS#7961 Re-fix txn init) from
>> git://git.uio.no/u/hbf/openldap.git
>> or
>> revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit
>> txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a
>> single write txn).
>
>
> Totally fixed? Or are you still seeing the ITS#7968 crash now? So far I've
> run your testcase on the patched source (with ead57604) and gotten no
> crashes.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 Leonid Yuriev 2014-11-30 20:50:51 UTC
Previously noted (2) and (3) was caused by following patch.
It is a trial from Howard Chu for workout on ITS#7968 ;)

Under a heavy load of add/delete it may induce "DN index delete failed" 
in the mdb backend.

> Message-ID: <5440D299.60208@symas.com>
> Date: Fri, 17 Oct 2014 09:26:01 +0100
> From: Howard Chu <hyc@symas.com>
>
> Can you try this patch and followup again?
>
>> diff --git a/servers/slapd/overlays/syncprov.c 
>> b/servers/slapd/overlays/syncprov.c
>> index e15020e..b54c83f 100644
>> --- a/servers/slapd/overlays/syncprov.c
>> +++ b/servers/slapd/overlays/syncprov.c
>> @@ -1306,11 +1306,12 @@ syncprov_matchops( Operation *op, opcookie 
>> *opc, int saveit )
>>                         op2.o_hdr = &oh;
>>                         op2.o_extra = op->o_extra;
>>                         op2.o_callback = NULL;
>> -                       if (ss->s_flags & PS_FIX_FILTER) {
>> +                       if ((ss->s_flags & PS_FIX_FILTER)
>> +                               && op2.ors_filter->f_choice == 
>> LDAP_FILTER_AND) {
>>                                 /* Skip the AND/GE clause that we 
>> stuck on in front. We
>>                                    would lose deletes/mods that 
>> happen during the refresh
>>                                    phase otherwise (ITS#6555) */
>> -                               op2.ors_filter = 
>> ss->s_op->ors_filter->f_and->f_next;
>> +                               op2.ors_filter = 
>> op2.ors_filter->f_and->f_next;
>>                         }
>>                         ldap_pvt_thread_mutex_unlock( &ss->s_mutex );
>>                         rc = test_filter( &op2, e, op2.ors_filter );


Comment 9 Leonid Yuriev 2014-12-02 11:17:29 UTC
(1) I can сonfirm once again - the bug is fixed.

(2) Previously noted failure "DN index delete failed" caused by another 
(local) bug.

(3) Feature "single write txn" need a minor fix, patch attached.

Leonid.

---

The attached files is derived from OpenLDAP Software. All of the 
modifications
to OpenLDAP Software represented in the following patch(es) were 
developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned 
rights
and/or interest in this work to any party. I, Leonid Yuriev am 
authorized by
Peter-Service LLC, my employer, to release this work under the following
terms.

Peter-Service LLC hereby places the following modifications to OpenLDAP 
Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose 
with or
without attribution and/or other notice.


Comment 10 OpenLDAP project 2014-12-05 19:31:32 UTC
fixed in mdb.master
Comment 11 Howard Chu 2014-12-05 19:31:32 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 12 Quanah Gibson-Mount 2014-12-11 01:06:02 UTC
changed state Test to Release
Comment 13 Quanah Gibson-Mount 2015-07-02 17:45:49 UTC
changed state Release to Closed