Full_Name: Leonid Yuriev Version: 2.4 git head (OPENLDAP_REL_ENG_2_4 branch) OS: Ubuntu 14.10 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (31.130.36.33) Multi-master cluster of 4 nodes. Testcase will be available shortly (config + script). Program terminated with signal 11, Segmentation fault. #0 0x00000000004a2e25 in mdb_cursor_put (mc=0x7f81f915c370, key=0x7f82177fd120, data=0x7f82177fd130, flags=32) at ./../../../libraries/liblmdb/mdb.c:6358 6358 nsize = IS_LEAF2(mc->mc_pg[mc->mc_top]) ? key->mv_size : mdb_leaf_size(env, key, rdata); (gdb) bt #0 0x00000000004a2e25 in mdb_cursor_put (mc=0x7f81f915c370, key=0x7f82177fd120, data=0x7f82177fd130, flags=32) at ./../../../libraries/liblmdb/mdb.c:6358 #1 0x00000000004a2b52 in mdb_cursor_put (mc=mc@entry=0x7f81f915c370, key=key@entry=0x7f82177fd120, data=data@entry=0x7f82177fd130, flags=flags@entry2%2) at ./../../../libraries/liblmdb/mdb.c:6471 #2 0x00000000004d3e58 in mdb_idl_insert_keys (be=<optimised out>, cursor=0x7f81f915c370, keys=<optimised out>, id=997543) at idl.c:534 #3 0x00000000004d5143 in indexer (op=0x7f81ed1356e0, txn=0x0, atname=0%0, vals=0x7f81f81026d0, id=997543, opid=1, mask=4, ad=<optimised out>, ai=<optimised out>, ai=<optimised out>) at index.c:219 #4 0x00000000004d5445 in index_at_values (op=0x7f81ed1356e0, txn=0x1b55110, type=0x194a270, tags=0x1943830, vals=0x7f81f81026d0, id=997543, opid=1, ad=<optimised out>) at index.c:337 #5 0x00000000004d59f9 in mdb_index_values (opid=<optimised out>, id=<optimised out>, vals=<optimised out>, desc=<optimised out>, txn=<optimised out>, op=<optimised out>) at index.c:386 #6 mdb_index_entry (op=0x0, txn=0x2C o opid=394252592, e=0x8) at index.c:558 #7 0x00000000004c9159 in mdb_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at add.c:359 #8 0x000000000048a346 in overlay_op_walk (op=op@entry=0x7f81ed1356e0, rs=0x7f82177fda80, which=op_add, oi=0x19a3730, on=0x0)t t backover.c:671 #9 0x000000000048a4b1 in over_op_func (op=0x7f81ed1356e0, rs=<optimised out>, which=<optimised out>) at backover.c:723 #10 0x000000000042a9b0 in fe_op_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at add.c:334 #11 0x000000000042b59f in do_add (op=0x7f81ed1356e0, rs=0x7f82177fda80) at add.c:194 #12 0x0000000000424624 in connection_operation (ctx=0x7f82177fdbd0, arg_v=0x7f81ed1356e0) at connection.c:1155 #13 0x0000000000424d3c in connection_read_thread (ctx=0x7f82177fdbd0, argv=0x13) at connection.c:1291 #14 0x00007f824def1cf2 in ldap_int_thread_pool_wrapper (xpool=0x1956090) at tpool.c:688 #15 0x00007f824dabc0a5 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #16 0x00007f824d7e984d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) p * mc $1 = {mc_next = 0x0, mc_backup = 0x0, mc_xcursor = 0x7f81f915c4f8, mc_txn = 0x1b55110, mc_dbi = 5, mc_db = 0x1b55288, mc_dbx = 0x1b52660, mc_dbflag = 0x1b57015 "\v\n\n", mc_snum = 0, mc_top = 0, mc_flags % 6 64, mc_pg = {0x0, 0x7f822f03f000, 0x7f822f287000, 0x308f, 0x2fb0, 0x2e37, 0x2d75, 0x2bbc, 0x2b76, 0x28eb, 0x25f4, 0x2582, 0x2439, 0x225f, 0x21d0, 0x1fdd, 0x1c89, 0x1b68, 0x1b00, 0x19db, 0x1323, 0xf00, 0x895, 0x223, 0x12b, 0x3cb, 0x32b, 0xe5, 0x0, 0x7f81f9006e63, 0x941, 0x7f81f80005b8}, mc_ki = {0, 0, 0, 0, 50336, 63765, 32641, 0, 50336, 63765, 32641, 0, 3, 0, 12849, 14641, 21032, 437, 0, 0, 9728, 437, 0, 0, 28691, 437, 0, 0, 3, 2, 65, 0}} (gdb) p mc->mc_top $2 = 0 (gdb) p mc->mc_pg $3 = {0x0, 0f8f822f03f000, 0x7f822f287000, 0x308f, 0x2fb0, 0x2e37, 0x2d75, 0x2bbc, 0x2b76, 0x28eb, 0x25f4, 0x2582, 0x2439, 0x225f, 0x21d0, 0x1fdd, 0x1c89, 0x1b68, 0x1b00, 0x19db, 0x1323, 0xf00, 0x895, 0x223, 0x12b, 0x3cb, 0x32b, 0xe5, 0x0, 0x7f81f9006e63, 0x941, 0x7f81f80005b8} (gdb) p mc->mc_pg[mc->mc_top] $4 = (MDB_page *) 0x0 git show commit 6b26910c5acbf141ff322d043b3301d0976a7913 Author: Quanah Gibson-Mount <quanah@openldap.org> Date: 2014-10-03 15:35:39 -0500 Silence compiler warning by adding explicit return 0 to ppolicy_db_destroy
Just run a 'runme.sh' from attached tarball. Problem (coredump) usually is reproduced for 10-20 min. I will try 'git bisect'. Leonid.
Two runs gave the same result, bisect-testcase attached. sudo will be called for ramfs-mount and ifconfig for ip-aliases (for cluster). mkdir xxx cd xxx tar xaf its7987-gitbisect-testcase.tar.gz git clone git://git.openldap.org/openldap.git openldap-source cd openldap-source git checkout OPENLDAP_REL_ENG_2_4 git bisect start 'OPENLDAP_REL_ENG_2_4' 'OPENLDAP_REL_ENG_2_4_39' git bisect run ../git-bisect-probe.sh git bisect log # bad: [6b26910c5acbf141ff322d043b3301d0976a7913] Silence compiler warning by adding explicit return 0 to ppolicy_db_destroy # good: [6bfa8256161618e62e56d139243aa234b3ece875] Prep for release git bisect start 'OPENLDAP_REL_ENG_2_4' 'OPENLDAP_REL_ENG_2_4_39' # good: [0e104da454c0f59c6d3195add38c935179de2cb3] Merge remote-tracking branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4 git bisect good 0e104da454c0f59c6d3195add38c935179de2cb3 # good: [5e4cc6d1c515ace1e0c151ca8986bf805f203834] ITS#7895, ITS#7915 git bisect good 5e4cc6d1c515ace1e0c151ca8986bf805f203834 # good: [d871d328544124573bdbf63e360c4bb47e6c0d38] ITS#7702 git bisect good d871d328544124573bdbf63e360c4bb47e6c0d38 # good: [e4c848e24ff36139eb235efe7f426d89e01056ae] ITS#7915 fix memory leaks in previous patch git bisect good e4c848e24ff36139eb235efe7f426d89e01056ae # bad: [0659ef45d486b5daaafc020cb67b561a8029036d] ITS#7941 fix for repeated tags git bisect bad 0659ef45d486b5daaafc020cb67b561a8029036d # bad: [5ebeec95535786ce7a0e80173209e33f64ab9013] Merge remote-tracking branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4 git bisect bad 5ebeec95535786ce7a0e80173209e33f64ab9013 # skip: [29fd241fadc3dd49b3486f0e3556b029b716bcbf] Remember oldest reader txnid git bisect skip 29fd241fadc3dd49b3486f0e3556b029b716bcbf # skip: [3646ba966c75137b01e38fc5baea6d5864189c8e] More for me_pgoldest git bisect skip 3646ba966c75137b01e38fc5baea6d5864189c8e # skip: [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn git bisect skip 4d02c741b120786df1b87ee9ed49c1d3f9bc7522 # skip: [5ee99f1125a775f28ed69b06d991a43c60d894a9] Change retry to num times 60. Testing shows that on a known dataset, this has the same growth behavior as 2.4.39, while num times 20 resulted in significant growth. git bisect skip 5ee99f1125a775f28ed69b06d991a43c60d894a9 # only skipped commits left to test # possible first bad commit: [5ebeec95535786ce7a0e80173209e33f64ab9013] Merge remote-tracking branch 'origin/mdb.master' into OPENLDAP_REL_ENG_2_4 # possible first bad commit: [5ee99f1125a775f28ed69b06d991a43c60d894a9] Change retry to num times 60. Testing shows that on a known dataset, this has the same growth behavior as 2.4.39, while num times 20 resulted in significant growth. # possible first bad commit: [3646ba966c75137b01e38fc5baea6d5864189c8e] More for me_pgoldest # possible first bad commit: [29fd241fadc3dd49b3486f0e3556b029b716bcbf] Remember oldest reader txnid # possible first bad commit: [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn 2014-11-26 22:11 GMT+03:00 Леонид Юрьев <leo@yuriev.ru>: > Just run a 'runme.sh' from attached tarball. > Problem (coredump) usually is reproduced for 10-20 min. > I will try 'git bisect'. > > Leonid.
On 11/27/2014 10:26 AM, leo@yuriev.ru wrote: > # possible first bad commit: > [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn Partly fixed in mdb.master, but I think branch "mdb/fixes" in <git://git.uio.no/u/hbf/openldap.git> is needed too. That is, the "[Untested] ITS#7961 Re-fix txn init." commit. Both branches have some remaining problems, however. -- Hallvard
confirm - this bug seems to be fixed (just merge with 'mdb/fixes' from git://git.uio.no/u/hbf/openldap.git). but now is another, like http://www.openldap.org/its/index.cgi/Incoming?id=7968 #3 0x00007f5061836c82 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x00007f5061df4553 in ber_int_sb_write (sb=sb@entry=0x7f5024105870, buf=0x7f4ff80019d0, len=len@entry=152) at sockbuf.c:441 #5 0x00007f5061df0b8b in ber_flush2 (sb=0x7f5024105870, ber=ber@entry=0x7f500cffb330, freeit=freeit@entry=0) at io.c:246 #6 0x0000000000433dce in send_ldap_ber (op=0x7f500cffb7c0, ber=0x7f500cffb330) at result.c:339 #7 0x00000000004367a6 in slap_send_search_entry (op=0x7f500cffb7c0, rs=0x7f500cffb5c0) at result.c:1430 #8 0x000000000051077a in syncprov_sendresp (mode=<optimised out>, so=<optimised out>, opc=<optimised out>, op=<optimised out>) at syncprov.c:895 #9 syncprov_qplay (so=<optimised out>, op=<optimised out>) at syncprov.c:944 #10 syncprov_qtask (ctx=0x7f4ff81086c8, arg=0x7f4ff8108670) at syncprov.c:1010 #11 0x00007f5062009d12 in ldap_int_thread_pool_wrapper (xpool=0xd3b070) at tpool.c:688 #12 0x00007f5061bd40a5 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #13 0x00007f506190184d in clone () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) frame 4 #4 0x00007f5061df4553 in ber_int_sb_write (sb=sb@entry=0x7f5024105870, buf=0x7f4ff80019d0, len=len@entry=152) at sockbuf.c:441 441 assert( sb->sb_iod != NULL ); (gdb) info local ret = <optimised out> __PRETTY_FUNCTION__ = "ber_int_sb_write" (gdb) p *sb $1 = {sb_opts = {lbo_valid = 3, lbo_options = 0, lbo_debug = 16384}, sb_iod = 0x0, sb_fd = 75, sb_max_incoming = 262143, sb_trans_needs_read = 0, sb_trans_needs_write = 0} tail xxx.log 54775e18 syncrepl_entry: rid=001 cn=tablet,uid=1040,dc=ngdr,dc=ldap 54775e18 slap_queue_csn: queueing 0x7f5020107890 20141127172334.589180Z#000000#001#000000 54775e18 syncprov_matchops: skipping original sid 001 54775e18 slap_graduate_commit_csn: removing 0x7f502012d740 20141127172334.589180Z#000000#001#000000 54775e18 syncrepl_entry: rid=001 be_delete cn=tablet,uid=1040,dc=ngdr,dc=ldap (0) 54775e18 slap_queue_csn: queueing 0x7f5020107890 20141127172334.589180Z#000000#001#000000 54775e18 slap_graduate_commit_csn: removing 0x7f5020108020 20141127172334.589180Z#000000#001#000000 54775e18 do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20141127172334.589328Z#000000#001#000000 54775e18 syncrepl_message_to_entry: rid=001 DN: cn=tablet,uid=1793,dc=ngdr,dc=ldap, UUID: dc4d5d30-0aa5-1034-8ff5-a9ee5a485f53 slapd: sockbuf.c:441: ber_int_sb_write: Assertion `sb->sb_iod != ((void *)0)' failed. 2014-11-27 13:07 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > On 11/27/2014 10:26 AM, leo@yuriev.ru wrote: >> >> # possible first bad commit: >> [4d02c741b120786df1b87ee9ed49c1d3f9bc7522] Use a single write txn > > > Partly fixed in mdb.master, but I think branch "mdb/fixes" > in <git://git.uio.no/u/hbf/openldap.git> is needed too. > That is, the "[Untested] ITS#7961 Re-fix txn init." commit. > Both branches have some remaining problems, however. > > -- > Hallvard
Sure this bug is introduced by the 4d02c741 (Use a single write txn). Could be fixed by: cherry-pick ead57604 (ITS#7961 Re-fix txn init) from git://git.uio.no/u/hbf/openldap.git or revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a single write txn). Leonid.
leo@yuriev.ru wrote: > Sure this bug is introduced by the 4d02c741 (Use a single write txn). > > Could be fixed by: > cherry-pick ead57604 (ITS#7961 Re-fix txn init) from > git://git.uio.no/u/hbf/openldap.git > or > revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit > txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a > single write txn). Totally fixed? Or are you still seeing the ITS#7968 crash now? So far I've run your testcase on the patched source (with ead57604) and gotten no crashes. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
(1) ITS#7968 not fixed - still crash in a few 'favorite' locations - last time SIGSEGV in do_syncrep2() at "if ( ber_bvcmp( syncCookie.ctxcsn, &si->si_cookieState->cs_vals[i] ) <= 0 )" (2) mdb.master seems to be have another bug(s), that also was fixed in the 'mdb/fixes' branch on git://git.uio.no/u/hbf/openldap.git - just a cherry-pick of ead57604 fixes this SIGSEGV, but I got a 'implementation specific' error in heavy add/delete test (same as in the attached testcase). - but merge 'mdb/fixes' passed my test. (3) OPENLDAP_REL_ENG_2_4 without merge current mdb.master and after revert all 4d02c741-related changes (single write txn) also passed my test. Leonid. 2014-11-28 22:06 GMT+03:00 Howard Chu <hyc@symas.com>: > leo@yuriev.ru wrote: >> >> Sure this bug is introduced by the 4d02c741 (Use a single write txn). >> >> Could be fixed by: >> cherry-pick ead57604 (ITS#7961 Re-fix txn init) from >> git://git.uio.no/u/hbf/openldap.git >> or >> revert: d72b2f5d (ITS#7961 fix txn init), 62e4eeb7 (ITS#7943 reinit >> txn flags), 891e6627 (Plug leak in 4d02c741...) and 4d02c741 (Use a >> single write txn). > > > Totally fixed? Or are you still seeing the ITS#7968 crash now? So far I've > run your testcase on the patched source (with ead57604) and gotten no > crashes. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/
Previously noted (2) and (3) was caused by following patch. It is a trial from Howard Chu for workout on ITS#7968 ;) Under a heavy load of add/delete it may induce "DN index delete failed" in the mdb backend. > Message-ID: <5440D299.60208@symas.com> > Date: Fri, 17 Oct 2014 09:26:01 +0100 > From: Howard Chu <hyc@symas.com> > > Can you try this patch and followup again? > >> diff --git a/servers/slapd/overlays/syncprov.c >> b/servers/slapd/overlays/syncprov.c >> index e15020e..b54c83f 100644 >> --- a/servers/slapd/overlays/syncprov.c >> +++ b/servers/slapd/overlays/syncprov.c >> @@ -1306,11 +1306,12 @@ syncprov_matchops( Operation *op, opcookie >> *opc, int saveit ) >> op2.o_hdr = &oh; >> op2.o_extra = op->o_extra; >> op2.o_callback = NULL; >> - if (ss->s_flags & PS_FIX_FILTER) { >> + if ((ss->s_flags & PS_FIX_FILTER) >> + && op2.ors_filter->f_choice == >> LDAP_FILTER_AND) { >> /* Skip the AND/GE clause that we >> stuck on in front. We >> would lose deletes/mods that >> happen during the refresh >> phase otherwise (ITS#6555) */ >> - op2.ors_filter = >> ss->s_op->ors_filter->f_and->f_next; >> + op2.ors_filter = >> op2.ors_filter->f_and->f_next; >> } >> ldap_pvt_thread_mutex_unlock( &ss->s_mutex ); >> rc = test_filter( &op2, e, op2.ors_filter );
(1) I can сonfirm once again - the bug is fixed. (2) Previously noted failure "DN index delete failed" caused by another (local) bug. (3) Feature "single write txn" need a minor fix, patch attached. Leonid. --- The attached files is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights and/or interest in this work to any party. I, Leonid Yuriev am authorized by Peter-Service LLC, my employer, to release this work under the following terms. Peter-Service LLC hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
fixed in mdb.master
changed notes changed state Open to Test moved from Incoming to Software Bugs
changed state Test to Release
changed state Release to Closed