Full_Name: Ryan Tandy Version: master (05ea787), RE24 (082e192) OS: Debian unstable URL: ftp://ftp.openldap.org/incoming/20150317_rtandy_syncprovsegv.tgz Submission from: (NULL) (24.68.37.4) hi, ./configure CFLAGS="-g -O0" --disable-bdb --disable-hdb --enable-syncprov reproducer: ftp://ftp.openldap.org/incoming/20150317_rtandy_syncprovsegv.tgz note this is _not_ delta-syncrepl. ./prepare ./runslapd (backgrounds a consumer, runs e e producer in gdb in the foreground) in another terminal, once the consumer has connected (5 seconds retry): ./modify I get the following crash on master and RE24. not every time, but most times. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe6ffe700 (LWP 25923)] 0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at syncprov.c:2129 2129 if ( m2->mi_op->o_threadctx == op->o_threadctx ) { (gdb) bt #0 0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at syncprov.c:2129 #1 0x00000000004b6e91 in overlay_op_walk (op=0x7fffd41024a0, rs=0x7fffe6ffdae0, which=op_modify, oi=0x895a30, on=0x895c10) at backover.c:661 #2 0x00000000004b715f in over_op_func (op=0x7fffd41024a0, rs=0x7fffe6ffdae0, which=op_modify) at backover.c:730 #3 0x00000000004b7293 in over_op_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at backover.c:769 #4 0x00000000004494c1 in fe_op_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at modify.c:303 #5 0x0000000000448d94 in do_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at modify.c:177 #6 0x0000000000429a9f in connection_operation (ctx=0x7fffe6ffdc10, arg_v=0x7fffd41024a0) at connection.c:1155 #7 0x000000000042a039 in connection_read_thread (ctx=0x7fffe6ffdc10, argv=0xb) at connection.c:1291 #8 0x000000000052b511 in ldap_int_thread_pool_wrapper (xpool=0x870270) at tpool.c:696 #9 0x00007ffff77ad0a4 in start_thread (arg=0x7fffe6ffe700) at pthread_create.c:309 #10 0x00007ffff74e204d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) p m2->mi_op $2 = (Operation *) 0xc8 ^ that looks a bit bogus. haven't started to investigate or bisect just yet. will look more tomorrow. apologies in advance if I've duplicated an existing ITS by accident :)
Fixed in ReOpenLDAP week ago. https://github.com/ReOpen/ReOpenLDAP/commit/6de52172bc0f8309dd00c329452213d51e5573a9 Leonid.
leo@yuriev.ru wrote: > Fixed in ReOpenLDAP week ago. > https://github.com/ReOpen/ReOpenLDAP/commit/6de52172bc0f8309dd00c329452213d51e5573a9 Leo: if you intend for this patch to be adopted here, please attach the IPR notice as documented http://www.openldap.org/devel/contributing.html Note that, as written, your patch is unacceptable as it violates the "one functional change per patch" condition. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Wed, Mar 18, 2015 at 05:06:33AM +0000, ryan@nardis.ca wrote: >I get the following crash on master and RE24. not every time, but most times. > >Program received signal SIGSEGV, Segmentation fault. >[Switching to Thread 0x7fffe6ffe700 (LWP 25923)] >0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at >syncprov.c:2129 >2129 if ( m2->mi_op->o_threadctx == op->o_threadctx ) { Same testcase, different crash: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffecde1700 (LWP 1747)] 0x000000000051ae5e in syncprov_op_cleanup (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at syncprov.c:1418 1418 mt->mt_mods = mt->mt_mods->mi_next; (gdb) bt #0 0x000000000051ae5e in syncprov_op_cleanup (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at syncprov.c:1418 #1 0x00000000004417f3 in slap_cleanup_play (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:567 #2 0x0000000000441fac in send_ldap_response (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:759 #3 0x0000000000442793 in slap_send_ldap_result (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:886 #4 0x00000000004e472a in mdb_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:672 #5 0x00000000004baadd in overlay_op_walk (op=0x7fffe0000aa0, rs=0x7fffecde0ac0, which=op_modify, oi=0x8aae60, on=0x0) at backover.c:696 #6 0x00000000004bad01 in over_op_func (op=0x7fffe0000aa0, rs=0x7fffecde0ac0, which=op_modify) at backover.c:749 #7 0x00000000004bae35 in over_op_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at backover.c:788 #8 0x000000000044bacb in fe_op_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:303 #9 0x000000000044b37e in do_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:177 #10 0x000000000042bdf3 in connection_operation (ctx=0x7fffecde0bf0, arg_v=0x7fffe0000aa0) at connection.c:1134 #11 0x000000000042c3a3 in connection_read_thread (ctx=0x7fffecde0bf0, argv=0xb) at connection.c:1280 #12 0x000000000053772e in ldap_int_thread_pool_wrapper (xpool=0x883b40) at tpool.c:958 #13 0x00007ffff77ad0a4 in start_thread (arg=0x7fffecde1700) at pthread_create.c:309 #14 0x00007ffff74e204d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Looks like at this point mt either points to garbage, or is itself garbage. Reverting 8eb9aa7d resolves both crashes.
Howard, I don't submit this path or so on. But you can see the bugs and fix it in openldap. Leonid 18.03.2015 22:17 пользователь "Howard Chu" <hyc@symas.com> написал: > leo@yuriev.ru wrote: > >> Fixed in ReOpenLDAP week ago. >> https://github.com/ReOpen/ReOpenLDAP/commit/ >> 6de52172bc0f8309dd00c329452213d51e5573a9 >> > > Leo: if you intend for this patch to be adopted here, please attach the > IPR notice as documented > > http://www.openldap.org/devel/contributing.html > > Note that, as written, your patch is unacceptable as it violates the "one > functional change per patch" condition. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ >
changed notes changed state Open to Release moved from Incoming to Software Bugs
I had saw a queer in the CHANGES of 2.4 branch, and also comment of aecec6a75. Exactly: Fixed slapo-syncprov deadlock when autogroup is in use (ITS#8063,ITS#8081) But really, this (ITS#8081) is NOT related to autogroup (ITS#8063). This bug was introduced by 7561998f7 (ITS#6335, Quanah Gibson-Mount <quanah@openldap.org>, 2009-10-30). Yes, after ITS#8063 this bug could be seen as SIGSEGV. But before ITS#8063 the syncprov could `re-order` notifications of changes that are visible by a remote syncrepl. Under a highload (our use case) this makes possible to 'lost' a some changes by replication, and then spread this error to all nodes of ldap-cluster (I spent a lot of time to dig this). So, this is a INDEPENDENT (critical) bug in syncprov (not in autogroup), which is also present in 2.4.40 release and so on.
--On Saturday, April 04, 2015 1:03 AM +0300 Leonid Yuriev <leo@yuriev.ru> wrote: > I had saw a queer in the CHANGES of 2.4 branch, and also comment of > aecec6a75. > > Exactly: Fixed slapo-syncprov deadlock when autogroup is in use > (ITS#8063,ITS#8081) > > But really, this (ITS#8081) is NOT related to autogroup (ITS#8063). > This bug was introduced by 7561998f7 (ITS#6335, Quanah Gibson-Mount > <quanah@openldap.org>, 2009-10-30). ITS#6335 was introduced in OpenLDAP 2.4.20. So wouldn't this then affect all releases since 2.4.20? Just so I can note that. ;) Also 6335 was not introduced by me. I simply sync'd it from master to RE24. ;) commit 739f8d075394ee3e85c3f2de7aa4031b36f54c8f Author: Rein Tollevik <rein@openldap.org> Date: Fri Oct 16 17:27:18 2009 +0000 ITS#6335 Don't reuse a modtarget someone is about to remove --Quanah -- Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
fixed in master fixed in RE25 fixed in RE24
changed notes changed state Release to Closed