8081 – syncprov crash in syncprov_op_mod

Issue 8081 - syncprov crash in syncprov_op_mod

Summary: syncprov crash in syncprov_op_mod

Status:	VERIFIED FIXED

Alias:	None

Product:	OpenLDAP
Classification:	Unclassified
Component:	slapd (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-03-18 05:06 UTC by Ryan Tandy
Modified:	2015-07-02 17:50 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Ryan Tandy 2015-03-18 05:06:33 UTC

Full_Name: Ryan Tandy
Version: master (05ea787), RE24 (082e192)
OS: Debian unstable
URL: ftp://ftp.openldap.org/incoming/20150317_rtandy_syncprovsegv.tgz
Submission from: (NULL) (24.68.37.4)


hi,

./configure CFLAGS="-g -O0" --disable-bdb --disable-hdb --enable-syncprov

reproducer: ftp://ftp.openldap.org/incoming/20150317_rtandy_syncprovsegv.tgz
note this is _not_ delta-syncrepl.

./prepare
./runslapd (backgrounds a consumer, runs e e producer in gdb in the foreground)
in another terminal, once the consumer has connected (5 seconds retry):
./modify

I get the following crash on master and RE24. not every time, but most times.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe6ffe700 (LWP 25923)]
0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at
syncprov.c:2129
2129						if ( m2->mi_op->o_threadctx == op->o_threadctx ) {
(gdb) bt
#0  0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0)
at syncprov.c:2129
#1  0x00000000004b6e91 in overlay_op_walk (op=0x7fffd41024a0, rs=0x7fffe6ffdae0,
which=op_modify, oi=0x895a30,
    on=0x895c10) at backover.c:661
#2  0x00000000004b715f in over_op_func (op=0x7fffd41024a0, rs=0x7fffe6ffdae0,
which=op_modify) at backover.c:730
#3  0x00000000004b7293 in over_op_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0)
at backover.c:769
#4  0x00000000004494c1 in fe_op_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at
modify.c:303
#5  0x0000000000448d94 in do_modify (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at
modify.c:177
#6  0x0000000000429a9f in connection_operation (ctx=0x7fffe6ffdc10,
arg_v=0x7fffd41024a0) at connection.c:1155
#7  0x000000000042a039 in connection_read_thread (ctx=0x7fffe6ffdc10, argv=0xb)
at connection.c:1291
#8  0x000000000052b511 in ldap_int_thread_pool_wrapper (xpool=0x870270) at
tpool.c:696
#9  0x00007ffff77ad0a4 in start_thread (arg=0x7fffe6ffe700) at
pthread_create.c:309
#10 0x00007ffff74e204d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) p m2->mi_op
$2 = (Operation *) 0xc8

^ that looks a bit bogus. haven't started to investigate or bisect just yet.
will look more tomorrow.

apologies in advance if I've duplicated an existing ITS by accident :)

Comment 1 Leonid Yuriev 2015-03-18 07:49:15 UTC

Fixed in ReOpenLDAP week ago.
https://github.com/ReOpen/ReOpenLDAP/commit/6de52172bc0f8309dd00c329452213d51e5573a9

Leonid.

Comment 2 Howard Chu 2015-03-18 19:17:26 UTC

leo@yuriev.ru wrote:
> Fixed in ReOpenLDAP week ago.
> https://github.com/ReOpen/ReOpenLDAP/commit/6de52172bc0f8309dd00c329452213d51e5573a9

Leo: if you intend for this patch to be adopted here, please attach the IPR notice as documented

http://www.openldap.org/devel/contributing.html

Note that, as written, your patch is unacceptable as it violates the "one functional change per patch" condition.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 Ryan Tandy 2015-03-19 04:09:26 UTC

On Wed, Mar 18, 2015 at 05:06:33AM +0000, ryan@nardis.ca wrote:
>I get the following crash on master and RE24. not every time, but most times.
>
>Program received signal SIGSEGV, Segmentation fault.
>[Switching to Thread 0x7fffe6ffe700 (LWP 25923)]
>0x0000000000511d45 in syncprov_op_mod (op=0x7fffd41024a0, rs=0x7fffe6ffdae0) at
>syncprov.c:2129
>2129						if ( m2->mi_op->o_threadctx == op->o_threadctx ) {

Same testcase, different crash:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecde1700 (LWP 1747)]
0x000000000051ae5e in syncprov_op_cleanup (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at syncprov.c:1418
1418			mt->mt_mods = mt->mt_mods->mi_next;
(gdb) bt
#0  0x000000000051ae5e in syncprov_op_cleanup (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at syncprov.c:1418
#1  0x00000000004417f3 in slap_cleanup_play (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:567
#2  0x0000000000441fac in send_ldap_response (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:759
#3  0x0000000000442793 in slap_send_ldap_result (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at result.c:886
#4  0x00000000004e472a in mdb_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:672
#5  0x00000000004baadd in overlay_op_walk (op=0x7fffe0000aa0, rs=0x7fffecde0ac0, which=op_modify, oi=0x8aae60, on=0x0) at backover.c:696
#6  0x00000000004bad01 in over_op_func (op=0x7fffe0000aa0, rs=0x7fffecde0ac0, which=op_modify) at backover.c:749
#7  0x00000000004bae35 in over_op_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at backover.c:788
#8  0x000000000044bacb in fe_op_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:303
#9  0x000000000044b37e in do_modify (op=0x7fffe0000aa0, rs=0x7fffecde0ac0) at modify.c:177
#10 0x000000000042bdf3 in connection_operation (ctx=0x7fffecde0bf0, arg_v=0x7fffe0000aa0) at connection.c:1134
#11 0x000000000042c3a3 in connection_read_thread (ctx=0x7fffecde0bf0, argv=0xb) at connection.c:1280
#12 0x000000000053772e in ldap_int_thread_pool_wrapper (xpool=0x883b40) at tpool.c:958
#13 0x00007ffff77ad0a4 in start_thread (arg=0x7fffecde1700) at pthread_create.c:309
#14 0x00007ffff74e204d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Looks like at this point mt either points to garbage, or is itself 
garbage. 

Reverting 8eb9aa7d resolves both crashes.

Comment 4 Leonid Yuriev 2015-03-19 12:05:00 UTC

Howard, I don't submit this path or so on. But you can see the bugs and fix
it in openldap.

Leonid
18.03.2015 22:17 пользователь "Howard Chu" <hyc@symas.com> написал:

> leo@yuriev.ru wrote:
>
>> Fixed in ReOpenLDAP week ago.
>> https://github.com/ReOpen/ReOpenLDAP/commit/
>> 6de52172bc0f8309dd00c329452213d51e5573a9
>>
>
> Leo: if you intend for this patch to be adopted here, please attach the
> IPR notice as documented
>
> http://www.openldap.org/devel/contributing.html
>
> Note that, as written, your patch is unacceptable as it violates the "one
> functional change per patch" condition.
>
> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/
>

Comment 5 Quanah Gibson-Mount 2015-03-23 16:19:53 UTC

changed notes
changed state Open to Release
moved from Incoming to Software Bugs

Comment 6 Leonid Yuriev 2015-04-03 21:03:52 UTC

I had saw a queer in the CHANGES of 2.4 branch, and also comment of 
aecec6a75.

Exactly: Fixed slapo-syncprov deadlock when autogroup is in use 
(ITS#8063,ITS#8081)

But really, this (ITS#8081) is NOT related to autogroup (ITS#8063).
This bug was introduced by 7561998f7 (ITS#6335, Quanah Gibson-Mount 
<quanah@openldap.org>, 2009-10-30).

Yes, after ITS#8063 this bug could be seen as SIGSEGV.
But before ITS#8063 the syncprov could `re-order` notifications of 
changes that are visible by a remote syncrepl.
Under a highload (our use case) this makes possible to 'lost' a some 
changes by replication, and then spread this error to all nodes of 
ldap-cluster (I spent a lot of time to dig this).

So, this is a INDEPENDENT (critical) bug in syncprov (not in autogroup), 
which is also present in 2.4.40 release and so on.

Comment 7 Quanah Gibson-Mount 2015-04-03 21:39:05 UTC

--On Saturday, April 04, 2015 1:03 AM +0300 Leonid Yuriev <leo@yuriev.ru> 
wrote:

> I had saw a queer in the CHANGES of 2.4 branch, and also comment of
> aecec6a75.
>
> Exactly: Fixed slapo-syncprov deadlock when autogroup is in use
> (ITS#8063,ITS#8081)
>
> But really, this (ITS#8081) is NOT related to autogroup (ITS#8063).
> This bug was introduced by 7561998f7 (ITS#6335, Quanah Gibson-Mount
> <quanah@openldap.org>, 2009-10-30).

ITS#6335 was introduced in OpenLDAP 2.4.20.  So wouldn't this then affect 
all releases since 2.4.20?  Just so I can note that. ;)


Also 6335 was not introduced by me.  I simply sync'd it from master to 
RE24. ;)

commit 739f8d075394ee3e85c3f2de7aa4031b36f54c8f
Author: Rein Tollevik <rein@openldap.org>
Date:   Fri Oct 16 17:27:18 2009 +0000

    ITS#6335 Don't reuse a modtarget someone is about to remove

--Quanah



--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration

Comment 8 OpenLDAP project 2015-07-02 17:50:05 UTC

fixed in master
fixed in RE25
fixed in RE24

Comment 9 Quanah Gibson-Mount 2015-07-02 17:50:05 UTC

changed notes
changed state Release to Closed