OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Software Bugs/6104
Full headers

From: h.b.furuseth@usit.uio.no
Subject: race condition with cancel operation
Compose comment
Download message
State:
0 replies:
10 followups: 1 2 3 4 5 6 7 8 9 10

Major security issue: yes  no

Notes:

Notification:


Date: Mon, 11 May 2009 05:06:20 +0000
From: h.b.furuseth@usit.uio.no
To: openldap-its@OpenLDAP.org
Subject: race condition with cancel operation
Full_Name: Hallvard B Furuseth
Version: HEAD
OS: Linux
URL: 
Submission from: (NULL) (129.240.6.233)
Submitted by: hallvard


slapd/cancel.c sets o_abandon before o_cancel.  Thus it's possible for
the canceled operation to obey o_abandon before o_cancel gets set.
Though I had to insert some sleeps to achieve that.
Either the operation is abandoned and the Cancel operation receives
tooLate, or if the client unbinds/closes the connection fast enough
Cancel will hang:  slapd does not close the connection, and hangs on
shutdown: "slapd shutdown: waiting for 1 operations/tasks to finish".

Since the flags are not mutex-protected (at least not when read), it's
not enough to move the o_cancel setting after o_abandon in the Cancel
thread.  The cancelled thread might still see the o_abandon change
first.  A fix could be to make o_abandon a bitmask which says whether
the abandon is actually a cancel, but the Abandon and Cancel operations
will still need a mutex to coordinate so that Abandon does not reset
a Cancel bitflag.  In any case, it'd be cleaner if an operation which
reacts to o_abandon grabs some mutex before checking o_cancel.


The problem was tested as follows:
- sleep 0.2 sec after Statslog "DEL" and before setting SLAP_CANCEL_REQ.
- log "ABANDONED" when send_ldap_response() abandons the operation.
- Client: A python socket client which sends raw BER, no libldap:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(('localhost', 3890))
    s.send(delete("cn=test"))
    time.sleep(0.1)
    s.send(cancel()) # cancel last operation
    #sys.exit()
    time.sleep(0.4)
    s.send(unbind())
-->
    conn=0 fd=9 ACCEPT from IP=127.0.0.1:56945 (IP=127.0.0.1:3890)
    conn=0 op=0 DEL dn="cn=test"
    conn=0 op=1 EXT oid=1.3.6.1.1.8
    conn=0 op=1 CANCEL msg=1
    conn=0 op=0 ABANDONED
    conn=0 op=2 UNBIND
    conn=0 op=1 RESULT oid= err=120 text=
    conn=0 fd=9 closed
    <server closed connection, client exited>
^C slapd

If the client exits after send(cancel()):
    conn=0 fd=9 ACCEPT from IP=127.0.0.1:48826 (IP=127.0.0.1:3890)
    conn=0 op=0 DEL dn="cn=test"
    conn=0 op=1 EXT oid=1.3.6.1.1.8
    conn=0 op=1 CANCEL msg=1
    conn=0 op=2 UNBIND
    conn=0 op=0 ABANDONED
    <not closing connection>
^C slapd
    daemon: shutdown requested and initiated.
    slapd shutdown: waiting for 1 operations/tasks to finish
    <slapd is hanging>
kill -KILL <slapd>

slapd.conf:
    include         servers/slapd/schema/core.schema
    allow           update_anon
    database        ldif
    directory       "."
    suffix          "cn=test"

Patches to slapd:

Index: cancel.c
--- cancel.c	21 Jan 2009 23:40:25 -0000	1.30
+++ cancel.c	11 May 2009 04:42:58 -0000
@@ -92,4 +92,8 @@
 		}
 
+		{
+			struct timeval timeout = { 0, 200000 };
+			select(0, NULL, NULL, NULL, &timeout);
+		}
 		o->o_cancel = SLAP_CANCEL_REQ;
 
Index: delete.c
--- delete.c	21 Jan 2009 23:40:26 -0000	1.144
+++ delete.c	11 May 2009 04:42:58 -0000
@@ -75,4 +75,8 @@
 		op->o_log_prefix, op->o_req_dn.bv_val, 0, 0, 0 );
 
+	{
+		struct timeval timeout = { 0, 200000 };
+		select(0, NULL, NULL, NULL, &timeout);
+	}
 	if( op->o_req_ndn.bv_len == 0 ) {
 		Debug( LDAP_DEBUG_ANY, "%s do_delete: root dse!\n",
Index: result.c
--- result.c	11 May 2009 02:23:51 -0000	1.331
+++ result.c	11 May 2009 04:42:58 -0000
@@ -418,4 +418,6 @@
 	if (( rs->sr_err == SLAPD_ABANDON || op->o_abandon ) &&
!op->o_cancel ) {
 		rc = SLAPD_ABANDON;
+		Statslog( LDAP_DEBUG_STATS,
+			"%s ABANDONED\n", op->o_log_prefix, 0, 0, 0, 0 );
 		goto clean2;
 	}

Followup 1

Download message
Date: Sun, 10 May 2009 22:44:33 -0700
From: Howard Chu <hyc@symas.com>
To: h.b.furuseth@usit.uio.no
CC: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: HEAD
> OS: Linux
> URL:
> Submission from: (NULL) (129.240.6.233)
> Submitted by: hallvard
>
>
> slapd/cancel.c sets o_abandon before o_cancel.  Thus it's possible for
> the canceled operation to obey o_abandon before o_cancel gets set.

Cancel is just best-effort, why not just set o_cancel before o_abandon?

> Though I had to insert some sleeps to achieve that.
> Either the operation is abandoned and the Cancel operation receives
> tooLate, or if the client unbinds/closes the connection fast enough
> Cancel will hang:  slapd does not close the connection, and hangs on
> shutdown: "slapd shutdown: waiting for 1 operations/tasks to finish".
>
> Since the flags are not mutex-protected (at least not when read), it's
> not enough to move the o_cancel setting after o_abandon in the Cancel
> thread.  The cancelled thread might still see the o_abandon change
> first.  A fix could be to make o_abandon a bitmask which says whether
> the abandon is actually a cancel, but the Abandon and Cancel operations
> will still need a mutex to coordinate so that Abandon does not reset
> a Cancel bitflag.  In any case, it'd be cleaner if an operation which
> reacts to o_abandon grabs some mutex before checking o_cancel.
>
>
> The problem was tested as follows:
> - sleep 0.2 sec after Statslog "DEL" and before setting SLAP_CANCEL_REQ.
> - log "ABANDONED" when send_ldap_response() abandons the operation.
> - Client: A python socket client which sends raw BER, no libldap:
>      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>      s.connect(('localhost', 3890))
>      s.send(delete("cn=test"))
>      time.sleep(0.1)
>      s.send(cancel()) # cancel last operation
>      #sys.exit()
>      time.sleep(0.4)
>      s.send(unbind())
> -->
>      conn=0 fd=9 ACCEPT from IP=127.0.0.1:56945 (IP=127.0.0.1:3890)
>      conn=0 op=0 DEL dn="cn=test"
>      conn=0 op=1 EXT oid=1.3.6.1.1.8
>      conn=0 op=1 CANCEL msg=1
>      conn=0 op=0 ABANDONED
>      conn=0 op=2 UNBIND
>      conn=0 op=1 RESULT oid= err=120 text=
>      conn=0 fd=9 closed
>      <server closed connection, client exited>
> ^C slapd
>
> If the client exits after send(cancel()):
>      conn=0 fd=9 ACCEPT from IP=127.0.0.1:48826 (IP=127.0.0.1:3890)
>      conn=0 op=0 DEL dn="cn=test"
>      conn=0 op=1 EXT oid=1.3.6.1.1.8
>      conn=0 op=1 CANCEL msg=1
>      conn=0 op=2 UNBIND
>      conn=0 op=0 ABANDONED
>      <not closing connection>
> ^C slapd
>      daemon: shutdown requested and initiated.
>      slapd shutdown: waiting for 1 operations/tasks to finish
>      <slapd is hanging>
> kill -KILL<slapd>
>
> slapd.conf:
>      include         servers/slapd/schema/core.schema
>      allow           update_anon
>      database        ldif
>      directory       "."
>      suffix          "cn=test"
>
> Patches to slapd:
>
> Index: cancel.c
> --- cancel.c	21 Jan 2009 23:40:25 -0000	1.30
> +++ cancel.c	11 May 2009 04:42:58 -0000
> @@ -92,4 +92,8 @@
>   		}
>
> +		{
> +			struct timeval timeout = { 0, 200000 };
> +			select(0, NULL, NULL, NULL,&timeout);
> +		}
>   		o->o_cancel = SLAP_CANCEL_REQ;
>
> Index: delete.c
> --- delete.c	21 Jan 2009 23:40:26 -0000	1.144
> +++ delete.c	11 May 2009 04:42:58 -0000
> @@ -75,4 +75,8 @@
>   		op->o_log_prefix, op->o_req_dn.bv_val, 0, 0, 0 );
>
> +	{
> +		struct timeval timeout = { 0, 200000 };
> +		select(0, NULL, NULL, NULL,&timeout);
> +	}
>   	if( op->o_req_ndn.bv_len == 0 ) {
>   		Debug( LDAP_DEBUG_ANY, "%s do_delete: root dse!\n",
> Index: result.c
> --- result.c	11 May 2009 02:23:51 -0000	1.331
> +++ result.c	11 May 2009 04:42:58 -0000
> @@ -418,4 +418,6 @@
>   	if (( rs->sr_err == SLAPD_ABANDON || op->o_abandon )&& 
!op->o_cancel ) {
>   		rc = SLAPD_ABANDON;
> +		Statslog( LDAP_DEBUG_STATS,
> +			"%s ABANDONED\n", op->o_log_prefix, 0, 0, 0, 0 );
>   		goto clean2;
>   	}
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 2

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Mon, 11 May 2009 15:57:21 +0200
To: hyc@symas.com
Cc: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
hyc@symas.com writes:
>> slapd/cancel.c sets o_abandon before o_cancel.  Thus it's possible for
>> the canceled operation to obey o_abandon before o_cancel gets set.
> 
> Cancel is just best-effort, why not just set o_cancel before o_abandon?

Memory coherence.  Without mutex protection, the cancelled thread might
still see the o_abandon change before the o_cancel change.  And thus not
send a response, nor update info for the Cancel operation to react to.

-- 
Hallvard



Followup 3

Download message
Date: Mon, 11 May 2009 15:14:17 -0700
From: Howard Chu <hyc@symas.com>
To: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
CC: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
Hallvard B Furuseth wrote:
> hyc@symas.com writes:
>>> slapd/cancel.c sets o_abandon before o_cancel.  Thus it's possible
for
>>> the canceled operation to obey o_abandon before o_cancel gets set.
>>
>> Cancel is just best-effort, why not just set o_cancel before o_abandon?
>
> Memory coherence.  Without mutex protection, the cancelled thread might
> still see the o_abandon change before the o_cancel change.  And thus not
> send a response, nor update info for the Cancel operation to react to.
>
I'd prefer to condense it all into a bitfield. We already ensure that the 
c_mutex is held when setting these flags. It would be a lot of overhead to 
grab the mutex just to read the flag, and would require additional analysis to 
make sure the new locking behavior doesn't introduce new deadlocks.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 4

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Fri, 15 May 2009 17:21:05 +0200
To: Howard Chu <hyc@symas.com>
Cc: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
Howard Chu writes:
> I'd prefer to condense it all into a bitfield.

Right... keep both o_abandon and o_cancel (maybe renamed to o_dummy) for
RE24 though, otherwise we have a silent binary incompatible change with
already-compiled slapd modules.  The change will in any case break
modules that play with o_cancel though.

> We already ensure that
> the c_mutex is held when setting these flags.

Not yet, but c_mutex it is:-)  Must protect the following code.
Or rather, bitflag code which will replace it:

- o_cancel access in cancel.c, connection.c:connection_operation(),
  back-relay/op.c.

- o_abandon setting in overlays/syncprov.c:syncprov_op_abandon(), and
  in overlays/retcode.c where hopefully the first case should have been
	if ( op2.o_abandon ) op->o_abandon = 1;
  so that we can never reset o_abandon once it is set.

> It would be a lot of
> overhead to grab the mutex just to read the flag,
> and would require additional analysis to make sure the new locking
> behavior doesn't introduce new deadlocks.

Checking o_abandon!=0 works unmutexed.  (Well, technically any
such unmutexed access is wrong, but in practice we should be OK.)
As long as we:
- never reset o_abandon like retcode above can do, and
- never unmutexed assume consistency betweeen abandon-related variables
  set by different threads.

So the equivalent of this should be OK:
  if ( op->o_abandon ) rs->sr_err = SLAPD_ABANDON;
  ...  
  if ( rs->sr_err == SLAPD_ABANDON ) {
     lock c_mutex;
     more careful o_abandon checks;
     unlock c_mutex;
  }

  
While I'm looking at this:

The ldap_pvt_thread_yield() in cancel.c is really ugly.  There is no
cond to attach to it to without breaking binary compatibility, but even
a even a global mutex+cond might be better, with threads broadcasting on
it when it might be relevant.

It's still ugly though.  Cancelling an expensive operation ought to be a
friendly operation, but instead it is unfriendly in that yet another
thread is occupied, solely to wait for that operation.  What would make
sense (for RE25) would be to redo Cancel so that the cancelled thread is
responsible for sending the Cancel response or resubmitting an operation
which will.

Also, the be_cancel implementation in at least back-relay looks wrong.
Cancel isn't dispatched on DN, and back-relay with no 'relay' directive
configured cannot know where to dispatch.  I haven't checked how it
works, but it seems a call to be_cancel means the Cancel operation
_might_ belong to this backend.  But it might not, which means
back-relay shouldn't set op->o_cancel in this case.  What it should do
I don't know though.  And I don't know if syncprov is doing any better.

-- 
Hallvard



Followup 5

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Sat, 23 May 2009 00:08:38 +0200
To: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
Well, this turned out a lot of problems.  Currently blocked by 
ITS#6138 Bad Cancel/Abandon/"internal abandon"/Syncprov interactions.

back-relay/op.c done (ITS#6133). syncprov/retcode issues may fit better
under ITS#6138.

Bitflags should be unnecessary as such, just need a set of values.
Though it wouldn't hurt if they were chosen so one could use bit
operations to reduce checks for "o_abandon == A or B".  "Suppress
response" value mentioned in ITS#6138 also needed, maybe that must be a
bitflag when combined with Cancel stuff.

Some tidbits uglifying a fix:
* op->o_conn->c_mutex is locked when be->be_abandon() is called,
  but not when be->be_cancel() is called.
* send_ldap_ber() needs the lock while it holds &conn->c_write1_mutex,
  but code elsewhere indicates the reverse lock order.

Question:
  syncprov_op_abandon() holds &si->si_ops_mutex when setting
  so->s_op->o_abandon.  Does it need that?  The function needs to
  grab op->o_conn->c_mutex when called as Cancel, but must not do
  that while holding &si->si_ops_mutex since Abandon grabs the
  locks in the opposite order.

-- 
Hallvard



Followup 6

Download message
Date: Fri, 22 May 2009 15:13:13 -0700
From: Howard Chu <hyc@symas.com>
To: h.b.furuseth@usit.uio.no
CC: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
h.b.furuseth@usit.uio.no wrote:
> Well, this turned out a lot of problems.  Currently blocked by
> ITS#6138 Bad Cancel/Abandon/"internal abandon"/Syncprov interactions.
>
> back-relay/op.c done (ITS#6133). syncprov/retcode issues may fit better
> under ITS#6138.
>
> Bitflags should be unnecessary as such, just need a set of values.
> Though it wouldn't hurt if they were chosen so one could use bit
> operations to reduce checks for "o_abandon == A or B".  "Suppress
> response" value mentioned in ITS#6138 also needed, maybe that must be a
> bitflag when combined with Cancel stuff.

Yes, it still seems some bitflags will be more appropriate.

> Some tidbits uglifying a fix:
> * op->o_conn->c_mutex is locked when be->be_abandon() is called,
>    but not when be->be_cancel() is called.
> * send_ldap_ber() needs the lock while it
holds&conn->c_write1_mutex,
>    but code elsewhere indicates the reverse lock order.
>
> Question:
>    syncprov_op_abandon() holds&si->si_ops_mutex when setting
>    so->s_op->o_abandon.  Does it need that?  The function needs to
>    grab op->o_conn->c_mutex when called as Cancel, but must not do
>    that while holding&si->si_ops_mutex since Abandon grabs the
>    locks in the opposite order.

syncprov_op_abandon() can remove ops from the list, so yes, it must hold 
si_ops_mutex.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 7

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Sat, 23 May 2009 20:13:40 +0200
To: hyc@symas.com
Cc: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
hyc@symas.com writes:
>> Bitflags should be unnecessary as such, just need a set of values.
>> Though it wouldn't hurt if they were chosen so one could use bit
>> operations to reduce checks for "o_abandon == A or B".  "Suppress
>> response" value mentioned in ITS#6138 also needed, maybe that must be a
>> bitflag when combined with Cancel stuff.
> 
> Yes, it still seems some bitflags will be more appropriate.

As long as we decide which flag combinations are (not) valid.  Problem
with "unrestricted" bitflags compared to a list of values is that they
allow and thus require support for flag combinations we have no real
need to support.

>> Question:
>>    syncprov_op_abandon() holds&si->si_ops_mutex when setting
>>    so->s_op->o_abandon.  Does it need that?  The function needs
to
>>    grab op->o_conn->c_mutex when called as Cancel, but must not
do
>>    that while holding&si->si_ops_mutex since Abandon grabs the
>>    locks in the opposite order.
> 
> syncprov_op_abandon() can remove ops from the list, so yes, it must
> hold si_ops_mutex.

Right, but does it need si_ops_mutex while setting o_abandon?  Cancel
could remove the op, unlock si_ops_mutex, lock c_mutex, update
o_abandon.  Alternative: Lock c_mutex first (that must be safe since
Abandon is doing it anyway), then lock si_ops_mutex.  So it's not
a critical difference, just that a lock must be held a bit longer.

-- 
Hallvard



Followup 8

Download message
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Fri, 5 Jun 2009 23:53:07 +0200
To: openldap-its@openldap.org
Subject: Re: (ITS#6104) race condition with cancel operation
Hrmph, the placement of the 'so->s_op->o_abandon = 1;' which I wanted
to move out of the mutex lock derives from syncprov.c rev 1.20 commented
"Fix abandon sequencing":-(

-- 
Hallvard



Followup 9

Download message
From: Ddh3eaa <ddh3eaa@gmail.com>
Date: Mon, 29 Dec 2014 05:11:15 -0500
Subject: Re: (ITS#6104) race condition with cancel operation
Cc: "h.b.furuseth@usit.uio.no" <h.b.furuseth@usit.uio.no>
To: "openldap-its@OpenLDAP.org" <openldap-its@OpenLDAP.org>

Sent from my iPhone



Followup 10

Download message
From: michael garofalo <michaelgarofalo404@yahoo.com>
Date: Tue, 17 Jan 2017 21:57:05 -0500
Subject: Re: (ITS#6104) race condition with cancel operation
Cc: h.b.furuseth@usit.uio.no
To: openldap-its@OpenLDAP.org


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org