[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#6104) race condition with cancel operation



Full_Name: Hallvard B Furuseth
Version: HEAD
OS: Linux
URL: 
Submission from: (NULL) (129.240.6.233)
Submitted by: hallvard


slapd/cancel.c sets o_abandon before o_cancel.  Thus it's possible for
the canceled operation to obey o_abandon before o_cancel gets set.
Though I had to insert some sleeps to achieve that.
Either the operation is abandoned and the Cancel operation receives
tooLate, or if the client unbinds/closes the connection fast enough
Cancel will hang:  slapd does not close the connection, and hangs on
shutdown: "slapd shutdown: waiting for 1 operations/tasks to finish".

Since the flags are not mutex-protected (at least not when read), it's
not enough to move the o_cancel setting after o_abandon in the Cancel
thread.  The cancelled thread might still see the o_abandon change
first.  A fix could be to make o_abandon a bitmask which says whether
the abandon is actually a cancel, but the Abandon and Cancel operations
will still need a mutex to coordinate so that Abandon does not reset
a Cancel bitflag.  In any case, it'd be cleaner if an operation which
reacts to o_abandon grabs some mutex before checking o_cancel.


The problem was tested as follows:
- sleep 0.2 sec after Statslog "DEL" and before setting SLAP_CANCEL_REQ.
- log "ABANDONED" when send_ldap_response() abandons the operation.
- Client: A python socket client which sends raw BER, no libldap:
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect(('localhost', 3890))
    s.send(delete("cn=test"))
    time.sleep(0.1)
    s.send(cancel()) # cancel last operation
    #sys.exit()
    time.sleep(0.4)
    s.send(unbind())
-->
    conn=0 fd=9 ACCEPT from IP=127.0.0.1:56945 (IP=127.0.0.1:3890)
    conn=0 op=0 DEL dn="cn=test"
    conn=0 op=1 EXT oid=1.3.6.1.1.8
    conn=0 op=1 CANCEL msg=1
    conn=0 op=0 ABANDONED
    conn=0 op=2 UNBIND
    conn=0 op=1 RESULT oid= err=120 text=
    conn=0 fd=9 closed
    <server closed connection, client exited>
^C slapd

If the client exits after send(cancel()):
    conn=0 fd=9 ACCEPT from IP=127.0.0.1:48826 (IP=127.0.0.1:3890)
    conn=0 op=0 DEL dn="cn=test"
    conn=0 op=1 EXT oid=1.3.6.1.1.8
    conn=0 op=1 CANCEL msg=1
    conn=0 op=2 UNBIND
    conn=0 op=0 ABANDONED
    <not closing connection>
^C slapd
    daemon: shutdown requested and initiated.
    slapd shutdown: waiting for 1 operations/tasks to finish
    <slapd is hanging>
kill -KILL <slapd>

slapd.conf:
    include         servers/slapd/schema/core.schema
    allow           update_anon
    database        ldif
    directory       "."
    suffix          "cn=test"

Patches to slapd:

Index: cancel.c
--- cancel.c	21 Jan 2009 23:40:25 -0000	1.30
+++ cancel.c	11 May 2009 04:42:58 -0000
@@ -92,4 +92,8 @@
 		}
 
+		{
+			struct timeval timeout = { 0, 200000 };
+			select(0, NULL, NULL, NULL, &timeout);
+		}
 		o->o_cancel = SLAP_CANCEL_REQ;
 
Index: delete.c
--- delete.c	21 Jan 2009 23:40:26 -0000	1.144
+++ delete.c	11 May 2009 04:42:58 -0000
@@ -75,4 +75,8 @@
 		op->o_log_prefix, op->o_req_dn.bv_val, 0, 0, 0 );
 
+	{
+		struct timeval timeout = { 0, 200000 };
+		select(0, NULL, NULL, NULL, &timeout);
+	}
 	if( op->o_req_ndn.bv_len == 0 ) {
 		Debug( LDAP_DEBUG_ANY, "%s do_delete: root dse!\n",
Index: result.c
--- result.c	11 May 2009 02:23:51 -0000	1.331
+++ result.c	11 May 2009 04:42:58 -0000
@@ -418,4 +418,6 @@
 	if (( rs->sr_err == SLAPD_ABANDON || op->o_abandon ) && !op->o_cancel ) {
 		rc = SLAPD_ABANDON;
+		Statslog( LDAP_DEBUG_STATS,
+			"%s ABANDONED\n", op->o_log_prefix, 0, 0, 0, 0 );
 		goto clean2;
 	}