Full_Name: Hallvard B Furuseth Version: HEAD OS: URL: Submission from: (NULL) (129.240.6.233) Submitted by: hallvard Might make this a tracking issue for Cancel/Abandon problems. And/or copy some to separate ITSes, but they all seem interconnected: Cancel(abandoned operation)-requests are not rejected. Thus slapd sets o_cancel and turns an active Abandon into a Cancel. Presumably that can confuse Cancel/Abandon handlers, like that in Syncprov. Similarly, Abandon(abandoned/cancelled operation) is not ignored. connection_abandon() re-abandons abandoned/cancelled operations too. However Syncprov:RefreshAndPersist abuses op->o_abandon: It sets it to mean "Suppress the response. A copy of this operation will send it." So if Cancel(op with o_abandon!=0) is fixed to respond protocolError "already abandoned", presumably that breaks Cancel(RefreshAndPersist). I'm not touching it with a flagpole - I don't know syncprov. Help? Overlay retcode does something similar - sends a response and then sets o_abandon. Cancel/Abandon can in any case fail by targeting the wrong operation though: A connection can have multiple messages with the same IDs when the response is sent and the client reuses the message ID, before the old operation in slapd can clean up and finish. syncprov_op_abandon() identifies messages by (connid, msgid). Can multiple messages with the same ID break that, or more importantly, break what gets sent/written with syncrepl? Anyway, the current code needs an o_abandon value which means "suppress response". Or maybe "abandoned, except as far as future Abandon and Cancel operations are concerned". Syncprov and Retcode need to handle the various possible orderings of Cancel/Abandon vs Suppress, including when they forward o_abandon from one operation to another. I haven't looked too closely at that code either. ITS#6104 (race condition with cancel operation) also calls for multiple o_abandon values (joining some or all of o_cancel into o_abandon), and must be considered when picking a value. However I suggest to consider just the various possible values first and get that right, and deal with the race condition later. Source and binary compatibility with third-party modules complicates this, if we're trying to support both and old compiled slapd + new compiled module and vice versa. Might try a minimal solution now.
moved from Incoming to Software Bugs
back-ldap:extended.c also does "suppress response, it has been sent", but does it by returning and setting rs->sr_err = SLAPD_ABANDON. Might break assumptions somewhere that SLAPD_ABANDON implies o_abandon was set. And I guess the hack fails if the operation gets cancelled. ======================================================================== I think these are the Operation states related to Cancel and Abandon: op->o_abandon is set for these - could extend to multiple values: A) Operation Abandoned/Cancelled by client. B) Operation implicitly abandoned by client. (Bind or lost connection) C) Operation abandoned by server. (It wants to close the connection) D) Suppress response - a duplicate of the operation will proceed. (syncprov) E) Suppress response - final send_ldap_response() was done. (retcode overlay) rs->sr_err == SLAPD_ABANDON if: F) The backend obeyed o_abandon. (Cancel op, if any, will succeed) G=E) Suppress response - final send_ldap_response() was done. (back-ldap) op->o_cancel packs these states/values: H) The o_abandon is due to a Cancel. I) Cancel operation wants a result, cancelled op must set it and wait. J) Result is available to the Cancel operation. K) Result. (LDAP result code, or SLAP_CANCEL_ACK for success) L) Cancel operation has fetched result, cancelled operation can proceed. States that fit in none of the above, or poorly so: M) Operation must not be waited for, e.g. by Cancel. Operation is itself waiting for others, e.g. cn=config update. N) Operation invisible to Abandon/Cancel/internal abandon. msgID reusable due to result sent to client. Also case D (syncprov)? Fix by removing the op from op->o_conn->c_ops? Or does that just move the problem around? Would need to do something to o_conn to prevent connection_close() from doing connection_destroy(). O) Operation result has been committed, do not abandon. ITS#6059. But o_abandon can be set while trying to commit, unless this flag is set before trying - in which case we can't abandon an operation which is failing to commit, which may be when it's most relevant. Could reset o_abandon, if anyone can keep straight the consequences. Or replace the 'if ( op->o_abandon )' tests with some macro call. Still, interactions with other states could be a problem. About the o_abandon values above: B can be treated like A, I think. C differs in that Cancel/Abandon(operation) should not say "already abandoned" since the client doen't know about the abandon. Could be solved with a vague error message. D-E differ in that o_abandon gets set even though the backends' cancel/abandon handlers were not called. Unsure of the effects of that. D Syncprov duplicating a Persistent Search operation. Handled similar to a server-initiated abandon? Except if the operation cannot be "invisible to Abandon/Cancel" above it must remain possible to Abandon/Cancel it. E Suppress response - response has been sent: Set when exiting slap_send_ldap_result() & co? Handled similar to a server-initiated abandon? At the time slap_send_ldap_result() is called again, the operation may have set up things which need to be cleaned up in the normal way. Yet it has already gone through that function once, doing callbacks etc. Must "final response" code be prepared to be called twice? Beyond that, the main problem would be code which transitions to one state to another, it needs to handle the other cases. -- Hallvard
changed notes
See also ITS#6059, 6103, 6104, (6133), 6157
Can make use of new asyncmeta op code
slapd async op return code, specifically
(In reply to Hallvard Furuseth from comment #0) > Full_Name: Hallvard B Furuseth > Version: HEAD > OS: > URL: > Submission from: (NULL) (129.240.6.233) > Submitted by: hallvard > > > Might make this a tracking issue for Cancel/Abandon problems. > And/or copy some to separate ITSes, but they all seem interconnected: > > > Cancel(abandoned operation)-requests are not rejected. Thus slapd > sets o_cancel and turns an active Abandon into a Cancel. Presumably > that can confuse Cancel/Abandon handlers, like that in Syncprov. > > Similarly, Abandon(abandoned/cancelled operation) is not ignored. > connection_abandon() re-abandons abandoned/cancelled operations too. Both of the above checks are now uncommented / enabled. > However Syncprov:RefreshAndPersist abuses op->o_abandon: It sets it to > mean "Suppress the response. A copy of this operation will send it." > So if Cancel(op with o_abandon!=0) is fixed to respond protocolError > "already abandoned", presumably that breaks Cancel(RefreshAndPersist). > > I'm not touching it with a flagpole - I don't know syncprov. Help? syncprov has been changed to use the new SLAPD_ASYNCOP returncode so it no longer needs to muck with this. > Overlay retcode does something similar - sends a response and then sets > o_abandon. I can't see any reason why this is needed. It's been there since the initial commit but I'm removing it. Any return code != SLAP_CB_CONTINUE already prevents the frontend from sending its own response so there's no reason to set this here. > Cancel/Abandon can in any case fail by targeting the wrong operation > though: A connection can have multiple messages with the same IDs when > the response is sent and the client reuses the message ID, before the > old operation in slapd can clean up and finish. > > syncprov_op_abandon() identifies messages by (connid, msgid). Can > multiple messages with the same ID break that, or more importantly, > break what gets sent/written with syncrepl? Sounds like bad practice in clients, but should still be safe. The messages are queued in order, so only the first matching message will be abandoned. Presumably the first incoming abandon request can only refer to the first queued operation of that ID. > Anyway, the current code needs an o_abandon value which means "suppress > response". Or maybe "abandoned, except as far as future Abandon and > Cancel operations are concerned". Syncprov and Retcode need to handle > the various possible orderings of Cancel/Abandon vs Suppress, including > when they forward o_abandon from one operation to another. I haven't > looked too closely at that code either. Mooted by use of SLAPD_ASYNCOP returncode.
https://git.openldap.org/openldap/openldap/-/merge_requests/365
(In reply to Hallvard Furuseth from comment #2) > back-ldap:extended.c also does "suppress response, it has been sent", > but does it by returning and setting rs->sr_err = SLAPD_ABANDON. Might > break assumptions somewhere that SLAPD_ABANDON implies o_abandon was > set. And I guess the hack fails if the operation gets cancelled. Now fixed to always let the frontend handle responses. But this is still awkward, and will need to be revisited again for ITS#9220. > ======================================================================== > > I think these are the Operation states related to Cancel and Abandon: > > op->o_abandon is set for these - could extend to multiple values: > A) Operation Abandoned/Cancelled by client. > B) Operation implicitly abandoned by client. (Bind or lost connection) > C) Operation abandoned by server. (It wants to close the connection) > D) Suppress response - a duplicate of the operation will proceed. (syncprov) > E) Suppress response - final send_ldap_response() was done. (retcode overlay) > > rs->sr_err == SLAPD_ABANDON if: > F) The backend obeyed o_abandon. (Cancel op, if any, will succeed) > G=E) Suppress response - final send_ldap_response() was done. (back-ldap) D, E, and G will no longer occur. > op->o_cancel packs these states/values: > H) The o_abandon is due to a Cancel. > I) Cancel operation wants a result, cancelled op must set it and wait. > J) Result is available to the Cancel operation. > K) Result. (LDAP result code, or SLAP_CANCEL_ACK for success) > L) Cancel operation has fetched result, cancelled operation can proceed. > > States that fit in none of the above, or poorly so: > > M) Operation must not be waited for, e.g. by Cancel. > Operation is itself waiting for others, e.g. cn=config update. > > N) Operation invisible to Abandon/Cancel/internal abandon. > msgID reusable due to result sent to client. Also case D (syncprov)? > > Fix by removing the op from op->o_conn->c_ops? Or does that just > move the problem around? Would need to do something to o_conn to > prevent connection_close() from doing connection_destroy(). > > O) Operation result has been committed, do not abandon. ITS#6059. > > But o_abandon can be set while trying to commit, unless this flag is > set before trying - in which case we can't abandon an operation which > is failing to commit, which may be when it's most relevant. > > Could reset o_abandon, if anyone can keep straight the consequences. > Or replace the 'if ( op->o_abandon )' tests with some macro call. > Still, interactions with other states could be a problem. > > About the o_abandon values above: > > B can be treated like A, I think. > C differs in that Cancel/Abandon(operation) should not say "already > abandoned" since the client doen't know about the abandon. > Could be solved with a vague error message. > > D-E differ in that o_abandon gets set even though the backends' > cancel/abandon handlers were not called. Unsure of the effects of that. > > D Syncprov duplicating a Persistent Search operation. > Handled similar to a server-initiated abandon? Except if the > operation cannot be "invisible to Abandon/Cancel" above it must remain > possible to Abandon/Cancel it. > > E Suppress response - response has been sent: > Set when exiting slap_send_ldap_result() & co? > > Handled similar to a server-initiated abandon? > At the time slap_send_ldap_result() is called again, the operation > may have set up things which need to be cleaned up in the normal way. > Yet it has already gone through that function once, doing callbacks > etc. Must "final response" code be prepared to be called twice? > > Beyond that, the main problem would be code which transitions to one > state to another, it needs to handle the other cases. > > -- > Hallvard
Commits: • d3bd4aa7 by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 don't allow redundant abandon/cancel ops • 5a61175d by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 add lock flag to connection_op_finish() • e9e6fd71 by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 don't overwrite rs->sr_err after sending response • 795add7b by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 use SLAPD_NO_REPLY for persistent searches The particular code doesn't matter, any result besides SLAP_CB_CONTINUE always halts overlay/response processing. • 658e526b by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 drop unnecessary use of o_abandon • 66ed15a2 by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 FIx exop handler to ignore SLAPD_ASYNCOPs • a54f9985 by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 let frontend handle all exop responses • 8311c71f by Howard Chu at 2021-08-03T15:19:49+00:00 ITS#6138 fix test timing issue