Issue 6758 - Cleaning up SlapReply usage
Summary: Cleaning up SlapReply usage
Status: RESOLVED PARTIAL
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-30 22:09 UTC by Hallvard Furuseth
Modified: 2020-03-20 14:10 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Hallvard Furuseth 2010-12-30 22:09:58 UTC
Full_Name: Hallvard B Furuseth
Version: 2.4.23, HEAD
OS: Linux x86_64
URL: 
Submission from: (NULL) (129.240.6.233)
Submitted by: hallvard


Slapd and its modules put varous info in SlapReply and send it.
They do not always clean it up afterwards, or not correctly.  Then
they sometimes reuse the SlapReply without reinitializing, even
though most of its fields are invalid at that point.  (In practice
they are usually mostly zeroed by then, so slapd works anyway.)

This is a dup of the misnamed ITS#5340, so the CHANGES file can
refer to an ITS with a better name for all the upcoming commits.

Also I'll go about it a bit backwards than discussed there:
First reset some flags and avoid reusing SlapReply, then get
more aggressive about obeying the flag settings that remain.
Comment 1 Hallvard Furuseth 2010-12-31 12:49:26 UTC
changed notes
moved from Incoming to Software Bugs
Comment 2 Hallvard Furuseth 2010-12-31 21:35:59 UTC
Mostly done with the easily spotted cases.  Hopefully HEAD's code
stayed valid through my commits.  Need to think some more for the
rest.  The decisions with greater chance of breaking things
remain, like ITS#6763.

Other issues:

* hyc, please have a look at slapd/bconfig.c 1.419.  If I have
  gotten things wrong, that one summarizes my mistakes nicely:-)

* If slap_send_ldap_result() or send_ldap_disconnect() see an entry
  with REP_ENTRY_MUST<RELEASE/BEFREED> and REP_SEARCH/SEARCHREF,
  should release/free the entry?  That helps code which sends error
  without freeing sr_entry.  It breaks if the entry comes from
  another Operation, or if the caller reuses a Search SlapReply for
  a non-Search and trusts rs->sr_un.sru_search to stay untouched.

* Is the code getting clean enough that slap_send_search_entry's
  cleanup code can flush the entry independently of whether
  op->o_tag == LDAP_REQ_SEARCH?

* slapi/slapi_pblock.c 1.81:
	/* TODO: Should REP_ENTRY_MODIFIABLE be set? */
  I've no idea, and I've more pressing things to look at...

Notes on the updates:

* "Reset some SlapReply flags & data" commits vs "avoid SlapReply reuse"
  commits:  The combination might be redundant/overcautious, but tries
  to catch problems from remaining unclean code from both ends.

* result.c, freeing as well as releasing entries as soon as possible:
  More readable, and since the entry isn't needed: The sooner we get
  rid of it, the less chance for an entry leak.

* commits "Use rs_*() to manage SlapReply entries."

  Switching to rs_flush_entry():
  - result.c:slap_send_search_entry(), overlays/pcache.c,
      overlays/translucent.c, slapi/slapi_pblock.c:
    These did not clear the MODIFIABLE flag along with MUSTBEFREED.

  Switching to use rs_ensure_entry_modifiable():
  - syncprov_operational(), overlay collect:
    noop change (except operational should not update the entry)
  - overlay valsort:
    It needlessly required REP_ENTRY_MUSTBEFREED as well as MODIFIABLE.
    But the entry can also be MODIFIABLE and e.g. on the stack.
  - contrib overlays cloak, usn:
    Had entry leaks.

-- 
Hallvard

Comment 3 Howard Chu 2011-01-01 07:16:06 UTC
h.b.furuseth@usit.uio.no wrote:
> Mostly done with the easily spotted cases.  Hopefully HEAD's code
> stayed valid through my commits.  Need to think some more for the
> rest.  The decisions with greater chance of breaking things
> remain, like ITS#6763.
>
> Other issues:
>
> * hyc, please have a look at slapd/bconfig.c 1.419.  If I have
>    gotten things wrong, that one summarizes my mistakes nicely:-)

I think you're over-paranoid. config_build_entry() is only called by internal 
plumbing and the SlapReply here never gets sent to send_ldap_result(). It's 
only passed in because be_add() expects one; the contents are mostly 
irrelevant. (Aside from rs_err)

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 4 Hallvard Furuseth 2011-01-04 18:03:55 UTC
In addition to the issues listed in my last message, this ITS depends
on these:
- ITS#6763: slapd does not clear sr_entry/ctrls on sizelimit/busy
- ITS#6776: Control handling in PassMod exop + ppolicy
- ITS#6760: rwm broken entry handling

====

Howard Chu writes:
>h.b.furuseth@usit.uio.no wrote:
>> * hyc, please have a look at slapd/bconfig.c 1.419.  If I have
>>    gotten things wrong, that one summarizes my mistakes nicely:-)

I meant to say "If I've broken anything, that commit should summarize
my worst mistakes."  At least that worry is out of the way:

> I think you're over-paranoid. config_build_entry() is only called by
> internal plumbing and the SlapReply here never gets sent to
> send_ldap_result(). It's only passed in because be_add() expects one;
> the contents are mostly irrelevant. (Aside from rs_err)

Thanks.  Not exactly what I hoped to hear, but I did ask...  I can
revert it, or since this SlapReply is so uninteresting, maybe allow rs
= NULL in this case so some callers can omit it.

====

Summary of changes:

* Add rs_assert_*(), disabled by default, to check SlapReply state.

* Majority of changes:  Avoid reusing SlapReply: Reinitialize these
  variables at need with rs_reinit() or move a declaration inwards
  (typically into loops).  I think I introduced 2 new SlapReply's,
  in bconfig.c (unnecessarily as Howards says) ad in pcache.

* Use the rs_*() functions in result.c to manage SlapReply entries,
  rather than freeing/releasing by hand and every so often forgetting
  to handle some of the REP_ENTRY_* flags.

  This also makes slapd more aggressive about freeing entries.  E.g.
  result.c _released_ entries at need before sending, so entry locks
  for cached entries would not wait for network traffic.  OTOH it did
  not at that point bother with entries that merely needed _freeing_.

* "Reset dangerous REP_ENTRY_* flags."

  This is so when a module puts e.g. puts Extended Operation data
  in the SlapReply, another module won't try to entry_free/release()
  that non-entry because of some leftover REP_ENTRY_* flags.

  This may have been over-paranoid too:  I seem closer to having
  rs.<sr_type, sr_un> behave like a variant record where rs.sr_type
  actually can be trusted.  Anyway, this paranoia shouldn't hurt.

Others are some minor cleanups:

* Reset some SlapReply flags & data.  Reset data more consistenlty.
  (E.g. rs_entry and flags go together, so don't reset just entry.)

* Clear out or avoid filling in known unwanted info (Don't set sr_err
  = non-success before entering an operation handler, do not set
  sr_text with sr_err = LDAP_SUCCESS, etc.)

* Add an error check or two (was actually done for consistency sr_err
  vs sr_text)

* slap.h: Cast REP_* #defines to slap_mask_t

* No-op changes (remove lint, rearrange/simplify code a bit, etc)

-- 
Hallvard

Comment 5 Hallvard Furuseth 2011-01-19 23:05:59 UTC
I wrote:

> In addition to the issues listed in my last message, this ITS depends
> on these: [ITS#6763,#6776#6760].

Sorry, that was quite wrong.  The changes done so far can stand alone,
I've tried hard for that.  Taking this ITS _further_ depends on those
ITSes.  But those ITSes are safest to tackle after the changes I've done
so far.


I've done this ITS backwards after some IRC miscommunication and by
sitting too close to the code:  First commit a huge patch to clean up
SlapReply, then explain why.  Sorry about that.  I intended to get it
done for 2.4.24, but when Howard balked at this approach I expect I
instead ended up complicating HEAD->RE24 merges.

So to explain better: This is ITS#5340 with a better name.  See
   http://www.openldap.org/its/?findid=5340#followup16 - 18
which actually explains most of the rationale for these changes:

   "I'm a bit nervous about doing more than minimal fixes to this
   until we do something major, because it's so broken.  (...)
   [Howard suggested] code which puts something into a SlapReply
   should ensure it's taken out again.  (...)  I think that's not
   the right fix because it's the most fragile solution:
   Practically every module which depends on it can break if some
   other module gets it wrong.
   OTOH, modules can't just reset SlapReply since some of it is
   input values to op_*() (...)"

SlapReply is so broken and confusing that for 3 years I've not dared to
do the obvious fixes to ITS#5340. Early on I kept prompting others who
understood slapd better than me to take on #5430, but nothing happened.

So now I've "done something major": Cleaned up SlapReply enough that
it's no longer an act of bravery to take on the original issues.  Most
of the changes are a preliminary cleanup for going for the original
ITS#5340 fixes (obeying the flags).
elsewhere:

By the original design, it's not a bug for an op_*() function to expect
an incoming SlapReply to be clean, nor to output a "dirty" SlapReply, so
it' has got to be wrong to reuse a SlapReply without cleaning it up.  So
I've taken a simpler and more robust approach this time, adding bigger
but more trivial changes: In addition to cleaning up some flags, I've
killed most cases of SlapReply reuse, which also isolates what damage
one module can do to another through SlapReply.  Even if they are
third-party overlays where the first is broken because it accurately
copied what OpenLDAP's overlays were doing:(

I've also mostly fixed rs.sr_type.  That one was almost as broken,
somewhat in the spirit of this awful and confusing idea in result.c:
	/* FIXME: I think rs->sr_type should be explicitly set to
	 * REP_SEARCH here. That's what it was when we entered this
	 * function. send_ldap_error may have changed it, but we
	 * should set it back so that the cleanup functions know
	 * what they're doing.
	 */
No.  If something set rs.sr_type = REP_INTERMEDIATE, result.c should
not tell its caller that OID rs.sr_rspoid is an entry, rs.sr_entry.


As for user-visible issues the cleanup caught: Half of my December and
January ITSes came from this effort. Some minor, some bigger.  Generally
issues somewhat out of scope of what I was doing in this ITS.

Howard asked if all these changes fixed any crashes.  Thinking it over,
I've had few crashes because of this mess, and I have had a hard time
trying to remember which ones.  I've had many crashes in the truckload
of asserts I added in my private code, which pointed out these ITSes and
how to clean up.  I could not have found the ITSes before the cleanup,
since the asserts then brought down slapd in an instant.

Also I've broken slapd a number of times by attempting "obvious" fixes
before cleaning up, because slapd is more creative than me about reusing
SlapReply strangely.  Like pcache.c which reused a Search SlapReply with
an entry in it for be_modify() and expected the entry to still be there
after the modify:-(  Looks like a good candidate for trying to construct
a pre-cleanup crash, if I ever find time to try: Find a way to feed that
SlapReply to something which reuses it for a Search or an Intermediate
Response, thus overwriting the entry sitting in the Modify SlapReply.

That kind if thing is why I'm much more eager to get the initial cleanup
into RE24 than to get the final entry leak fixes etc into RE24.  After
the cleanup, there should only be an entry in there if sr_type is
REP_SEARCH/REP_SEARCHREF, not also REP_RESULT/REP_GLUE_RESULT as before.

ITSes:
ITS#6760 slapo-rwm broken entry handling
ITS#6752 slapo-dynlist issues       [+ related non-SlapReply issues]
ITS#6746, #6763 entry/control leaks [+ related non-SlapReply issues]
My commits using rs_flush_entry() & co have fixed some other entry leaks
or potential leaks.  Some modules - in particular BDB - catch potential
leaks of their entries.  Others do not, giving us memory leaks.
ITS#6731, #6748 used scrambled sr_text on an exited function's stack.
ITS#6774, ITS#6785, some of my back-config commits:  sr_text/sr_matched
sent with sr_err=LDAP_SUCCESS, or related to another error than sr_err.

Browsing 2.4.24 patches, an earlier cleanup would have caught #6633
(slapd-bdb error propagation) - my asserts with test047 catch sr_text
!= NULL when sr_err == 0 - and likely #6629 (freeing controls).

-- 
Regards,
Hallvard

Comment 6 Hallvard Furuseth 2011-01-21 02:53:56 UTC
changed state Open to Test
Comment 7 Hallvard Furuseth 2011-01-21 04:42:30 UTC
changed notes
Comment 8 Quanah Gibson-Mount 2011-01-26 15:23:03 UTC
changed notes
changed state Test to Release
Comment 9 Quanah Gibson-Mount 2011-02-14 12:33:37 UTC
changed notes
changed state Release to Closed
Comment 10 Hallvard Furuseth 2011-03-24 03:37:37 UTC
changed notes
changed state Closed to Partial
Comment 11 Hallvard Furuseth 2011-11-03 15:29:15 UTC
changed notes
Comment 12 OpenLDAP project 2014-08-01 21:04:32 UTC
Many fixes in HEAD. The rest depends on ITS#6776 and maybe #6785.
Changes merged into RE24.

Merge notes: As of Jan 21, the slapd/sasl.c and contrib/slapd-modules/usn/
changes are HEAD only.