[Date Prev][Date Next] [Chronological] [Thread] [Top]
Re: (ITS#6758) Cleaning up SlapReply usage

To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#6758) Cleaning up SlapReply usage
From: h.b.furuseth@usit.uio.no
Date: Wed, 19 Jan 2011 23:06:29 GMT
Auto-submitted: auto-generated (OpenLDAP-ITS)
I wrote:

> In addition to the issues listed in my last message, this ITS depends
> on these: [ITS#6763,#6776#6760].

Sorry, that was quite wrong.  The changes done so far can stand alone,
I've tried hard for that.  Taking this ITS _further_ depends on those
ITSes.  But those ITSes are safest to tackle after the changes I've done
so far.


I've done this ITS backwards after some IRC miscommunication and by
sitting too close to the code:  First commit a huge patch to clean up
SlapReply, then explain why.  Sorry about that.  I intended to get it
done for 2.4.24, but when Howard balked at this approach I expect I
instead ended up complicating HEAD->RE24 merges.

So to explain better: This is ITS#5340 with a better name.  See
   http://www.openldap.org/its/?findid=5340#followup16 - 18
which actually explains most of the rationale for these changes:

   "I'm a bit nervous about doing more than minimal fixes to this
   until we do something major, because it's so broken.  (...)
   [Howard suggested] code which puts something into a SlapReply
   should ensure it's taken out again.  (...)  I think that's not
   the right fix because it's the most fragile solution:
   Practically every module which depends on it can break if some
   other module gets it wrong.
   OTOH, modules can't just reset SlapReply since some of it is
   input values to op_*() (...)"

SlapReply is so broken and confusing that for 3 years I've not dared to
do the obvious fixes to ITS#5340. Early on I kept prompting others who
understood slapd better than me to take on #5430, but nothing happened.

So now I've "done something major": Cleaned up SlapReply enough that
it's no longer an act of bravery to take on the original issues.  Most
of the changes are a preliminary cleanup for going for the original
ITS#5340 fixes (obeying the flags).
elsewhere:

By the original design, it's not a bug for an op_*() function to expect
an incoming SlapReply to be clean, nor to output a "dirty" SlapReply, so
it' has got to be wrong to reuse a SlapReply without cleaning it up.  So
I've taken a simpler and more robust approach this time, adding bigger
but more trivial changes: In addition to cleaning up some flags, I've
killed most cases of SlapReply reuse, which also isolates what damage
one module can do to another through SlapReply.  Even if they are
third-party overlays where the first is broken because it accurately
copied what OpenLDAP's overlays were doing:(

I've also mostly fixed rs.sr_type.  That one was almost as broken,
somewhat in the spirit of this awful and confusing idea in result.c:
	/* FIXME: I think rs->sr_type should be explicitly set to
	 * REP_SEARCH here. That's what it was when we entered this
	 * function. send_ldap_error may have changed it, but we
	 * should set it back so that the cleanup functions know
	 * what they're doing.
	 */
No.  If something set rs.sr_type = REP_INTERMEDIATE, result.c should
not tell its caller that OID rs.sr_rspoid is an entry, rs.sr_entry.


As for user-visible issues the cleanup caught: Half of my December and
January ITSes came from this effort. Some minor, some bigger.  Generally
issues somewhat out of scope of what I was doing in this ITS.

Howard asked if all these changes fixed any crashes.  Thinking it over,
I've had few crashes because of this mess, and I have had a hard time
trying to remember which ones.  I've had many crashes in the truckload
of asserts I added in my private code, which pointed out these ITSes and
how to clean up.  I could not have found the ITSes before the cleanup,
since the asserts then brought down slapd in an instant.

Also I've broken slapd a number of times by attempting "obvious" fixes
before cleaning up, because slapd is more creative than me about reusing
SlapReply strangely.  Like pcache.c which reused a Search SlapReply with
an entry in it for be_modify() and expected the entry to still be there
after the modify:-(  Looks like a good candidate for trying to construct
a pre-cleanup crash, if I ever find time to try: Find a way to feed that
SlapReply to something which reuses it for a Search or an Intermediate
Response, thus overwriting the entry sitting in the Modify SlapReply.

That kind if thing is why I'm much more eager to get the initial cleanup
into RE24 than to get the final entry leak fixes etc into RE24.  After
the cleanup, there should only be an entry in there if sr_type is
REP_SEARCH/REP_SEARCHREF, not also REP_RESULT/REP_GLUE_RESULT as before.

ITSes:
ITS#6760 slapo-rwm broken entry handling
ITS#6752 slapo-dynlist issues       [+ related non-SlapReply issues]
ITS#6746, #6763 entry/control leaks [+ related non-SlapReply issues]
My commits using rs_flush_entry() & co have fixed some other entry leaks
or potential leaks.  Some modules - in particular BDB - catch potential
leaks of their entries.  Others do not, giving us memory leaks.
ITS#6731, #6748 used scrambled sr_text on an exited function's stack.
ITS#6774, ITS#6785, some of my back-config commits:  sr_text/sr_matched
sent with sr_err=LDAP_SUCCESS, or related to another error than sr_err.

Browsing 2.4.24 patches, an earlier cleanup would have caught #6633
(slapd-bdb error propagation) - my asserts with test047 catch sr_text
!= NULL when sr_err == 0 - and likely #6629 (freeing controls).

-- 
Regards,
Hallvard
Prev by Date: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
Next by Date: Re: (ITS#6798) Mutex starvation on two-level referral for SASL connection
Index(es):
- Chronological
- Thread