Full_Name: Hallvard B Furuseth Version: 2.4.23, HEAD OS: Linux x86_64 URL: Submission from: (NULL) (129.240.6.233) Submitted by: hallvard Slapd and its modules put varous info in SlapReply and send it. They do not always clean it up afterwards, or not correctly. Then they sometimes reuse the SlapReply without reinitializing, even though most of its fields are invalid at that point. (In practice they are usually mostly zeroed by then, so slapd works anyway.) This is a dup of the misnamed ITS#5340, so the CHANGES file can refer to an ITS with a better name for all the upcoming commits. Also I'll go about it a bit backwards than discussed there: First reset some flags and avoid reusing SlapReply, then get more aggressive about obeying the flag settings that remain.
changed notes moved from Incoming to Software Bugs
Mostly done with the easily spotted cases. Hopefully HEAD's code stayed valid through my commits. Need to think some more for the rest. The decisions with greater chance of breaking things remain, like ITS#6763. Other issues: * hyc, please have a look at slapd/bconfig.c 1.419. If I have gotten things wrong, that one summarizes my mistakes nicely:-) * If slap_send_ldap_result() or send_ldap_disconnect() see an entry with REP_ENTRY_MUST<RELEASE/BEFREED> and REP_SEARCH/SEARCHREF, should release/free the entry? That helps code which sends error without freeing sr_entry. It breaks if the entry comes from another Operation, or if the caller reuses a Search SlapReply for a non-Search and trusts rs->sr_un.sru_search to stay untouched. * Is the code getting clean enough that slap_send_search_entry's cleanup code can flush the entry independently of whether op->o_tag == LDAP_REQ_SEARCH? * slapi/slapi_pblock.c 1.81: /* TODO: Should REP_ENTRY_MODIFIABLE be set? */ I've no idea, and I've more pressing things to look at... Notes on the updates: * "Reset some SlapReply flags & data" commits vs "avoid SlapReply reuse" commits: The combination might be redundant/overcautious, but tries to catch problems from remaining unclean code from both ends. * result.c, freeing as well as releasing entries as soon as possible: More readable, and since the entry isn't needed: The sooner we get rid of it, the less chance for an entry leak. * commits "Use rs_*() to manage SlapReply entries." Switching to rs_flush_entry(): - result.c:slap_send_search_entry(), overlays/pcache.c, overlays/translucent.c, slapi/slapi_pblock.c: These did not clear the MODIFIABLE flag along with MUSTBEFREED. Switching to use rs_ensure_entry_modifiable(): - syncprov_operational(), overlay collect: noop change (except operational should not update the entry) - overlay valsort: It needlessly required REP_ENTRY_MUSTBEFREED as well as MODIFIABLE. But the entry can also be MODIFIABLE and e.g. on the stack. - contrib overlays cloak, usn: Had entry leaks. -- Hallvard
h.b.furuseth@usit.uio.no wrote: > Mostly done with the easily spotted cases. Hopefully HEAD's code > stayed valid through my commits. Need to think some more for the > rest. The decisions with greater chance of breaking things > remain, like ITS#6763. > > Other issues: > > * hyc, please have a look at slapd/bconfig.c 1.419. If I have > gotten things wrong, that one summarizes my mistakes nicely:-) I think you're over-paranoid. config_build_entry() is only called by internal plumbing and the SlapReply here never gets sent to send_ldap_result(). It's only passed in because be_add() expects one; the contents are mostly irrelevant. (Aside from rs_err) -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
In addition to the issues listed in my last message, this ITS depends on these: - ITS#6763: slapd does not clear sr_entry/ctrls on sizelimit/busy - ITS#6776: Control handling in PassMod exop + ppolicy - ITS#6760: rwm broken entry handling ==== Howard Chu writes: >h.b.furuseth@usit.uio.no wrote: >> * hyc, please have a look at slapd/bconfig.c 1.419. If I have >> gotten things wrong, that one summarizes my mistakes nicely:-) I meant to say "If I've broken anything, that commit should summarize my worst mistakes." At least that worry is out of the way: > I think you're over-paranoid. config_build_entry() is only called by > internal plumbing and the SlapReply here never gets sent to > send_ldap_result(). It's only passed in because be_add() expects one; > the contents are mostly irrelevant. (Aside from rs_err) Thanks. Not exactly what I hoped to hear, but I did ask... I can revert it, or since this SlapReply is so uninteresting, maybe allow rs = NULL in this case so some callers can omit it. ==== Summary of changes: * Add rs_assert_*(), disabled by default, to check SlapReply state. * Majority of changes: Avoid reusing SlapReply: Reinitialize these variables at need with rs_reinit() or move a declaration inwards (typically into loops). I think I introduced 2 new SlapReply's, in bconfig.c (unnecessarily as Howards says) ad in pcache. * Use the rs_*() functions in result.c to manage SlapReply entries, rather than freeing/releasing by hand and every so often forgetting to handle some of the REP_ENTRY_* flags. This also makes slapd more aggressive about freeing entries. E.g. result.c _released_ entries at need before sending, so entry locks for cached entries would not wait for network traffic. OTOH it did not at that point bother with entries that merely needed _freeing_. * "Reset dangerous REP_ENTRY_* flags." This is so when a module puts e.g. puts Extended Operation data in the SlapReply, another module won't try to entry_free/release() that non-entry because of some leftover REP_ENTRY_* flags. This may have been over-paranoid too: I seem closer to having rs.<sr_type, sr_un> behave like a variant record where rs.sr_type actually can be trusted. Anyway, this paranoia shouldn't hurt. Others are some minor cleanups: * Reset some SlapReply flags & data. Reset data more consistenlty. (E.g. rs_entry and flags go together, so don't reset just entry.) * Clear out or avoid filling in known unwanted info (Don't set sr_err = non-success before entering an operation handler, do not set sr_text with sr_err = LDAP_SUCCESS, etc.) * Add an error check or two (was actually done for consistency sr_err vs sr_text) * slap.h: Cast REP_* #defines to slap_mask_t * No-op changes (remove lint, rearrange/simplify code a bit, etc) -- Hallvard
I wrote: > In addition to the issues listed in my last message, this ITS depends > on these: [ITS#6763,#6776#6760]. Sorry, that was quite wrong. The changes done so far can stand alone, I've tried hard for that. Taking this ITS _further_ depends on those ITSes. But those ITSes are safest to tackle after the changes I've done so far. I've done this ITS backwards after some IRC miscommunication and by sitting too close to the code: First commit a huge patch to clean up SlapReply, then explain why. Sorry about that. I intended to get it done for 2.4.24, but when Howard balked at this approach I expect I instead ended up complicating HEAD->RE24 merges. So to explain better: This is ITS#5340 with a better name. See http://www.openldap.org/its/?findid=5340#followup16 - 18 which actually explains most of the rationale for these changes: "I'm a bit nervous about doing more than minimal fixes to this until we do something major, because it's so broken. (...) [Howard suggested] code which puts something into a SlapReply should ensure it's taken out again. (...) I think that's not the right fix because it's the most fragile solution: Practically every module which depends on it can break if some other module gets it wrong. OTOH, modules can't just reset SlapReply since some of it is input values to op_*() (...)" SlapReply is so broken and confusing that for 3 years I've not dared to do the obvious fixes to ITS#5340. Early on I kept prompting others who understood slapd better than me to take on #5430, but nothing happened. So now I've "done something major": Cleaned up SlapReply enough that it's no longer an act of bravery to take on the original issues. Most of the changes are a preliminary cleanup for going for the original ITS#5340 fixes (obeying the flags). elsewhere: By the original design, it's not a bug for an op_*() function to expect an incoming SlapReply to be clean, nor to output a "dirty" SlapReply, so it' has got to be wrong to reuse a SlapReply without cleaning it up. So I've taken a simpler and more robust approach this time, adding bigger but more trivial changes: In addition to cleaning up some flags, I've killed most cases of SlapReply reuse, which also isolates what damage one module can do to another through SlapReply. Even if they are third-party overlays where the first is broken because it accurately copied what OpenLDAP's overlays were doing:( I've also mostly fixed rs.sr_type. That one was almost as broken, somewhat in the spirit of this awful and confusing idea in result.c: /* FIXME: I think rs->sr_type should be explicitly set to * REP_SEARCH here. That's what it was when we entered this * function. send_ldap_error may have changed it, but we * should set it back so that the cleanup functions know * what they're doing. */ No. If something set rs.sr_type = REP_INTERMEDIATE, result.c should not tell its caller that OID rs.sr_rspoid is an entry, rs.sr_entry. As for user-visible issues the cleanup caught: Half of my December and January ITSes came from this effort. Some minor, some bigger. Generally issues somewhat out of scope of what I was doing in this ITS. Howard asked if all these changes fixed any crashes. Thinking it over, I've had few crashes because of this mess, and I have had a hard time trying to remember which ones. I've had many crashes in the truckload of asserts I added in my private code, which pointed out these ITSes and how to clean up. I could not have found the ITSes before the cleanup, since the asserts then brought down slapd in an instant. Also I've broken slapd a number of times by attempting "obvious" fixes before cleaning up, because slapd is more creative than me about reusing SlapReply strangely. Like pcache.c which reused a Search SlapReply with an entry in it for be_modify() and expected the entry to still be there after the modify:-( Looks like a good candidate for trying to construct a pre-cleanup crash, if I ever find time to try: Find a way to feed that SlapReply to something which reuses it for a Search or an Intermediate Response, thus overwriting the entry sitting in the Modify SlapReply. That kind if thing is why I'm much more eager to get the initial cleanup into RE24 than to get the final entry leak fixes etc into RE24. After the cleanup, there should only be an entry in there if sr_type is REP_SEARCH/REP_SEARCHREF, not also REP_RESULT/REP_GLUE_RESULT as before. ITSes: ITS#6760 slapo-rwm broken entry handling ITS#6752 slapo-dynlist issues [+ related non-SlapReply issues] ITS#6746, #6763 entry/control leaks [+ related non-SlapReply issues] My commits using rs_flush_entry() & co have fixed some other entry leaks or potential leaks. Some modules - in particular BDB - catch potential leaks of their entries. Others do not, giving us memory leaks. ITS#6731, #6748 used scrambled sr_text on an exited function's stack. ITS#6774, ITS#6785, some of my back-config commits: sr_text/sr_matched sent with sr_err=LDAP_SUCCESS, or related to another error than sr_err. Browsing 2.4.24 patches, an earlier cleanup would have caught #6633 (slapd-bdb error propagation) - my asserts with test047 catch sr_text != NULL when sr_err == 0 - and likely #6629 (freeing controls). -- Regards, Hallvard
changed state Open to Test
changed notes
changed notes changed state Test to Release
changed notes changed state Release to Closed
changed notes changed state Closed to Partial
Many fixes in HEAD. The rest depends on ITS#6776 and maybe #6785. Changes merged into RE24. Merge notes: As of Jan 21, the slapd/sasl.c and contrib/slapd-modules/usn/ changes are HEAD only.