[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5342) DN and naming attribute mismatch

Aaron Richton wrote:
> OK. I can reproduce in test, but I'm starting to not care due to the
> circumstances.
> My January 29 production database produces err=80s when hit. The important
> open question is what caused the *initial* corruption: the hardware (which
> I note has been reliable since downgrading to 2.3.39), 2.3.37 (or possibly
> even some earlier version), or 2.3.40. I can think of no way to answer
> this question at this time, short of additional experimentation/data
> collection, and it may be near-impossible to find out. (Unfortunate,
> because it is the important question.)
> A perverse side effect is that the same corrupted production database,
> loaded against 2.3.39, happily accepts all changes. Now, .39 and .40 would
> be expected to produce identical results. Which of the two is wrong in
> this case? I find myself saying, again, that expectations during
> "impossible situations" are a difficult subject.

That makes some degree of sense. 2.3.40 introduced a dn2id_lock which uses the 
entryDN as the lock object. Since you have some entries with corrupted DNs, 
that will cause the locking protection to be ineffective.

Since there's no problem when you start with a clean database, I'd say the 
culprit was something older.

> Now, why don't I care about figuring out which one of them is right?
> Because I can only instigate failure when something is in an "impossible
> situation" at t=0. In the test environment, I gained the luxury of a
> debugging procedure I couldn't afford in production: a full rm/slapadd of
> the entire database. If I rm/slapadd using entirely 2.3.39, things work.
> And if I rm/slapadd using entirely 2.3.40, things work: so far I cannot
> make an initial corruption with 2.3.40, although corruption at t=0 can be
> worsened by it.
> Given the fact that I observed database issues like #5262 in my own test
> environment, I find it plausible that 2.3.37 corrupted my database on the
> way out the door. I've also observed, in test, that I can work around this
> with slapcat/rm/slapadd.
> While all this has been going on, I've been stressing 2.3.40 with test008,
> and it seems to be fine. I will likely try 2.3.40 again next week, but
> plan on a slightly abnormal upgrade procedure that includes a
> slapcat/rm/slapadd. This way, at least the sins of the past will be fully
> purged prior to the first 2.3.40 start, so if any future issues arise
> we'll have end-to-end 2.3.40 accountability.

   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/