[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching

At 07:07 PM 10/25/00 +0200, Stig Venås wrote:
>On Wed, Oct 25, 2000 at 08:32:57AM -0700, Kurt D. Zeilenga wrote:
>> At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
>> >code would have to be changed then. An easy but incorrect way
>> >out could be to simply not change casing for a character if
>> >the size is different. It would still be better than todays
>> >situation.
>> We can certainly cheat in the short term....
>It's very tempting. But some people will need to recreate or at
>least reindex their database each time we change the normalization,
>right? So it shouldn't change too many times. It's a lot of work to
>do it properly though, and I would like to have something people can
>use soon.

We try to avoid releasing patches (sub-minor) that require reindexing,
deferring such changes to minor releases.  If the cheat was such that
only those DN with non-ASCII characters were affected, then we might
push such out as a patch.  However, I was caseIgnore support for
2.1 (a minor release).

>> Long term, we need to use the dnValidate()/dnNormalizer()
>> semantics instead of the dn_validate()/dn_normalize() semantics.  

Good.  This means we both agree architecturally.  I'm actually quite
happy with any incremental solution towards this end.  I'm primarily
laying out some options.

>> In the mid term, to avoid the ripple effect of the
>> dn_validate()/dn_normalize() change, I suggest that temporary
>> versions of dn_validate()/dn_normalize() be implemented which
>> use dnValidate()/dnNormalize() to do the work but provide old
>> semantics otherwise.
>I don't get this. dnValidate() and dnNormalize() use dn_validate()/
>dn_normalize() today.

In the mid term, we'd reverse the dependency.  dn_validate would
call dnValidate (to validate) and dnNormalize just to compare
lengths.  If length of normalized DN is too bug, the DN would
be treated as invalid.

This is a "mid-term" solution.  It hopefully avoids the rippling
of validation/normalization call changes though the code.  However,
this ripple might be unavoidable.

>I see two possibilities:
>I cheat and add simplistic UTF8 code to dn_validate()/dn_normalize().

This is what I call the "short term" solution.

>I leave dn_validate()/dn_normalize() as they are and implement new
>versions of dnValidate()/dnNormalize() with more correct UTF8 code,
>allowing for the possibility that the size of the dn can increase.
>Then we must change a lot of surrounding code so that it uses
>dnValidate()/dnNormalize() instead of dn_validate()/dn_normalize().

This is what I call a "long term" solution.

>I have no illusions of implementing 100% perfect normalization code

Understandable!  I'm happy with any forward steps.