[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching

To: Stig Venås <venaas@alfa.itea.ntnu.no>
Subject: Re: UTF8 case insensitive matching
From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Date: Wed, 25 Oct 2000 10:41:57 -0700
Cc: openldap-devel@OpenLDAP.org
In-reply-to: <20001025190736.A1932@itea.ntnu.no>
References: <5.0.0.25.0.20001025080809.00b012c0@router.boolean.net> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001024112053.A22541@itea.ntnu.no> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001025163154.A11668@itea.ntnu.no> <5.0.0.25.0.20001025080809.00b012c0@router.boolean.net>

At 07:07 PM 10/25/00 +0200, Stig Venås wrote:
>On Wed, Oct 25, 2000 at 08:32:57AM -0700, Kurt D. Zeilenga wrote:
>> At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
>> >code would have to be changed then. An easy but incorrect way
>> >out could be to simply not change casing for a character if
>> >the size is different. It would still be better than todays
>> >situation.
>> 
>> We can certainly cheat in the short term....
>
>It's very tempting. But some people will need to recreate or at
>least reindex their database each time we change the normalization,
>right? So it shouldn't change too many times. It's a lot of work to
>do it properly though, and I would like to have something people can
>use soon.

We try to avoid releasing patches (sub-minor) that require reindexing,
deferring such changes to minor releases.  If the cheat was such that
only those DN with non-ASCII characters were affected, then we might
push such out as a patch.  However, I was caseIgnore support for
2.1 (a minor release).

>> Long term, we need to use the dnValidate()/dnNormalizer()
>> semantics instead of the dn_validate()/dn_normalize() semantics.  
>
>Right.

Good.  This means we both agree architecturally.  I'm actually quite
happy with any incremental solution towards this end.  I'm primarily
laying out some options.

>> In the mid term, to avoid the ripple effect of the
>> dn_validate()/dn_normalize() change, I suggest that temporary
>> versions of dn_validate()/dn_normalize() be implemented which
>> use dnValidate()/dnNormalize() to do the work but provide old
>> semantics otherwise.
>
>I don't get this. dnValidate() and dnNormalize() use dn_validate()/
>dn_normalize() today.

In the mid term, we'd reverse the dependency.  dn_validate would
call dnValidate (to validate) and dnNormalize just to compare
lengths.  If length of normalized DN is too bug, the DN would
be treated as invalid.

This is a "mid-term" solution.  It hopefully avoids the rippling
of validation/normalization call changes though the code.  However,
this ripple might be unavoidable.

>I see two possibilities:
>
>I cheat and add simplistic UTF8 code to dn_validate()/dn_normalize().

This is what I call the "short term" solution.

>I leave dn_validate()/dn_normalize() as they are and implement new
>versions of dnValidate()/dnNormalize() with more correct UTF8 code,
>allowing for the possibility that the size of the dn can increase.
>Then we must change a lot of surrounding code so that it uses
>dnValidate()/dnNormalize() instead of dn_validate()/dn_normalize().

This is what I call a "long term" solution.

>I have no illusions of implementing 100% perfect normalization code
>though.

Understandable!  I'm happy with any forward steps.

Follow-Ups:
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>

References:
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>

Prev by Date: Re: UTF8 case insensitive matching
Next by Date: Re: back-ldap problem with Win2000 Active Directory
Index(es):
- Chronological
- Thread