[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching



On Wed, Oct 25, 2000 at 08:32:57AM -0700, Kurt D. Zeilenga wrote:
> At 04:31 PM 10/25/00 +0200, Stig Venås wrote:
> >code would have to be changed then. An easy but incorrect way
> >out could be to simply not change casing for a character if
> >the size is different. It would still be better than todays
> >situation.
> 
> We can certainly cheat in the short term....

It's very tempting. But some people will need to recreate or at
least reindex their database each time we change the normalization,
right? So it shouldn't change too many times. It's a lot of work to
do it properly though, and I would like to have something people can
use soon.

> Long term, we need to use the dnValidate()/dnNormalizer()
> semantics instead of the dn_validate()/dn_normalize() semantics.  

Right.

> In the mid term, to avoid the ripple effect of the
> dn_validate()/dn_normalize() change, I suggest that temporary
> versions of dn_validate()/dn_normalize() be implemented which
> use dnValidate()/dnNormalize() to do the work but provide old
> semantics otherwise.

I don't get this. dnValidate() and dnNormalize() use dn_validate()/
dn_normalize() today. If dnNormalize() alters the length when normal-
izing, it can not be used by dn_normalize() to do the work, not with-
out changing the semantics. Or am I missing something?

I see two possibilities:

I cheat and add simplistic UTF8 code to dn_validate()/dn_normalize().

or

I leave dn_validate()/dn_normalize() as they are and implement new
versions of dnValidate()/dnNormalize() with more correct UTF8 code,
allowing for the possibility that the size of the dn can increase.
Then we must change a lot of surrounding code so that it uses
dnValidate()/dnNormalize() instead of dn_validate()/dn_normalize().
I have no illusions of implementing 100% perfect normalization code
though.

Stig