[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching



On Wed, Oct 25, 2000 at 10:41:57AM -0700, Kurt D. Zeilenga wrote:
> We try to avoid releasing patches (sub-minor) that require reindexing,
> deferring such changes to minor releases.  If the cheat was such that
> only those DN with non-ASCII characters were affected, then we might
> push such out as a patch.  However, I was caseIgnore support for
> 2.1 (a minor release).

Okay, I decided to cheat for now. I've written new dn_normalize()
code that only works when the upper case UTF8 version of a character
has the same length as the lower, see ITS#859. We that have non-ASCII
characters might need to rebuild the database, but we also want the
search to be case insensitive (well, I do). I decided to put all the
code in dn.c. The UTF8 toupper function is cheating, I don't want to
put it in a library unless we need to use it other places than dn.c.

In the long term we need to change normalization as discussed, but I
also think the matching I did a few days ago needs to be improved then.
I think dn_validate() is okay for now, is there anything else we need
to fix in the short term? All I want in the short term is case
insensitive matching and dn I think.


In the long term:

I need to study unicode in detail, so that I now what I'm talking
about. I think we need to enhance the unicode library. There exists
some free general purpose unicode libraries that we perhaps should
consider.

Stig