[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching

To: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Subject: Re: UTF8 case insensitive matching
From: Stig Venås <venaas@alfa.itea.ntnu.no>
Date: Thu, 26 Oct 2000 15:54:14 +0200
Cc: openldap-devel@OpenLDAP.org
In-reply-to: <5.0.0.25.0.20001025101238.00af7670@router.boolean.net>; from Kurt@OpenLDAP.org on Wed, Oct 25, 2000 at 10:41:57AM -0700
References: <5.0.0.25.0.20001025080809.00b012c0@router.boolean.net> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001024112053.A22541@itea.ntnu.no> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001025163154.A11668@itea.ntnu.no> <5.0.0.25.0.20001025080809.00b012c0@router.boolean.net> <20001025190736.A1932@itea.ntnu.no> <5.0.0.25.0.20001025101238.00af7670@router.boolean.net>

On Wed, Oct 25, 2000 at 10:41:57AM -0700, Kurt D. Zeilenga wrote:
> We try to avoid releasing patches (sub-minor) that require reindexing,
> deferring such changes to minor releases.  If the cheat was such that
> only those DN with non-ASCII characters were affected, then we might
> push such out as a patch.  However, I was caseIgnore support for
> 2.1 (a minor release).

Okay, I decided to cheat for now. I've written new dn_normalize()
code that only works when the upper case UTF8 version of a character
has the same length as the lower, see ITS#859. We that have non-ASCII
characters might need to rebuild the database, but we also want the
search to be case insensitive (well, I do). I decided to put all the
code in dn.c. The UTF8 toupper function is cheating, I don't want to
put it in a library unless we need to use it other places than dn.c.

In the long term we need to change normalization as discussed, but I
also think the matching I did a few days ago needs to be improved then.
I think dn_validate() is okay for now, is there anything else we need
to fix in the short term? All I want in the short term is case
insensitive matching and dn I think.

In the long term:

I need to study unicode in detail, so that I now what I'm talking
about. I think we need to enhance the unicode library. There exists
some free general purpose unicode libraries that we perhaps should
consider.

Stig

Follow-Ups:
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: UTF8 case insensitive matching
  - From: Konstantin Chuguev <Konstantin.Chuguev@dante.org.uk>

References:
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: Re: back-ldap problem with Win2000 Active Directory
Next by Date: Re: UTF8 case insensitive matching
Index(es):
- Chronological
- Thread