[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: UTF8 case insensitive matching

To: Stig VenЕs <venaas@alfa.itea.ntnu.no>
Subject: Re: UTF8 case insensitive matching
From: Konstantin Chuguev <Konstantin.Chuguev@dante.org.uk>
Date: Wed, 25 Oct 2000 15:59:00 +0100
Cc: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>, openldap-devel@OpenLDAP.org
Organization: Delivery of Advanced Networking Service to Europe Ltd.
References: <20001024112053.A22541@itea.ntnu.no> <5.0.0.25.0.20001024130940.00abf0d0@router.boolean.net> <20001025163154.A11668@itea.ntnu.no>

"Stig VenЕs" wrote:

> On Tue, Oct 24, 2000 at 01:11:25PM -0700, Kurt D. Zeilenga wrote:
> > The DN normalization and matching?
>
> I'm looking at this. I have some questions.
>
> I'm writing UTF8str2upper and perhaps some other UTF8 functions
> that need liblunicode to work. I think they belong in utf8.c in
> libldap, but it's not so good I think, if applications that use
> libldap also must link with liblunicode. Where should I put it?
>

I think, as case conversion/normalisation is made in LDAP server and is
hidden from LDAP client, it should be outside of libldap and compiled
into slapd. Making slapd dependant on libunicode shared library is not a
bad idea IMHO.

>
> I'm not sure, but I think that the width of a character in UTF8
> might change when you change casing. Does anyone know for sure
> if it might? If it can change, dn_normalize will have to malloc
> space for a new string and return a pointer to that. A lot of
> code would have to be changed then. An easy but incorrect way
> out could be to simply not change casing for a character if
> the size is different. It would still be better than todays
> situation.
>

Definitely, case conversion can change the word length. An example is
the German "sharp s" eszet letter which in upper case becomes "SS".
There are lots of other examples, although mainly for non-European
languages.
I don't know what libunicode does, but I doubt it can do things
mentioned in the recent Unicode technical report on Case Mappings
(http://www.unicode.org/unicode/reports/tr21/index.html)
A starting point on complete implementation of case conversions for
Unicode could be this FAQ:
http://www.unicode.org/unicode/faq/casemap_charprop.html.

Sadly (for implementers, but this is the real world of human languages
;-) to be able to do proper search through Unicode text, one needs to
implement not just case mapping, but the complete Unicode normalisation
algorithm (see http://www.unicode.org/unicode/reports/tr15/index.html).

This is not expected to be implemented soon. Anyway, even the simplest
working algorithm is better than nothing and is really useful for
OpenLDAP :-)

Regards,
    Konstantin.

--
          * *        Konstantin Chuguev - Application Engineer
       *      *              Francis House, 112 Hills Road
     *                       Cambridge CB2 1PQ, United Kingdom
 D  A  N  T  E       WWW:    http://www.dante.net

References:
- UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: UTF8 case insensitive matching
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
- Re: UTF8 case insensitive matching
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>

Prev by Date: Re: UTF8 case insensitive matching
Next by Date: Re: UTF8 case insensitive matching
Index(es):
- Chronological
- Thread