[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problems with case folding of UTF-8

To: Pierangelo Masarati <masarati@aero.polimi.it>
Subject: Re: Problems with case folding of UTF-8
From: Stig Venaas <Stig@OpenLDAP.org>
Date: Mon, 10 Dec 2001 19:02:48 +0100
Cc: openldap-devel@OpenLDAP.org
Content-disposition: inline
In-reply-to: <200112100905.fBA95ww15654@server.aero.polimi.it>; from masarati@aero.polimi.it on Mon, Dec 10, 2001 at 10:05:57AM +0100
References: <200112100905.fBA95ww15654@server.aero.polimi.it>
User-agent: Mutt/1.2.5i

On Mon, Dec 10, 2001 at 10:05:57AM +0100, Pierangelo Masarati wrote:
> Hi,
> 
> while dealing with DN normalization I had a serious problem.  I was 
> dealing with Italian accents in Latin-1 ("e acute", "e grave" and so),
> and this is what happened:

I think the behavior is correct, but I need to do some checking before
I can give you a definite answer. Hopefully I'm able to do that in 8-9
days. If you feel like it you can look at the Unicode tables yourself.
Look at UnicodeData.txt in HEAD. The format is explained at the
Unicode consortium web.

> b) breaks the current DN normalization workaround in slapd because 
> the resulting normalized DN is longer than the input one (a six-char 
> '\c3\89' is turned into a seven-char 'E\cc\81' when there's an 
> equivalent six-char representation)

Yes, there can be multiple equivalent representations, and the
normalized representation is often not the shortest one, so we
have to allow for the strong to grow, which means that you need
to do new allocation for the normalized string. UTF8Normalize or
whatever I called it, does this.

Stig

Follow-Ups:
- Re: Problems with case folding of UTF-8
  - From: Pierangelo Masarati <masarati@aero.polimi.it>

References:
- Problems with case folding of UTF-8
  - From: Pierangelo Masarati <masarati@aero.polimi.it>

Prev by Date: Re: Back-monitor log functionality
Next by Date: Re: Problems with case folding of UTF-8
Index(es):
- Chronological
- Thread