[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problems with case folding of UTF-8



On Mon, Dec 10, 2001 at 07:10:35PM +0100, Pierangelo Masarati wrote:
> Seriously, the "multiple equivalent representations" again scare me a bit,
> because unless our normalization routines are pretty robust, uniquely 
> choosing the same representation regardless of the input, we won't end up
> with a unique string (not even structural) representation of the DNs.

The UC has defined how this should be chosen uniquely, and that is
what UTF8normalize etc. does. The Unicode characters are partitioned
into equivalence classes and for each class there is a specific
character (combination) that is the normalized one. We want the
normalized DN to be independent of which representation was used as
input. If two DNs are the same except for that some characters have
different equivalent representations, they should be the same after
normalization.

The current matching code tries to do this. If two assertion values are
the same up to equivalent character representations, they will/should
have the same matches.

Stig