[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Problems with case folding of UTF-8

On Mon, Dec 10, 2001 at 01:45:54PM -0800, Howard Chu wrote:
> This makes sense to me. I wonder why we should be forced to choose a longer
> representation; as long as our conversion is self-consistent (always chooses
> the same representation) we should be free to choose the form we want.

Unicode normalization is not exactly straightforward, there are
complications, I'm not quite sure how to do this consistently for all
characters in all the different scripts based on the Unicode tables.
Please read up on how normalization works. I don't think it would be
worth the effort. The only thing you are solving, is the need for
allocating new memory when the normalized string is longer. The only
problem I see is performance. We might need better memory handling.