[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Problems with case folding of UTF-8



> -----Original Message-----
> From: Stig Venaas [mailto:Stig@OpenLDAP.org]
> Sent: Monday, December 10, 2001 2:16 PM

> On Mon, Dec 10, 2001 at 01:45:54PM -0800, Howard Chu wrote:
> > This makes sense to me. I wonder why we should be forced to
> choose a longer
> > representation; as long as our conversion is self-consistent
> (always chooses
> > the same representation) we should be free to choose the form we want.
>
> Unicode normalization is not exactly straightforward, there are
> complications, I'm not quite sure how to do this consistently for all
> characters in all the different scripts based on the Unicode tables.
> Please read up on how normalization works. I don't think it would be
> worth the effort. The only thing you are solving, is the need for
> allocating new memory when the normalized string is longer. The only
> problem I see is performance. We might need better memory handling.

I have started reading through the normalization documents. I recognize that
the process is not straightforward.

re: performance - clearly we must have correctness first, but we cannot
overlook performance once we have obtained correctness. My personal
philosophy is that we as software developers have an obligation to deliver
both. It may cost tens of hours on our part, but slow code deployed in the
wide world will cost countless time to thousands to billions of users, far
outweighing the hours it would cost us to get it right. Certainly it does
not seem that other large software houses share my philosophy, but so it
goes.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support