[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: normalised UTF-8, should it be "decomposed", or "composed"?



On Wed, Feb 20, 2002 at 09:48:43AM -0800, Kurt D. Zeilenga wrote:
> At 06:39 AM 2002-02-20, Stig Venaas wrote:
> >then strip 8-bit characters
> 
> I think we should NOT strip 8-bit characters (when doing
> approximate matching).

I guess I wasn't clear enough, I meant stripping non-ascii code points.
So say an accented e would be decomposed as two code points, e and the
accent. When we then strip 8-bit code points, we strip the accent but
not the e. This is like Howard suggested I think, I was trying to tell
how easy it is to implement.

Stig