[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Add flag to UTF8normalize and pals to allow accent stripping

On Mon, Feb 25, 2002 at 11:00:34AM +0100, John Hughes wrote:
> Some of us are too lazy to type accents when we do searches.
> Here's a patch to UTF8normalize to make it strip accents if
> the caller wants.
> This could be used to implement accent free searches.

I'm not quite sure if this is the right thing to do, but personally I
don't mind adding this.

> (On reflection maybe the stripping should be done inside
> uccanondecomp, but that's not how I wrote it).

I would rather do it outside because uccanondecomp() is in an external
library that we don't want to change more than necessary.

The code looks good, if you compare with the UTF8bvnormcmp changes I
just did, you will see that it looks much like what I did for something
else. I only have one question, see below:

> +		if (strip) {
> +			int in,ex;
> +			for (in = 1, ex = 1; in < ucsoutlen; ++in) { 
> +				if (ucisnonspacing (ucsout[in])) continue;

Where did you find the ucisnonspacing(), has that recently been added
to the ucdata lib? I can't find it anywhere. Also, I'm far from sure
that it's correct to strip all nonspacing. Maybe we should only do it
with those specific characters that some of you want to treat as