[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: RE : Add flag to UTF8normalize and pals to allow accent stripping



Stig Venaas wrote:
> 
> On Mon, Feb 25, 2002 at 03:24:56PM +0100, John Hughes wrote:
> > > Where did you find the ucisnonspacing(), has that recently been added
> > to the
> > > ucdata lib? I can't find it anywhere.
> >
> > ldap/libraries/liblunicode/ucdata/ucdata.h
> 
> Ah of course, it's a macro. I grepped in the wrong place.
> 
> > > Maybe we should only do it with those specific characters that some of
> > > you want to treat as equivalent?
> >
> > To be realy correct I guess it should be language specific.
> >
> > For example in French most people would expect Noël = Noel,
> > but Germans might like ö = oe.
> 
> To cater for French, we could replace ë, é, è (what others?)

à
ù
ê
â
û
ô
î
ï

Note also ç

> with just e
> and it wouldn't affect the German umlauts at all. In Norwegian and some
> other languages the accent is usually ignored as well. Say a word like
> café is sometimes written as cafe. And one can use accents optionally in
> some Norwegian words. So, an ugly hack could be to have a table for
> mappings that people can modify and enable at compile time. It is ugly
> though. To do it correctly we should use language tags and the collation
> table for the given language I think.
> 
> BTW, we should also do case insensitive comparisons differently. And that
> might also depend on language tags/locales.
> 
> Stig