[Date Prev][Date Next] [Chronological] [Thread] [Top]

Unicode normalization

Before I start doing the changes, I would like to let you know what
I'm thinking, please let me know if this looks wrong.

To lunicode, I'm adding UTF8str2upper and UTF8normalize that does upper-
casing and unicode normalization on a UTF8 string resp., and return a
pointer to a new string. They allocate new memory for this, preserving
the old one. I'm using functions from lldap for this, is it bad that
lunicode depends on lldap? If not we need to duplicate some code, since
we can't move this out of lldap.

I'm leaving them as two separate functions since we don't always want to
do uppercasing.

Both should be used by dnNormalize, caseIgnoreIndexer, caseIgnoreFilter,
caseIgnoreSubstringsIndexer, caseIgnoreSubstringsFilter. They should also
be used by the approx-functions I guess, but I don't want to think of
them now.

caseExactMatch, caseExactSubstringsMatch, caseExactIndexer,
caseExactFilter, caseExactSubstringsIndexer and caseExactSubstringsFilter
should all use UTF8normalize.

Is it okay if I enable this in head (will require reindexing in many
cases), or should I ifdef it out.