[Date Prev][Date Next]
Re: UTF8 case insensitive matching
- To: openldap-devel@OpenLDAP.org
- Subject: Re: UTF8 case insensitive matching
- From: "David A. Cooper" <email@example.com>
- Date: Thu, 15 Feb 2001 11:17:16 -0500
- In-reply-to: <20010124155533.A12632@itea.ntnu.no>
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <20001024112053.A22541@itea.ntnu.no> <email@example.com> <20001025163154.A11668@itea.ntnu.no> <firstname.lastname@example.org> <20001025190736.A1932@itea.ntnu.no> <email@example.com>
I have also been working on improving the distinguished name normalization code. In late December, I posted a note saying that I had written a new version of dn_validate() that would read in and validate/normalize any DN encoded according to RFC 2253. The code handled whitespace compression, escaped characters, quoted strings, and BER encoded strings.
After Stig announced that he had completed his work on Unicode normalization, I decided to merge this functionality into the code that I wrote. My new function, get_validated_dn(), takes as input a normalize flag. If the flag is set, then the uccanondecomp() and uccanoncomp() functions are called in order to perform Unicode normalization (I used UTF8normalize() as a guide).
I have also improved the functionality of my code in a couple of other places. In the new code, when the normalize flag is set, the attribute type/value pairs in multi-valued RDNs are sorted in order to ensure consistency when performing matches. I have also changed the code for handling bit strings to use the DER encoded form of the bit string as the normalized form.
As with the previous version of my code, the get_validated_dn() function works by returning a returning the normalized DN as a new string (as opposed to overwriting the old one). I have re-written dn_validate(), dn_normalize(), dnValidate(), and dnNormalize() to make calls to this function. In the case of dn_validate() and dn_normalize(), the DN is treated as invalid if the normalized version of the DN is longer than the "unnormalized" version.
I have created a Web page from which the code can be obtained:
I have made the code available in two ways:
(1) As a patch file against the most recent code in the development branch of the code.
(2) As a stand-alone program that takes a DN from the command line and outputs its normalized form (In this version, get_validated_dn() makes the proper function calls to perform Unicode uppercasing and normalization, but the code to perform these functions is not included).
I believe that this code is now ready for incorporation into the development branch of the code. I would appreciate it if anyone who has opportunity would try this code and let me know if they discover any problems.
At 03:55 PM 1/24/01 +0100, Stig Venås wrote:
>I've written a new dnNormalize() that works like the old, except that
>it does Unicode normalization and case folding also. Are there cases
>where dnNormalize() should not do this? As discussed the long term
>solution is to use dnNormalize() instead of dn_normalize() right?
>Then I think it must do this.
>I've also written a new dn_normalize() that uses dnNormalize() and if
>the normalized string fits inside the old, this is returned from
>dn_normalize. If dnNormalize() fails or returns a larger string,
>dn_normalize() returns NULL like it did if dn_validate() failed.