[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: dnValidate (was: Re: UTF8 case insensitive matching)

To: "David A. Cooper" <david.cooper@nist.gov>
Subject: Re: dnValidate (was: Re: UTF8 case insensitive matching)
From: Stig Venås <venaas@alfa.itea.ntnu.no>
Date: Fri, 22 Dec 2000 21:55:24 +0100
Cc: openldap-devel@OpenLDAP.org
In-reply-to: <4.2.2.20001222150321.00ad2a00@email.nist.gov>; from david.cooper@nist.gov on Fri, Dec 22, 2000 at 03:18:04PM -0500
References: <4.2.2.20001222111658.00acdad0@email.nist.gov> <4.2.2.20001031125512.00a63870@email.nist.gov> <5.0.0.25.0.20001031104352.0272cad0@router.boolean.net> <4.2.2.20001222111658.00acdad0@email.nist.gov> <20001222205337.A790@itea.ntnu.no> <4.2.2.20001222150321.00ad2a00@email.nist.gov>

On Fri, Dec 22, 2000 at 03:18:04PM -0500, David A. Cooper wrote:
> At 08:53 PM 12/22/00 +0100, Stig Venås wrote:
> >On Fri, Dec 22, 2000 at 01:27:55PM -0500, David A. Cooper wrote:
> > > 
> > > Based on earlier discussions, I have been working on a version of dnValidate that will read in a distinguished name that was generated based on RFC 2253 (or RFC 1779) and will return a normalized version of that string (compliant with RFC 2253).
> >
> >I started to look into this as well, but have been working mostly on Unicode
> >normalization. I probably have some code ready for use in 2-3 weeks. When
> >normalizing the dn, we should also do Unicode normalization I think.
> 
> I'm not sure what you mean here. According to RFC 2253, the string representation of an attribute value must be a UTF-8 string. So, the code that I wrote reads in characters, one at a time, converts them to unicode as is necessary to call uctoupper, and then converts the result back to UTF-8 to place in the normalized string. Did you have something else in mind?

The problem is Unicode composite characters. A character like say Å can be
written in several ways (both composed and decomposed) and we want them to
be identical. I think it makes sense to do it together with your
normalization since we shouldn't convert to UTF8 and back several times.

My plan is to add a function that does Unicode canonical normalization on
a UTF-8 string, and optionally case folding at the same time. The function
will have an argument for specifying whether case folding should be done or
not. As I said above, it's good to do the case folding and the normali-
zation at the same time. This function will be needed several places, and I
think it might be best that you call this function from your function,
instead of doing the uc parts yourself. If all goes well, I have this code
ready in two weeks. I'm waiting for the new release of the ucdata library
that will contain the normalization code. I decided to add the code to the
library since it can be of use to other applications as well. If you want
to know what the normalization is all about, have a look at
http://www.unicode.org/unicode/reports/tr15/
I'm doing so called NFC normalization.

> I don't think there would be problem going either way. One could just replace TOUPPER and uctoupper with TOLOWER and uctolower.

Agree

> > > For my own purposes, for the short term, my plan is to re-write dn_validate and dn_normalize as functions that call my version of dnValidate and then overwrite the original string with the string returned by dnValidate (if the normalized string will fit).
> >
> >Do you plan to look at the case where it won't fit later?
> 
> The nice thing about simply overwriting the original string in dn_validate and dn_normalize is that it involves only a local change to the code. In order to do things properly, all of the calls to these functions would need to be changed. While that may or may not be difficult, I am not very familiar with the code and would be concerned about the possibility of introducing bugs if I tried to do it myself. So, I guess the short answer is that I was hoping that someone more familiar with the code would do it.

I understand, I asked because I think someone should start looking into
the long term solution soon. I might do it myself, but I think it might
be a while before I get the time.

Stig

References:
- dnValidate (was: Re: UTF8 case insensitive matching)
  - From: "David A. Cooper" <david.cooper@nist.gov>
- Re: dnValidate (was: Re: UTF8 case insensitive matching)
  - From: Stig Venås <venaas@alfa.itea.ntnu.no>
- Re: dnValidate (was: Re: UTF8 case insensitive matching)
  - From: "David A. Cooper" <david.cooper@nist.gov>

Prev by Date: Re: dnValidate (was: Re: UTF8 case insensitive matching)
Next by Date: Re: dnValidate (was: Re: UTF8 case insensitive matching)
Index(es):
- Chronological
- Thread