[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Characters in DN

On Wednesday 11 July 2001 02:13 am, Pierangelo Masarati wrote:
> "David A. Cooper" wrote:
> > OK, I have now integrated my versions of the dn_validate and dn_normalize
> > functions into current development branch code and have posted the new
> > patch file to http://csrc.nist.gov/pki/testing/openLDAP_contrib.html.
> > Feel free to check it out and, if you think it is appropriate, to commit
> > the changes.
> Your code looks ok. You should really submit an ITS
> so we can keep track of the changes.

OK, I'll go to the OpenLDAP Web site and submt an ITS.

> I have only one question. You treat '=' and '#' as
> characters that need to be escaped. While rfc 2253
> says implementations may escape other characters,
> it doesn't require them to be treated as special except
> in type/value separation (=) and beginning of string (#).
> I think you should handle them differently.

Actually, RFC 2253 isn't entirely clear on this issue. In section 2.4 it 
states that '=' and '#' (except at the beginning of a string) do not need 
to be escaped. However, the BNF in section 3 states that any character other 
than a stringchar must be escaped, where:

stringchar = <any character except one of special, "\" or QUOTATION >
special    = "," / "=" / "+" / "<" /  ">" / "#" / ";"
QUOTATION  =  <the ASCII double quotation mark character '"' decimal 34>

In my code, I compromised. The dn_validate/dn_normalize functions will 
take as input DNs that contain unescaped (=)'s and (#)'s, but in the 
normalized form they are escaped. So for example, the input "cn= ===###" is 
accepted, but the output is "CN=\=\=\=\#\#\#".

I see that the BNF in draft-ietf-ldapbis-dn-05.txt is different. It defines 
stringchar as:

stringchar        = <any UTF-8 character (can be multiple octets)
                          except one of escaped or ESC>

escaped           =   "," / "+" / """ / "<" /  ">" / ";"

So, while the BNF of draft-ietf-ldapbis-dn-05.txt is clear that '=' and '#' 
do not need to be escaped, the BNF in RFC 2253 suggests otherwise (in 
contradiction to the text of RFC 2253).

To some degree, the issue is somewhat academic though. The code will accept 
DNs with unescaped '=' and '#' and the normalized versions of the DNs, with 
the '=' and '#' are definitely compliant with RFC 2253. The only question is 
whether a few bytes are being wasted by including escape characters where 
they may not be absolutely necessary.

However, if people feel that this is important, I'll look into changing the 
code to avoid escaping '=' and '#' (unless the '#' is the first character in 
the string).

> Someone who's directly involved in unicode stuff
> should check the UTF part before anything is added.
> I'm totally stuck with it at present (see why we need
> ITS?)

That would be helpful. Originally, Stig Venås incorporated unicode 
normalization into the dnNormalize function by making a call to 
UTF8normalize. I used the UTF8normalize function as a guide in order to 
directly incorporate calls to uccanondecomp and uccanoncomp into my code. I 
tried it out with a few simple examples, and the code seems to be handling 
unicode normalization properly, but it would certainly be helpful to have 
others test it out as well.