[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: commit: ldap/libraries/libldap getdn.c



Stig Venaas wrote:
> 
> On Wed, Dec 05, 2001 at 11:01:17AM +0100, Pierangelo Masarati wrote:
> > I think this will go in between ldap_str2dn and ldap_dn2str inside
> > the new dnNormalize; this should also be done selectively on the
> > values of the attributes whose syntax allows UTF-8 data.
> 
> I must confess I haven't looked at your code, but I think that in all
> cases where you consider casefolding (uppercasing), you should think
> about Unicode. Of course if you know that some string (or part of
> string) is plain ASCII you can ignore Unicode there.
> 
> > We cannot work at the string level because all UTF-8 that is not
> > plain ascii is already represented as '\' + HEXPAIR; my guess is
> 
> Yes, so either we need to normalize it before it is escaped or we
> need to actually have something that reads in hex like this, and
> outputs new hex values. To me it sounds reasonable to do it before
> it is escaped, but I haven't looked at your code...
> 
> > we need to implement the schema aware dnNormalize to have UTF-8
> > normalization in place in an efficient manner, although we have
> > to deal with the overhead of finding the AttributeDescription of
> > each ava in the LDAPDN structure.  To this purpose we could store
> > it in the LDAPAVA as well, possibly only if inside the server and
> > if explicitly required by a flag. Sort of:
> 
> I need to look at your code, but are you (or should we) perhaps
> internally store the dn as a list of rdns, and have pointers to
> ldap_ava structs in there? And then only translate the dn into a
> string when necessary? The translated value can be stored/cached
> somewhere and reused.

I think this is the direction; for now, we are able 
to translate strings in structural representations
and vice versa; we use it to do validate/normalize/pretty
every time it is required.

At present, all of these functions are a mere check
that the sequence of operations succeeds; the actual 
normalization (which will not mean just uppercasing
any more, I guess) will be done inside dnNormalize
between the two operations.  

The hack I introduced last night is a mere uppercase 
AFTER the sequence of operations succeeded. This means 
that unicode is not affected, as it is already 
in '\' + HEXPAIR (and thus is NOT normalized!) This,
of course is a flaw we cannot accept; we have to deal 
with it until I work the schema-aware normalization out.

In my opinion we should pass the structural representation
AND the NORMALIZED string representation everywhere (maybe
also the PRETTY representation?) possibly with a state 
flag so that every time a particular representation is
required it can be generated if not available yet; this 
avoids unnecessary conversions.

Pierangelo.

-- 
Dr. Pierangelo Masarati               | voice: +39 02 2399 8309
Dip. Ing. Aerospaziale                | fax:   +39 02 2399 8334
Politecnico di Milano                 |
mailto:pierangelo.masarati@polimi.it
via La Masa 34, 20156 Milano, Italy   |
http://www.aero.polimi.it/~masarati