[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: ldap_str2dn etc.



At 02:05 AM 2001-10-08, Pierangelo Masarati wrote:
>I've done a big chunk of work; I think the largest part
>that's still to do is the UTF-8 handling (I started working 
>on it this morning on the commuter train :)
>
>The code is kinda experimental, that is it is filled with
>programmer's notes, and not optimized. After it works as 
>expected I'll clean it a bit before commit.
>
>I'll try to list all the open issues to see if I can fix any 
>of them before I commit it (some of them simply are statements; 
>correct them if they're wrong).
>
><grep FIXME getdn.c:>
>
>a) there's no explicit mention both in RFC 2253 and 
>in LDAPbis DN draft of language extensions to attr types.
>I've figured out two behaviors:
>        1) discard the extensions
>        2) leave the extensions in place
>        3) issue an error if PEDANTIC

DNs cannot contain attribute type options.  That is,
        CN;lang-de=Kurt

is invalid.  Both parser and generator should bitch.

>b) string value means UTF-8 if LDAPv3 or T61 if LDAPv2;

Actually, in LDAPv2, LDAPDN (RFC1777) restricted
to IA5 but a DN (RFC1779) string representation is
neutral to the codeset used.  This means that any
extended T.61 (or other non-IA5) character in a value
string representation causes requires use of the #hex
format.

>I assume UTF-8 also if I'm reading DCE format, right?

No clue (I'd guess DCE isn't UTF-8 as it predates LDAPv3).

>But after I parsed a string into a DN, if I need to write 
>it back, say, in LDAPv2, what should I do if it contains 
>UTF-8? I was thinkning of extending the LDAPAVA struct
>to hold flags that state if the value is UTF-8 or simply 
>IA5 (correct?).

Well, I suggest that any transliteration be handled
outside of the str2dn and dn2str functions (much like
normalization).

>c) for performance issues, I think I should add a field
>to the LDAPAVA struct that holds the length of the string
>representation of the value, so I don't need to compute it 
>all the times. I'd also like to turn the attribute type
>field into a berval, to avoid computing its length many times
>(maybe we could even use an AttributeDescription in union
>with a berval in case the description is unknown).

I have no problem with this.

>d) I guess empty attributes are illegal in a DN; correct?
>(I need to handle AVA separators right after the '=' sign).

No.  A value can be empty.
        ref=,o=foo

is valid.

>e) I guess multi-AVA RDNs are allowed in DCE, right?

I assume so.

>f) If I understood the point, I have to turn string 
>representations of binary values ('#' + 1*(HEXPAIR) and 
>'\' HEXPAIR) into their binary form, and back to string, 
>right? (at least that's what I did).

In parsing (str2dn), if the string value is # format, you should
just place the BER value into la_value.  That is, (psuedo
code)
        if ( value[0] == '#' ) then
                la_value = unhex( &value[1] );
                la_flags = LDAP_AVA_BINARY;
        } else {
                ...
                la_flags = LDAP_AVA_STRING;
        }

In generating (dn2str),
        if( la_flags == LDAP_AVA_BINARY ) {
                value[0] = '#';
                &value[1] = hex( la_value )
        } else {
                ...
        }

The parser/generators need not convert between string and
binary (BER) value representations.  This is a job for
normalizers.

>g) I made leading and trailing spaces, and spaces around '=',
>',', ';' and '+' admittable unless the PEDANTIC flag is set.

Yes.

>Are they allowed also in DCE format?

No clue.


>h) T61: what should I look at?.

I assume you mean RFC 1779 format DNs (which technically
can be in any codeset, but restricted to IA5 when used in
LDAPv2).

>I strip quotes from quoted
>values; as a consequence I need to escape chars that need it
>(according to RFC 1779). I've a small doubt: in a quoted 
>attribute only double quotes need be escaped (among printable
>chars); so if I find an escape not followed by a quote do I 
>need to consider it a normal char (and thus escape it while
>eliminating quotes)? According to rfc 1779, a quoted string 
>can contain what is called a PAIR. Example:
>
>'","'   => '\,'         OK
>'"\,"'  => '\,'         OK

I guess I don't understand the question (I haven't finished
my morning cold carbonated caffeine yet:-).  But let me see
if I can clarify.

In parsing a string representation of a value, that string
may be quoted or may be escaped.  In either case, the value
is the unquoted/unescaped value and that's what stored in
la_value.

In generating a string representation, the value (la_value)
may contain characters which require quoting and/or escaping.
I suggest that here that only escaping be used.  Hence,

        dn2str(str2dn(cn="foo,bar")) returns cn=foo\,bar

Kurt