[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Normalizing directory data (Was: Distinguished name format & RFC 1779)



At 02:52 PM 8/26/99 +0200, Hallvard B Furuseth wrote:
>Kurt D. Zeilenga writes:
>> likely the exact form of the DN provided with the ADD operation will
>> be stored in the directory and that this would be returned by the
>> search (with minimal rewriting to conform to RFC2253, Section 4).
>
>I think it would be a bad idea to store unnormalized data by default.
>Here are my reasons.  Other opinions?

Normality is an undefined state.

Some might argue that we should normalize to their personal preferred
form, but it's their personal preferred form.  Their arguments for
their particular form are just as good as our arguments for our
preferred forms.  It quickly because a religious issue.  One way to
avoid this is to preserve the representation provided by the user.
Another way is to provide DN rewrite pluggins... (hint).

There are some cases where we must rewrite the DN before providing
it (optimally done on input, not output).  We must allow accept
any DN that RFC1779 allows, but rewrite it as necessary to provide
it back in RFC2253 form.  This, of course, can be done on input or
output.  Also, there are some DN forms that LDAPv2 clients are likely
not to understand.  As such there is some rewriting of valid RFC2253
DNs.  Namely, attribute type should be rewritten into the canonical
form (commonName->cn, 2.5.4.3->cn).

I believe our default DN rewriter should do the minimum required
for LDAPv2/LDAPv3 compatibility.

>- Speaking as the maintainer of a directory with user data, users
>  come up with an impressive number of ugly ways to format data.

Uglyness is in the eye of the beholder.  In fact, some would object
to codes prettying up their purposely ugly dn values.  That is,
I would never rewrite 'cn=#030405, o=foo' to 'cn=\03\04\05, o=foo'
or vice versa.  However, some folks likely prefer one form over the
other.

>- A piece of data is (usually) entered once but read many times.  Even
>  if normalization annoys the person who wanted to store `O = "Foo"',
>  he's only one.  The unnormalized form will be ugly to many who see
>  it in the middle of a list where most use the format `o=Bar'.

But if the maintainer says 'O = "FOO"', that's what she wants,
the software has little business rewriting it.

>- Directory browsers will automatically handle data more consistently
>  if it's normalized, e.g. `O = "Foo"' and `2.5.4.10=Foo' sort before
>  `o=Bar' but the normalized `o=Foo' doesn't.


I do agree that the default DN normalization should some minimal
rewriting.  Normalizing the attribute type makes sense.  Elimininating
extraneous spaces is touchy.

>  (Of course clients
>  _could_ put some effort into extracting the "real" text before
>  sorting it, but most don't bother.)

As directory objects should be self describing... and well designed
clients should avoid displaying the DN.

>It should be easy to make the degree of normalization configurable,
>though.

A DN "normalization" plugin?

>I'm not sure how much normalization I'd like by default, though.
>Something like this, I guess:
>
>- Change the case of attribute names to what slapd.conf says
>  (e.g. "CN" -> "cn").

Yes.

>- When an unknown attribute name or option is seen, remember it
>  and change the case of future versions of the same name/option
>  to that of the first.

map all attribute types and options to lower case on input.
It's amazing how many clients cannot handle ";Binary".

>- Delete superfluous spaces (e.g. `o = x   y' -> `o=x y')

'o=x y' and 'o=x  y' are different DN.

Personally, I prefer all optional spaces to be removed.
However, some folks like the space after the ',' and
before and after '+'... some folks even like spaces before
and after '='.

>- Delete superfluous quotes (e.g. o="x" -> o=x)

This is required per RFC2253, section 4.  We must accept this
RFC1779 quoted values when talking with LDAPv2 clients,
but we should provide RFC2253 escapes (if needed) when
returning such to LDAPv3 clients.

>- Maybe replace OIDs with names and "alternate" attribute
>  names to the primary names (e.g. organizationalName -> o).

Yes, we should rewrite attribute types to their canonical form.

>- Maybe store with consistent quoting rules (always change
>  `cn=x\,y' to `cn="x,y"', or always do the opposite change).

For RFC2253, you'd want to do the opposite.   I'd also suggest
leaving 'cn=#782c79' as is.