[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Normalizing directory data (Was: Distinguished name format & RFC 1779)



Kurt D. Zeilenga writes:
> Some might argue that we should normalize to their personal preferred
> form, but it's their personal preferred form.  Their arguments for
> their particular form are just as good as our arguments for our
> preferred forms.  It quickly because a religious issue.

Indeed.  And a tecnical issue, depending on what the server data is
used for.  All I dare argue for is what default we should pick,
we can let the maintainer be the "religious dictator".

> One way to avoid this is to preserve the representation provided by
> the user.

There'll be trouble either way.  The best we can do is pick the "least
bad" solution.  And duck the issue and give it to the the maintainer:-)

> Another way is to provide DN rewrite pluggins... (hint).

I didn't think of that one.
Could be useful in both the server and the client, BTW.

BTW, they should probably be general attribute rewrite plugins,
multiplexed by the attribute syntax.  Of course they don't have
to _do_ anything for other syntaxes than DNs.

> There are some cases where we must rewrite the DN before providing
> it (optimally done on input, not output).  We must allow accept
> any DN that RFC1779 allows, but rewrite it as necessary to provide
> it back in RFC2253 form.  This, of course, can be done on input or
> output.

> Also, there are some DN forms that LDAPv2 clients are likely not to
> understand.  As such there is some rewriting of valid RFC2253 DNs.

Good point.  But there are times where LDAPv2 requires `\char' quoting
and LDAPv3 requires \hex quoting or no quoting.  Examples: CR or "=".
Yet another configurable, I guess...

> Namely, attribute type should be rewritten into the canonical
> form (commonName->cn, 2.5.4.3->cn).

Only by default.  There is a mail agent that knows 'rfc822Mailbox' but
not 'mail' (and others that understand 'mail' but not 'rfc822Mailbox'),
so there is at least one reason to _not_ do that rewriting:-(

(Well, we could treat them as separate attributes and return the e-mail
address _both_ in 'mail' and in 'rfc822Mailbox' like ldap.hioslo.no
does, even though they have the same OID.  That's uglier, but only
requires one LDAP server.)

> I believe our default DN rewriter should do the minimum required
> for LDAPv2/LDAPv3 compatibility.
> (...)
> Personally, I prefer all optional spaces to be removed.

You mean you expect your preference (to remove them) is in minority?
Well, I haven't heard anyone speak up against it yet:-)

>>- Speaking as the maintainer of a directory with user data, users
>>  come up with an impressive number of ugly ways to format data.
> 
> Uglyness is in the eye of the beholder.  In fact, some would object
> to codes prettying up their purposely ugly dn values.  That is,
> I would never rewrite 'cn=#030405, o=foo' to 'cn=\03\04\05, o=foo'
> or vice versa.

I might, depending on on what what kind of users I had.  If someone had
a client which consistently used the #hex form to write data to our
public directory, I would - since I know that plenty of others have
clients that would display the hex form.  (Probably including us, come
to think of it.  I'll have to check.)

BTW, would you rewrite `roleOccupant= cn=#030405, o=foo'?  That _would_
be shown by the clients that prettify DNs but not attributes.

> However, some folks likely prefer one form over the other.
> 
>>- A piece of data is (usually) entered once but read many times.  Even
>>  if normalization annoys the person who wanted to store `O = "Foo"',
>>  he's only one.  The unnormalized form will be ugly to many who see
>>  it in the middle of a list where most use the format `o=Bar'.
> 
> But if the maintainer says 'O = "FOO"', that's what she wants,
> the software has little business rewriting it.

If the *maintainer* does, yes.  But she can set the rewriting to the
default she wants.

>>- Directory browsers will automatically handle data more consistently
>>  if it's normalized, e.g. `O = "Foo"' and `2.5.4.10=Foo' sort before
>>  `o=Bar' but the normalized `o=Foo' doesn't.
> 
> I do agree that the default DN normalization should some minimal
> rewriting.  Normalizing the attribute type makes sense.  Elimininating
> extraneous spaces is touchy.
> 
>>  (Of course clients
>>  _could_ put some effort into extracting the "real" text before
>>  sorting it, but most don't bother.)
> 
> As directory objects should be self describing...

Indeed, but they are not.  There is not (yet) a well-known 'realName'
attribute type which tells us how to display an object.  Nor an
ASCIIrealName or realName;ASCII to tell us Western types how to display
a DN with Japanese characters.  Nor do objects provide a prettified form
of their _parent_ DN, and it's not always good design for the client to
spend time to read the parent object before displaying a prettified DN.

> and well designed clients should avoid displaying the DN.

Yeah.  I wish more clients were well designed.

I'm *really* curious about how many complaints we'll get when we switch
to `uniqueID=xxx,c=NO' DNs for Norwegian organizations...

>>It should be easy to make the degree of normalization configurable,
>>though.
> 
> A DN "normalization" plugin?

Sounds nice.  I just thought to write a few DN normalization functions,
and a slapd.conf statement to choose between them.

>>- When an unknown attribute name or option is seen, remember it
>>  and change the case of future versions of the same name/option
>>  to that of the first.
> 
> map all attribute types and options to lower case on input.
> It's amazing how many clients cannot handle ";Binary".

Agreed for options, but not attribute types (and object classes, come
to think of it) with at least one lowercase letter.  They eye doesn't
exactly flow past `initiatorpullingauthenticationrequirements'
(lowercased attribute type from ISODE oidtable.at).

>>- Delete superfluous spaces (e.g. `o = x   y' -> `o=x y')
> 
> 'o=x y' and 'o=x  y' are different DN.

Oops.

> Personally, I prefer all optional spaces to be removed.
> However, some folks like the space after the ',' and
> before and after '+'... some folks even like spaces before
> and after '='.

There does seem to be a _lot_ of options.  We should definitely
say `You want it?  You write it.  And please contribute it'.

>>- Delete superfluous quotes (e.g. o="x" -> o=x)
> 
> This is required per RFC2253, section 4.

Great.

> We must accept this RFC1779 quoted values when talking with LDAPv2
> clients, but we should provide RFC2253 escapes (if needed) when
> returning such to LDAPv3 clients.

Does that mean you want to return the quoted values to LDAPv2 clients?
That'd actually mean there are things you can write with a v2 client
which you can't write with a v3 client (to a server with that
configuration).

>>- Maybe store with consistent quoting rules (always change
>>  `cn=x\,y' to `cn="x,y"', or always do the opposite change).
> 
> For RFC2253, you'd want to do the opposite.   I'd also suggest
> leaving 'cn=#782c79' as is.

Fine.  (By default:-)

-- 
Hallvard