[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Normalizing directory data (Was: Distinguished name format & RFC 1779)

At 14:52 26/08/99 +0200, you wrote:
>Kurt D. Zeilenga writes:
>> likely the exact form of the DN provided with the ADD operation will
>> be stored in the directory and that this would be returned by the
>> search (with minimal rewriting to conform to RFC2253, Section 4).
>I think it would be a bad idea to store unnormalized data by default.
>Here are my reasons.  Other opinions?
>(For those who don't know what I mean by normalization, see the list
>at the end of this article.)
>- Speaking as the maintainer of a directory with user data, users
>  come up with an impressive number of ugly ways to format data.
>- A piece of data is (usually) entered once but read many times.  Even
>  if normalization annoys the person who wanted to store `O = "Foo"',
>  he's only one.  The unnormalized form will be ugly to many who see
>  it in the middle of a list where most use the format `o=Bar'.
>- Directory browsers will automatically handle data more consistently
>  if it's normalized, e.g. `O = "Foo"' and `' sort before
>  `o=Bar' but the normalized `o=Foo' doesn't.  (Of course clients
>  _could_ put some effort into extracting the "real" text before
>  sorting it, but most don't bother.)

One of the LDAP server I'm running applies normalization to added entries:
cn=foo,o=bar becomes cn = "foo" ; o = "bad" (uggly isn't it?)

Most clients (VisualLDAP for example) won't understand cn = "foo" ; o =
"bad" (and they will even crash). Normalizing in a standard fashion
(cn=foo,o=bar) would guarantee the software will work with most clients.

>A negative effect of normalization is that a naive client which stores
>`o = "Foo"' and reads it back may not recognize the returned `o=Foo'.
>That trouble exists to some degree even without normalization, though.
>E.g. the client tries to store `o = "Foo"' and is told that it (`o=Foo')
>already exists, yet it won't find `o = "Foo"' in the returned data.
>It should be easy to make the degree of normalization configurable,
>though.  Just run the data through some function pointers before and
>after storing/indexing/comparing the information and add slapd.conf
>keyword to switch to a different degree of normalization.
>I'm not sure how much normalization I'd like by default, though.
>Something like this, I guess:
>- Change the case of attribute names to what slapd.conf says
>  (e.g. "CN" -> "cn").
>- When an unknown attribute name or option is seen, remember it
>  and change the case of future versions of the same name/option
>  to that of the first.
>- Delete superfluous spaces (e.g. `o = x   y' -> `o=x y')
>- Delete superfluous quotes (e.g. o="x" -> o=x)
>- Maybe replace OIDs with names and "alternate" attribute
>  names to the primary names (e.g. organizationalName -> o).
>- Maybe store with consistent quoting rules (always change
>  `cn=x\,y' to `cn="x,y"', or always do the opposite change).