[Date Prev][Date Next]
Re: Normalizing directory data (Was: Distinguished name format & RFC 1779)
At 14:52 26/08/99 +0200, you wrote:
>Kurt D. Zeilenga writes:
>> likely the exact form of the DN provided with the ADD operation will
>> be stored in the directory and that this would be returned by the
>> search (with minimal rewriting to conform to RFC2253, Section 4).
>I think it would be a bad idea to store unnormalized data by default.
>Here are my reasons. Other opinions?
>(For those who don't know what I mean by normalization, see the list
>at the end of this article.)
>- Speaking as the maintainer of a directory with user data, users
> come up with an impressive number of ugly ways to format data.
>- A piece of data is (usually) entered once but read many times. Even
> if normalization annoys the person who wanted to store `O = "Foo"',
> he's only one. The unnormalized form will be ugly to many who see
> it in the middle of a list where most use the format `o=Bar'.
>- Directory browsers will automatically handle data more consistently
> if it's normalized, e.g. `O = "Foo"' and `18.104.22.168=Foo' sort before
> `o=Bar' but the normalized `o=Foo' doesn't. (Of course clients
> _could_ put some effort into extracting the "real" text before
> sorting it, but most don't bother.)
One of the LDAP server I'm running applies normalization to added entries:
cn=foo,o=bar becomes cn = "foo" ; o = "bad" (uggly isn't it?)
Most clients (VisualLDAP for example) won't understand cn = "foo" ; o =
"bad" (and they will even crash). Normalizing in a standard fashion
(cn=foo,o=bar) would guarantee the software will work with most clients.
>A negative effect of normalization is that a naive client which stores
>`o = "Foo"' and reads it back may not recognize the returned `o=Foo'.
>That trouble exists to some degree even without normalization, though.
>E.g. the client tries to store `o = "Foo"' and is told that it (`o=Foo')
>already exists, yet it won't find `o = "Foo"' in the returned data.
>It should be easy to make the degree of normalization configurable,
>though. Just run the data through some function pointers before and
>after storing/indexing/comparing the information and add slapd.conf
>keyword to switch to a different degree of normalization.
>I'm not sure how much normalization I'd like by default, though.
>Something like this, I guess:
>- Change the case of attribute names to what slapd.conf says
> (e.g. "CN" -> "cn").
>- When an unknown attribute name or option is seen, remember it
> and change the case of future versions of the same name/option
> to that of the first.
>- Delete superfluous spaces (e.g. `o = x y' -> `o=x y')
>- Delete superfluous quotes (e.g. o="x" -> o=x)
>- Maybe replace OIDs with names and "alternate" attribute
> names to the primary names (e.g. organizationalName -> o).
>- Maybe store with consistent quoting rules (always change
> `cn=x\,y' to `cn="x,y"', or always do the opposite change).