[Date Prev][Date Next]
Normalizing directory data (Was: Distinguished name format & RFC 1779)
Kurt D. Zeilenga writes:
> likely the exact form of the DN provided with the ADD operation will
> be stored in the directory and that this would be returned by the
> search (with minimal rewriting to conform to RFC2253, Section 4).
I think it would be a bad idea to store unnormalized data by default.
Here are my reasons. Other opinions?
(For those who don't know what I mean by normalization, see the list
at the end of this article.)
- Speaking as the maintainer of a directory with user data, users
come up with an impressive number of ugly ways to format data.
- A piece of data is (usually) entered once but read many times. Even
if normalization annoys the person who wanted to store `O = "Foo"',
he's only one. The unnormalized form will be ugly to many who see
it in the middle of a list where most use the format `o=Bar'.
- Directory browsers will automatically handle data more consistently
if it's normalized, e.g. `O = "Foo"' and `220.127.116.11=Foo' sort before
`o=Bar' but the normalized `o=Foo' doesn't. (Of course clients
_could_ put some effort into extracting the "real" text before
sorting it, but most don't bother.)
A negative effect of normalization is that a naive client which stores
`o = "Foo"' and reads it back may not recognize the returned `o=Foo'.
That trouble exists to some degree even without normalization, though.
E.g. the client tries to store `o = "Foo"' and is told that it (`o=Foo')
already exists, yet it won't find `o = "Foo"' in the returned data.
It should be easy to make the degree of normalization configurable,
though. Just run the data through some function pointers before and
after storing/indexing/comparing the information and add slapd.conf
keyword to switch to a different degree of normalization.
I'm not sure how much normalization I'd like by default, though.
Something like this, I guess:
- Change the case of attribute names to what slapd.conf says
(e.g. "CN" -> "cn").
- When an unknown attribute name or option is seen, remember it
and change the case of future versions of the same name/option
to that of the first.
- Delete superfluous spaces (e.g. `o = x y' -> `o=x y')
- Delete superfluous quotes (e.g. o="x" -> o=x)
- Maybe replace OIDs with names and "alternate" attribute
names to the primary names (e.g. organizationalName -> o).
- Maybe store with consistent quoting rules (always change
`cn=x\,y' to `cn="x,y"', or always do the opposite change).