[Date Prev][Date Next] [Chronological] [Thread] [Top]

Normalizing directory data (Was: Distinguished name format & RFC 1779)

Kurt D. Zeilenga writes:
> likely the exact form of the DN provided with the ADD operation will
> be stored in the directory and that this would be returned by the
> search (with minimal rewriting to conform to RFC2253, Section 4).

I think it would be a bad idea to store unnormalized data by default.
Here are my reasons.  Other opinions?

(For those who don't know what I mean by normalization, see the list
at the end of this article.)

- Speaking as the maintainer of a directory with user data, users
  come up with an impressive number of ugly ways to format data.

- A piece of data is (usually) entered once but read many times.  Even
  if normalization annoys the person who wanted to store `O = "Foo"',
  he's only one.  The unnormalized form will be ugly to many who see
  it in the middle of a list where most use the format `o=Bar'.

- Directory browsers will automatically handle data more consistently
  if it's normalized, e.g. `O = "Foo"' and `' sort before
  `o=Bar' but the normalized `o=Foo' doesn't.  (Of course clients
  _could_ put some effort into extracting the "real" text before
  sorting it, but most don't bother.)

A negative effect of normalization is that a naive client which stores
`o = "Foo"' and reads it back may not recognize the returned `o=Foo'.
That trouble exists to some degree even without normalization, though.
E.g. the client tries to store `o = "Foo"' and is told that it (`o=Foo')
already exists, yet it won't find `o = "Foo"' in the returned data.

It should be easy to make the degree of normalization configurable,
though.  Just run the data through some function pointers before and
after storing/indexing/comparing the information and add slapd.conf
keyword to switch to a different degree of normalization.

I'm not sure how much normalization I'd like by default, though.
Something like this, I guess:

- Change the case of attribute names to what slapd.conf says
  (e.g. "CN" -> "cn").

- When an unknown attribute name or option is seen, remember it
  and change the case of future versions of the same name/option
  to that of the first.

- Delete superfluous spaces (e.g. `o = x   y' -> `o=x y')

- Delete superfluous quotes (e.g. o="x" -> o=x)

- Maybe replace OIDs with names and "alternate" attribute
  names to the primary names (e.g. organizationalName -> o).

- Maybe store with consistent quoting rules (always change
  `cn=x\,y' to `cn="x,y"', or always do the opposite change).