Re: recommendations for data population

> We are populating our LDAP database automatically from various other 
> sources.  Unfortunately, some of these other sources aren't always well 
> behaved.  For example, just last night I found out someone didn't have a 
> first name.  (note that that's supposed to be a requirement.. so why we 
> got data without a first name is beyond me)  Anyway, here's my delima.  I 
> am planning on running a script to perform updates regularly (probably 
> once a day for now) which generates the ldif files, sorts them, does a 
> diff with the last sorted ldif file, and uploads the changes via 
> ldapmodify.  However, since I consider gn a required field, and it wasn't 
> present, the ldapmodify croaked ...

You would seem to have the most control over the process you describe,
at the point where the script creates the LDIF file you are looking to
use to define the repository changes. Is it not possible to flag,
and/or move out of the process, any incompatible source records for
later inspection.

Indeed, there is a body of thought (though I may be the only body
holding such a thought) which says that you could be diffing the
current source data records with those in the last source data set, to
find a set of source diffs, from which you then create a set of LDIFs
and it is with those which you modify the LDAP repository.

For example, it might be obvious, from seeing the full source record
of the erroneous entry above, where the loss of cn-producing info
arose.  Such an observation might not, not necessarily anyway, be
obvious once you have produced your LDIF file, because, in creating
LDIF entries you will have removed information in the source, not of
interest to the repository schema.


