[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: recommendations for data population

--On Wednesday, July 14, 2004 5:43 PM +0100 Kev Buckley <k.buckley@lancaster.ac.uk> wrote:

We are populating our LDAP database automatically from various other
sources.  Unfortunately, some of these other sources aren't always well
behaved.  For example, just last night I found out someone didn't have a
first name.  (note that that's supposed to be a requirement.. so why we
got data without a first name is beyond me)  Anyway, here's my delima.
I  am planning on running a script to perform updates regularly
(probably  once a day for now) which generates the ldif files, sorts
them, does a  diff with the last sorted ldif file, and uploads the
changes via  ldapmodify.  However, since I consider gn a required field,
and it wasn't  present, the ldapmodify croaked ...

You would seem to have the most control over the process you describe, at the point where the script creates the LDIF file you are looking to use to define the repository changes. Is it not possible to flag, and/or move out of the process, any incompatible source records for later inspection.

Indeed, there is a body of thought (though I may be the only body
holding such a thought) which says that you could be diffing the
current source data records with those in the last source data set, to
find a set of source diffs, from which you then create a set of LDIFs
and it is with those which you modify the LDAP repository.

For example, it might be obvious, from seeing the full source record
of the erroneous entry above, where the loss of cn-producing info
arose.  Such an observation might not, not necessarily anyway, be
obvious once you have produced your LDIF file, because, in creating
LDIF entries you will have removed information in the source, not of
interest to the repository schema.

This is essentially how we feed data into our directory service --

All the shared data from various systems (Peoplesoft, Oracle financials, etc) gets pushed into a central data repository we call the registry. The registry has an XML doc service that is used to display the data. We have a process that looks at the data in the doc service, and compares it to the data in the directory, and then writes any differences to the directory. Data changes trigger events, which is how our directory feeder process knows what records to look at to see if diffs have occurred. The feeder process has rules in place based on our schema to determine if resulting entries are valid or not.


Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html