[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Producing a web directory with LDAP

At 9:51 -0500 30/1/03, Mark H. Wood wrote:
On Wed, 29 Jan 2003, Adam de Zoete wrote:
 Firstly let me say that i'm new to this list and new to LDAP. However
 i'm looking for a flat file solution (or non-RDBMS) to produce a web
 directory and am prepared to put in some learning curve in order to
 be in an appropriate format.

A number of your concerns are not specific to OpenLDAP, but rather relate to LDAP in general or even to directory services in general. You might try asking the broader community at LDAP@UMich.Edu. Several of us follow that list as well.

Thanks, Mark, I'll try that.

> Firstly, could I be looking in the right place with LDAP? Would you
 have any other suggestions to build something like this?....

 The reason why I am looking at LDAP is because SQL tables can't build
 the hierarchical structure of a directory easily. I think I need a
 un-relational solution, with greater speed and flexibility.

Relational tables can represent hierarchial data, but it's not quite natural. Cut open a lot of directory services and you will find something very like an RDBMS inside, thickly overlaid with code to present a hierarchial view.

Say my directory is 6 categories deep. In order to find the path to the root category with an RDBMS i'd have to perform 5 or 6 searches just to plot a bread crumb trail. That's ideally what i'm trying to avoid!!!

> I'm looking to build a web directory that can sit on one main server
 with localized replicas in other countries. At the moment I am
 imagining running this off either Linux or OS X boxes.

This much could be done using something like rdist to synchronize snapshots of a master set of pages, or of datasets from which dynamic pages are constructed.

That sounds ideal for localized deployment.

> I have some relational data that is based on the directory. In some
 articles online it claims that LDAP can't handle relational data, or
 that it's simply not tailored for updating data on the fly. Can you
 suggest ways of handling various relational bits of data, for one
 example, hit-counters? Or any data that needs to be updated
 regularly? (None of the data needs to be accurate to the second or is
 time sensitive)

Again, you *could* make a directory look like a pool of flat tables if you're willing to do sufficient violence to the natural modes of directory access. It's probably not something you want to do.

Correct I want to follow those 'natural modes of directory access' you're mentioning. Is LDAP the only protocol to offer this?

 But the real
difficulty lies precisely in things like your example, where the new value
of a field depends on the previous value of that field.  Without a
transaction mechanism, there is no way to guarantee that the value you
fetched is still the current value when you submit an update.

But i'm not specifically worried about that because 90% of the updated data is not time sensitive. I just have to perform updates to values frequently, but these values are not time sensitive.

updates are expected and tolerated in a directory, but intolerable when
doing things such as updating a counter.  Directory services assume that
the sources of all updates are external to the directory, while a DBMS
usually offers extensive features for dealing with updates based on
previous values of the same data.  A directory typically has transactional
mechanisms inside it, but does not expose those mechanisms for us to use
for our own purposes.

 The content should be more than a few million entries and I need fast
 and flexible multiple field searching.

Please specify what you mean by "multiple field searching". If you mean that you want to define a complex relationship among fields which specifies a match set, then a relational language would probably be a much better fit to your needs.

No I would like a single search to look into multiple categories. I would like a search to be performed on Title, Description, Keyword at once. I imagine it's easy through partitioning the search-able data.

> Can anyone suggest whether LDAP is firstly capable of the above and
 secondly the the correct format for holding/searching such
 information in such quantity?

I would not choose a directory to hold the entire set of data for the sort of service I think you envision. But since it's all hidden behind HTTP servers anyway, that is not necessary. You can generate pages which represent a view across several data repositories chosen to fit various subsets of the data. If the hierarchial portion of the data changes slowly, you could keep that portion in a directory and use undisplayed attributes of directory objects to rapidly locate corresponding records in a DBMS. Whether this is a good idea depends on the effort required to maintain consistency among the repositories, i.e. the proportion of changes which affect more than one repository.

One question you should try to answer is:  do you have a greater need for
accurate answers or for rapid response?

90% is rapid response with that 10% being accurate and more time sensitive.

 The underlying assumption for a
distributed directory service is that occasional inconsistency is not a
big problem, and that answers will converge toward correctness fairly
rapidly.  This is okay for looking up telephone numbers but would be
disastrous for bank balances.

This inconsistency margin falls into the same ratio as above. But non of this data is remotely as sensitive as bank balances.

> Can it be easily integrated into other networks/systems?
 What is your suggestion for the ideal middle-ware for use with LDAP?
 Should I use OpenLDAP Server or use the built in LDAPv3 in OS 10.2?
 Know of any examples of LDAP on the web I could look at?
 Is the learning curve steep?

The steepest portion of the learning curve seems to be associated with authentication and privacy. Of course, if your data are public knowledge, then there's no question of authentication or privacy.

I think my real problem is the fact that I am dealing with relational and un-relational data. 90% is un-relational and would benefit from LDAP, 10% is relational (almost like a shopping cart, but the contents aren't that important). The data is intermingled, i.e. I can't display un-relational data (from LDAP) without having to relate to the relational stuff.

However, I think I would be shooting myself in the foot building a directory in an RDBMS. I am hoping that as I get a clearer picture of this development, I should be able to envisage a way of using LDAP and SQL to form something that could work between the two.

So I have the speed and flexibility of LDAP with SQL offering the more human relational level.


// Adam de Zoete
\\ AtoZ@dnet.co.uk