[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: HDB physical files separate from DIT

We were simply evaluating our options for openldap redesign (read
upgrades).  Here's a reply from one of our engineers:

Thank you Howard for your reply. I would like to know how many BDB
files you had to achieve this great performance with 150 million

My request is to collapse the DIT structure for our customers. I need
a single "ou=people" container and be able to accommodate all customer
entries into this logical context. This container has to be able to
grow. It has to be able to grow to many, many millions. Currently we
are thinking of a number in between 20MM to 50MM. However, because of
scale limitations in the past we had to split out people container
into 14 sub-OU's which are named as cities.

A commercial LDAP vendor is highlighting one of his features that
would allow us to have an arbitrary amount of physical files belonging
to the same OU. We would be able to reduce the physical file size and
retrieve performance gains through shorter searches, smaller indices,
and whatever else benefits from small files.  It would greatly reduce
our administrative cost/burden and reduce some costly moving of
people/entries in our environment.  Also a lot of our LDAP-Depending
applications could be simplified. There is no need for our business to
know where a customer is coming from. We have no value of that
information to us, but we are unable to get rid of it.

To expand a little bit, splitting the tree allowed us to keep database
files small and recoverable from another source, and we could split
the database accross different disks  for I/O gains.  Although more
intelligent indexing could probably help a lot in these respects.


On 1/4/06, Howard Chu <hyc@symas.com> wrote:
> matthew sporleder wrote:
> > I'm trying to figure out if I can abstract a database's logical layout
> > (DIT) from being bound to specific files per 'database' definition,
> > and I'm not seeing any good tips in the berkeley db tuning docs.
> >
> > For example:
> >
> > I have ou=region1,dc=example,dc=com and ou=region2,dc=exmaple,dc=com.
> > Right now the only options I see of separating these are to define
> > them in different 'database' sections.  I would, however, like to have
> > them both defined in one database, but allow the actual database files
> > (dn2id, etc) to be split in terms of size, or other definables.
> > (usage stats, whatever)
> >
> > Am I missing something obvious in DB_CONFIG like "max_file_size"?
> >
> No, there's no such feature. Nor does it sound like it would be useful,
> given what little you've described so far. Even if you allowed a
> particular DB file to be split, all of the files would still occupy
> space in the single BDB environment cache. In fact, since each DB handle
> also consumes cache space, splitting files would consume more resources
> than otherwise. Given that we've benchmarked a directory with 150
> million entries consuming about a terabyte of disk space, using the
> current back-bdb code, getting tens of thousands of operations per
> second throughput, I don't see any particular reason to bother with
> splitting the files. Perhaps if you explained what real problem you're
> trying to solve, it might make a bit more sense.
> --
>   -- Howard Chu
>   Chief Architect, Symas Corp.  http://www.symas.com
>   Director, Highland Sun        http://highlandsun.com/hyc
>   OpenLDAP Core Team            http://www.openldap.org/project/