[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: OpenLDAP system architecture?



On Thu, 2008-01-24 at 15:08 -0800, Howard Chu wrote:

> In my experience, 4 million objects (at around 3KB per entry) is near the 
> limit of what will fit into 16GB of RAM. Sounds like you need a server with 
> more than 16GB if you want to keep growing and not be waiting on disks.

I was going through the caching discussion in section 19.4 at
<http://www.openldap.org/doc/admin24/tuning.html#Performance%20Factors>,
which talks about how much RAM to devote to what type of cache, but it
had not occurred to me to try to just throw the whole thing (of this
size) into RAM and be done with it.

> The single-master constraints on OpenLDAP were never about performance. Even 
> with OpenLDAP 2.2 the concurrent read/write rates for back-bdb are faster than 
> any other directory server. It's always been about data consistency, and the 
> fact that it's so easy to lose it in a multi-master setup.

I wasn't concerned so much about single-master versus multi-master.  I
was thinking more about the issue where a very read-intensive
environment did not mix well with an environment where a lot of writes
were also occurring directly to the system, but where bulk writes from
slurpd (at the time) would be handled relatively efficiently while also
handling high read intensity.

Thus the reason for ensuring that all updates went directly and only to
the master, the master was replicated out to the slaves via slurpd, and
the slaves only handled reads.


Of course, the modern architecture doesn't use slurpd, I was just
wondering if it might make more sense from a scalability perspective to
have a similar data flow architecture.

> You've been brainwashed by all the marketing lies other LDAP vendors tell 
> about multi-master replication. Multi-master has no relation to performance.

Again, I wasn't looking at single-master versus multi-master.  If I gave
that impression, I'm sorry.
 
> It's only about fault tolerance and high availability. No matter whether you 
> choose a single-master or a multi-master setup, with the same number of 
> machines, the same number of writes must be propagated to all servers, so the 
> overall performance will be the same.

I'm confused.

So there's no performance benefit to doing bulk writes via syncrepl to
the slaves as opposed to individual writes to the master(s) via ldapadd?
Then why have syncrepl at all and instead just have everything handled
by ldapadd?

I understand the consistency argument for single-master versus
multi-master, I'm just trying to find a way to partition the problem
space for performance reasons, in addition to any consistency reasons.

> That's a pointless question. The right question is - how fast do you need it 
> to be? What load are you experiencing now, what constitutes a noticeable 
> delay, and how often do you see those?

Good questions, but I'm not sure I've got the answers.  I know that our
OpenLDAP directory system is going to be used as a critical component of
a campus-wide authentication system, and the target for the
authentication system is to handle at least hundreds of authentications
per second.  Problem is, I don't know what that translates to in terms
of numbers of OpenLDAP operations per second, and what the mix of reads
are relative to the writes.

And the authentication system is just one of the many consumers of data
from the OpenLDAP system.


So, at the very least, I would be surprised if the OpenLDAP system
didn't have to handle at least thousands of read operations per second,
and at peak it may also have to handle thousands of write operations per
second -- a single student might potentially have dozens or a hundred or
more data elements to be written or updated, and then they may be
registering for a half-dozen classes or more at once, and each class
might have hundreds of data elements that might also need to be updated.
The domino effect of a single high-level entity being added or modified
could potentially result in hundreds or thousands of smaller operations
that need to be done as a result.


But right now, I'm just guessing.

I haven't actually seen the systems yet, and I don't know what the
schemas look like, so I can only speak from my limited experience with
OpenLDAP in the past and places where relatively simple uses could
result in dozens of data elements for a single entity, and I don't know
how OpenLDAP handles those kinds of things internally.

> > Is CPU more important, or RAM, or disk space/latency?
> 
> If you have enough RAM, disk latency shouldn't be a problem. Disk space is so 
> cheap today that it should never be a problem. CPU, well, that depends on your 
> performance target.

I'm not so worried about disk space per se.  I would be more concerned
about disk latency and throughput being potential bottlenecks.

> Generally I like the idea of having compact/simple slapd configs spread all 
> over. With the old slapd.conf that would have been rather painful to 
> administer though. Also in general, more moving parts means more things that 
> can break.

Much appreciated.  Thanks!

-- 
Brad Knowles <b.knowles@its.utexas.edu>
Sr. System Administrator, UT Austin ITS-Unix
COM 24 | 5-9342