[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: performance issue behind a a load balancer 2.3.32



On Tue, Jul 21, 2009 at 01:54:25PM -0700, Quanah Gibson-Mount wrote:
> --On Tuesday, July 21, 2009 4:51 PM -0400 "Clowser, Jeff" <jeff_clowser@fanniemae.com> wrote:
> > Do you have any facts/numbers to back this up?  I've never seen F5's
> > slow things down noticably.
>
> We've had F5's be the root of the problem with several clients who load  
> balanced their LDAP servers, and pointed postfix at the F5 for delivery.  
> They added just a few milliseconds of time to each LDAP query, but that 
> was enough to completely back up their mail delivery system.  Removing 
> the F5 from the picture allowed mail to flow smoothly, no more problems.

I can't speak for any other clients that Quanah may be referencing, but we
experienced this with our Zimbra deployment. However, I emphatically
disagree with his stance against running LDAP services behind a hardware
load balancer.

We have F5 BigIPs in front of nearly every service we provide, for the
reasons cited by others. In the past, we've had load balancers from Cisco
(CSS), and Alteon (ACEdirector, IIRC, and now owned by Nortel) and our
BigIPs have been the most transparent and have worked the best.

That said, we did encounter throughput problems with Zimbra's Postfix MTAs
due to BigIP configuration. When incoming mail volume started to ramp up for
the day, Postfix's queue size would slowly build. We ruled out (host) CPU
consumption, disk I/O load, syslogging bottlenecks, and a host of other
usual and unusual suspects on the hosts themselves.

I'm not sure if Quanah heard the final resolution, which was to change the
LDAP VIP type from Standard to "Performance (Layer 4)." This solved the
problem immediately. I didn't see the final response from F5, but my
impression was that Performance (Layer 4) bypasses a lot of the hooks that
let you manipulate packets and connections. Interestingly, CPU consumption
on our BigIPs was low and therefore didn't prompt us to troubleshoot from
that angle. This was the first we've seen this behavior; our non-Zimbra
OpenLDAP nodes have a higher operation rate (~12k operations/sec aggregate)
and had been servicing a similar mail infrastructure before we started
moving to Zimbra's software.

On Tue, Jul 21, 2009 at 05:56:48AM -0700, David J. Andruczyk wrote:
> We had tried experimenting with a higher number of threads previously,
> but that didn't seem to have a positive effect.  Can any openLDAP guru's
> suggest some things to set/look for, i.e. (higher number of threads,
> higher defaults for conn_max_pending, conn_max_pending_auth).
> 
> Any ideas on what a theoretical performance limit should be of a machine
> of this caliber? i.e. how many reqs/sec, how far will it scale, etc..

Sounds like you're doing NAT on inbound connections (so connections offered
to your LDAP nodes are sourced from the BigIP), so I'm not sure if this
alternate VIP type would preclude doing that. If you have OneConnect
enabled, you might try disabling that, too. I generally see it used with
HTTP, but perhaps it's usable with other protocols?

AFAICT, increasing conn_max_pending_auth shouldn't be helpful unless your
application(s) are doing a lot of asynchronous operations simultaneously
(i.e., submit many LDAP operations at once and have them pending
simultaneously). If they're primarily submitting an operation and waiting
for a response, lather rinse repeat, I don't see how a connection could
accumulate pending operations.

As far as scalability, I see no reason OpenLDAP shouldn't scale reasonably
to the limits of your hardware (CPU consumption and disk I/O). It bodes well
for your OpenLDAP build, tuning, etc. that it can handle your current
workload when using round-robin DNS. What kind of LDAP ops/sec are these
machines taking?

john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< \_          /  \       ----  <  \,
www.horde.net/    __(_)/_(_)________/    \_______(_) /_(_)__