Re: Large scale traffic testing

On Sep 4, 2017 9:15 PM, "Tim" <tim@yetanother.net> wrote:

Cheers guys,

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.

https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png

(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.

Thanks in advance

On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder <michael@stroeder.com> wrote:
Tim wrote:
> I've, so far, been making use of home grown python-ldap3 scripts to
> simulate the various kinds of interactions using many parallel synchronous
> requests - but as I scale this up, I'm increasingly aware that it is a very
> different ask to simulate simple synchronous interactions compared to a
> fully optimised multithreaded client with dedicated async/sync channels and
> associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test
pattern and you should simply make your test client multi-threaded.

> I'm currently working with a dataset of in the region of 2,500,000 objects
> and looking to test throughput up to somewhere in the region of 15k/s
> searches alongside 1k/s modification/addition events - which is beyond what
> the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1
encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org
module is a C wrapper module around the OpenLDAP libs and therefore you might get a
better client performance. Nevertheless you should spread your test clients over several
machines to really achieve the needed performance.

Ciao, Michael.

--
Tim
tim@yetanother.net