[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 2.1 & 2.2 statistics, and some odd behavior that needs to be examined.



> As I'm sure most (if not all) of you are aware, I've been performing a 
> number of tests on OpenLDAP 2.1 and 2.2, to see how the products compare to 
> each other, and how different tuning options in 2.2 affect the outcome of 
> the tests.
> They can be seen at:
> 
> <http://www.stanford.edu/~quanah/directories/statistics/>
> The above page should always work, but I restructure the pages underneath 
> it periodically, so don't bookmark them. ;)
> 
> For the most part, 2.2 is a clear winner over 2.1.
> 
> Some general conclusions:
> 
> Btree is a definate win when it comes to running slapadd and slapindex (I 
> think this should be at least a configure option in 2.2.6)
> Memory cache is pretty much essential
> HDB is generally better than BDB (but there are some odd issues with the 
> idlcache I've noted to Howard)
> syncRepl as it currently behaves is not ready for production use (Although 
> I am corresponding with Jong about this regularly, so this may change in 
> the near future).

I have been working on a few bug fixes identified so far by the discussion threads
in the devel / bug lists but lately became short of time. I am also seeking to identify
a few more by examining your traces. Best thanks for the continued helps in locating
problems in the production environment.

> However, there is a serious threading issue in 2.2 when it is used with 
> SASL and a disk-based database cache.
> 
> You can see this looking at Tests on Solaris Servers->Performance Tests on 
> Replica Servers.  All of my servers have the same underlying software 
> packages, so OpenLDAP is the *only* variation on them.  This lets me know 
> that the issue I am seeing must be in OpenLDAP 2.2, or in how OpenLDAP 2.2 
> interfaces with those packages as compared to how 2.1 interfaces with those 
> packages.  The DB_CONFIG parameters are the same across all the systems.
> 
> In 2.1 (using 2.1.24) the system is set up with BDB and a disk based cache. 
> The performance test shows an average rate of 74.856 answers/second using a 
> SASL/GSSAPI authenticated bind using a filter of (uid=<whatever>) returning 
> sumaildrop.  This is using a mixed set of accounts (Some uid's exist and 
> have maildrop, some exist and don't have maildrop, and some don't exist at 
> all).  At the worst, I see a 6 answers/second response rate, and at the 
> best I see a 94 answers/second response rate in the time this test runs. 
> The test has 30 hosts querying the server for this information.  All of 
> querying hosts stay querying throughout the test.
> 
> In 2.2 (using 2.2.5 with btree patch), the system is set up with BDB and a 
> disk based cache (I see the same results using btree or hash indices).  The 
> performance test shows an average rate of 28.5958 answers/second (btree) or 
> 32.7432 answers/second (hash).  This is half of the performance in 2.1!  At 
> the worst, I see 0 answers a second (22-90 instances) and at the best, I 
> see 116 answers/second (1 instance).  What this also doesn't show, is that 
> it is *impossible* to keep all 30 hosts querying the server.  They get 
> GSSAPI errors or "Can't contact LDAP server errors", and drop off.  Once 
> about 6 servers drop off, the rest will stay querying the server, with only 
> occasional dropoffs.
> 
> However, if I do this same setup, except that I use a memory based cache 
> instead of a disk based cache, the performance shoots up to 126 
> answers/second (BDB) to 177 answers/second (HDB).  No hosts die off, and 
> the range is from 17 answers/second (BDB low) to 189 answers/second (HDB 
> high).

Can you detail more on what you are referring to by the disk-based caching ?

> I have a feeling if whatever is causing the problem in the disk cache 
> scenario can be resolved, that the memory cache numbers could shoot even 
> higher.
> 
> Another thing to note is I did the same disk cache test, only doing simple 
> bind (anonymous) instead of SASL/GSSAPI binds.  I had 1 host of 30 drop 
> (Can't contact LDAP Server).  The remaining 29 ran the server at a whopping 
> 222 answers/second (178 low, 266 high).  That is why I finger the threading 
> and disk cache as being part of the issue.
> 
> Any ideas on where I can proceed from here to help identify where the 
> issue(s) are occuring?
> 
> --Quanah
> 
> --
> Quanah Gibson-Mount
> Principal Software Developer
> ITSS/TSS/Computing Systems
> ITSS/TSS/Infrastructure Operations
> Stanford University
> GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
> 
>