[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: thread pools, performance

Howard Chu wrote:
Yep, that helped. Raising rx-usecs from default 20 to 1000, and rx-frames from
default 5 to 100, I'm getting 43k auths/sec with back-null (in 4 separate
thread pools) and the core fielding the interrupts is only about 80% busy now
instead of 100%. I'm afraid my load generators may be maxed out now, because I
can't seem to drive up the load on the server any higher even though there's
more idle CPU.

The current code in HEAD (with only 1 thread pool) is reaching 36k auths/sec
with back-null, so it's actually not far off from my experimental peak rate.
Considering that HEAD was at 25k/sec last week (and now in 2.4.6) that's
pretty decent.

With back-bdb and 1 million users I'm getting 26.1k/sec with plaintext
passwords (up from 19.3k/sec last week). With {SSHA} passwords that drops to
25.7k/sec (~1.5% difference).

I have to put this tinkering on hold for a bit, to run some authrate tests
against ActiveDirectory on this machine (using W2K3sp2 X64). Later on we'll do
a W2K3 OpenLDAP build for comparison as well. Should be entertaining...

Just for reference, using slapadd with tool-threads set to 4, it took 7:05.17 seconds to load an LDIF with 1 million user objects. These user objects had plaintext passwords. When I later decided to change them to {SSHA} passwords it took 10:12.38 to ldapmodify all of them.

This machine came with a pair of Maxtor 36GB 10k RPM SCSI drives. We added a pair of IBM 146GB 10k RPM SCSI drives. One of the 36GB drives has FedoraCore6 on it. We installed Windows 2003 SP2 Enterprise Edition for x86_64 on the other 36GB drive.

We split the 146GB drives into two partitions each, with each partition occupying half of the drive. The partitions are assigned such that both Windows and Linux get equivalent layouts:

	/dev/sdc1 - NTFS, AD logs
	/dev/sdc2 - XFS, OpenLDAP data

	/dev/sdd1 - XFS, OpenLDAP logs
	/dev/sdd2 - NTFS, AD data

My assumption here is that the transaction log partition will get more frequent activity, and the data partition will just get the occasional flush. So, I chose to place the log partitions on the outer tracks of the drives where they should have higher throughput and lower latency.

Anyway, using Microsoft's ldifde tool to import the same 1 million user LDIF, using 8 threads, took 4:23:46.85 (yes, that's over 4 hours for MS AD vs about 7 minutes for OpenLDAP). By the way, we configured the server as noted in this Microsoft document
http://www.microsoft.com/downloads/details.aspx?FamilyID=52e7c3bd-570a-475c-96e0-316dc821e3e7&DisplayLang=en in Appendix B: Setup Instructions Step 1. That allowed us to import regular inetOrgPerson entries with userPassword attributes and have AD treat them as actual user accounts. (Otherwise we would have had to convert all the entries to use the Microsoft unicodePwd attribute instead.)

Unfortunately, the accounts imported this way were all initially disabled. So we had to ldapmodify their userAccountControl attribute to enable them all before we could proceed with the authentication tests. It took 20:57.017 seconds to ldapmodify all 1 million user records.

Finally we got to running the actual authrate tests, which yielded a peak rate of 4526 auths/second with 40 client threads. The rate declined from there as more clients were added; AD clearly isn't capable of handling very many concurrent sessions. It also appears that most of the CPUs were idle, perhaps 3 out of 8 cores were actually doing any work. I.e., AD doesn't scale well across multiple CPUs.

Unfortunately the native AD server runs as a privileged process and Windows doesn't allow you to alter its processor affinity settings, so there's no way to directly measure how it scales from one core up to eight. But I guess there's really nothing interesting to see here anyway. (For reference, even when restricted to only a single core on this machine, OpenLDAP 2.4.5 handled about 8800 auths/second, coming from even more client threads. And that was before any other tweaks.)

The numbers speak for themselves.

It's enlightening to look at the actual CPU time used during the import tasks. For ldifde on W2K3 we got:

time ldifde.exe -i -f examp3.ldif -h -q 8
261.10u 140.73s 4:23:46.85 2.5%

For slapadd on FC6 we got:

time .slapadd -f slapd.conf.slam -q -l example.ldif.1mil
260.75u 80.86s 7:05.17 80%

One interesting part here is that the amount of user CPU time is nearly identical in both cases. That implies that both slapadd and ldifde are doing about the same amount of work to parse the input LDIF. (For all we know they could be doing *exactly* the same work, using our own code. Or it could just be an interesting coincidence.)

Comparing the rest of the time isn't really fair since it seems that ldifde just feeds data into a running server using LDAP, while slapadd simply writes to the DB directly. I guess for the sake of fairness we'll have to time an OpenLDAP import using ldapadd next.

We'll remove AD and test ADAM next. At least, running as a normal user process, we should be able to tweak its processor affinity so we can plot how it scales with number of cores. Later we'll build a 64 bit OpenLDAP on Windows and see how it fares. My experience with 32 bit Windows has been that slapd runs about as fast on Windows as it does on Linux. But with the silly limits that Windows places on how many sockets a process can have open, (64 IIRC) you really can't subject it to as heavy a load in production use.

At this point I'd have a few choice things to say about Microsoft in general and AD in particular, but I think the numbers speak for themselves.
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/