[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd lightweight dispatcher



Quanah Gibson-Mount wrote:
I've done a lot of testing in the last several days on a binary with the lightweight dispatcher code enabled (2.3 base), and I just want to say I'm a very big fan of it. It allows slapd to scale under high load at a rate I've not previously seen. I've also played with the multi-conn flag as well, and haven't seen it particularly affect the rates when only it is enabled, but I haven't tried the lightweight dispatcher without it being enabled at the same time.

Some interesting observations from these tests... With the generic 2.3.21 code, throwing 26 SLAMD clients at a SunFire T2000 with a 1GHz processor, we get around 6400 authentications per second on a 1 million entry database. (Which is pretty respectable, really. SunOne only got 6233 for the same configuration) http://blogs.sun.com/roller/page/DirectoryManager/20060110


What's interesting is that we only see about 19% CPU usage on slapd (13% user, 6% kernel), even though there are 32 virtual processors here and 16 worker threads (LWPs) and the database is 100% cached in RAM. Clearly slapd isn't making as good a use of the available CPUs as it could; if it were keeping the threads busy we should see right about 50% CPU usage.

With the lightweight dispatcher and multi-conn array code, we see CPU utilization go up to 45% (20% user, 25% kernel) and throughput goes up to about 8800 authentications per second. With the lightweight dispatcher and the new code with no connections_mutex we get about 8900 authentications per second, and CPU use is at 49% (23% user, 26% kernel).

(We didn't collect numbers for just the lightweight dispatcher by itself yet.) At this point it seems we've eliminated a bottleneck in the frontend, only to find that there's probably a bottleneck in the backend as well. (Otherwise, performance ought to keep improving as we add worker threads, but it doesn't.) I'm puzzled about why we're seeing so much more kernel time vs user time. Hopefully spending some time with dtrace will tell us what's going on there. Ideally we should see a tiny percentage of kernel time, and we should see performance continue to improve as more worker threads are added. (Why stop at just 42% faster than The Other Guys, when we can potentially be 100% faster?)

It's worth noting that this SLAMD authentication job consists of a search for a uid, followed by a Simple Bind on the returned entry's DN, so it's really the equivalent of almost 18,000 searches per second. And since we processed 6400 authentications/second using only 19% of the CPU, we really ought to be able to get 12,800 auths/second at 40%, and so on. (OK, it won't scale perfectly linearly because slapd only has one dispatcher thread, and at some point we will probably saturate the network interfaces on the server, or the gigabit switch. Also, we're struggling to get any more requests out of the clients. I really wish the SLAMD clients were written in C with libldap, this Java stuff is way too slow. It shouldn't take 26 clients to generate this workload; it should take at most 16 clients to keep the 16 server threads busy. Who ever thought that writing performance-critical code in Java was a good idea?)

--
 -- Howard Chu
 Chief Architect, Symas Corp.  http://www.symas.com
 Director, Highland Sun        http://highlandsun.com/hyc
 OpenLDAP Core Team            http://www.openldap.org/project/