[Date Prev][Date Next]
Thread pool efficiency
- To: OpenLDAP Devel <email@example.com>
- Subject: Thread pool efficiency
- From: Howard Chu <firstname.lastname@example.org>
- Date: Thu, 07 Feb 2008 11:13:33 -0800
- User-agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.9b3pre) Gecko/2008013117 SeaMonkey/2.0a1pre
Testing on an 8-socket AMD server with Opteron 885 dual-core processors (16
cores total) and a Sun T5120 (T2 Niagara 8 cores, 64 hardware threads) has
shown that our current frontend code is performing very poorly with more than
16 server threads.
E.g. on the AMD system with 16 cores allocated, performance was still slower
than on the 4-socket AMD server with Opteron 875 dual-core processors (despite
2x the cores and a significant clock-speed advantage). Testing also showed
that in this configuration, at least one of the 16 cores was always 100% idle.
Basically, the frontend cannot hand out work fast enough to the worker threads.
Rather than using a single mutex to control all accesses into the thread pool,
I think we need to have separate queues per worker thread. The frontend can
operate in single-producer mode where only the single listener thread is
allowed to submit jobs into the pool. The workers can just access their own
individual work queues, thus significantly reducing mutex contention.
Ideally we would arrange things such that any data structure is only ever
written by a single thread, and all other threads only perform reads against
it. (And in the best case, only one other thread needs to perform that read.)
By eliminating memory ownership changes and unnecessary cache line sharing, we
can dramatically reduce the cache coherency traffic.
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/