[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Thread pool efficiency
- To: OpenLDAP Devel <openldap-devel@openldap.org>
- Subject: Thread pool efficiency
- From: Howard Chu <hyc@symas.com>
- Date: Thu, 07 Feb 2008 11:13:33 -0800
- User-agent: Mozilla/5.0 (X11; U; Linux i686;	rv:1.9b3pre) Gecko/2008013117 SeaMonkey/2.0a1pre
Testing on an 8-socket AMD server with Opteron 885 dual-core processors (16 
cores total) and a Sun T5120 (T2 Niagara 8 cores, 64 hardware threads) has 
shown that our current frontend code is performing very poorly with more than 
16 server threads.
E.g. on the AMD system with 16 cores allocated, performance was still slower 
than on the 4-socket AMD server with Opteron 875 dual-core processors (despite 
2x the cores and a significant clock-speed advantage). Testing also showed 
that in this configuration, at least one of the 16 cores was always 100% idle. 
Basically, the frontend cannot hand out work fast enough to the worker threads.
Rather than using a single mutex to control all accesses into the thread pool, 
I think we need to have separate queues per worker thread. The frontend can 
operate in single-producer mode where only the single listener thread is 
allowed to submit jobs into the pool. The workers can just access their own 
individual work queues, thus significantly reducing mutex contention.
Ideally we would arrange things such that any data structure is only ever 
written by a single thread, and all other threads only perform reads against 
it. (And in the best case, only one other thread needs to perform that read.) 
By eliminating memory ownership changes and unnecessary cache line sharing, we 
can dramatically reduce the cache coherency traffic.
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP     http://www.openldap.org/project/