[Date Prev][Date Next]
Re: finegrain locking in LDBM (relates to dissection of search latency)
What is the lock hierarchy used to prevent deadlock?
At 02:48 PM 2001-11-14, you wrote:
>I've been looking for places where threads in the slapd server spend a
>lot of time waiting for a lock. I added a debugging routine in
>ldap_pvt_mutex_lock() to tell me if a thread had to wait for another
>thread to release the lock, and if that wait time was longer than, say,
>I found that threads wait most often for the LDBM cache lock
>cache->c_mutex. This protects the linked list and AVL trees of Entry's
>that are in memory instead of reading LDIF off disk and converting them to
>an Entry struct. (The BDB backend does remarkable improvement of this step
>by storing entries on disk as a more evolved species than LDIF).
>So what I tried doing was increasing the granularity of this locking:
>1) Separate the c_mutex lock into 3 locks:
> c_dntree to protect the dn to Entry AVL
> c_idtree to protect the id to Entry AVL
> c_lrulock to protect the LRU linked lists
> This means that when one thread has to write lock the DN AVL to insert an
> Entry, the ID AVL tree can still be used concurrently, and vice versa. It
> also means the LRU can be updated while other threads search on the AVL
>2) Make the DN and ID tree locks reader/writer locks just like the Entry
> locks are (ldbm_entry_info->lei_rdwr). This means that if you have a
> cache that gets a lot of hits, there can be a lot of concurrency of
> threads in search operations reading the AVL btrees to find Entry's.
>3) Leave the LRU lock a mutex since there are few readonly operations on
> the LRU. Also, the LRU code for a cache hit is only a dozen or so machine
> instructions, so it should be fast.
>4) Some slight reordering of operations to work with smaller locks
>I think this code shows a few percentage points speed increase overall for
>the testing that I've done. My testing is a 16 threaded client running for
>about a half hour doing a list of searches. The LDBM entry cache can hold
>all of the working data set of about 40,000 or so entries, so I get a high
>cache hit rate. Sites with different cache size vs data set ratios will see
>Some numbers on performance.
>This mainly helps in cache_find_entry_id(), which is where Entry's are
>fetched from the cache. It is called by id2entry_rw(). A single search
>operation spent about 20-35% of its time in id2entry_rw() (depending on
>how you count the time a thread spends in the yield() when it's done with
>My measurement was how much time does id2entry_rw spend calling
>cache_find_entry_id() with the old code versus with the new fine grain
> 46% of its time with single mutex locking
> 37% of its time with fine grain locking
>I'm now asking if other people have benchmarks that they could try on this
>fine grain locking (Jonghyuk Choi?). This kind of testing is very
>susceptible to the data set used in the test and the type of client used.
>A new back-ldbm/cache.c is available at
>At the top of it is a few gdiff-u lines to tell what additional changes
>lines have to be added to init.c and back-ldbm.h to create and initialize
>the new mutexes.
>If it works out well for other people, I can look at getting it into the
> Carnegie Mellon