[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5442) slapd_rq not locked before use bugfix



Rein Tollevik wrote:
> On Sun, 30 Mar 2008, hyc@symas.com wrote:
>
>> rein@tollevik.no wrote:
>>> On Sat, 29 Mar 2008, ando@sys-net.it wrote:
>>>> rein@basefarm.no wrote:
>
>>>>> I was seeing random failures of the test050-syncrepl-multimaster test.  One of
>>>>> the failures was that it went into a tight loop traversing a circular runqueue
>>>>> it had managed to create in slapd_rq.task_list.  It seems as this was caused by
>>>>> missing mutex locks around accesses to slapd_rq, which the patch uploaded to
>>>>> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>>>>>
>>>>> Before I applied this patch the test failed after being run a few times, with it
>>>>> it has now passed 100 times and is still counting.
>>>> locks in back-bdb/config.c should be pointless, as modifications to the
>>>> configuration should only occur while all threads are paused.  The rest
>>>> makes sort of sense, but I'd leave it to Howard.
>> Ignoring the ITS#5403 changes, I don't see anything here that isn't
>> config-related, therefore it's all running single-threaded.
>
> Now that the configuration can be changed dynamically (as this test does)
> I find it a bit odd that the config stuff should always be running
> single-threaded.  But there is obviously much I don't know about the
> internals of slapd.

This was discussed at the 2003 OpenLDAP Developers' Day.
http://www.openldap.org/conf/odd-wien-2003/proceedings.html
Page 7 of my slides touched on it.

It's also mentioned again here
http://www.openldap.org/lists/openldap-devel/200505/msg00062.html

The main point is that the original config stuff assumed that it was only 
executing during slapd startup, and single-threaded. Putting locks around all 
of the potential configurable value accesses would have been too much work, so 
the decision was made to force slapd to be single-threaded whenever writing to 
the config.

I'm guessing that what's happening here is just a broken assumption - 
suspending the thread pool doesn't in fact freeze everything in slapd. In 
particular, the slapd listener thread may still wake up for a select() timeout 
to handle the slapd runqueue. Even though no tasks submitted as a result of 
this will execute (because they'll just get queued into the suspended thread 
pool) the runqueue itself is still being manipulated.

Given that explanation, your patches make sense.
-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/