[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5442) slapd_rq not locked before use bugfix

On Sun, 30 Mar 2008, hyc@symas.com wrote:

> rein@tollevik.no wrote:
>> On Sat, 29 Mar 2008, ando@sys-net.it wrote:
>>> rein@basefarm.no wrote:

>>>> I was seeing random failures of the test050-syncrepl-multimaster test.  One of
>>>> the failures was that it went into a tight loop traversing a circular runqueue
>>>> it had managed to create in slapd_rq.task_list.  It seems as this was caused by
>>>> missing mutex locks around accesses to slapd_rq, which the patch uploaded to
>>>> ftp://ftp.openldap.org/incoming/slapd_rq_lock.patch fixes.
>>>> Before I applied this patch the test failed after being run a few times, with it
>>>> it has now passed 100 times and is still counting.
>>> locks in back-bdb/config.c should be pointless, as modifications to the
>>> configuration should only occur while all threads are paused.  The rest
>>> makes sort of sense, but I'd leave it to Howard.
> Ignoring the ITS#5403 changes, I don't see anything here that isn't
> config-related, therefore it's all running single-threaded.

Now that the configuration can be changed dynamically (as this test does) 
I find it a bit odd that the config stuff should always be running 
single-threaded.  But there is obviously much I don't know about the 
internals of slapd.

> Of the "relevant" changes in syncrepl.c, I note that three out of the four
> chunks of the patch are in code that is only run when using cn=config to
> delete a syncrepl configuration, and test050 never performs that operation.
> The remaining chunk only takes affect when adding syncrepl config, and again,
> slapd is single-threaded for that.

I have been experimenting a bit more with the patch, and it appears to be 
the locks added to add_syncrepl() that stopped the test from failing. 
The test uses ordinary ldapadd to add the syncrepl configuration after 
slapd has been started, which I would expect to indicate that it has 
entered the multi-threaded state.  But again, it can revert to 
single-threaded when modifying the configuration for all I know.

> I've also run test050 thru hundreds of iterations without any issue, without
> these patches. If there's a problem in test050, I don't believe it's in this code.

I have also run it several hundred times on other systems, it is only on 
one of the 4 systems I build slapd that it fails.  That is, that test
failed on one of the other systems with the 2.4.8 release, also with an 
infinite loop, but that dosn't say vary much in this context.

The system where the test fail is the only multi-cpu system I have run it 
on, which I recon to be the reason why it only fails there.  Without this 
patch the test fails about 20% of the times it is run on that system, the 
reason varies between slapd being killed due to abort or seg. fault 
(unknown why), or infinite loop traversing the circular slapd_rq.  With 
the patch it has succeeded several thousand times in a row, which I 
account for more than just a coincident.

So, as long as this patch doesn't create any deadlock I can't really see 
the problem with it and will stick to it in our version.  I find it a bit 
easier to understand and verify code that requiers locking if the locks 
are always used, then I don't need to know the thread states that are 
allowed to call the functions as well to know whether a lock is needed or