[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5470) Sporadic failures with RE24



Raphaël Ouazana-Sustowski wrote:
> Hi,
>
> Le Ven 2 mai 2008 11:01, hyc@symas.com a écrit :
>> luca@OpenLDAP.org wrote:
>>> luca@OpenLDAP.org wrote:
>>>> This is a multi-part message in MIME format.
>>>> --------------080809000906010300090306
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>> Content-Transfer-Encoding: 7bit
>>>>
>>>> Howard Chu wrote:
>>>>
>>>>> Thanks. Please try HEAD again.
>>>>>
>>>> No way.
>>>> new testrun directory in
>>>> ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz
>>>>
>>>> backtrace attached
>>>>
>>> recent commits seem to have fixed it (at least, right now I'm not able
>>> to reproduce it anymore...)
>> Right. Confirmed here too; I (temporarily) added an assert(0) to the
>> offending
>> branch of code to make sure the patch was actually getting hit. It takes a
>> very particular timing to trigger that code path.
>>
>> I'm not sure how we can reliably test for this down the road. Perhaps we
>> should add a "disabled" config keyword for backends and syncrepl
>> consumers, so
>> that we can start up the individual servers, (which takes an unpredictable
>> amount of time for each) and then enable various parts in a fixed sequence
>> (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit
>> or
>> miss, because our test database is so small it's unlikely that we can hit
>> the
>> window of time on demand.
>
> I'm testing the last RE24 tag. After 201 successful runs of test050, I got
> a failure :/
> Cleaning up test run directory leftover from previous run.
> Running ./scripts/test050-syncrepl-multimaster...
> running defines.sh
> Initializing server configurations...
> Starting producer slapd on TCP/IP port 9011...
> Using ldapsearch to check that producer slapd is running...
> Inserting syncprov overlay on producer...
> Starting consumer slapd on TCP/IP port 9012...
> Using ldapsearch to check that consumer slapd is running...
> Configuring syncrepl on consumer...
> Starting consumer2 slapd on TCP/IP port 9013...
> Using ldapsearch to check that consumer2 slapd is running...
> Configuring syncrepl on consumer2...
> Adding schema and databases on producer...
> Using ldapadd to populate producer...
> Waiting 20 seconds for syncrepl to receive changes...
> Using ldapadd to populate consumer...
> Waiting 20 seconds for syncrepl to receive changes...
> Using ldapsearch to check that syncrepl received database changes...
> Waiting 5 seconds for syncrepl to receive changes...
> Waiting 5 seconds for syncrepl to receive changes...
> Waiting 5 seconds for syncrepl to receive changes...
> Waiting 5 seconds for syncrepl to receive changes...
> Waiting 5 seconds for syncrepl to receive changes...
> Waiting 5 seconds for syncrepl to receive changes...
> ldapsearch failed (32)!
>
> testrun uploaded in
> ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-080505.tgz

The logs show that the syncrepl consumers all timed out periodically, when 
trying to bind to a provider. It seems that using a 1 second timeout in the 
syncrepl configs is too short, or your test machine was too slow during that run.

Probably we should remove that timeout now, since the cn=config/thread pause 
issue has already been resolved.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/