Issue 4839 - syncrepl-related tests fail randomly
Summary: syncrepl-related tests fail randomly
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-10 09:51 UTC by ando@openldap.org
Modified: 2014-08-01 21:05 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description ando@openldap.org 2007-02-10 09:51:23 UTC
Full_Name: Pierangelo Masarati
Version: HEAD
OS: irrelevant?
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (81.72.89.40)
Submitted by: ando


I'm seeing occasional failures of syncrepl-related tests in HEAD.  In case of
test043, results differend once, but after repeating the test everything went
smoth.

In some cases test045 resulted in a core dump:

(gdb) bt
#0  0x00418060 in memchr () from /lib/tls/libc.so.6
#1  0x080f5eef in slap_parse_csn_sid (csn=0x87b6798) at ldapsync.c:128
#2  0x080f5fa1 in slap_parse_csn_sids (csns=0x87b6778, numcsns=8)
    at ldapsync.c:148
#3  0x081ce0c6 in syncprov_db_open (be=0x8770bd0) at syncprov.c:2544
#4  0x080f31de in over_db_func (be=0x8770bd0, which=db_open) at backover.c:61
#5  0x080f35dd in over_db_open (be=0x8770bd0) at backover.c:173
#6  0x0808e1d1 in backend_startup_one (be=0x8770bd0) at backend.c:212
#7  0x0808e65e in backend_startup (be=0x8770bd0) at backend.c:303
#8  0x080b578a in slap_startup (be=0x0) at init.c:248
#9  0x08062ee1 in main (argc=8, argv=0xbffa3554) at main.c:919

In this case, numcsns is 8 because syncprov incorrectly tested for
BER_BVISEMPTY() instead of BER_BVISNULL when creating the csn array:

(gdb) p a->a_vals[0]@8
$2 = {{
    bv_len = 40, 
    bv_val = 0x87deb60 "20070210093621.965471Z#000000#000#000000"
  }, {
    bv_len = 7566177, 
    bv_val = 0x0
  }, {
    bv_len = 24, 
    bv_val = 0x11 ""
  }, {
    bv_len = 1820143995, 
    bv_val = 0x706164
"�s����F���\211L$\004\213E\b\211\004$�\037���\205�\211E�\017\204\r����\033���=�\206��tb\213M�\213\001�@\030\002\215v"
  }, {
    bv_len = 541392999, 
    bv_val = 0x21 ""
  }, {
    bv_len = 142432232, 
    bv_val = 0x65 ""
  }, {
    bv_len = 142470176, 
    bv_val = 0x0
  }, {
    bv_len = 142305208, 
    bv_val = 0x0
  }}

Fixing that still results in a test failure, but without core dump:

bash-3.00$ SLAPD_DEBUG=stats ./run test045
Cleaning up test run directory leftover from previous run.
Running ./scripts/test045-syncreplication-proxied...
running defines.sh
Starting master slapd on TCP/IP port 9011...
Using ldapsearch to check that master slapd is running...
Using ldapadd to create the context prefix entry in the master...
Starting slave slapd on TCP/IP port 9012...
Using ldapsearch to check that slave slapd is running...
Starting proxy slapd on TCP/IP port 9013...
Using ldapsearch to check that proxy slapd is running...
1 > Using ldapadd to populate the master directory...
Waiting 15 seconds for syncrepl to receive changes...
1 < Comparing retrieved entries from master and slave...
2 > Stopping the provider, sleeping 10 seconds and restarting it...
Using ldapsearch to check that master slapd is running...
Using ldapmodify to modify master directory...
Waiting 15 seconds for syncrepl to receive changes...
2 < Comparing retrieved entries from master and slave...
test failed - master and slave databases differ

apparently, none of the modifications made it into the consumer.

After another try, things just go smooth again... keep testing.

p.

Comment 1 ando@openldap.org 2007-02-10 09:55:26 UTC
changed notes
moved from Incoming to Development
Comment 2 Howard Chu 2007-02-11 11:20:50 UTC
ando@sys-net.it wrote:
> Full_Name: Pierangelo Masarati
> Version: HEAD
> OS: irrelevant?
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (81.72.89.40)
> Submitted by: ando
> 
> 
> I'm seeing occasional failures of syncrepl-related tests in HEAD.  In case of
> test043, results differend once, but after repeating the test everything went
> smoth.
> 
> In some cases test045 resulted in a core dump:

> In this case, numcsns is 8 because syncprov incorrectly tested for
> BER_BVISEMPTY() instead of BER_BVISNULL when creating the csn array:

Good catch.

> Fixing that still results in a test failure, but without core dump:

> apparently, none of the modifications made it into the consumer.

> After another try, things just go smooth again... keep testing.

We need to change these fixed delays for replication into explicit checks 
like you did for test049. Otherwise it will take too long to loop the tests 
enough times to trigger an error...

-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Comment 3 ando@openldap.org 2007-02-11 21:20:00 UTC
changed notes
Comment 4 Hallvard Furuseth 2007-06-21 21:52:38 UTC
ando@sys-net.it writes:
> I'm seeing occasional failures of syncrepl-related tests in HEAD.  In
> case of test043, results differend once, but after repeating the test
> everything went smoth.

Six HEAD/test043-delta-syncrepl failures in a row.  Producer and
consumer databases differ.  RedHat Linux, i686.  Logs etc from testrun:
ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-070621.tgz

-- 
Regards,
Hallvard

Comment 5 Howard Chu 2007-06-22 01:37:45 UTC
h.b.furuseth@usit.uio.no wrote:
> ando@sys-net.it writes:
>> I'm seeing occasional failures of syncrepl-related tests in HEAD.  In
>> case of test043, results differend once, but after repeating the test
>> everything went smoth.
> 
> Six HEAD/test043-delta-syncrepl failures in a row.  Producer and
> consumer databases differ.  RedHat Linux, i686.  Logs etc from testrun:
> ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-070621.tgz
> 
Strange, I saw some failures last week but with current HEAD it just works.

-- 
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/

Comment 6 Luca Scamoni 2007-06-22 15:19:52 UTC

hyc@symas.com wrote:
> h.b.furuseth@usit.uio.no wrote:
>   
>> ando@sys-net.it writes:
>>     
>>> I'm seeing occasional failures of syncrepl-related tests in HEAD.  In
>>> case of test043, results differend once, but after repeating the test
>>> everything went smoth.
>>>       
>> Six HEAD/test043-delta-syncrepl failures in a row.  Producer and
>> consumer databases differ.  RedHat Linux, i686.  Logs etc from testrun:
>> ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-070621.tgz
>>
>>     
> Strange, I saw some failures last week but with current HEAD it just works.
>
>   
I just uploaded my test files

ftp://ftp.openldap.org/incoming/Luca_Scamoni-070622.tgz

The difference here is also in the modifyTimestamp and modifiersName...





Comment 7 Luca Scamoni 2007-06-25 14:49:43 UTC
hyc@symas.com wrote:
> h.b.furuseth@usit.uio.no wrote:
>   
>> ando@sys-net.it writes:
>>     
>>> I'm seeing occasional failures of syncrepl-related tests in HEAD.  In
>>> case of test043, results differend once, but after repeating the test
>>> everything went smoth.
>>>       
>> Six HEAD/test043-delta-syncrepl failures in a row.  Producer and
>> consumer databases differ.  RedHat Linux, i686.  Logs etc from testrun:
>> ftp://ftp.openldap.org/incoming/Hallvard-Furuseth-070621.tgz
>>
>>     
> Strange, I saw some failures last week but with current HEAD it just works.
>   
Nevermind. I just checked out a fresh copy from CVS and it works just 
fine. Maybe something fancy with some modifications I was testing. I'll 
have to check.



Comment 8 ando@openldap.org 2007-08-14 14:00:56 UTC
changed notes
Comment 9 ando@openldap.org 2007-08-14 14:01:32 UTC
changed notes
changed state Open to Closed
Comment 10 Howard Chu 2009-02-17 06:56:05 UTC
moved from Development to Archive.Development
Comment 11 OpenLDAP project 2014-08-01 21:05:25 UTC
crash fixed in HEAD (HEAD only)
replication issues seem to persist (tests?)