[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: Persistent failures of test050
On Mon, Jul 01, 2019 at 03:07:15PM +0200, Ondřej Kuzník wrote:
> On Tue, Jun 25, 2019 at 04:45:30PM -0700, Quanah Gibson-Mount wrote:
> > --On Saturday, June 22, 2019 2:06 PM -0700 Quanah Gibson-Mount
> > <quanah@symas.com> wrote:
> >
> >> [build@freebsd12 ~/git/openldap-2-4/tests/testrun]$ diff -u server1.out
> >> server3.out
> >> --- server1.out 2019-06-22 18:23:54.933600000 +0000
> >> +++ server3.out 2019-06-22 18:23:55.049209000 +0000
> >> @@ -1,3 +1,8 @@
> >> +dn: cn=Add-Mod-Del,dc=example,dc=com
> >> +cn: Add-Mod-Del
> >> +objectClass: organizationalRole
> >> +description: guinea pig
> >> +
> >
> > There appears to be two separate problems happening in test050.
>
> Just seen another issue when the wait times are further reduced so as to
> have the syncrepl establishment overlap with write traffic.
>
> 1. Servers start up and traffic starts coming in towards MMR node 1
> 2. syncrepl session from node 2 with node 1 as the producer is
> established
> 3. Add/mod/del cycles run on node 1 and are replicated towards node 2
> 4. Node 1 starts to run a syncrepl session towards node 2 (somehow the
> sid=001 cookie sent is older than the newest modification at the
> time, but that wouldn't really change things)
> 5. That triggers a present phase and the add is propagated - this then
> bypasses the sid source checks at the provider and csn checks on the
> consumer and the entry is actually added
Forgot to mention the consumer part of the above happens here:
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/syncrepl.c;h=28a5e724531ab4e302b4600f0bb8fb883f0de19a;hb=refs/heads/OPENLDAP_REL_ENG_2_4#l3056
> 6. The next add/mod/del cycle starts before the deletion is processed so
> add fails with LDAP_ALREADY_EXISTS and aborts the test.
>
> It's probably the consumer CSN checks that need to be run again if we
> don't receive the CSN with the PDU (which is what happens in present
> phase), but that might have to be a '>=' on the contextCSN set rather
> than a strict '>'? Something tells me that we need to deal with present
> phase coming in with several entries with the same CSN.
--
Ondřej Kuzník
Senior Software Engineer
Symas Corporation http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP