[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Persistent failures of test050



On Mon, Jul 01, 2019 at 03:07:15PM +0200, Ondřej Kuzník wrote:
> On Tue, Jun 25, 2019 at 04:45:30PM -0700, Quanah Gibson-Mount wrote:
> > --On Saturday, June 22, 2019 2:06 PM -0700 Quanah Gibson-Mount
> > <quanah@symas.com> wrote:
> > 
> >> [build@freebsd12 ~/git/openldap-2-4/tests/testrun]$ diff -u server1.out
> >> server3.out
> >> --- server1.out 2019-06-22 18:23:54.933600000 +0000
> >> +++ server3.out 2019-06-22 18:23:55.049209000 +0000
> >> @@ -1,3 +1,8 @@
> >> +dn: cn=Add-Mod-Del,dc=example,dc=com
> >> +cn: Add-Mod-Del
> >> +objectClass: organizationalRole
> >> +description: guinea pig
> >> +
> > 
> > There appears to be two separate problems happening in test050.
> 
> Just seen another issue when the wait times are further reduced so as to
> have the syncrepl establishment overlap with write traffic.
> 
> 1. Servers start up and traffic starts coming in towards MMR node 1
> 2. syncrepl session from node 2 with node 1 as the producer is
>    established
> 3. Add/mod/del cycles run on node 1 and are replicated towards node 2
> 4. Node 1 starts to run a syncrepl session towards node 2 (somehow the
>    sid=001 cookie sent is older than the newest modification at the
>    time, but that wouldn't really change things)
> 5. That triggers a present phase and the add is propagated - this then
>    bypasses the sid source checks at the provider and csn checks on the
>    consumer and the entry is actually added

Forgot to mention the consumer part of the above happens here:
https://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=servers/slapd/syncrepl.c;h=28a5e724531ab4e302b4600f0bb8fb883f0de19a;hb=refs/heads/OPENLDAP_REL_ENG_2_4#l3056

> 6. The next add/mod/del cycle starts before the deletion is processed so
>    add fails with LDAP_ALREADY_EXISTS and aborts the test.
> 
> It's probably the consumer CSN checks that need to be run again if we
> don't receive the CSN with the PDU (which is what happens in present
> phase), but that might have to be a '>=' on the contextCSN set rather
> than a strict '>'? Something tells me that we need to deal with present
> phase coming in with several entries with the same CSN.

-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP