Re: Sync replication failure during startup.

On Fri, 2007-09-28 at 17:02 -0700, Howard Chu wrote:
> Stelios Grigoriadis wrote:
> > I have upgraded openldap to latest stable version (2.3.38) and
> > used Berkeley DB version 4.5.20. The problem remains. I realize
> > my analisys wasn't correct since, as Howard Chu pointed out, the
> > CSN contains both a timestamp and a counter. So the entryCSN:s 
> > are unique.
> > 
> > But, the problem remains and I have no idea why this happens.
> > I somehow still suspect that the problem still is in the initial
> > phase of the sync operation (refresh stage). It might be that,
> > some of the not-yet committed modifications don't make it into
> > the result set in the search operation. Later after another entry
> > is added, the "lost" entries are never to be synced over.
> This also cannot be the cause. The contextCSN is snapshotted at the beginning of a 
> refresh. Only updates between the consumer's cookie CSN and the snapshot CSN are 
> sent to the consumer. Any entries added during this refresh will be excluded from 
> the update, and the consumer will then record the snapshot CSN. Any entries the 
> consumer didn't pick up in this refresh pass will be picked up in the next refresh.

I agree with you, I just didn't see the "next refresh" in the code. I
thought it refreshed only once and then the master would write back all
subsequent changes (syncprov_op_response -> syncprov_qstart etc.)

> > I will test some more and try to provide more information. I have
> > a test program that generates this problem but it is a little
> > cumbersome. I will try to slim it down and use more common schema
> > elements before posting it.
> That will certainly help.

The setup to reproduce the error is as follows: 1 master, 3 replicas.

1. Start the replicas.
2. Start the program that adds persons (parallell_stress_simple.sh).
   Actually a script that starts a number of processes (add_person.c)
   on different machines that add persons.
3. Start the master.
4. When the script completes, compare the number of added entries in
   the master and replicas.

To Quanah Gibson-Mount: The slapd.conf i also provided.


