Re: Issue in syncprov findcsn code

Rein Tollevik wrote:

Well, a serverID of 0 is basically the same as no serverID. For mirrormode/multimaster the serverID must be non-zero. For single-master the serverID must be zero.

This is not how I read the doc nor the source. But if it was like this then it should be what I need :-) To enforce it syncprov must be changed so that:

If serverID is 0 it should only allow one contextCSN value, and it should have 0 in the sid field. Maybe not required to enforce, but it should help to quickly identify incorrectly configured servers.

If serverID is not 0 it should not accept contextCSN values from syncrepl with 0 in the sid field, to make sure it don't receives updates from a single-master configured server.

If serverID is not 0 it must ignore contextCSN values with 0 in the sid field read from the database. This is to allow a single-master server to be promoted to a multi-master without leaving the old sid=0 csn around forever. Hmm, if a csn with sid=0 is found, but none with the serverID value, then it could maybe be better to replace the sid in that csn? More hmm, when starting up it would probably be correct to include entries with 0 in the sid fields of their entryCSN value in those that could cause the current servers contextCSN to be updated? I expect I'm not the only one that forgets to add the -S argument to slapadd...

The serverID in existing mirrormode/multimaster configurations that uses 0 as the value must be changed, but this should be all that is needed when upgrading to this version.

What would be the correct action if a contextCSN with an invalid sid value is received from syncrepl? Asserting it could be a bit too strict, better to ignore the value and complain loudly in the logs?

Does this make any sense? If so, I'll volunteer to implement.

To me, it makes a lot of sense and, well explained in the docs, would greatly help troubleshooting (or even better, set up things the right way right away).

My concerns are:

- do we need to consider all those cases and try to repair them? I'd say: no. Just complain (and refuse to start) if the problem can be solved by running "slapadd -S <SID>" or "slapcat | sed | slapadd".

- the problem should not occur run-time in a homogeneous, well-configured system (== same versions, consistent configuration). If it happens, just give up replication and/or commence a full refresh (agree that assert'ing would be bad).

- slapadd could detect from the configuration whether -S is needed (don't think it could determine the right SID, but at least it could complain, and require a --force (to be implemented) if one retains to know what he's doing).


