[Date Prev][Date Next] [Chronological] [Thread] [Top]

CSN of delete operations



The bdb/hdb and ldif backends assigns CSNs to delete operations that lacks it, which causes problems in forwarding replication configurations. During the refresh phase there may be legitimate delete operations that should not have any CSN. When the forwarder adds its CSN it might leave the forwarded and its consumers with a CSN set that includes a SID not present on the provider, and they will never be able to resync.

syncrepl_del_nonpresent() queues the minimum CSN received from the provider, which partly obscures this problem but in return introduce other :-( The CSN set received may include updates to more than one CSN, and only one if these can be added on the queue. Much worse, the first delete will commit the queued CSN. If there are more than one entry that should be deleted this leaves an open window where the forwarder (and its consumers) have an apparently up-to-date CSN set without actually being in sync with the provider. Running the new test061 with sync debugging shows traces of these problem in the logs.

In back-bdb/delete.c, the CSN of the delete operation appear to be added as a value in the entryCSN index, which really puzzles me. If that index is to be modified I would expect that it should delete the entryCSN value of the entry being deleted, not to add anything. Why this is only done in non-shadowed databases I cannot tell either.

I would fix these problems by assigning the CSN of delete operations in the frontend, i.e on the server where ordinary delete operations where done. syncrepl_del_nonpresent() should not queue the CSN, updating it should be left to the syncrepl_updateCookie() call which takes place when the refresh phase completes. But what to do about the index manipulation I cannot tell. Anyone?

Rein