[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5661) contextCSN gets corrupted on the stand by mirror



Full_Name: Ali Pouya
Version: 2.4.11
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (145.242.11.4)


I think there is a documentation issue for OpenLdap 2.4.11 :
The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
directives for each mirror side. If I do so, the contextCSN of the stand by
mirror gets  corrupted very easily. But if I confugure the mirrors with only ONE
syncrepl directive it's OK.

The test environment :
I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as
recommended in the Admin's Guide, and a replica C connected to A.
The directory contains 10 million objects, and I use the server A for writing
500 000 new ones. 

Very often and without any apparent reason the contextCSN in the memory of B
gets suddenly corrupted while those of A and C are OK.
In this situation the contextCSN of B gets stuck but B continues to receive data
from A.

The value of contextCSN in base 64 is  :

contextCSN: 20080727021429.070493Z#000000#000#000000
contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==

I note that only the part indicating the year (2008) is garbled. May be this
part is handled differently ?

At service shutdown B writes the corrupt contextCSN to the disk.
At service startup B reads the corrupt contextCSN from the disk and begins to
scan ALL of the data base.

Also it sends a sync request to A (a persitent search containing the corrupt
contextCSN in the control field) causing A to scan the WHOLE data base.
The replica C remains safe.

If I reverse the roles of A and B the corruption occurs on A (always on the
stand by mirror).

I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and
this was one of my reasons to migrate to 2.4.11.

Thanks for your HELP
Best Regards
Ali Pouya