[Date Prev][Date Next]
Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
> Full_Name: Ali Pouya
> Version: 2.4.11
> OS: Linux 2.6
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (188.8.131.52)
> I think there is a documentation issue for OpenLdap 2.4.11 :
> The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
> directives for each mirror side. If I do so, the contextCSN of the stand by
> mirror gets corrupted very easily. But if I confugure the mirrors with only ONE
> syncrepl directive it's OK.
> The test environment :
> I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as
> recommended in the Admin's Guide, and a replica C connected to A.
> The directory contains 10 million objects, and I use the server A for writing
> 500 000 new ones.
> Very often and without any apparent reason the contextCSN in the memory of B
> gets suddenly corrupted while those of A and C are OK.
> In this situation the contextCSN of B gets stuck but B continues to receive data
> from A.
> The value of contextCSN in base 64 is :
> contextCSN: 20080727021429.070493Z#000000#000#000000
> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
which looks like
4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
I note that, according to the sid values you assigned to servers A and
B, the first contextCSN should not appear, since it has sid == 0, while
the second one, apart from the corruption, is plausible (as you're
writing to server A, with sid == 1).
> I note that only the part indicating the year (2008) is garbled. May be this
> part is handled differently ?
> At service shutdown B writes the corrupt contextCSN to the disk.
> At service startup B reads the corrupt contextCSN from the disk and begins to
> scan ALL of the data base.
> Also it sends a sync request to A (a persitent search containing the corrupt
> contextCSN in the control field) causing A to scan the WHOLE data base.
> The replica C remains safe.
The fact that the two servers scan the whole database is a side effect
of the incorrect contextCSN; I wouldn't bother, as soon as the
corruption gets tracked and fixed.
> If I reverse the roles of A and B the corruption occurs on A (always on the
> stand by mirror).
> I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and
> this was one of my reasons to migrate to 2.4.11.
Ing. Pierangelo Masarati
OpenLDAP Core Team
via Dossi, 8 - 27100 Pavia - ITALIA
Office: +39 02 23998309
Mobile: +39 333 4963172
Fax: +39 0382 476497