[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror



ali.pouya@free.fr wrote:
> Full_Name: Ali Pouya
> Version: 2.4.11
> OS: Linux 2.6
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (145.242.11.4)
> 
> 
> I think there is a documentation issue for OpenLdap 2.4.11 :
> The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
> directives for each mirror side. If I do so, the contextCSN of the stand by
> mirror gets  corrupted very easily. But if I confugure the mirrors with only ONE
> syncrepl directive it's OK.
> 
> The test environment :
> I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as
> recommended in the Admin's Guide, and a replica C connected to A.
> The directory contains 10 million objects, and I use the server A for writing
> 500 000 new ones. 
> 
> Very often and without any apparent reason the contextCSN in the memory of B
> gets suddenly corrupted while those of A and C are OK.
> In this situation the contextCSN of B gets stuck but B continues to receive data
> from A.
> 
> The value of contextCSN in base 64 is  :
> 
> contextCSN: 20080727021429.070493Z#000000#000#000000
> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==

which looks like

4 bytes of garbage + "0802033718.300111Z#000000#001#000000"

I note that, according to the sid values you assigned to servers A and 
B, the first contextCSN should not appear, since it has sid == 0, while 
the second one, apart from the corruption, is plausible (as you're 
writing to server A, with sid == 1).

> I note that only the part indicating the year (2008) is garbled. May be this
> part is handled differently ?

No.

> At service shutdown B writes the corrupt contextCSN to the disk.
> At service startup B reads the corrupt contextCSN from the disk and begins to
> scan ALL of the data base.
> 
> Also it sends a sync request to A (a persitent search containing the corrupt
> contextCSN in the control field) causing A to scan the WHOLE data base.
> The replica C remains safe.

The fact that the two servers scan the whole database is a side effect 
of the incorrect contextCSN; I wouldn't bother, as soon as the 
corruption gets tracked and fixed.

> If I reverse the roles of A and B the corruption occurs on A (always on the
> stand by mirror).
> 
> I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and
> this was one of my reasons to migrate to 2.4.11.

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------