OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Documentation/5661
Full headers

From: ali.pouya@free.fr
Subject: contextCSN gets corrupted on the stand by mirror
Compose comment
Download message
State:
0 replies:
12 followups: 1 2 3 4 5 6 7 8 9 10 11 12

Major security issue: yes  no

Notes:

Notification:


Date: Tue, 19 Aug 2008 09:48:05 GMT
From: ali.pouya@free.fr
To: openldap-its@OpenLDAP.org
Subject: contextCSN gets corrupted on the stand by mirror
Full_Name: Ali Pouya
Version: 2.4.11
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (145.242.11.4)


I think there is a documentation issue for OpenLdap 2.4.11 :
The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
directives for each mirror side. If I do so, the contextCSN of the stand by
mirror gets  corrupted very easily. But if I confugure the mirrors with only ONE
syncrepl directive it's OK.

The test environment :
I have a test directory with two mirrors A (sid=1) and B (sid=2) configured as
recommended in the Admin's Guide, and a replica C connected to A.
The directory contains 10 million objects, and I use the server A for writing
500 000 new ones. 

Very often and without any apparent reason the contextCSN in the memory of B
gets suddenly corrupted while those of A and C are OK.
In this situation the contextCSN of B gets stuck but B continues to receive data
from A.

The value of contextCSN in base 64 is  :

contextCSN: 20080727021429.070493Z#000000#000#000000
contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==

I note that only the part indicating the year (2008) is garbled. May be this
part is handled differently ?

At service shutdown B writes the corrupt contextCSN to the disk.
At service startup B reads the corrupt contextCSN from the disk and begins to
scan ALL of the data base.

Also it sends a sync request to A (a persitent search containing the corrupt
contextCSN in the control field) causing A to scan the WHOLE data base.
The replica C remains safe.

If I reverse the roles of A and B the corruption occurs on A (always on the
stand by mirror).

I have already encountered the contextCSN corruption problem in OpenLdap 2.3 and
this was one of my reasons to migrate to 2.4.11.

Thanks for your HELP
Best Regards
Ali Pouya


Followup 1

Download message
Date: Tue, 19 Aug 2008 11:48:51 +0100 (BST)
From: ghenry@OpenLDAP.org
To: ali pouya <ali.pouya@free.fr>
Cc: openldap-its@OpenLDAP.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
> I think there is a documentation issue for OpenLdap 2.4.11 :
> The chapter 17.4.4 of the Admin Guide recommends configuring TWO
> sycrepl
> directives for each mirror side. If I do so, the contextCSN of the
> stand by
> mirror gets  corrupted very easily. But if I confugure the mirrors
> with only ONE
> syncrepl directive it's OK.

The documentation is correct.
 
> The test environment :
> I have a test directory with two mirrors A (sid=1) and B (sid=2)
> configured as
> recommended in the Admin's Guide, and a replica C connected to A.
> The directory contains 10 million objects, and I use the server A for
> writing
> 500 000 new ones. 
> 
> Very often and without any apparent reason the contextCSN in the
> memory of B
> gets suddenly corrupted while those of A and C are OK.
> In this situation the contextCSN of B gets stuck but B continues to
> receive data
> from A.
> 
> The value of contextCSN in base 64 is  :
> 
> contextCSN: 20080727021429.070493Z#000000#000#000000
> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==

perl -MMIME::Base64 -e 'print
decode_base64("+HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA=="),
"\n";'

does look very funny :-(

Can we get your bdb version, your config and the logs of an empty mirrormode
node B pulling in the data loaded in mirrormode A (posted/hosted online
somewhere).

Also, has this always happened on the same machine? What are the specs of the
servers?

Is this a fresh install?

-- 
Kind Regards,

Gavin Henry.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry@suretecsystems.com

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/



Followup 2

Download message
Date: Tue, 19 Aug 2008 14:44:50 +0200
From: ali.pouya@free.fr
To: openldap-its@openldap.org
Subject: Trans.: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror

----- Message transf.r. de ali.pouya@free.fr -----
   Date.: Tue, 19 Aug 2008 14:48:53 +0200
     De.: ali.pouya@free.fr
Adresse de retour.:ali.pouya@free.fr
  Sujet.: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
      ..: ghenry@OpenLDAP.org

Hi Gavin;

Below you find the answers to your questions :

> Can we get your bdb version, your config and the logs of an empty
mirrormode
> node B pulling in the data loaded in mirrormode A (posted/hosted online
> somewhere).

The BDB version is 4.6.21.
You find here attached the file conf.tar.gz containing the configuration of B.
The file syncrepl.conf.simple works well, but the file syncrepl.conf.double
garbles the contextCSN (I write more than 1000 entries per minute).
Do you want a log for the 10 million entries ? Which loglevel ?
The problem only happens if there are write operations on A, not if the server A
is stationary.

>
> Also, has this always happened on the same machine? What are the specs of
the
> servers?

The problem happens on the stand by server : If I write on B the contextCSN of
A gets corrupted (I have already tested this).

My servers are quadri-processor Xeon 2.2 GHz.
I think this is not related to the hardware but the "year" part of contextCSN is
not well protected against concurrent operations (?).

>
> Is this a fresh install?
Yes for 2.4.11, but I use OpenLdap since 5 years for my different projects.

Best Regards
Ali


>




----- Fin du message transf.r. -----




Followup 3

Download message
Date: Tue, 19 Aug 2008 13:54:37 +0100 (BST)
From: Gavin Henry <ghenry@OpenLDAP.org>
To: openldap-its@OpenLDAP.org
Cc: ali.pouya@free.fr
Subject: Fwd: (ITS#5661) contextCSN gets corrupted on the stand by mirror
------=_Part_5_5686568.1219150476998
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

For Ticket records. Please keep to openldap-its

----- Forwarded Message -----
From: "ali pouya" <ali.pouya@free.fr>
To: ghenry@OpenLDAP.org
Sent: Tuesday, 19 August, 2008 1:48:53 PM GMT +00:00 GMT Britain, Ireland,
Portugal
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror

Hi Gavin;

Below you find the answers to your questions :

> Can we get your bdb version, your config and the logs of an empty
mirrormode
> node B pulling in the data loaded in mirrormode A (posted/hosted online
> somewhere).

The BDB version is 4.6.21.
You find here attached the file conf.tar.gz containing the configuration of B.
The file syncrepl.conf.simple works well, but the file syncrepl.conf.double
garbles the contextCSN (I write more than 1000 entries per minute).
Do you want a log for the 10 million entries ? Which loglevel ?
The problem only happens if there are write operations on A, not if the server A
is stationary.

>
> Also, has this always happened on the same machine? What are the specs of
the
> servers?

The problem happens on the stand by server : If I write on B the contextCSN of
A gets corrupted (I have already tested this).

My servers are quadri-processor Xeon 2.2 GHz.
I think this is not related to the hardware but the "year" part of contextCSN is
not well protected against concurrent operations (?).

>
> Is this a fresh install?
Yes for 2.4.11, but I use OpenLdap since 5 years for my different projects.

Best Regards
Ali


>




-- 
Kind Regards,

Gavin Henry.
OpenLDAP Engineering Team.

E ghenry@OpenLDAP.org

Community developed LDAP software.

http://www.openldap.org/project/

------=_Part_5_5686568.1219150476998
Content-Type: application/x-gzip; name=conf.tar.gz
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename=conf.tar.gz

H4sIAJrAqkgAA+2Z3VMbNxDAeT39FRp4SELJfWI7YeYeaNym6aTUKSXN9E0+ybbSO+miD4Lb6f/e
vTsbjI1rHLAZEv2YwWd97Z6kXa3WfrCzcUKg02rVn8D8Z/0cRWErbh1CwwTKO52wvYNbm1dtZ8dq
QxTGO0pK83/tVtU/UvxAj0WmWJn7mRQDX/OizNn9yqgWuH14uHT9D8PW9fWPkk6Y7ODwftW4mW98
/ffQdP2x4jQNwxjhUslzTplKc0rKoyCIQj9qtf3EjzrJUfLiJcJmXLJUsYFienQsaI8pzbVBWDGj
xuluFOPvdhHWjKhs1CeapbtZOlBQNOC5gYF3n8r+R5aZVznROt1/VjXOJIypbb96HLGCwL/sLy6G
qRwMEO5zQQtmRpKmzRZtiqiAoUVavcOBtKCS1tKqjOnndMirkoIN4DMdSnt+MNEB3pYyYTjJdd0R
oYdehAcE7D8nJa2Nf1MyVth/EiVT+0/gmIjB/qN2J3b2vw327gbawwv0lPzIjMZECEu4YpgynBPc
ff1moSn0v7P8H3k24kxVUqpNzIdWEcOlqMU+uVTi15KJt93jHi7BQczIn21DmcYlUYZnNoch9eK7
bUD/95XzBHWPcOh3VkpckI+7xFSaKwEaM1xIygc8a2YAw6DRyyBsB3EYvljS/9gaZtWSEY7wcc5x
T9ox2cz7c5HlljLPC6xWQS4zkgfMZAEcBqI6foLmMAgyqZjfPK/RR3Oxdi/oYqQalrAsUqzbuSxl
zrPxut3gsMpkUdj15V3t1mlXtJfLYc7OWe49jxAqOYVD9/pI50QFyoqJ64cWiKihrpstbVW1QKv1
Ihkcvro+ThCixJDq+MdeIQWHWb0q8vq0jzRT50y96Xox0nYw4BeeNwkUKm9LhefB2U5owcXtDve6
W/nZw94/p6c/Hf8bvbMn3T+S8tO7bu9lYn7Jxh9k+1X+5/c/x+97rQ+vEQWrz0CvsRdA0EMtNzoA
xYJKy9XvCvEHu5i8ah2tlJIL40XNsYbbIdq7oTgKbzGNOS9Al8nYqzfDbAiPCq6UVGDGMO9SICRh
jnMyxlWrKrRD04fnV9rhWr1Kt8tKDdMNLgA2U1N5VSNkCWsBIZT3+29nP1yVK5ZLQkd8WvEtx1WP
hfn7H5W2v+X7XwR/c/Ff0m63XPy3DRbuf9F93P/Cx3X/W+sK3HJXYMdXw5z/34iM2+f/Kv8fgf+P
D2N3/98KLv/3bTs/f/aqsSEZK+2/fThv/2Encva/DZrVx5gKn12QzKxrTtjwgqVW1OMwijX/e+Yr
QpPxvWvjw14TYP3Mqk1KATd2Ma6GI2WZTxJauvo+m+L7EnEPvWb3iT+TxdiUjFX3v6QdLeb/3f1v
K9w9f73AlvP/V7yFGKRJ4td7GlvDc67hKyVC45zhjNBGndn893We4vzJ8dLfA55tQP/Z/H/7xiz/
ch5//h/VS+XNhIMHEJup8avTk+bh7OxN1/M89gntNU2pINkpUxC+ndiiz5TX1DaVllPPu1aSiWnB
QRVcflXO+x7wZzP2G5Kxyv/HnXjO/8dhEjr/vw1Qs/zYSEyMUTq1mqkemOFnqSiqbLw/Bk8uxbiQ
Fny6NaNpKQRbk7vdehGjYoTeNMZlvLayCxhyuvtFgd0y2etHpJ8VxILTofYxzBBDM5O5f5OUNX7K
mhNwj7o+9mWbzvVDW47D4XA4HA6Hw+FwOBwOh8PhcDgcDofD4XA4HA6Hw+FwOBwPx39608HTAFAA
AA==
------=_Part_5_5686568.1219150476998--



Followup 4

Download message
Date: Tue, 19 Aug 2008 14:21:16 +0100 (BST)
From: ghenry@OpenLDAP.org
To: ali pouya <ali.pouya@free.fr>
Cc: openldap-its@OpenLDAP.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
----- "ali pouya" <ali.pouya@free.fr> wrote:

> Hi Gavin;
> 
> Below you find the answers to your questions :
> 
> > Can we get your bdb version, your config and the logs of an empty
> mirrormode
> > node B pulling in the data loaded in mirrormode A (posted/hosted
> online
> > somewhere).
> 
> The BDB version is 4.6.21.
> You find here attached the file conf.tar.gz containing the
> configuration of B.

Thanks.

> The file syncrepl.conf.simple works well, but the file
> syncrepl.conf.double
> garbles the contextCSN (I write more than 1000 entries per minute).
> Do you want a log for the 10 million entries ? Which loglevel ?

Nope, not yet. loglevel sync

> The problem only happens if there are write operations on A, not if
> the server A
> is stationary.

Also note that serverID is a *global* directive not per database. Move 
that out of "database bdb".

> > Also, has this always happened on the same machine? What are the
> specs of the
> > servers?
> 
> The problem happens on the stand by server : If I write on B the
> contextCSN of
> A gets corrupted (I have already tested this).
> 
> My servers are quadri-processor Xeon 2.2 GHz.
> I think this is not related to the hardware but the "year" part of
> contextCSN is
> not well protected against concurrent operations (?).
> 
> >
> > Is this a fresh install?
> Yes for 2.4.11, but I use OpenLdap since 5 years for my different
> projects.

OK, well you should then know that 
"rootdn		cn=admin,ou=ressources-dgi,ou=mefi,o=gouv,c=fr"

by passes all ACLs, so you don't need:

access to *
    by dn.base="cn=admin,ou=ressources-dgi,ou=mefi,o=gouv,c=fr" write

-- 
Kind Regards,

Gavin Henry.

T +44 (0) 1224 279484
M +44 (0) 7930 323266
F +44 (0) 1224 824887
E ghenry@suretecsystems.com

Open Source. Open Solutions(tm).

http://www.suretecsystems.com/



Followup 5

Download message
Date: Thu, 21 Aug 2008 23:08:30 +0200
From: Pierangelo Masarati <ando@sys-net.it>
To: ali.pouya@free.fr
CC: openldap-its@openldap.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
ali.pouya@free.fr wrote:
> Full_Name: Ali Pouya
> Version: 2.4.11
> OS: Linux 2.6
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (145.242.11.4)
> 
> 
> I think there is a documentation issue for OpenLdap 2.4.11 :
> The chapter 17.4.4 of the Admin Guide recommends configuring TWO sycrepl
> directives for each mirror side. If I do so, the contextCSN of the stand by
> mirror gets  corrupted very easily. But if I confugure the mirrors with
only ONE
> syncrepl directive it's OK.
> 
> The test environment :
> I have a test directory with two mirrors A (sid=1) and B (sid=2) configured
as
> recommended in the Admin's Guide, and a replica C connected to A.
> The directory contains 10 million objects, and I use the server A for
writing
> 500 000 new ones. 
> 
> Very often and without any apparent reason the contextCSN in the memory of
B
> gets suddenly corrupted while those of A and C are OK.
> In this situation the contextCSN of B gets stuck but B continues to receive
data
> from A.
> 
> The value of contextCSN in base 64 is  :
> 
> contextCSN: 20080727021429.070493Z#000000#000#000000
> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==

which looks like

4 bytes of garbage + "0802033718.300111Z#000000#001#000000"

I note that, according to the sid values you assigned to servers A and 
B, the first contextCSN should not appear, since it has sid == 0, while 
the second one, apart from the corruption, is plausible (as you're 
writing to server A, with sid == 1).

> I note that only the part indicating the year (2008) is garbled. May be
this
> part is handled differently ?

No.

> At service shutdown B writes the corrupt contextCSN to the disk.
> At service startup B reads the corrupt contextCSN from the disk and begins
to
> scan ALL of the data base.
> 
> Also it sends a sync request to A (a persitent search containing the
corrupt
> contextCSN in the control field) causing A to scan the WHOLE data base.
> The replica C remains safe.

The fact that the two servers scan the whole database is a side effect 
of the incorrect contextCSN; I wouldn't bother, as soon as the 
corruption gets tracked and fixed.

> If I reverse the roles of A and B the corruption occurs on A (always on the
> stand by mirror).
> 
> I have already encountered the contextCSN corruption problem in OpenLdap
2.3 and
> this was one of my reasons to migrate to 2.4.11.

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------



Followup 6

Download message
Date: Fri, 22 Aug 2008 00:25:50 +0200
From: Ali Pouya <ali.pouya@free.fr>
To: Pierangelo Masarati <ando@sys-net.it>
CC: openldap-its@openldap.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
Hi Pierangelo,
>> contextCSN: 20080727021429.070493Z#000000#000#000000
>> contextCSN:: +HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
>
> which looks like
>
> 4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
>
Yes, but I would like to bring a precision :
under VI the 4 bytes are handled as 2 characters only. In fact each time 
the problem occurs I repair my database using a BDB C program wich reads 
the first key from id2entry.bdb and writes it on disk.
Then I use vi to fix the contextCSN, before writing the key back to the 
database.
Using vi I do not delete any characters. I only replace them by 20, then 
I fix the rest of the fields.

Another precision : when the first two chars take corrupted, the rest of 
the contextCSN gets stuck and does not follow write operations.

> I note that, according to the sid values you assigned to servers A and 
> B, the first contextCSN should not appear, since it has sid == 0, 
> while the second one, apart from the corruption, is plausible (as 
> you're writing to server A, with sid == 1).
>
Yes.
The contextCSN with sid=0 is there because at the beginning I initiated 
my directory without SID (defaults to 0), then I set two difrent SIDs 
for A and B.


Best Regards
Ali




Followup 7

Download message
Date: Fri, 22 Aug 2008 11:57:48 +0100 (BST)
From: Gavin Henry <ghenry@OpenLDAP.org>
To: ando@sys-net.it
Cc: openldap-its@OpenLDAP.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
> The fact that the two servers scan the whole database is a side effect
> 
> of the incorrect contextCSN; I wouldn't bother, as soon as the 
> corruption gets tracked and fixed.

Is there anything that should be updated for the MirrorMode docs here?

-- 
Kind Regards,

Gavin Henry.
OpenLDAP Engineering Team.

E ghenry@OpenLDAP.org

Community developed LDAP software.

http://www.openldap.org/project/



Followup 8

Download message
Date: Fri, 29 Aug 2008 16:58:00 +0200
From: Pierangelo Masarati <ando@sys-net.it>
To: Ali Pouya <ali.pouya@free.fr>
CC: openldap-its@openldap.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
Ali Pouya wrote:
> Hi Pierangelo,
>>> contextCSN: 20080727021429.070493Z#000000#000#000000
>>> contextCSN::
+HYDCTA4MDIwMzM3MTguMzAwMTExWiMwMDAwMDAjMDAxIzAwMDAwMA==
>>
>> which looks like
>>
>> 4 bytes of garbage + "0802033718.300111Z#000000#001#000000"
>>
> Yes, but I would like to bring a precision :
> under VI the 4 bytes are handled as 2 characters only.

That's probably because vi incorrectly interprets that as a multi-byte 
encoding, since it contains garbage.  That's supposed to be a string 
restricted to those chars that are allowed by generalized time, so you 
shouldn't rely on vi guesses based on their actual, erroneous content.

> In fact each time 
> the problem occurs I repair my database using a BDB C program wich reads 
> the first key from id2entry.bdb and writes it on disk.
> Then I use vi to fix the contextCSN, before writing the key back to the 
> database.
> Using vi I do not delete any characters. I only replace them by 20, then 
> I fix the rest of the fields.

Then you'd get year 20 AD!  The 08 you see in your broken entryCSN is 
the month, not the last two digits of the year.

> Another precision : when the first two chars take corrupted, the rest of 
> the contextCSN gets stuck and does not follow write operations.
> 
>> I note that, according to the sid values you assigned to servers A and 
>> B, the first contextCSN should not appear, since it has sid == 0, 
>> while the second one, apart from the corruption, is plausible (as 
>> you're writing to server A, with sid == 1).
>>
> Yes.
> The contextCSN with sid=0 is there because at the beginning I initiated 
> my directory without SID (defaults to 0), then I set two difrent SIDs 
> for A and B.

Can you try a fresh reload of the database(s) stripping out the entryCSN 
and letting slapadd generate them, using the -S <SID> switch (along with 
the -w switch), in order to enforce a SID of 001 (or 002, as you like)?

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------



Followup 9

Download message
Date: Tue, 02 Sep 2008 17:33:06 +0200
From: ali.pouya@free.fr
To: "openldap-its@OpenLDAP.org" <openldap-its@OpenLDAP.org>,
        Pierangelo Masarati <ando@sys-net.it>
Cc: "ali.pouya@free.fr" <ali.pouya@free.fr>
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
Pierangelo mazarati wrote :

> Can you try a fresh reload of the database(s) stripping out the entryCSN
> and letting slapadd generate them, using the -S <SID> switch (along
with
> the -w switch), in order to enforce a SID of 001 (or 002, as you like)?


Hi Pierangelo,

I made a new directory with only one contextCSN (SID=002) as you recommended,
and reproduced the contextCSN corruption problem several times.

Example1 :
contextCSN:: 0L0NojA5MDIxMjU5NDkuNzMwMjg1WiMwMDAwMDAjMDAyIzAwMDAwMA==

The four corrupted bytes at the beginning are : D0 BD 02 A2 (hex)

Example2 :
contextCSN:: 4I54oTA5MDIxNTE5MTYuMjYzNDIxWiMwMDAwMDAjMDAyIzAwMDAwMA==

The four corrupted bytes at the beginning are : E0 8E 78 A1 (hex)


I insist on the fact that the problem heppens ONLY if I use TWO syncrepl
directives as recommended in the Admin Guide.
If I use only ONE syncrepl directive, I don't reproduce the problem and the
mirrors get synchronized correctly (whichever mirror side I use for writing).
Also the problem happens on the stand by mirror only when therer are write
operations on the active mirror (> 1000 writes per minute).

I do not understand the interest of using TWO syncrepl directives for
mirrormode.

Thanks for your help
Best Regards
Ali




Followup 10

Download message
Date: Tue, 02 Sep 2008 21:26:37 +0200
From: Pierangelo Masarati <ando@sys-net.it>
To: ali.pouya@free.fr
CC: openldap-its@openldap.org
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
ali.pouya@free.fr wrote:

> I made a new directory with only one contextCSN (SID=002) as you
recommended,
> and reproduced the contextCSN corruption problem several times.
> 
> Example1 :
> contextCSN:: 0L0NojA5MDIxMjU5NDkuNzMwMjg1WiMwMDAwMDAjMDAyIzAwMDAwMA==
> 
> The four corrupted bytes at the beginning are : D0 BD 02 A2 (hex)
> 
> Example2 :
> contextCSN:: 4I54oTA5MDIxNTE5MTYuMjYzNDIxWiMwMDAwMDAjMDAyIzAwMDAwMA==
> 
> The four corrupted bytes at the beginning are : E0 8E 78 A1 (hex)
> 
> 
> I insist on the fact that the problem heppens ONLY if I use TWO syncrepl
> directives as recommended in the Admin Guide.
> If I use only ONE syncrepl directive, I don't reproduce the problem and the
> mirrors get synchronized correctly (whichever mirror side I use for
writing).
> Also the problem happens on the stand by mirror only when therer are write
> operations on the active mirror (> 1000 writes per minute).
> 
> I do not understand the interest of using TWO syncrepl directives for
> mirrormode.

Well, going back to your initial posting, I think you are somehow 
correct.  Rather than not seeing the point of having two syncrepl 
statements (of which only one is supposed to be active), I see it as an 
inconsistent and potentially dangerous configuration.  In fact, the only 
advantage of having two syncrepl statements is related to being able to 
share the same configuration among two symmetric servers (mirror mode, 
multimaster, ...), using the serverID directive to determine what is the 
"right" one.  But in that case, you'd need to have multiple serverID 
directives as well, with the URI field set.  I set up a test system with 
your configuration, and loaded it very heavily, while running the server 
that's supposed to screw up under valgrind.  I haven't seen any issue 
yet, though.

p.


Ing. Pierangelo Masarati
OpenLDAP Core Team

SysNet s.r.l.
via Dossi, 8 - 27100 Pavia - ITALIA
http://www.sys-net.it
-----------------------------------
Office:  +39 02 23998309
Mobile:  +39 333 4963172
Fax:     +39 0382 476497
Email:   ando@sys-net.it
-----------------------------------



Followup 11

Download message
Date: Fri, 05 Sep 2008 14:22:59 +0200
From: Ali Pouya <ali.pouya@free.fr>
To: openldap-its@OpenLDAP.org, Pierangelo Masarati <ando@sys-net.it>
CC: ali.pouya@free.fr
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
Pierangelo Masarati wrote :

> Well, going back to your initial posting, I think you are somehow 
> correct. ...

So I will use the simple configuration (only one syncrepl directive) for my
production site. 

.....

> I set up a test system with 
> your configuration, and loaded it very heavily, while running the server 
> that's supposed to screw up under valgrind.  I haven't seen any issue 
> yet, though.

Neither me : When I run slapd with valgrind I cannot reproduce the problem !
Also if I run slap with detailed log I cannot reproduce it.
Both cases slow down slapd !

Isn't this a problem of simultaneous (concurrent) access to the contextCSN
memory zone ?

I will be on vacation for two weeks.
Thanks for your help.
Best Regards
Ali




 







Followup 12

Download message
From: "Carl Johnstone" <carl.johnstone@gmgrd.co.uk>
To: <openldap-its@OpenLDAP.org>
Subject: Re: (ITS#5661) contextCSN gets corrupted on the stand by mirror
Date: Tue, 27 Jan 2009 15:58:29 -0000
I'm seeing the same think on a 3-way multi-master setup here. Two servers 
(#001 & #002) are currently sat next to each other. The third (#003) is at a

remote location. We're currently doing all amends through #001, although in 
the long term we'll be doing amends through all the servers.

When I checked them yesterday I spotted that the remote server had a corrupt 
contextCSN for server #001. I dropped the DB and synced both the config and 
data from the main server again overnight. On checking again today the 
contextCSN is once again corrupt.

In my case it's the first 8 bytes rather than the first 4.

I'm running 2.4.13 with bdb 4.7.25 (first 3 patches applied).

build:

./configure --enable-dynamic --enable-crypt --enable-modules=yes 
--enable-backends=mod
 --enable-overlays=mod --enable-sql=no --enable-ndb=no


Carl


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org