[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#8897) Delta syncrepl + refreshOnly segfault if 2 servers receive same group modification operation within interval



Full_Name: Mark Cairney
Version: 2.4.46
OS: Centos 7
URL: 
Submission from: (NULL) (129.215.149.98)


In a MMR setup of >=3 servers (4 in this case) you can cause a segfault if you
issue the same modification to 2 separate servers within your replication
interval.
It seems like it's the last server to receive the change falls over when it
receives the change from one of the other servers during replication:

2018-08-10T16:01:39.461367+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=3 BIND authcid="mcairney@EASE.ED.AC.UK"
authzid="mcairney@EASE.ED.AC.UK"
2018-08-10T16:01:39.461843+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=3 BIND dn="uid=mcairney,ou=people,ou=central,dc=authorise-dev,dc=ed,dc=ac,dc=uk"
mech=GSSAPI sasl_ssf=256 ssf=256
2018-08-10T16:01:39.462205+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=3 RESULT tag=97 err=0 text=
2018-08-10T16:01:39.462858+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=4 MOD dn="cn=marksgroup,ou=ug,ou=iti,ou=is,dc=authorise-dev,dc=ed,dc=ac,dc=uk"
2018-08-10T16:01:39.463268+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=4 MOD attr=member
2018-08-10T16:01:39.463703+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
slap_queue_csn: queueing 0x46a1a00 20180810150139.461113Z#000000#00c#000000
2018-08-10T16:01:39.464026+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
slap_queue_csn: queueing 0x46a0540 20180810150139.461113Z#000000#00c#000000
2018-08-10T16:01:39.464554+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
slap_graduate_commit_csn: removing 0x46a0540
20180810150139.461113Z#000000#00c#000000
2018-08-10T16:01:39.464922+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=4 RESULT tag=103 err=0 text=
2018-08-10T16:01:39.465271+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
slap_graduate_commit_csn: removing 0x46a1a00
20180810150139.461113Z#000000#00c#000000
2018-08-10T16:01:39.465647+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 op=5 UNBIND
2018-08-10T16:01:39.465962+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1004 fd=12 closed
2018-08-10T16:01:59.042478+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 fd=12 ACCEPT from IP=129.215.17.49:59638 (IP=129.215.17.52:636)
2018-08-10T16:01:59.045948+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 fd=12 TLS established tls_ssf=256 ssf=256
2018-08-10T16:01:59.046828+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=0 BIND dn="cn=manager,dc=authorise-dev,dc=ed,dc=ac,dc=uk"
method=128
2018-08-10T16:01:59.052303+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=0 BIND dn="cn=manager,dc=authorise-dev,dc=ed,dc=ac,dc=uk"
mech=SIMPLE ssf=0
2018-08-10T16:01:59.052924+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=0 RESULT tag=97 err=0 text=
2018-08-10T16:01:59.053419+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=1 SRCH base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
2018-08-10T16:01:59.053912+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=1 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN
reqNewSuperior entryCSN
2018-08-10T16:01:59.054366+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
Entry reqStart=20180810144019.000000Z,cn=accesslog CSN
20180810144019.929847Z#000000#00c#000000 older or equal to ctx
20180810144019.929847Z#000000#00c#000000
2018-08-10T16:01:59.054902+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
Entry reqStart=20180810144922.000002Z,cn=accesslog CSN
20180810144822.605802Z#000000#009#000000 older or equal to ctx
20180810150125.942413Z#000000#009#000000
2018-08-10T16:01:59.055330+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
syncprov_search_response:
cookie=rid=004,sid=00c,csn=20120217162731.749366Z#000000#000#000000;20180701051000.119854Z#000000#003#000000;20180711011500.114848Z#000000#004#000000;20180710075708.402194Z#000000#005#000000;20180703112408.905121Z#000000#006#000000;20180810144822.605802Z#000000#009#000000;20180711161057.835548Z#000000#00b#000000;20180810150139.461113Z#000000#00c#000000
2018-08-10T16:01:59.055753+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=1 SEARCH RESULT tag=101 err=0 nentries=1 text=
2018-08-10T16:01:59.056308+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=2 ABANDON msg=2
2018-08-10T16:01:59.056735+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=3 SRCH base="dc=authorise-dev,dc=ed,dc=ac,dc=uk" scope=2 deref=0
filter="(objectClass=*)"
2018-08-10T16:01:59.057148+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
conn=1005 op=3 SRCH attr=* +
2018-08-10T16:02:06.066075+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT
2018-08-10T16:02:06.075530+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
do_syncrep2: rid=001
cookie=rid=001,sid=009,csn=20180810150125.942413Z#000000#009#000000
2018-08-10T16:02:06.076055+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
slap_queue_csn: queueing 0x46a0380 20180810150125.942413Z#000000#009#000000
2018-08-10T16:02:06.076275+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11286]:
syncrepl_message_to_op: rid=001 tid f8504700

The following gets logged in /var/log/messages:

2018-08-10T15:22:25.582679+01:00 spruce.authorise-dev.is.ed.ac.uk su: (to ldap)
root on pts/0
2018-08-10T15:59:50.381730+01:00 spruce.authorise-dev.is.ed.ac.uk su: (to ldap)
root on pts/0
2018-08-10T16:02:06.076563+01:00 spruce.authorise-dev.is.ed.ac.uk kernel:
slapd[11289]: segfault at 1c ip 000000000051c6d4 sp 00007fcbf85022b0 error 4 in
slapd[400000+21e000]

Even worse is that because the change is stuck in it's accesslog if you try and
restart slapd it just falls over again:

2018-08-10T16:12:06.020868+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11451]:
@(#) $OpenLDAP: slapd 2.4.46 (Mar 27 2018 09:23:46)
$#012#011mockbuild@harmon.is.ed.ac.uk:/builddir/build/BUILD/openldap-2.4.46/servers/slapd
2018-08-10T16:12:06.052521+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]:
slapd starting
2018-08-10T16:12:06.067909+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]:
do_syncrep2: rid=002
cookie=rid=002,sid=00b,csn=20180810150125.942413Z#000000#009#000000
2018-08-10T16:12:06.068382+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]:
slap_queue_csn: queueing 0x4252a40 20180810150125.942413Z#000000#009#000000
2018-08-10T16:12:06.068652+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]:
syncrepl_message_to_op: rid=002 tid ce43f700
2018-08-10T16:12:06.068871+01:00 spruce.authorise-dev.is.ed.ac.uk slapd[11453]:
do_syncrep2: rid=001
cookie=rid=001,sid=009,csn=20180810150125.942413Z#000000#009#000000

Finally you can bring the server back up if you clear out the accesslog and
restart again.

The ldif to trigger it was simply:
dn: cn=marksgroup,ou=ug,ou=iti,ou=is,dc=authorise-dev,dc=ed,dc=ac,dc=uk
changetype: modify
add: member
member: uid=mcairney,ou=people,ou=central,dc=authorise-dev,dc=ed,dc=ac,dc=uk
-

Applied using ldapmodify -H ldaps://<server>.authorise-dev.is.ed.ac.uk:636 -f
file.ldif
Deleting a member also triggers the same behaviour

Changing user attributes e.g. mail in the same manner also triggers the issue.

Adding/deleting an object doesn't cause a segfault but simply triggers a resync
on each server.