[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values



On Wed, Feb 15, 2017 at 12:22:29PM -0800, Quanah Gibson-Mount wrote:

> I would suggest filing an ITS with the full backtrace info, so I can track 
> it.

Ok, will do.

> It could be useful to have the entry data from the accesslog as 
> well for the failed replication op, as we can see the failed entry DN in 
> the output of your backtrace.

That would be in the accesslog on the server that crashed? Hmm, the
server that crashed is the master, and all updates were going to it. Am
I confused, or did the update that caused the crash come in via syncrepl
though, and hence originate from a different server? So the accesslog
entry you want would be from that server, not the server that crashed?
But given no other servers should have been receiving updates, how would an
update have been received via replication? Or is this another issue like
the memberOf problem where updates are being improperly replicated?

> Does the operation complete successfully after slapd is restarted?

As far as I can tell; at least the server doesn't crash. I don't get any
errors at the application level in my logs with that specific message
from the backtrace. 

Hmm, looking at the logs that correspond with one of the crashes:

Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu"
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 MOD attr=eduPersonAffiliation eduPersonPrimaryAffiliation csupomonaEduPersonAffiliation
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=805 RESULT tag=103 err=0 text=

This operation appears to succeed? Then there's this:

Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD dn="uid=vntruong,ou=user,dc=csupomona,dc=edu"
Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD attr=csupomonaEduPersonExpiration

Then nothing, the server crashed. In my application, it's the
csupomonaEduPersonExpiration modification that fails.

Here's the entry from the accesslog for the eduPersonAffiliation:

dn: reqStart=20170214120013.000001Z,cn=accesslog
objectClass: auditModify
structuralObjectClass: auditModify
reqStart: 20170214120013.000001Z
reqEnd: 20170214120013.000002Z
reqType: modify
reqSession: 37859
reqAuthzID: cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu
reqDN: uid=vntruong,ou=user,dc=csupomona,dc=edu
reqResult: 0
reqMod: eduPersonAffiliation:=
reqMod: eduPersonPrimaryAffiliation:=
reqMod: csupomonaEduPersonAffiliation:=
reqMod: entryCSN:= 20170214120013.628665Z#000000#000#000000
reqMod: modifiersName:= cn=idmgmt,ou=user,ou=service,dc=csupomona,dc=edu
reqMod: modifyTimestamp:= 20170214120013Z
reqEntryUUID: bd5ba51c-7a1b-4bdb-8bf3-fe90552f5909
entryCSN: 20170214120013.628665Z#000000#000#000000
entryUUID: 4737c48c-e46e-45a4-ba4b-2eb61540b27b
creatorsName: cn=accesslog
createTimestamp: 20170214120013Z
modifiersName: cn=accesslog
modifyTimestamp: 20170214120013Z

Then the next entry isn't until:

dn: reqStart=20170214184643.000008Z,cn=accesslog

when I restarted the server. I guess I am confused; the entryCSN has
serverID 0, the ID of this server, so this isn't a replicated op, it's
an op from this server. So why does the backtrace show the change coming
in via syncrepl? It seems like it's getting applied twice. The change is
deleting the attribute, so the second time it's getting applied you
would get a no such attribute error...