[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 2.4.44 + ITS 8432 patch segfault in modify_add_values



--On Wednesday, February 15, 2017 6:36 PM -0800 "Paul B. Henson" <henson@acm.org> wrote:

On Wed, Feb 15, 2017 at 12:22:29PM -0800, Quanah Gibson-Mount wrote:

I would suggest filing an ITS with the full backtrace info, so I can
track  it.

Ok, will do.

It could be useful to have the entry data from the accesslog as
well for the failed replication op, as we can see the failed entry DN in
the output of your backtrace.

That would be in the accesslog on the server that crashed? Hmm, the
server that crashed is the master, and all updates were going to it. Am
I confused, or did the update that caused the crash come in via syncrepl
though, and hence originate from a different server? So the accesslog
entry you want would be from that server, not the server that crashed?
But given no other servers should have been receiving updates, how would
an update have been received via replication? Or is this another issue
like the memberOf problem where updates are being improperly replicated?

It appears to be crashing while writing the change to the accesslog database. It's odd that the value for the attribute is NULL. Do we know for sure what the client doing the write op to the server is sending?


Hmm, looking at the logs that correspond with one of the crashes:


This operation appears to succeed? Then there's this:

Feb 14 04:00:13 fosse slapd[12524]: conn=37859 op=806 MOD
dn="uid=vntruong,ou=user,dc=csupomona,dc=edu" Feb 14 04:00:13 fosse
slapd[12524]: conn=37859 op=806 MOD attr=csupomonaEduPersonExpiration

Yeah, so this is the operation that actually failed... It'd be interesting to know if it succeeded in the primary DB, but failed when writing to the accesslog DB (I.e., the master and its consumers are now out of sync for that entry), or if the entire write op failed (master and consumers are in sync for the entry)

when I restarted the server. I guess I am confused; the entryCSN has
serverID 0, the ID of this server, so this isn't a replicated op, it's
an op from this server. So why does the backtrace show the change coming
in via syncrepl? It seems like it's getting applied twice. The change is
deleting the attribute, so the second time it's getting applied you
would get a no such attribute error...

Hm, so I guess my question would be is it doing the op like this:

dn: ...
changetype: modify
replace: csupomonaEduPersonExpiration
csupomonaEduPersonExpiration:

Or is it doing it like this:

dn: ...
changetype: modify
delete: csupomonaEduPersonExpiration

Because the NULL value seems to imply the former.

--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>