[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: commit: ldap/servers/slapd/back-bdb proto-bdb.h search.c (ITS#3172)



Let's move this thread back to ITS#3172 although it seems to me that the
segfault caught in the current Quanah's setup is similar but not the same as
the ones described in ITS#3172. In addition, the segfault does not seem to
happen only with two active replicas. It can happen with only one. Hence it
seems to be a timing issue that the fault was observed only with the two
replicas.

The message I got right before segfault is:
sb_sasl_write: failed to encode packet: can't request info until later in
exchange

I looked into the cyrus library code although I'm not that familiar with it.
sasl_encode() returns SASL_NOTDONE and it does so when the context is not in
the authenticated state.

Isn't it that the max buf size for SASL is 64K in OpenLDAP ?
The syncrepl master returns a vector message containing the IDs of the
deleted/present entries.
Currently, the size of this message is slightly larger than 4KB (256 UUIDs,
each 16 octets).
sasl_encode() returned the error when it tries to send a normal entry right
after three such vector messages were sent.
After this error, it was observed that the slapd can be caught a segfult in
a locale routine for err logging.

- Jong-Hyuk

> >> I then stopped one replica, and restarted it.  The master immediately
hit
> > a
> >> segfault.
> >
> > Was this segfault the same as the one described in ITS#3172 ?
>
> There's been some offline discussion with Jong. ;) The answer here is yes,
> and Jong will have access to my servers for testing starting tomorrow.