[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: mmr pair stops replicating: "consumer state is newer than provider"



--On Thursday, June 29, 2017 1:41 PM -0400 btb <btb@bitrate.net> wrote:



On 6/29/17 11:15 AM, Quanah Gibson-Mount wrote:
--On Thursday, June 29, 2017 2:12 AM -0400 btb <btb@bitrate.net> wrote:

i see, thanks.  i tested this, and did a modify on each, but didn't see
replication resume.  emulating the syncrepl connection with a manual
search against each master, there do seem to be accesslog entries now,
on both masters:

You may have to restart the consumers (I did when I ran into this).

i did try a restart on both, but they returned to the same state

Also, there are 2 sets of CSNs per master that you need to examine --
The CSNs in your database root (i.e., dc=example,dc=org) and your
accesslog root.

that would be these, right?

dsa1 cn=accesslog:
20161019002438.652359Z#000000#000#000000
20170521175113.974560Z#000000#002#000000
20170530214415.204052Z#000000#001#000000

dsa1 dc=example,dc=org:
20170520031415.276678Z#000000#000#000000
20170530214231.171959Z#000000#002#000000
20170530214415.204052Z#000000#001#000000

dsa2 cn=accesslog:
20170520031415.276678Z#000000#000#000000
20170521175113.974560Z#000000#002#000000
20170628034119.327974Z#000000#001#000000

dsa2 dc=example,dc=org:
20170520031415.276678Z#000000#000#000000
20170619014933.531051Z#000000#002#000000
20170628034119.327974Z#000000#001#000000

why are there three per db, and which is suppose to match which?

wow, that's a mess.

So #000# is serverID 0, which would be for any entries prior to moving to MMR. The fact that you have different values for #000# on dsa1 accesslog vs the other 3 databases is disturbing.

It would appear DSA1 is serverID 1, and its CSNs make sense:

20170530214415.204052Z#000000#001#000000
20170530214415.204052Z#000000#001#000000

However, there's someting serious wrong with dsa2 (assuming it is serverID 2):

20170521175113.974560Z#000000#002#000000
20170619014933.531051Z#000000#002#000000

As this implies the primary DB received a write on 2017/06/19 @ 01:49:33, but the accesslog has not recorded this change, as it says the last time there was a write op to the accesslog DB on #002# was 2017/05/21 @ 17:51:13, nearly a month earlier. So it doesn't seem to think you've done a write op directly against serverID 002.

--Quanah

--

Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<http://www.symas.com>