[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: mmr pair stops replicating: "consumer state is newer than provider"
- To: openldap-technical@openldap.org
- Subject: Re: mmr pair stops replicating: "consumer state is newer than provider"
- From: btb <btb@bitrate.net>
- Date: Wed, 5 Jul 2017 00:39:56 -0400
- Content-language: en-US
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitrate.net; s=default; t=1499229598; bh=8LqWM+zZc2GNgMskUkkxokig5d09WfuaHcp4UuctFbA=; h=Subject:To:References:From:Date:In-Reply-To:From; b=DolUIeM31D/cwEcgXdNpQuGusLfT6zJSjyetjqjzJHXr+iGwFK9xGDYZNSRqp8Srn kd03YHb+R2EF5CRyTCpDn4lz20kZURmUfgQEuuPXYWIRyGLUcPvPpusgNOtcoCbRxY DBca+QwmFt4QKf6zTTxhptN+M7lFmEzndPtqgP9c=
- In-reply-to: <4EE5AA58F754C102C58D127F@[192.168.1.30]>
- References: <460a87bc-ccb6-9553-bb6a-b57de306058e@bitrate.net> <WM!721ba9b642972ca17483c621787c32b1e0b1f650e884b9d0653d75b7c6a4b403485f248df406b00e352a97047c1e5e1c!@mailstronghold-1.zmailcloud.com> <B3D6DB90F83F55DBF692C0B8@[192.168.1.30]> <ffa99d26-b81a-6409-6e8c-12ee91d5487e@bitrate.net> <WM!250a43491a3881f6c8d454396d5edcdbdff347676182c3cd95de6b3570ee09feafbcccefba03f9d48b03b9bb3f10deb0!@mailstronghold-1.zmailcloud.com> <4DA177A2CB98B18529699F27@[192.168.1.30]> <a59a985e-8c4c-9f58-131a-c51b78b8874f@bitrate.net> <WM!001a7eaf2d319db0d65d5f48486c7e4d9457a2a4db8dbd04f89cdd1d17dc8fdb2a0d9b3ca6d0898ed0828dd9956d7bf6!@mailstronghold-1.zmailcloud.com> <73EF314E2CECAE34C9C098F4@[192.168.1.30]> <5022df33-7fb9-cbf7-3199-cf5638b2980a@bitrate.net> <WM!0ca4cf98c7e38b6e1e42c0cb58b01a04b10e85c77eb5112cdab0fa6acfbec97c5eec939932a3c8048b3b23078e6be829!@mailstronghold-2.zmailcloud.com> <4EE5AA58F754C102C58D127F@[192.168.1.30]>
- User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:54.0) Gecko/20100101 Thunderbird/54.0
wow, that's a mess.
So #000# is serverID 0, which would be for any entries prior to moving
to MMR. The fact that you have different values for #000# on dsa1
accesslog vs the other 3 databases is disturbing.
It would appear DSA1 is serverID 1, and its CSNs make sense:
20170530214415.204052Z#000000#001#000000
20170530214415.204052Z#000000#001#000000
However, there's someting serious wrong with dsa2 (assuming it is
serverID 2):
20170521175113.974560Z#000000#002#000000
20170619014933.531051Z#000000#002#000000
As this implies the primary DB received a write on 2017/06/19 @
01:49:33, but the accesslog has not recorded this change, as it says the
last time there was a write op to the accesslog DB on #002# was
2017/05/21 @ 17:51:13, nearly a month earlier. So it doesn't seem to
think you've done a write op directly against serverID 002.
thanks. i think i've managed to clean up the mess, and replications is
flowing again. i've exorcized the old serverid 000 references, and
verified each server's accesslog is getting updated as local
modifications occur.
contextcsns seem to be a bit more sane now, hopefully?
>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'cn=config' -s base 'olcserverid'
Enter LDAP Password:
dn: cn=config
olcServerID: 1
>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'cn=config' -s base 'olcserverid'
Enter LDAP Password:
dn: cn=config
olcServerID: 2
>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'dc=example,dc=org' -s base 'contextcsn'
Enter LDAP Password:
dn: dc=example,dc=org
contextCSN: 20170705042207.590054Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000
>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'dc=example,dc=org' -s base 'contextcsn'
Enter LDAP Password:
dn: dc=example,dc=org
contextCSN: 20170705042207.590054Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000
>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'cn=accesslog' -s base 'contextcsn'
Enter LDAP Password:
dn: cn=accesslog
contextCSN: 20170705042145.957972Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000
>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D
'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b
'cn=accesslog' -s base 'contextcsn'
Enter LDAP Password:
dn: cn=accesslog
contextCSN: 20170705042145.957972Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000
i've also increased accesslog data retention from 7 days to 14 days, as
a bit of a compensation for the infrequent writes, and i'll implement a
"no-op" cron job as well, as a fail safe. are then any pitfalls i may
not be considering with a 14 day accesslog retention period? is that
too long according to "typical" consensus?
for posterity's sake, after the mess was cleaned up, once a proper write
occurred on each master, and the accesslog db was updated and csns
brought in line, replication began flowing again, without the need for a
restart on either side [at least in this particular case, anyway].
-ben