[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: mmr pair stops replicating: "consumer state is newer than provider"



wow, that's a mess.

So #000# is serverID 0, which would be for any entries prior to moving to MMR. The fact that you have different values for #000# on dsa1 accesslog vs the other 3 databases is disturbing.

It would appear DSA1 is serverID 1, and its CSNs make sense:

20170530214415.204052Z#000000#001#000000
20170530214415.204052Z#000000#001#000000

However, there's someting serious wrong with dsa2 (assuming it is serverID 2):

20170521175113.974560Z#000000#002#000000
20170619014933.531051Z#000000#002#000000

As this implies the primary DB received a write on 2017/06/19 @ 01:49:33, but the accesslog has not recorded this change, as it says the last time there was a write op to the accesslog DB on #002# was 2017/05/21 @ 17:51:13, nearly a month earlier. So it doesn't seem to think you've done a write op directly against serverID 002.

thanks. i think i've managed to clean up the mess, and replications is flowing again. i've exorcized the old serverid 000 references, and verified each server's accesslog is getting updated as local modifications occur.

contextcsns seem to be a bit more sane now, hopefully?

>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'cn=config' -s base 'olcserverid'
Enter LDAP Password:
dn: cn=config
olcServerID: 1

>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'cn=config' -s base 'olcserverid'
Enter LDAP Password:
dn: cn=config
olcServerID: 2

>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'dc=example,dc=org' -s base 'contextcsn'
Enter LDAP Password:
dn: dc=example,dc=org
contextCSN: 20170705042207.590054Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000

>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'dc=example,dc=org' -s base 'contextcsn'
Enter LDAP Password:
dn: dc=example,dc=org
contextCSN: 20170705042207.590054Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000

>ldapsearch -ZZxWLLLH 'ldap://dsa1.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'cn=accesslog' -s base 'contextcsn'
Enter LDAP Password:
dn: cn=accesslog
contextCSN: 20170705042145.957972Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000

>ldapsearch -ZZxWLLLH 'ldap://dsa2.example.org/' -D 'uid=dit_admin,ou=role_accounts,ou=accounts,dc=example,dc=org' -b 'cn=accesslog' -s base 'contextcsn'
Enter LDAP Password:
dn: cn=accesslog
contextCSN: 20170705042145.957972Z#000000#001#000000
contextCSN: 20170704183515.872465Z#000000#002#000000

i've also increased accesslog data retention from 7 days to 14 days, as a bit of a compensation for the infrequent writes, and i'll implement a "no-op" cron job as well, as a fail safe. are then any pitfalls i may not be considering with a 14 day accesslog retention period? is that too long according to "typical" consensus?

for posterity's sake, after the mess was cleaned up, once a proper write occurred on each master, and the accesslog db was updated and csns brought in line, replication began flowing again, without the need for a restart on either side [at least in this particular case, anyway].

-ben