[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Recovering a multi-master node after a server failure




> On May 22, 2014, at 2:57 PM, Richard Marshall <richard.marshall@first-utility.com> wrote:
> 
> Hi,
> 
> We have a multi master (2-node) cluster running 2.4.23 on CentOS 6. We're effectively using them as a failover active-standby pair

As has been stated on the list a few thousand times: If you want a working openldap server do not use the garbage shipped by RedHat. 

I would suggest you reload the failed master from the backup master as upgrade them to 2.4.39. If you are not able to build software yourself, then use the builds from Symas or the LTB project

--Quanah



> The 'Master' node failed last night and we failed over to the standby (they're behind a load balancer). I am now trying to bring the old 'Master' back online but it has become apparent there was a misconfiguration in the server id config.
> 
> We did have 'Master' = serverid 1 and 'slave' = serverid 2 - i.e. it was missing the servers URI. I have now fixed this, but we have around 500 objects on the old master reporting " changed by peer, ignored in the sync log.
> 
> The old master will get up to the latest CSN number after I restart it, but then get stuck with these " changed by peer, ignored" errors. 
> 
> My question is, how do I get past this? Is it possible to remove the objects and if so how (I don't want to delete them totally, just remove the conflict). 
> 
> Or, do I need to rebuild the 'old' master server database? If so, is the process to stop slapd, remove the content of the database and accesslog directories. Create an ldif export on the live server, slapadd that file back on to the 'old' master, start it and then allow it to replicate any new changes from it's partner?
> 
> If this is the only way to do it, is there anything I need to look out for? If not this, then what do I do? I've looked but can't find any guidelines in how to recover a failed node.
> 
> Any help appreciated!
> 
> Thanks,
> Rich
>