Hi Quanah or anyone with experience,|
I have upgraded to 2.4.41 in a two node cluster and still see replication slowness. I have inserted 300k user records into an lmdb database. The data.mdb ended up 2 GB in size. The insertion took 3 hours to complete (likely mostly due to ldapadd). I enabled replication to a second node using the following 2-way replication with the following syncrepl statement (similar on both nodes):
retry="5 5 300 +"
I started this replication on Friday and by Monday it is only 28% complete (around 90k records have been transferred -- data.mdb = 571M). These servers have full speed network connections between them, so I don't understand the protocol slowness. Is replication not intended for this amount of transfer load? Are we expected to recover the node via a separate method (i.e., slapcat / slapadd) and then kick replication off only after it's been loaded?
Additionally, when I have restarted a partially replicated node far into this replication process (90k records), the entire process stops and does not resume on restart. I do not have journaling enabled, but because these are new full records it wouldn't buy much performance speed here other than perhaps better replication recovery.
My questions include...
Is syncrepl configured optimally?
Will journaling help with replication recovery?
We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.
Please let me know.
On 7/21/15 1:53 PM, Brian Wright wrote: