[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: 2.3.39 syncrepl lost connection

The syncrepl config on the replicas is for refreshAndPersist and does a retry every 30 seconds -- so, if the replica knew the connection had dropped, it should have restarted it.

I had this ~2.3.27. It should be OK now; there was a patch to libldap.

A 2.3.39 replica should know the connection was lost, via the underlying OS, because it requests SO_KEEPALIVE. I assume these are (pseudo-?)dedicated servers, given the size of your OpenLDAP installation. As such, you may want to investigate your kernel tunable parameters to make keepalives more aggressive.

You mention RHEL. If you have a test environment, iptables'ing ports 389/636 to /dev/null is a really good way to make sure that this works in a sane fashion. Around here, we don't pay by the byte, so we do:

net.ipv4.tcp_keepalive_time = 10
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 5

to keep syncrepl sync'd. With the CentOS defaults, we saw ~2+hrs before the kernel got a clue.