Re: Syncrepl connections failing

On Thu, Apr 23, 2009 at 05:41:09PM -0500, John Kane wrote:

> I am having a problem with what appears (to me) to be 'stale' TCP
> connections for syncrepl between the master and a pair of slaves.  After
> restarting all, I see changes on the master replicated to both slaves.
> BUT, if I wait about 30 minutes or more, then make a change, the
> replication fails (most of the time).  netstat on the LDAP port show the
> connections still established, but queued packets at the master server.
> After about 15 minutes, the master server drops the connection.  An
> overnight tcpdump on the master showed LDAP occasionally sending a
> keep-alive, with 2hrs between the keep-alive messages (these keep-alives
> are inconsistent, though, some nights I see none).   

> Note:  The 2 slaves are running on blades in an IBM chassis, and the
> master is on a 1U Linux server, just 'one-hop' away.   Prior to this,
> when I had a master/slave pair running on the blades, syncRepl was
> working fine for several months.  It was not until I moved the master to
> the another server did the failures start. 

Do you have a firewall or NAT configured on or between any of the
boxes? This sort of problem with long-lived connections is often due
to state being dropped from IP-level devices.

