[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5133) Synchronous replication on slave doesn't notice lost network connection

bgmilne@staff.telkomsa.net wrote:
> On Thursday 13 September 2007 23:05:29 ando@sys-net.it wrote:
>> audrius.valunas@teo.lt wrote:
>>> There is synchronous replication between mastyer and slave. When network
>>> connectivity problems occur master closes tcp connection but slave
>>> doesn't notice those problems, it still has tcp connection open, but in
>>> real it is not receiving updates any more.
>>> I think that can be solved adding some ack from slave because sending on
>>> such a socket would fail and force slave to retry connection.
>> Well, this should already be taken into consideration by SO_KEEPALIVE,
>> which is always set when available on all connections.  I concur that it
>> usually requires quite a long time before a connection is actually
>> checked (usually more than 2 hours), so some better policy could be put
>> in place.

> I think I filed a previous ITS on this, but the servers exhibiting this 

Are you referring to ITS#4691, or some other one? (Since 4691 is refreshOnly, 
I guess it must be something else.)

> behaviour in a remote site were lost (power supplies died) so I couldn't test 
> Howard's fix at the time. We have recently installed some QA servers, which 
> now also need to traverse a PIX firewall to get to the production master 
> (from which they replicate one database), and I have seen the behaviour again 
> (they go out of sync on most of the rare changes to this database until I 
> restart them or the check kicks in).

> I note that a keepalive probably needs to be sent at least once an hour for a 
> PIX not to drop the connection. I haven't looked up any relevant RFCs on this 
> though ...

A router/firewall should not silently kill an established connection. It 
should send a TCP RST to notify the endpoints that the connection is gone.

> I can now test a fix a lot more easily (since I can upgrade one of these 
> servers at-will, as opposed to the previous slaves which were in production).
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/