[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5133) Synchronous replication on slave doesn't notice lost network connection

ando@sys-net.it wrote:
> audrius.valunas@teo.lt wrote:
>> There is synchronous replication between mastyer and slave. When network
>> connectivity problems occur master closes tcp connection but slave doesn't
>> notice those problems, it still has tcp connection open, but in real it is not
>> receiving updates any more.
>> I think that can be solved adding some ack from slave because sending on such a
>> socket would fail and force slave to retry connection.

> Well, this should already be taken into consideration by SO_KEEPALIVE,
> which is always set when available on all connections.  I concur that it
> usually requires quite a long time before a connection is actually
> checked (usually more than 2 hours), so some better policy could be put
> in place.

On most systems the TCP keepalive timing is a system-wide parameter, so it 
wouldn't be practical to try to manipulate that here. I suppose if we were to 
implement our own retry timer we'd need to use a benign op that triggers a 
reply. Searching the rootDSE for attr 1.1 would suffice. The question is, do 
we really want to be generating a lot of keepalive traffic like this? The 
default of 2 hours that most systems use is pretty sane, really.
   -- Howard Chu
   Chief Architect, Symas Corp.  http://www.symas.com
   Director, Highland Sun        http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP     http://www.openldap.org/project/