[Date Prev][Date Next]
Re: (ITS#5133) Synchronous replication on slave doesn't notice lost network connection
> email@example.com wrote:
>> There is synchronous replication between mastyer and slave. When network
>> connectivity problems occur master closes tcp connection but slave doesn't
>> notice those problems, it still has tcp connection open, but in real it is not
>> receiving updates any more.
>> I think that can be solved adding some ack from slave because sending on such a
>> socket would fail and force slave to retry connection.
> Well, this should already be taken into consideration by SO_KEEPALIVE,
> which is always set when available on all connections. I concur that it
> usually requires quite a long time before a connection is actually
> checked (usually more than 2 hours), so some better policy could be put
> in place.
On most systems the TCP keepalive timing is a system-wide parameter, so it
wouldn't be practical to try to manipulate that here. I suppose if we were to
implement our own retry timer we'd need to use a benign op that triggers a
reply. Searching the rootDSE for attr 1.1 would suffice. The question is, do
we really want to be generating a lot of keepalive traffic like this? The
default of 2 hours that most systems use is pretty sane, really.
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/