[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl consumer is slow



--On Monday, May 11, 2015 8:15 PM +0200 Emmanuel Lécharny <elecharny@gmail.com> wrote:

Quanah said that in some heavily servers, the only way for the consumer
to catch up is to slapcat/slapadd/restart the consumer. I wonder if it
would not be a way to deal with server that are to far behind the
running server, but as a mechanism that is included in the refresh phase
(ie, the restarted server will detect that it has to grab the set of
entries and load them, os if a human being was doing a
slapcat/slapadd/restart).

A specific example we had in the past was quarterly updates for students @ Stanford, which could push out 10's of thousands of updates to the single-node master. Generally of the 6 slaves, 2-3 would remain current, and the other 3 would fall hours or days behind. Since serving out siginficantly out of date data was not an option, we'd generally have to resort to reloading the ones that got stuck behind to get the sync'd up in a timely fashion.


Another point : as soon as the server is restarted, it can receive
incoming requests, which will send back outdated response, until the
refresh is completed (and i'm not talking about updates that could also
be applied on an outdated base, with the consequences if there are some
missing parents). In many cases, that would be a real problem, typically
if the LDAP servers are considered as part of a shared pool of server,
with a load balance mecahnism to spread the load. Wouldn't be more
realistic to simply consider the server as not available until the
refresh phase is completed ?

There's already an option for this, new for OpenLDAP 2.5 IIRC, that makes it return LDAP_BUSY or some such until it is "caught up". However, if you enable that option, it always returns this response, which is problematic, because a server may routinely flip between "caught up" and not "caught up". I.e., it is not unusual for a system to be a second or so behind other masters. Here's real world data from a client I just ran:

[zimbra@zm-mmr01 ~]$ ./libexec/zmreplchk
Master: ldap://zm-mmr01.client.net:389 ServerID: 1 Code: 6 Status: 0y 0M 0w 0d 0h 0m 1s behind CSNs:
20150504222317.897445Z#000000#001#000000
20150511174531.424005Z#000000#002#000000
20150501181032.360324Z#000000#00a#000000
20150511174535.964334Z#000000#00b#000000
Master: ldap://zm-mmr00.client.net:389 ServerID: 2 Code: 0 Status: In Sync CSNs:
20150504222317.897445Z#000000#001#000000
20150511174531.424005Z#000000#002#000000
20150501181032.360324Z#000000#00a#000000
20150511174535.964334Z#000000#00b#000000
Master: ldap://nvl-mmr10.client.net:389 ServerID: 10 Code: 6 Status: 0y 0M 0w 0d 0h 0m 1s behind CSNs:
20150504222317.897445Z#000000#001#000000
20150511174531.424005Z#000000#002#000000
20150501181032.360324Z#000000#00a#000000
20150511174536.315403Z#000000#00b#000000
Master: ldap://nvl-mmr11.client.net:389 ServerID: 11 Code: 6 Status: 0y 0M 0w 0d 0h 0m 1s behind CSNs:
20150504222317.897445Z#000000#001#000000
20150511174531.424005Z#000000#002#000000
20150501181032.360324Z#000000#00a#000000
20150511174536.315403Z#000000#00b#000000


--Quanah


--

Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration