[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl consumer is slow

To: Emmanuel Lécharny <elecharny@gmail.com>, "OpenLDAP-devel@openldap.org >> OpenLDAP Devel" <openldap-devel@openldap.org>
Subject: Re: syncrepl consumer is slow
From: Howard Chu <hyc@symas.com>
Date: Mon, 11 May 2015 21:17:59 +0100
In-reply-to: <5550E3C5.2070501@gmail.com>
References: <54C9A511.8000800@symas.com> <5550E3C5.2070501@gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:40.0) Gecko/20100101 Firefox/40.0 SeaMonkey/2.37a1

Emmanuel Lécharny wrote:

Restarting this thread...

we have had some interesting discussion today that I wanted to share.

Hypothesis : 1 server has been down for a long time, and the contextCSN
is older than the one of the other servers, forcing a refresh mode with
more than the content of the AccessLog.

Quanah said that in some heavily servers, the only way for the consumer
to catch up is to slapcat/slapadd/restart the consumer. I wonder if it
would not be a way to deal with server that are to far behind the
running server, but as a mechanism that is included in the refresh phase
(ie, the restarted server will detect that it has to grab the set of
entries and load them, os if a human being was doing a
slapcat/slapadd/restart).

More specifically, is there a way to know how many entries we will have
to update, and is there a way to know when it will be faster to be
brutal (the Quanah way)  compared to let the refresh mechanism doing its
job.

Not a worthwhile direction to pursue. Doing the equivalent of a fullslapcat/slapadd across the network will use even more bandwidth than thecurrent syncrepl. None of this addresses the underlying causes of why theconsumer is slow, so the original problem will remain.


There are two main problems:

1) the AVL tree used for presentlist is still extremely inefficient in bothCPU and memory use.2) the consumer does twice as much work for a single modification as theprovider. I.e., the consumer does a write op to the backend for themodification, and then a second write op to update its contextCSN. Theprovider only does the original modification, and caches the contextCSN update.

If we fix both of these issues, consumer speed should be much faster. Nothingelse is worth investigating until these two areas are reworked.

For (1) I've been considering a stripped down memory-only version of LMDB.There are plenty of existing memory-only Btree implementations out therealready though, if anyone has a favorite it would probably save us some timeto use an existing library. The Linux kernel has one (lib/btree.c) but it'sunder GPL so we can't use it directly.

Another point : as soon as the server is restarted, it can receive
incoming requests, which will send back outdated response, until the
refresh is completed (and i'm not talking about updates that could also
be applied on an outdated base, with the consequences if there are some
missing parents). In many cases, that would be a real problem, typically
if the LDAP servers are considered as part of a shared pool of server,
with a load balance mecahnism to spread the load. Wouldn't be more
realistic to simply consider the server as not available until the
refresh phase is completed ?

This was ITS#7616. We tried it and it caused a lot of problems. It has beenreverted.


--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Follow-Ups:
- Re: syncrepl consumer is slow
  - From: Emmanuel Lécharny <elecharny@gmail.com>

References:
- Re: syncrepl consumer is slow
  - From: Emmanuel Lécharny <elecharny@gmail.com>

Prev by Date: Re: syncrepl consumer is slow
Next by Date: Re: ITS8100: What to do about a fresh accesslog DB when in delta-sync MMR node
Index(es):
- Chronological
- Thread