[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#9015) Replication goes haywire querying promoted master



quanah@openldap.org wrote:
> Full_Name: Quanah Gibson-Mount
> Version: 2.4.47
> OS: N/A
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (47.208.144.40)
> 
> 
> In testing a particular use case/setup scenario, I found that it's possible to
> cause a replica to slam a provider with unending requests.  In this specific
> case, I was setting up delta-syncrepl MMR, but I believe the issue applies to
> standard syncrepl, and is not MMR specific.  The scenario looks like this:
> 
> Initially we have a stand alone server, which no overlays in place.  The
> configuration is done via cn=config, which allows for us to update the
> configuration without a server restart.
> 
> The configuration is modified to load the syncprov and accesslog overlays,
> create a new accesslog database, and to send all change data to the accesslog
> db.
> 
> After that is done, a secondary server is brought online with the same
> configuration other than the serverID being different and the syncrepl statement
> adjusted.
> 
> When the secondary server is started, it pummels the initial provider with
> queries like:
> 
> Apr 23 06:39:06 anvil4 slapd[28967]: conn=1003 op=361868131 SRCH
> base="dc=example,dc=com" scope=2 deref=0 filter="(objectClass=*)"
> Apr 23 06:39:06 anvil4 slapd[28967]: conn=1003 op=361868131 SRCH attr=* +
> Apr 23 06:39:06 anvil4 slapd[28967]: conn=1003 op=361868131 SEARCH RESULT
> tag=101 err=0 nentries=0 text=
> 
> (Averaging around 2000 queries/second on my server per syncrepl client).
> 
> I believe the problem is that the root entry for the database contains no
> contextCSN.  This is likely due to the fact that:
> 
> a) There was never a syncprov overlay present until I loaded this one in
> b) The serverID was set prior to the syncprov overlay being loaded (So it went
> from "0" to "1", with no changes ever recorded for "1").
> 
> Now there is a trivial ways to handle this, by making a change on the provider
> prior to starting up the other servers.
> 
> However, I think the overall behavior is undesirable.  If there is no contextCSN
> present, it should not lead to replication clients executing a potential DoS on
> the provider.  It also generated ~60GB of logs at loglevel stats in 1 day.

The consumer should not be reconnecting more frequently than its retry config.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/