[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5985) replication lockout with syncrepl



quanah@OpenLDAP.org wrote:
> Full_Name: Quanah Gibson-Mount
> Version: 2.3/2.4/HEAD
> OS: Linux 2.6
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (75.111.29.239)
>
>
> I noticed back in testing with OpenLDAP 2.3 that if a master gets a high rate of
> changes, and you have 3+ replicas, usually 2 replicas will end up getting all of
> the changes while the 3rd+ replicas have to wait until those 2 finish before
> getting changes.  If the high rate of changes goes on for a long enough period
> of time, this can cause the other replicas to get so far out of sync that it is
> more efficient to reload them than to wait on them to re-sync.  I discussed this
> with Howard, and in reviewing the code, he sees there's an underlying design
> issue with updates that is causing this.  His comments:
>
> Once a thread for a psearch wakes up, it sends all the changes that were queued
> so it may hog an entire thread for a long time before the next psearch comes off
> the queue
>
Fixing this issue would require a complete redesign of the psearch queue 
handling. Instead of queuing up a separate response per psearch, there should 
be a single queue of responses, and the qplayer should iterate thru to match a 
response to each of the active psearches. That would guarantee that all 
replicas receive a given change before any of them receives the next change. 
This would also help with the ordering issues discussed recently on -technical 
and -devel.

I suspect this is too big a change to target the next (.16) release, since 
we're focusing on re-stabilizing the code right now.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/