[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [Openldap 2.4.16] Is it possible to force synchronization: files log.xxxx not treated after a crash

--On Tuesday, August 25, 2009 1:38 PM -0400 Francis Swasey <Frank.Swasey@uvm.edu> wrote:

On 8/25/09 12:36 PM, Quanah Gibson-Mount wrote:
--On Tuesday, August 25, 2009 12:11 PM -0400 Francis Swasey
<Frank.Swasey@uvm.edu> wrote:

On 8/25/09 11:45 AM, Aaron Richton wrote:
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

My problem is that some data are not synchronised on one of my server
and I
have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a
process to slapcat each of the replica's generate what was different
from the master and cause the master to make the changes again (ie,
reverse the master and then revert to what the master knew was correct)
which caused the information to get pushed to the replica's again.  In
my case, the problem turned out to be one of my replica's had too
little memory and was triggering a bug in v2.3 which caused the changes
for delta-syncrepl to not get logged in the accessdb on the provider.

Was this ever fixed in 2.3?  Do you have an ITS#?  And interesting a
replica out of memory would cause the provider not to log data into the
accesslog.  I'm curious because I'm seeing an issue right now where a
ton of deletes are executed, and all the replicas of the master are
going into refresh mode on the same entry periodically during the
deletes which makes me think that possibly the accesslog is missing
writing out some of the changes.

I never filed an ITS for it.  I discussed it on this list and you and
Howard gave me pointers.   I theorize the root cause was the design
problem that allowed a consumer to cause the provider to hold a thread on
the accesslog (ITS# 5985: replication lockout with syncrepl) and its
interaction with the replica that needed more memory caused changes
(during high volume change periods) to get backed up so far on the
provider that they fell off the end of the queue and were never written
to the accesslog.

To find the thread about my issue -- search for the subject
"delta-syncrepl missing changes" starting on January 30 and ending around
March 20 of this year in the openldap-software list.

Since upgrading the memory on that replica (from 1GB to 5GB), I have not
had the problem again.

Yeah, I remembered bits and pieces of the thread, but it's been a while. I'm not sure this is the same issue, because the replicas all have a ton of memory, but it could be similar, just because there are 6 replicas causing lockout (same people that got me to file ITS#5985 in the first place). Still, I think there should never be a case where the provider fails to write updates to the accesslog db, regardless of the load replicas are putting on it. Hopefully the ITS#5985 fix takes care of that.




Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
Zimbra ::  the leader in open source messaging and collaboration