[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl replication taking too long(not sync)



On Tuesday, 18 August 2009 21:30:31 Rodrigo Costa wrote:
> openldap software community,
>
> I'm facing some difficulties to have database synchronized with
> syncrepl. I'm running the latest openldap 2.4.17 version which after
> these issues I compiled with gdb.
>
> I have a DB(divided really in 2 DBs) where each one has around 4 million
> entrances. Based in memory limitations I have a dncachesize configured
> with around 3000000, or smaller than the maximum number of entrances in
> DBs.
>
> I loaded both server with all indexes and the same data. Starting both
> there isn't any need for syncrepl(thread from slapd) to make any search
> and then both mirrors are in sync and consuming each other. If a new
> entrance is create the other consumes since both are listening right on
> when it happens.
>
> If I stop one mirror and create even small number of entrances in the
> other, like 10, when I try to start the other provider the syncrepl
> enters in conventional syncrepl replication which search the DB for
> synchronization.
>
> This never ends causing mirrors not in synchronization. What I can see is :
>
> 1) Stop the Second mirror, like for slapcat(calling second and first as
> reference);
> 2) Add a few entrances in First mirror(kept on-line);
> 3) Second mirror start again after First mirror had some new entrances
> added by normal operation;
> 4) Syncrepl in second mirror enters in the conventional syncrepl
> replication since it detects that something is different between mirrors;
> 5) Until dncache is not filled the First mirror slapd cpu consumption is
> below 100%(around 50%) and search happens in a good manner since monitor
> shows it;
> 6) After dncache is filled(oscillates above 3mi) the First mirror cpu
> consumption enter in 100% consumption, oscillating between 98% to 102%;
> 7) The search never ends and then systems are never in sync. Cpu is
> permanently in high consumption, almost always in 100%.
>
> I let days this process running and I could see only a one or two
> entrances in sync. By the CPU looks like something is hanging the search
> where some loop is keeping the thread consuming one full cpu processing.
>
> I could collect some GDB information which I'm sending attached. Not
> sure how to interpret this overlay_walk.
>
> The idea is to stop one mirror for backup releasing this task from the
> primary server. For this replication would need to happen.
>
> Your comments are very welcome.

You have provided absolutely no configuration information. There may well be 
other explanations for this behaviour than the dncachesize. I can think of at 
least two.

You also haven't provided information on the systems you are using. E.g., you 
may be trying on systems with too little memory (e.g., <1GB), which might be 
totally inadequate for the amount of data you have.

Regards,
Buchan