[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: syncrepl replication taking too long(not sync)

To: bgmilne@staff.telkomsa.net
Subject: Re: syncrepl replication taking too long(not sync)
From: Rodrigo Costa <rlvcosta@yahoo.com>
Date: Wed, 19 Aug 2009 05:14:44 -0700 (PDT)
Cc: openldap-software@openldap.org
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1250684084; bh=B+5iMO42PpvQvwiKZaO6mM0WLk6lAGa9FSqmQW/8mkU=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=SbzfMyPEh0CAOzIu00pLpL692tkU9BW7I4lSetCDQhUXLmQZ9XiNLx9WdFMRQih+92gDlRO6W3WWmeqN11Xnd9gld8Toc2GPtGn9LZYDEUCuJAG1ZZqow3XOY2yCAGpcAuA0vryy7YCu0l8amGiwETnaXegZAF7pC/OfiLJJSRA=
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:MIME-Version:Content-Type; b=Dduho4rZk4xRpKI6MX9ncPd36y61CCQt1YwmVlxRtadzeP6YoH0nc9Mj/kbpCbi31ahU+j6k0nlQ3iV7BZ35NnuPcuZT5fDo07owgR6q+bpy0lxgOMRWPbGMxiBcZeNmD4BlSfKIdA7rnxQr8dwzffzyhCXpO9nPFEKOfMyQtC8=;

Buchan,

I have a 32bit system so I can only allocate 3GB for slapd. The machines have 12GB each but I can only allocate 3GB for a single process in a 32bit system with CentOS5.3.

The tuning is done based on memory constraints and I think it should be more than enough since the traffic I have is low; only DB is a little large(4 million entrances).

In the end of the day I think that I cannot have dncache with a smaller number than records in your DB. This means I cannot have a DB that cannot have all dncache allocated in memory. I was wondering if this is the case so I will about to use search or replication. The DB can run only in single system with these restrictions.

Thanks,

Rodrigo.

Buchan Milne wrote:

On Tuesday, 18 August 2009 21:30:31 Rodrigo Costa wrote:

openldap software community,

I'm facing some difficulties to have database synchronized with
syncrepl. I'm running the latest openldap 2.4.17 version which after
these issues I compiled with gdb.

I have a DB(divided really in 2 DBs) where each one has around 4 million
entrances. Based in memory limitations I have a dncachesize configured
with around 3000000, or smaller than the maximum number of entrances in
DBs.

I loaded both server with all indexes and the same data. Starting both
there isn't any need for syncrepl(thread from slapd) to make any search
and then both mirrors are in sync and consuming each other. If a new
entrance is create the other consumes since both are listening right on
when it happens.

If I stop one mirror and create even small number of entrances in the
other, like 10, when I try to start the other provider the syncrepl
enters in conventional syncrepl replication which search the DB for
synchronization.

This never ends causing mirrors not in synchronization. What I can see is :

1) Stop the Second mirror, like for slapcat(calling second and first as
reference);
2) Add a few entrances in First mirror(kept on-line);
3) Second mirror start again after First mirror had some new entrances
added by normal operation;
4) Syncrepl in second mirror enters in the conventional syncrepl
replication since it detects that something is different between mirrors;
5) Until dncache is not filled the First mirror slapd cpu consumption is
below 100%(around 50%) and search happens in a good manner since monitor
shows it;
6) After dncache is filled(oscillates above 3mi) the First mirror cpu
consumption enter in 100% consumption, oscillating between 98% to 102%;
7) The search never ends and then systems are never in sync. Cpu is
permanently in high consumption, almost always in 100%.

I let days this process running and I could see only a one or two
entrances in sync. By the CPU looks like something is hanging the search
where some loop is keeping the thread consuming one full cpu processing.

I could collect some GDB information which I'm sending attached. Not
sure how to interpret this overlay_walk.

The idea is to stop one mirror for backup releasing this task from the
primary server. For this replication would need to happen.

Your comments are very welcome.


You have provided absolutely no configuration information. There may well be 
other explanations for this behaviour than the dncachesize. I can think of at 
least two.

You also haven't provided information on the systems you are using. E.g., you 
may be trying on systems with too little memory (e.g., <1GB), which might be 
totally inadequate for the amount of data you have.

Regards,
Buchan

Follow-Ups:
- Re: syncrepl replication taking too long(not sync)
  - From: Quanah Gibson-Mount <quanah@zimbra.com>

Prev by Date: Re: Using replication on RedHat servers
Next by Date: Re: syncrepl replication taking too long(not sync)
Index(es):
- Chronological
- Thread