[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: multi / standby master: incomplete replication after downtime (?) [SOLVED]

To: openldap-technical <openldap-technical@openldap.org>
Subject: Re: multi / standby master: incomplete replication after downtime (?) [SOLVED]
From: Elmar Marschke <elmar.marschke@schenker.at>
Date: Thu, 26 Aug 2010 18:03:26 +0200
In-reply-to: <4C6BF962.6070604@OpenLDAP.org>
References: <4C6BA174.2000808@schenker.at> <4C6BAE7C.1070904@phillipoux.net> <4C6BC257.5020404@schenker.at> <4C6BDBBD.9020701@itc.li> <4C6BEDFD.5090206@schenker.at> <4C6BF962.6070604@OpenLDAP.org>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6

On 18.08.2010 17:16, Rein Tollevik wrote:

On 08/18/2010 04:28 PM, Elmar Marschke wrote:

Here's the logfile of MASTER:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
===_BEGIN_CHANGES_WHILE_BOTH_UP_===
Aug 18 15:30:04 ldapmaster slapd[8017]: slap_queue_csn: queing
0x7f00f317b580 20100818133004.663851Z#000000#000#000000


Your ServerID setting is incorrect, and you are using the default
ServerID=0 on both systems.  The ServerID is included in the csn value,
the second to last number (#000# here).  Ensure that the ServerID URL is
the exact hostname of the systems it runs on, or that slapd is able to
select the correct sid based on its -h listener argument.

Start slapd with -d config and verify that both logs lines with
differing SID= values.

Rein


Hi all,

thanks to your hints i think i got it working now :)

Several details had to be changed. Perhaps someone has similar problems,so in the following i give a detailed description what had to be done inmy case. Everything was executed on two freshly out-of-the-box installedopenSuSE 11.3 x86_64 with SuSE-shipped openldap 2.4.21.

First; according the inaccurate time on my testmachines: i deleted theopenSuSE ntp.conf and additionally now i use other ntp servers astimesync source.

Complete ntp.conf on ldapmaster and ldapslave now is:
-----------------------------------------------------
server ptbtime1.ptb.de prefer
server ptbtime2.ptb.de
tinker panic 0
driftfile /var/lib/ntp/drift/ntp.drift

This results in less different offset values:
---------------------------------------------
ldapmaster:/etc/openldap # ntpq -p; ssh ldapslave ntpq -p
     remote           refid       offset
========================================
*ptbtime1.ptb.de .PTB.            -1.218
+ptbtime2.ptb.de .PTB.            -1.305
     remote           refid      offset
==========================================
*ptbtime1.ptb.de .PTB.            -0.381
+ptbtime2.ptb.de .PTB.             0.203
ldapmaster:/etc/openldap #

Second problem: the missing or wrong serverID in the csn values. To makeclear what obviously helped i will describe how i create(d) my slapd.d/online configuration. (My original slapd.conf was taken from the"openldap 2.4" book by Oliver Liebel & John Martin Ungar -- thanksOliver :)!

------------------------------------------------------------

ldapmaster:/etc/openldap # slaptest -f /etc/openldap/slapd.conf -F/etc/openldap/slapd.d/hdb_db_open: database "dc=local,dc=site":db_open(/var/lib/ldap//id2entry.bdb) failed: No such file or directory (2).backend_startup_one (type=hdb, suffix="dc=local,dc=site"): bi_db_openfailed! (2)

slap_startup failed (test would succeed using the -u switch)
------------------------------------------------------------

The following rcldap start every time resulted in:
--------------------------------------------------
ldapmaster:/etc/openldap # rcldap start

Starting ldap-serverstartproc: exit status of parent of/usr/lib/openldap/slapd: 1


                                           failed
-------------------------------------

and in /var/log/messages was written:
--------------------------------------

Aug 26 15:47:22 ldapmaster slapd[7805]: @(#) $OpenLDAP: slapd 2.4.21(Jul 5 2010 13:35:22)$#012#011abuild@build16:/usr/src/packages/BUILD/openldap-2.4.21/servers/slapdAug 26 15:47:22 ldapmaster slapd[7805]: olcSyncrepl: value #0:<olcSyncrepl> invalid URLAug 26 15:47:22 ldapmaster slapd[7805]: config error processingolcDatabase={0}config,cn=config: <olcSyncrepl> invalid URL

Aug 26 15:47:22 ldapmaster slapd[7805]: slapd stopped.

Aug 26 15:47:22 ldapmaster slapd[7805]: connections_destroy: nothing todestroy.


I greped for "olcSyncrepl" in slapd.d:
---------------------------------------
ldapmaster:/etc/openldap # grep -r olcSyncrepl slapd.d/

slapd.d/cn=config/cn=schema.ldif:olcAttributeTypes: ( OLcfgDbAt:0.11NAME 'olcSyncrepl' EQUALITY caseIgnoreMatcslapd.d/cn=config/cn=schema.ldif: olcSizeLimit $ olcSyncUseSubentry $olcSyncrepl $ olcTimeLimit $ olcUpdateDNslapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=003provider=ldap://ldapmaster.local.site uri="" bindmethod=sslapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=004provider=ldap://ldapslave.local.site uri="" bindmethod=sislapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=001provider=ldap://ldapmaster.local.site uri="" bindmethod=sslapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=002provider=ldap://ldapslave.local.site uri="" bindmethod=si

ldapmaster:/etc/openldap #

Some internet research told me, that the empty uri="" should be theproblem, and that it would help to remove it. BEFORE i always removedit; and right; starting of openldap worked then. BUT apparently thisalso leads to the missing serverID in my csn values (respectively thatthey all had default "000"). NOW i changed those config.ldif andhdb.ldif files from slapd.d (in which grep found the string"olcSyncrepl"), to give uri an appropriate value.

Open each file, search for "uri". It's found two times in every file.Before each occurence there's a variable "provider", which is set tosomething. For example in slapd.d/cn=config/olcDatabase={0}config.ldif :

provider=ldap://ldapmaster.local.site uri=""
and
provider=ldap://ldapslave.local.site uri=""

The value of provider ALSO has to be put into uri, that afterwards itlooks like:

provider=ldap://ldapmaster.local.site uri="ldap://ldapmaster.local.site";
and
provider=ldap://ldapslave.local.site uri="ldap://ldapslave.local.site";

I did it on every machine (no file copy from one to another).
After that; "rcldap start" (also) works without problems.

But that still was not enough...

Third thing to do: additionally on each machine make slapd start with"-h" parameter correctly set; like Rein and Jonathan wrote. According tothe SuSE-way of configuration (like it or not, but in this case i don'thave a choice ;)) this can be done in /etc/sysconfig/openldap:

set OPENLDAP_SLAPD_PARAMS correctly on every machine; e.g. on master:

OPENLDAP_SLAPD_PARAMS="-h ldap://ldapmaster.local.site";

And, by the way, to make sure that ONLY online configuration style(slapd.d) is used; one can set OPENLDAP_CONFIG_BACKEND from "" to:

OPENLDAP_CONFIG_BACKEND="ldap"

This all together seems to solve the problem. Now LDAP-Objects can bealtered, removed and added on one of both machines while the other oneis down; and all changes are replicated as soon as the "downed" machinecomes back up again. (At least in my tests until now ;))!


Thanks for your time, and best regards..
elmar

References:
- multi / standby master: incomplete replication after downtime (?)
  - From: Elmar Marschke <elmar.marschke@schenker.at>
- Re: multi / standby master: incomplete replication after downtime (?)
  - From: Jonathan Clarke <jonathan@phillipoux.net>
- Re: multi / standby master: incomplete replication after downtime (?)
  - From: Elmar Marschke <elmar.marschke@schenker.at>
- Re: multi / standby master: incomplete replication after downtime (?)
  - From: Oliver Liebel <oliver@itc.li>
- Re: multi / standby master: incomplete replication after downtime (?)
  - From: Elmar Marschke <elmar.marschke@schenker.at>

Prev by Date: Re: OpenLDAP and Load Balance F5 issue
Next by Date: Re: Problem with persistent search in OpenLDAP 2.4.23
Index(es):
- Chronological
- Thread