[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: multi / standby master: incomplete replication after downtime (?) [SOLVED]



On 18.08.2010 17:16, Rein Tollevik wrote:
On 08/18/2010 04:28 PM, Elmar Marschke wrote:

Here's the logfile of MASTER:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
===_BEGIN_CHANGES_WHILE_BOTH_UP_===
Aug 18 15:30:04 ldapmaster slapd[8017]: slap_queue_csn: queing
0x7f00f317b580 20100818133004.663851Z#000000#000#000000

Your ServerID setting is incorrect, and you are using the default
ServerID=0 on both systems.  The ServerID is included in the csn value,
the second to last number (#000# here).  Ensure that the ServerID URL is
the exact hostname of the systems it runs on, or that slapd is able to
select the correct sid based on its -h listener argument.

Start slapd with -d config and verify that both logs lines with
differing SID= values.

Rein


Hi all,

thanks to your hints i think i got it working now :)

Several details had to be changed. Perhaps someone has similar problems, so in the following i give a detailed description what had to be done in my case. Everything was executed on two freshly out-of-the-box installed openSuSE 11.3 x86_64 with SuSE-shipped openldap 2.4.21.

First; according the inaccurate time on my testmachines: i deleted the openSuSE ntp.conf and additionally now i use other ntp servers as timesync source.
Complete ntp.conf on ldapmaster and ldapslave now is:
-----------------------------------------------------
server ptbtime1.ptb.de prefer
server ptbtime2.ptb.de
tinker panic 0
driftfile /var/lib/ntp/drift/ntp.drift

This results in less different offset values:
---------------------------------------------
ldapmaster:/etc/openldap # ntpq -p; ssh ldapslave ntpq -p
     remote           refid       offset
========================================
*ptbtime1.ptb.de .PTB.            -1.218
+ptbtime2.ptb.de .PTB.            -1.305
     remote           refid      offset
==========================================
*ptbtime1.ptb.de .PTB.            -0.381
+ptbtime2.ptb.de .PTB.             0.203
ldapmaster:/etc/openldap #

Second problem: the missing or wrong serverID in the csn values. To make clear what obviously helped i will describe how i create(d) my slapd.d/ online configuration. (My original slapd.conf was taken from the "openldap 2.4" book by Oliver Liebel & John Martin Ungar -- thanks Oliver :)!
------------------------------------------------------------
ldapmaster:/etc/openldap # slaptest -f /etc/openldap/slapd.conf -F /etc/openldap/slapd.d/ hdb_db_open: database "dc=local,dc=site": db_open(/var/lib/ldap//id2entry.bdb) failed: No such file or directory (2). backend_startup_one (type=hdb, suffix="dc=local,dc=site"): bi_db_open failed! (2)
slap_startup failed (test would succeed using the -u switch)
------------------------------------------------------------

The following rcldap start every time resulted in:
--------------------------------------------------
ldapmaster:/etc/openldap # rcldap start
Starting ldap-serverstartproc: exit status of parent of /usr/lib/openldap/slapd: 1

                                           failed
-------------------------------------

and in /var/log/messages was written:
--------------------------------------
Aug 26 15:47:22 ldapmaster slapd[7805]: @(#) $OpenLDAP: slapd 2.4.21 (Jul 5 2010 13:35:22) $#012#011abuild@build16:/usr/src/packages/BUILD/openldap-2.4.21/servers/slapd Aug 26 15:47:22 ldapmaster slapd[7805]: olcSyncrepl: value #0: <olcSyncrepl> invalid URL Aug 26 15:47:22 ldapmaster slapd[7805]: config error processing olcDatabase={0}config,cn=config: <olcSyncrepl> invalid URL
Aug 26 15:47:22 ldapmaster slapd[7805]: slapd stopped.
Aug 26 15:47:22 ldapmaster slapd[7805]: connections_destroy: nothing to destroy.

I greped for "olcSyncrepl" in slapd.d:
---------------------------------------
ldapmaster:/etc/openldap # grep -r olcSyncrepl slapd.d/
slapd.d/cn=config/cn=schema.ldif:olcAttributeTypes: ( OLcfgDbAt:0.11 NAME 'olcSyncrepl' EQUALITY caseIgnoreMatc slapd.d/cn=config/cn=schema.ldif: olcSizeLimit $ olcSyncUseSubentry $ olcSyncrepl $ olcTimeLimit $ olcUpdateDN slapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=003 provider=ldap://ldapmaster.local.site uri="" bindmethod=s slapd.d/cn=config/olcDatabase={0}config.ldif:olcSyncrepl: rid=004 provider=ldap://ldapslave.local.site uri="" bindmethod=si slapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=001 provider=ldap://ldapmaster.local.site uri="" bindmethod=s slapd.d/cn=config/olcDatabase={1}hdb.ldif:olcSyncrepl: rid=002 provider=ldap://ldapslave.local.site uri="" bindmethod=si
ldapmaster:/etc/openldap #

Some internet research told me, that the empty uri="" should be the problem, and that it would help to remove it. BEFORE i always removed it; and right; starting of openldap worked then. BUT apparently this also leads to the missing serverID in my csn values (respectively that they all had default "000"). NOW i changed those config.ldif and hdb.ldif files from slapd.d (in which grep found the string "olcSyncrepl"), to give uri an appropriate value.

Open each file, search for "uri". It's found two times in every file. Before each occurence there's a variable "provider", which is set to something. For example in slapd.d/cn=config/olcDatabase={0}config.ldif :
provider=ldap://ldapmaster.local.site uri=""
and
provider=ldap://ldapslave.local.site uri=""

The value of provider ALSO has to be put into uri, that afterwards it looks like:
provider=ldap://ldapmaster.local.site uri="ldap://ldapmaster.local.site";
and
provider=ldap://ldapslave.local.site uri="ldap://ldapslave.local.site";

I did it on every machine (no file copy from one to another).
After that; "rcldap start" (also) works without problems.

But that still was not enough...
Third thing to do: additionally on each machine make slapd start with "-h" parameter correctly set; like Rein and Jonathan wrote. According to the SuSE-way of configuration (like it or not, but in this case i don't have a choice ;)) this can be done in /etc/sysconfig/openldap:
set OPENLDAP_SLAPD_PARAMS correctly on every machine; e.g. on master:

OPENLDAP_SLAPD_PARAMS="-h ldap://ldapmaster.local.site";

And, by the way, to make sure that ONLY online configuration style (slapd.d) is used; one can set OPENLDAP_CONFIG_BACKEND from "" to:
OPENLDAP_CONFIG_BACKEND="ldap"

This all together seems to solve the problem. Now LDAP-Objects can be altered, removed and added on one of both machines while the other one is down; and all changes are replicated as soon as the "downed" machine comes back up again. (At least in my tests until now ;))!

Thanks for your time, and best regards..
elmar