[Date Prev][Date Next] [Chronological] [Thread] [Top]

syncrepl broke, connection loss

To: openldap-technical@openldap.org
Subject: syncrepl broke, connection loss
From: Peter Mogensen <apm@mutex.dk>
Date: Tue, 08 Dec 2009 17:32:45 +0100
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Hi,

I've loaded my mirror mode setup with data and let it run for a few day,
Both cn=config and the application database is mirrored.
Only server1 is receiving writes from the application.

OpenLDAP 2.4.20, BDB 4.8

After about 6 hours the mirror partly broke and I experience 3 symptoms:

1)

The syncrepl connection from server1->server2 for the applicationdatabase is missing and data only flows from server1 to server2 - notthe other way. The cn=config connections exists.


$ netstat -tna # shows
tcp    0  0 192.168.0.102:636    0.0.0.0:*            LISTEN
tcp 8125  0 192.168.0.102:45535  192.168.0.101:636    ESTABLISHED
tcp    0  0 192.168.0.102:636    192.168.0.101:34954  ESTABLISHED
tcp    0  0 192.168.0.102:45537  192.168.0.101:636    ESTABLISHED

Where it should show, something like:
tcp    0  0 192.168.0.101:636    0.0.0.0:*            LISTEN
tcp    0  0 192.168.0.101:34954  192.168.0.102:636    ESTABLISHED
tcp  261  0 192.168.0.101:33409  192.168.0.102:636    ESTABLISHED
tcp    0  0 192.168.0.101:636    192.168.0.102:45537  ESTABLISHED
tcp    0  0 192.168.0.101:636    192.168.0.102:33226  ESTABLISHED

2)
Meanwhile the log on server1 says:
Dec  8 02:04:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -1 retrying
Dec  8 02:05:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -2 retrying
Dec  8 02:06:03 server1 slapd[6863]: do_syncrepl: rid=004 rc -2 retrying
etc...

The first such entry appear around 6 hours after start of the mirror.

3)

If I try to change cn=config with ldapmodify on either server, server1will hang, not answering queries until I restart it.

For instance, if I do:
----------
dn: cn=config
changetype: modify
replace: olcLogLevel
olcLogLevel: None sync
-----------
... it'l hang.

I was able to connect and search the database on both server, to bothservers like (on server1), using client certs:ldapsearch -H ldaps://server2/ -YEXTERNAL -b cn=data,dc=example,dc=com-s sub -D cn=config '(cn=*)' + \*


So it's not that the TCP connection can't be established.
Which make me suspect that this is related to this thread:
http://www.mail-archive.com/openldap-software@openldap.org/msg16028.html

Now after 27 hours the connection finally came back by it self, andreplication works both ways.

The "rc -2 retrying" in the log on server1 stopped and was replaced by:

Dec  8 15:39:34 server1 slapd[11177]: do_syncrepl: rid=004 rc -2 retrying
Dec  8 15:40:34 server1 slapd[11177]: do_syncrepl: rid=004 rc -2 retrying

Dec 8 15:42:15 server1 slapd[11177]: => bdb_idl_insert_key: c_put idfailed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)Dec 8 15:47:05 server1 slapd[11177]: => bdb_idl_delete_key: c_del idfailed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)Dec 8 15:47:05 server1 slapd[11177]: conn=15694 op=16: attribute"entryCSN" index delete failureDec 8 15:47:06 server1 slapd[11177]: => bdb_idl_delete_key: c_del idfailed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)Dec 8 15:47:06 server1 slapd[11177]: conn=15569 op=36: attribute"entryCSN" index delete failure

... and a bit more of the same.

Trying to modify cn=config with ldapmodify still makes server1 (andldapmodify) hang though.


/Peter

Follow-Ups:
- Re: syncrepl broke, connection loss
  - From: Peter Mogensen <apm@mutex.dk>

Prev by Date: Re: restrict host login based on group
Next by Date: Re: syncrepl broke, connection loss
Index(es):
- Chronological
- Thread