[Date Prev][Date Next] [Chronological] [Thread] [Top]

Can't contact LDAP server



I have a 2-way multi-master setup on srv1.foo.com (EDT) and srv2.foo.com (PDT).

For about 2hrs this morning srv2 was syslog'ing "Can't contact LDAP
server" while it was in a replication conversation with srv1:

Sep  9 05:29:45 srv2 slapd[9413]: do_syncrep2: rid=001 (-1) Can't
contact LDAP server
Sep  9 05:29:45 srv2 slapd[9413]: do_syncrepl: rid=001 rc -1 retrying
Sep  9 05:30:00 srv2 slapd[9413]: do_syncrep2: rid=001
LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Sep  9 05:30:01 srv2 last message repeated 34 times
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 be_search (0)
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 ...
Sep  9 05:30:01 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:01 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 be_modify ... (0)
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 be_search (0)
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 ...
Sep  9 05:30:01 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:01 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:01 srv2 slapd[9413]: syncrepl_entry: rid=001 be_modify ... (0
...
Sep  9 05:30:29 srv2 slapd[9413]: syncrepl_entry: rid=001
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Sep  9 05:30:29 srv2 slapd[9413]: syncrepl_entry: rid=001 be_search (0)
Sep  9 05:30:29 srv2 slapd[9413]: syncrepl_entry: rid=001 ...
Sep  9 05:30:29 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:29 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:29 srv2 slapd[9413]: syncrepl_entry: rid=001 be_modify ... (0)
Sep  9 05:30:29 srv2 slapd[9413]: do_syncrep2: rid=001 (-1) Can't
contact LDAP server
Sep  9 05:30:29 srv2 slapd[9413]: do_syncrepl: rid=001 rc -1 retrying
Sep  9 05:30:45 srv2 slapd[9413]: do_syncrep2: rid=001
LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Sep  9 05:30:45 srv2 last message repeated 34 times
Sep  9 05:30:45 srv2 slapd[9413]: syncrepl_entry: rid=001
LDAP_RES_SEARCH_ENTRY(LDAP_SYNC_ADD)
Sep  9 05:30:45 srv2 slapd[9413]: syncrepl_entry: rid=001 be_search (0)
Sep  9 05:30:45 srv2 slapd[9413]: syncrepl_entry: rid=001 ...
Sep  9 05:30:45 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:45 srv2 slapd[9413]: syncprov_matchops: skipping original sid 003
Sep  9 05:30:45 srv2 slapd[9413]: syncrepl_entry: rid=001 be_modify ... (0)
...

During that time no errors or closes were logged in the srv2 syslog. I
tried bouncing each slapd, but the issue continued. After about 2hrs
it stopped occurring.

My question is: What would cause the "Can't contact LDAP server
message"? The srv1 side doesn't log any error or log that the
connection was closed. The text "Can't contact" would seem to imply
that the error occurred during a connection attempt, but these errors
seemed to occur during a active conversation. Does the srv2 side
notice an read or write error on the socket and abandon the
connection? I've been looking through the code trying to determine
what causes that error message. Does it happen on a single read/write
error? Does it retry a few times? I hate to just say "Must be a
network error" without some due diligence.



My env:

RHEL 5.5
OpenLDAP 2.4.25
BerkeleyDB 4.8.40
OpenSSL 1.0.0d
Cyrus SASL 2.1.23
2-way Multi-master

====================
#srv1 slapd.conf -> slapd.d
include /appl/openldap/etc/schema/core.schema
include /appl/openldap/etc/schema/cosine.schema
include /appl/openldap/etc/schema/nis.schema
include /appl/openldap/etc/schema/inetorgperson.schema
include /appl/openldap/etc/schema/foo.com.schema

argsfile /appl/openldap/var/run/slapd.args
pidfile /appl/openldap/var/run/slapd.pid

threads 16
tool-threads 4

idletimeout 300
writetimeout 5

reverse-lookup off

timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000

password-hash {SSHA}

loglevel stats sync

serverid 1 ldap://srv1.foo.com:10806

modulepath /appl/openldap/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la

database config
rootdn cn=manager,dc=foo,dc=com

database monitor
rootdn cn=manager,dc=foo,dc=com

database hdb
rootdn cn=manager,dc=foo,dc=com
suffix dc=foo,dc=com
directory /appl/openldap/var/data/dc=foo,dc=com
cachesize 1000
idlcachesize 3000
cachefree 5
checkpoint 128 15

index objectClass eq
index entryCSN eq
index entryUUID eq

syncrepl rid=001
  provider=ldap://srv1.foo.com:10806
  type=refreshAndPersist
  retry="15 +"
  bindmethod=simple
  binddn="cn=replicator,dc=foo,dc=com"
  credentials="secret"
  searchbase="dc=foo,dc=com"
  starttls=no
  schemachecking=off

syncrepl rid=002
  provider=ldap://srv2.foo.com:10806
  type=refreshAndPersist
  retry="15 +"
  bindmethod=simple
  binddn="cn=replicator,dc=foo,dc=com"
  credentials="secret"
  searchbase="dc=foo,dc=com"
  starttls=no
  schemachecking=off

mirrormode TRUE

overlay syncprov
syncprov-checkpoint 50 10
syncprov-sessionlog 100

limits dn.exact="cn=replicator,dc=foo,dc=com"
  time.soft=unlimited time.hard=unlimited
  size.soft=unlimited size.hard=unlimited

====================
#srv2 slapd.conf -> slapd.d
include /appl/openldap/etc/schema/core.schema
include /appl/openldap/etc/schema/cosine.schema
include /appl/openldap/etc/schema/nis.schema
include /appl/openldap/etc/schema/inetorgperson.schema
include /appl/openldap/etc/schema/foo.com.schema

argsfile /appl/openldap/var/run/slapd.args
pidfile /appl/openldap/var/run/slapd.pid

threads 16
tool-threads 4

idletimeout 300
writetimeout 5

reverse-lookup off

timelimit time.soft=30 time.hard=300
sizelimit size.soft=500 size.hard=1000

password-hash {SSHA}

loglevel stats sync

serverid 2 ldap://srv2.foo.com:10806

modulepath /appl/openldap/libexec
moduleload back_monitor.la
moduleload back_hdb.la
moduleload syncprov.la

database config
rootdn cn=manager,dc=foo,dc=com

database monitor
rootdn cn=manager,dc=foo,dc=com

database hdb
rootdn cn=manager,dc=foo,dc=com
suffix dc=foo,dc=com
directory /appl/openldap/var/data/dc=foo,dc=com
cachesize 1000
idlcachesize 3000
cachefree 5
checkpoint 128 15

index objectClass eq
index entryCSN eq
index entryUUID eq

syncrepl rid=001
  provider=ldap://srv1.foo.com:10806
  type=refreshAndPersist
  retry="15 +"
  bindmethod=simple
  binddn="cn=replicator,dc=foo,dc=com"
  credentials="secret"
  searchbase="dc=foo,dc=com"
  starttls=no
  schemachecking=off

syncrepl rid=002
  provider=ldap://srv2.foo.com:10806
  type=refreshAndPersist
  retry="15 +"
  bindmethod=simple
  binddn="cn=replicator,dc=foo,dc=com"
  credentials="secret"
  searchbase="dc=foo,dc=com"
  starttls=no
  schemachecking=off

mirrormode TRUE

overlay syncprov
syncprov-checkpoint 50 10
syncprov-sessionlog 100

limits dn.exact="cn=replicator,dc=foo,dc=com"
  time.soft=unlimited time.hard=unlimited
  size.soft=unlimited size.hard=unlimited