[Date Prev][Date Next] [Chronological] [Thread] [Top]

OpenLDAP unresponsible after modify/replication operation



Hi there,

I have an active-active LDAP cluster with N-way multi-master
configuration. The slapd version is: 2.4.23-7.2 (Debian Squeeze)

After a modify operation in the cn=config tree, we get and
unresponsible server (all nodes in the cluster)


This is the log (256 loglevel) for the node who gets the original MOD operation.
Note the time cut between 12:23:55 and 12:42:54
The deferring operation message was repeated before that cut, and
today It's the first time I see it.
[...]
Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 op=0 RESULT tag=97 err=0 text=
Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 op=2 UNBIND
Apr 26 12:23:55 ldap2 slapd[19556]: conn=7352 fd=204 closed
Apr 26 12:23:55 ldap2 slapd[19556]: conn=1399 op=27 MOD
dn="olcDatabase={1}hdb,cn=config"
Apr 26 12:23:55 ldap2 slapd[19556]: conn=1399 op=27 MOD attr=olcAccess
***
Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=001 rc -5
retrying (4 retries left)
Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=006 rc -5
retrying (4 retries left)
Apr 26 12:42:54 ldap2 slapd[19556]: do_syncrepl: rid=003 rc -5
retrying (4 retries left)
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1399 op=27 RESULT tag=103 err=0 text=
Apr 26 12:42:54 ldap2 slapd[19556]: conn=7353 fd=135 ACCEPT from
IP=[xxx:xx:xx:xx::xx]:58394 (IP=[::]:636)
Apr 26 12:42:54 ldap2 slapd[19556]: conn=7354 fd=136 ACCEPT from
IP=xx.xx.x.xx:58617 (IP=0.0.0.0:636)
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=5551
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=3876
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1399 fd=48 closed (connection lost)
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1451 fd=50 closed (connection lost)
[...]
Apr 26 12:42:54 ldap2 slapd[19556]: conn=7224 op=2 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=7224 fd=134 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=6456 op=6 RESULT tag=97 err=0 text=
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=8 ABANDON msg=8
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=2035
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=10 ABANDON msg=10
Apr 26 12:42:54 ldap2 slapd[19556]: conn=3876 op=13 ABANDON msg=13
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=24 ABANDON msg=24
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=26 ABANDON msg=26
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=28 ABANDON msg=28
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=30 ABANDON msg=30
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 op=31 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1242 fd=38 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 op=7 ABANDON msg=7
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 op=8 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5967 fd=118 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 op=12 ABANDON msg=12
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 op=13 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2405 fd=44 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=24 ABANDON msg=24
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=26 ABANDON msg=26
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=28 ABANDON msg=28
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=30 ABANDON msg=30
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5551 op=6 ABANDON msg=6
Apr 26 12:42:54 ldap2 slapd[19556]: conn=2181 op=31 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=5 ABANDON msg=5
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=7 ABANDON msg=7
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=9 ABANDON msg=9
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 op=10 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=5924 fd=115 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1389 op=9 ABANDON msg=9
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1167 op=15 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1167 fd=37 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 op=10 ABANDON msg=10
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 op=11 UNBIND
Apr 26 12:42:54 ldap2 slapd[19556]: conn=1903 fd=58 closed
Apr 26 12:42:54 ldap2 slapd[19556]: conn=4400 op=3 ABANDON msg=3
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: pending operations
Apr 26 12:42:54 ldap2 slapd[19556]: connection_input: conn=6631
deferring operation: too many executing
Apr 26 12:42:54 ldap2 slapd[19556]: conn=4400 op=5 UNBIND
[...]

In other node, also a cut in the log

[...]
Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SRCH
base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)"
Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SRCH attr=* +
Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=1 SEARCH RESULT
tag=101 err=0 nentries=0 text=
Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 op=2 UNBIND
Apr 26 12:22:40 ldap3 slapd[7650]: conn=8292 fd=158 closed
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8293 fd=158 ACCEPT from
IP=192.168.21.21:44703 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8294 fd=163 ACCEPT from
IP=192.168.21.22:58503 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8295 fd=164 ACCEPT from
IP=192.168.21.22:58506 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8293 fd=158 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8294 fd=163 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:41 ldap3 slapd[7650]: conn=8295 fd=164 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:42 ldap3 slapd[7650]: slap_client_connect:
URI=ldaps://ldap1.xxx.xx/ DN="cn=admin,cn=config" ldap_sasl_bind_s
failed (-5)
***
Apr 26 12:42:54 ldap3 slapd[7650]: do_syncrepl: rid=001 rc -5 retrying
(4 retries left)
Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35
Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35
Apr 26 12:42:54 ldap3 slapd[7650]: null_callback : error code 0x35
Apr 26 12:42:54 ldap3 slapd[7650]: conn=8297 fd=165 ACCEPT from
IP=[xxx:xxx:xxx:xx::5]:42973 (IP=[::]:636)
Apr 26 12:42:54 ldap3 slapd[7650]: conn=8296 fd=160 ACCEPT from
IP=192.168.21.22:58509 (IP=0.0.0.0:636)
Apr 26 12:42:54 ldap3 slapd[7650]: conn=8293 op=1 UNBIND
Apr 26 12:42:54 ldap3 slapd[7650]: conn=8293 op=0 BIND
dn="cn=admin,cn=config" method=128
[...]

In other node, also a cut after the replication operation, the my
service restart:

[...]
Apr 26 12:22:37 ldap1 slapd[5945]: conn=8377 fd=159 closed (connection lost)
Apr 26 12:22:37 ldap1 slapd[5945]: conn=8373 fd=156 closed (connection lost)
Apr 26 12:22:37 ldap1 slapd[5945]: conn=8374 fd=157 closed (connection lost)
Apr 26 12:22:37 ldap1 slapd[5945]: conn=8375 fd=158 closed (connection lost)
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8378 fd=158 ACCEPT from
IP=192.168.21.23:57377 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 fd=159 ACCEPT from
IP=192.168.21.23:57379 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8380 fd=160 ACCEPT from
IP=192.168.21.22:33628 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 fd=161 ACCEPT from
IP=192.168.21.22:33629 (IP=0.0.0.0:636)
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 fd=159 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8380 fd=160 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 BIND
dn="cn=admin,dc=xxx,dc=es" method=128
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 BIND
dn="cn=admin,dc=xxx,dc=es" mech=SIMPLE ssf=0
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=0 RESULT tag=97 err=0 text=
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=1 SRCH
base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)"
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8379 op=1 SRCH attr=* +
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 fd=161 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 BIND
dn="cn=admin,dc=xxx,dc=es" method=128
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 BIND
dn="cn=admin,dc=xxx,dc=es" mech=SIMPLE ssf=0
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=0 RESULT tag=97 err=0 text=
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=1 SRCH
base="dc=xxx,dc=es" scope=2 deref=0 filter="(objectClass=*)"
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8381 op=1 SRCH attr=* +
Apr 26 12:22:41 ldap1 slapd[5945]: conn=8378 fd=158 TLS established
tls_ssf=128 ssf=128
Apr 26 12:22:42 ldap1 slapd[5945]: conn=8379 op=1 SEARCH RESULT
tag=101 err=0 nentries=1 text=
Apr 26 12:22:42 ldap1 slapd[5945]: conn=8381 op=1 SEARCH RESULT
tag=101 err=0 nentries=1 text=
Apr 26 12:22:42 ldap1 slapd[5945]: slap_client_connect:
URI=ldaps://ldap3.xxx.xx/ DN="cn=admin,cn=config" ldap_sasl_bind_s
failed (-5)
***
Apr 26 12:42:54 ldap1 slapd[5945]: daemon: shutdown requested and initiated.
Apr 26 12:42:54 ldap1 slapd[5945]: conn=1009 fd=13 closed (slapd shutdown)
Apr 26 12:42:54 ldap1 slapd[5945]: conn=1013 fd=20 closed (slapd shutdown)
Apr 26 12:42:54 ldap1 slapd[5945]: conn=1014 fd=21 closed (slapd shutdown)
[...]

Any idea?

Best regards.


-- 
/* Arturo Borrero Gonzalez || cer.inet@linuxmail.org */
/* Use debian gnu/linux! Best OS ever! */