[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#6861) slapd and syncrepl performance problem



Full_Name: Jeff Wheeler
Version: 2.4.20
OS: RHEL4.8
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (70.103.136.179)


Under ldapsearch/modify load of approximately 160 simultaneous sessions towards
the First Master only, after several hours replication falls behind and searches
and modifies that previously were <1 second now take 10 seconds+.  

We see also these messages continually every 5 seconds or so:
Mar 10 04:02:05 auvhen2be01 slapd[29178]: => bdb_idl_insert_key: c_put id
failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994)
Mar 10 04:02:05 auvhen2be01 slapd[29178]: conn=2249 op=13995: attribute
"entryCSN" index add failure

LDAP is configured in 2-way multi-master mirrormode on 2 Quad core HP blades
with 72GB RAM each.  Cache size set to 10GB, id2entry=18GB.
The replication agreements are as such:
olcSyncrepl: {0}rid=2 provider=ldap://auvhen4be05-traffic.vm.vodafone.net.au b
 indmethod=simple timeout=0 network-timeout=0 binddn="cn=Directory Manager,o=h
 3gau" credentials="admin123" starttls=no filter="(objectclass=*)" searchbase=
 "o=h3gau" scope=sub schemachecking=off type=refreshAndPersist retry="60 +"
..
olcOverlay: {0}syncprov
olcSpCheckpoint: 100 600
..
Eventually, the syncrepl connection goes to TIME_WAIT state and replications
stop.
Restarting slapd does not fix and we have to reload from backup.

We tuned the following and no longer does slapd repl connection go to TIME_WAIT,
however it still falls hours behind on updates and performs slowly as before.
Cache size set to 20GB, olcDbIDLcacheSize: 20000000000, olcThreads: 32
Restarting slapd results in the repl connection binding, searching, but then
unbinds immediately:
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 fd=19 ACCEPT from
IP=10.176.77.23:50798 (IP=10.176.77.47:389)
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 BIND dn="cn=directory
manager,o=h3gau" method=128
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 BIND dn="cn=directory
manager,o=h3gau" mech=SIMPLE ssf=0
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 RESULT tag=97 err=0
text=
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SRCH base="o=h3gau"
scope=2 deref=0 filter="(objectClass=*)"
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SRCH attr=* +
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SEARCH RESULT tag=101
err=0 nentries=0 text=
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=2 UNBIND
Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 fd=19 closed