[Date Prev][Date Next]
LDBM's "laggard reader" flaw still present, in continue of ITS#7904
- To: openldap-its@OpenLDAP.org
- Subject: LDBM's "laggard reader" flaw still present, in continue of ITS#7904
- From: firstname.lastname@example.org
- Date: Thu, 23 Oct 2014 04:24:12 +0000
- Auto-submitted: auto-generated (OpenLDAP-ITS)
Full_Name: Leonid Yuriev
Submission from: (NULL) (188.8.131.52)
Currently there is flaw that does not allow using OpenLDAP + LMDB in projects
with high rate of updates (add/modify/delete). The root of these problems is
that LMDB cannot reclaim freed pages by a presence of a "laggard reader", or in
other words if they are still referenced by an active read.
It should be noted, that withholding of reclaiming while the high update rate,
burns free pages very quickly. Fix of the ITS#7904 significantly improves the
situation, but does not solve all the problems completely.
Firstly, seemingly innocuous use of something like a "mdb_stat -efff | less" can
lead to the MDB_MAP_FULL and paralyze update.
Second, ITS#7904 affects the syncrepl only partially. Approximately half of the
"long read" operations occur without sending data to the network. Therefore, in
many cases get MDB_MAP_FULL easily enough. This leads to a chain of problems and
in some cases makes the replication impossible.
To solve these problems, I made two simple improvements.
1) OOMKiller feature ? just a fuse likely Linux kernel oomkiller.
In generally, in case of MDB_MAP_FULL will send the SIGKILL to a ?laggard
reader?, but not to self. On success will retry to reclaim and continue. Engaged
by ?envflags oomkill?.
2) Dreamcatcher feature ? really, it has caught and forced vanish our nightmares
with syncrepl & MDB MAP_FULL ;)
Based on ITS#7904 fix. In generally, renew read-txt when the lag from last txn
is greater than a configured threshold and the percentage of pages allocated is
greater than the configured value. Engaged by ?dreamcatcher lag percentage?.
Two patchsets will be attached soon.