Full_Name: Leonid Yuriev Version: 2.4.40 OS: RHEL7 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (31.130.36.33) Currently there is flaw that does not allow using OpenLDAP + LMDB in projects with high rate of updates (add/modify/delete). The root of these problems is that LMDB cannot reclaim freed pages by a presence of a "laggard reader", or in other words if they are still referenced by an active read. It should be noted, that withholding of reclaiming while the high update rate, burns free pages very quickly. Fix of the ITS#7904 significantly improves the situation, but does not solve all the problems completely. Firstly, seemingly innocuous use of something like a "mdb_stat -efff | less" can lead to the MDB_MAP_FULL and paralyze update. Second, ITS#7904 affects the syncrepl only partially. Approximately half of the "long read" operations occur without sending data to the network. Therefore, in many cases get MDB_MAP_FULL easily enough. This leads to a chain of problems and in some cases makes the replication impossible. To solve these problems, I made two simple improvements. 1) OOMKiller feature � just a fuse likely Linux kernel oomkiller. In generally, in case of MDB_MAP_FULL will send the SIGKILL to a �laggard reader�, but not to self. On success will retry to reclaim and continue. Engaged by �envflags oomkill�. 2) Dreamcatcher feature � really, it has caught and forced vanish our nightmares with syncrepl & MDB MAP_FULL ;) Based on ITS#7904 fix. In generally, renew read-txt when the lag from last txn is greater than a configured threshold and the percentage of pages allocated is greater than the configured value. Engaged by �dreamcatcher lag percentage�. Two patchsets will be attached soon.
The attached files is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights and/or interest in this work to any party. I, Leonid Yuriev am authorized by Peter-Service LLC, my employer, to release this work under the following terms. Peter-Service LLC hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
I assume ITS#7830 is the same issue.
On 10/23/2014 07:13 AM, leo@yuriev.ru wrote: > Subject: [PATCH 1/2] lmdb: ITS#7974 oomkiller feature. > (...) > +typedef int (MDB_oomkiller_func)(MDB_env *env, int pid, void* thread_id, size_t txn); Some thoughts about this: Instead of trusting the return value, it seems safer to re-check with mdb_reader_pid(). Like mdb_reader_check0() does. Maybe except on Windows, where file locks from dead processes may linger for a while until the OS reclaims them. Don't call it OOMkiller just because that's how you use it. Others might do something else, like sending a reader a signal which it interprets as "please wake up and finish your txn". Or it might decide this process is the one which should give up. This feature could make it interesting to let readers and writers tell each other things: Reserve some unused space in the reader table slots for stuff the reader's caller could put there, and some space for an impatient writer to leave a note. Could go in an independent commit if there is any demand for it though. -- Hallvard
Hallvard, thank for your comments. 2014-12-02 14:20 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > On 10/23/2014 07:13 AM, leo@yuriev.ru wrote: >> >> Subject: [PATCH 1/2] lmdb: ITS#7974 oomkiller feature. >> (...) >> +typedef int (MDB_oomkiller_func)(MDB_env *env, int pid, void* thread_id, >> size_t txn); > > > Some thoughts about this: > > Instead of trusting the return value, it seems safer to re-check > with mdb_reader_pid(). Like mdb_reader_check0() does. Maybe > except on Windows, where file locks from dead processes may > linger for a while until the OS reclaims them. I agree that usign mdb_reader_pid() is a better way. > Don't call it OOMkiller just because that's how you use it. > Others might do something else, like sending a reader a signal > which it interprets as "please wake up and finish your txn". > Or it might decide this process is the one which should give up. Could you suggest something other instead of "oomkiller"? Be noted, the "dreamcatcher" feature has a critical bug, which I has found and made fix while work on ITS#7968 & ITS#7987. Currently we hard testing a new code. So, in a week I plan to update both of the patches. > This feature could make it interesting to let readers and writers > tell each other things: Reserve some unused space in the reader > table slots for stuff the reader's caller could put there, and > some space for an impatient writer to leave a note. Could go > in an independent commit if there is any demand for it though. Communications between readers and writers may be interesting, but I think it is over-engineering in the LMDB context. IMHO the LMDB's code has a lot of technical debt, so it is more usefull to re-implement all of from a scratch, under a rules of perfectly-clean codestyle. May be I will do this, but on a basis and after a release of 1Hippeus - it is a extreme performance engine for zero-copy mesaging in a shared memory, partially like Intel DPDK. Leonid.
On 12/04/2014 11:31 AM, leo@yuriev.ru wrote: > Could you suggest something other instead of "oomkiller"? Don't have a particularly good idea. oom_func, maybe. >> This feature could make it interesting to let readers and writers >> tell each other things: Reserve some unused space in the reader >> table slots for stuff the reader's caller could put there, and >> some space for an impatient writer to leave a note. Could go >> in an independent commit if there is any demand for it though. > > Communications between readers and writers may be interesting, but I > think it is over-engineering in the LMDB context. Yes... I guess I was thinking mostly of the prototype, in case we want to add something like it later. Might be useful to add a void* argument which would be NULL now but could be used later, if needed. -- Hallvard
*** Issue 7830 has been marked as a duplicate of this issue. ***