OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Incoming/7974
Full headers

From: leo@yuriev.ru
Subject: LDBM's "laggard reader" flaw still present, in continue of ITS#7904
Compose comment
Download message
State:
0 replies:
6 followups: 1 2 3 4 5 6

Major security issue: yes  no

Notes:

Notification:


Date: Thu, 23 Oct 2014 04:24:12 +0000
From: leo@yuriev.ru
To: openldap-its@OpenLDAP.org
Subject: LDBM's "laggard reader" flaw still present, in continue of ITS#7904
Full_Name: Leonid Yuriev
Version: 2.4.40
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)


Currently there is flaw that does not allow using OpenLDAP + LMDB in projects
with high rate of updates (add/modify/delete). The root of these problems is
that LMDB cannot reclaim freed pages by a presence of a "laggard reader", or in
other words if they are still referenced by an active read.

It should be noted, that withholding of reclaiming while the high update rate,
burns free pages very quickly. Fix of the ITS#7904 significantly improves the
situation, but does not solve all the problems completely.

Firstly, seemingly innocuous use of something like a "mdb_stat -efff | less" can
lead to the MDB_MAP_FULL and paralyze update.

Second, ITS#7904 affects the syncrepl only partially. Approximately half of the
"long read" operations occur without sending data to the network. Therefore, in
many cases get MDB_MAP_FULL easily enough. This leads to a chain of problems and
in some cases makes the replication impossible.

To solve these problems, I made two simple improvements.

1) OOMKiller feature . just a fuse likely Linux kernel oomkiller.

In generally, in case of MDB_MAP_FULL will send the SIGKILL to a .laggard
reader., but not to self. On success will retry to reclaim and continue. Engaged
by .envflags oomkill..

2) Dreamcatcher feature . really, it has caught and forced vanish our nightmares
with syncrepl & MDB MAP_FULL ;)

Based on ITS#7904 fix. In generally, renew read-txt when the lag from last txn
is greater than a configured threshold and the percentage of pages allocated is
greater than the configured value. Engaged by .dreamcatcher lag percentage..

Two patchsets will be attached soon.

Followup 1

Download message
Date: Thu, 23 Oct 2014 09:13:21 +0400
From: Leonid Yuriev <leo@yuriev.ru>
To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in continue
 of ITS#7904
This is a multi-part message in MIME format.
--------------000000040604020102000407
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 7bit

The attached files is derived from OpenLDAP Software. All of the 
modifications
to OpenLDAP Software represented in the following patch(es) were 
developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following 
terms.

Peter-Service LLC hereby places the following modifications to OpenLDAP 
Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose 
with or
without attribution and/or other notice.


--------------000000040604020102000407
Content-Type: text/x-patch;
 name="0001-lmdb-ITS-7974-oomkiller-feature.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0001-lmdb-ITS-7974-oomkiller-feature.patch"

From 85fce95eaa0e71ee43625ccc202c173f7d4acb4a Mon Sep 17 00:00:00 2001
From: Leo Yuriev <leo@yuriev.ru>
Date: Tue, 21 Oct 2014 19:25:32 +0400
Subject: [PATCH 1/2] lmdb: ITS#7974 oomkiller feature.

---
 libraries/liblmdb/lmdb.h | 34 +++++++++++++++++
 libraries/liblmdb/mdb.c  | 95 ++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h
index bdbb0b9..a3ca62e 100644
--- a/libraries/liblmdb/lmdb.h
+++ b/libraries/liblmdb/lmdb.h
@@ -1537,6 +1537,40 @@ int	mdb_reader_list(MDB_env *env, MDB_msg_func *func,
void *ctx);
 	 * @return 0 on success, non-zero on failure.
 	 */
 int	mdb_reader_check(MDB_env *env, int *dead);
+
+	/** @brief A callback function for killing a laggard readers,
+	 * called in case of MDB_MAP_FULL error.
+	 *
+	 * @param[in] env An environment handle returned by #mdb_env_create().
+	 * @param[in] pid pid of the reader process.
+	 * @param[in] thread_id thread_id of the reader thread.
+	 * @param[in] txn Transaction number on which stalled.
+	 * @return -1 on failure (reader is not killed),
+	 *         0 on a race condition (no such reader),
+	 *		   1 on success (reader was killed),
+	 *		   >1 on success (reader was SURE killed).
+	 */
+typedef int (MDB_oomkiller_func)(MDB_env *env, int pid, void* thread_id, size_t
txn);
+
+	/** @brief Set the oomkiller callback.
+	 *
+	 * Callback will be called only on out-of-pages case for killing
+	 * a laggard readers to allowing reclaiming of freeDB.
+	 *
+	 * @param[in] env An environment handle returned by #mdb_env_create().
+	 * @param[in] oomkiller A #MDB_oomkiller_func function or NULL to disable.
+	 */
+void mdb_env_set_oomkiller(MDB_env *env, MDB_oomkiller_func *oomkiller);
+
+	/** @brief Get the current oomkiller callback.
+	 *
+	 * Callback will be called only on out-of-pages case for killing
+	 * a laggard readers to allowing reclaiming of freeDB.
+	 *
+	 * @param[in] env An environment handle returned by #mdb_env_create().
+	 * @return A #MDB_oomkiller_func function or NULL if disabled.
+	 */
+MDB_oomkiller_func* mdb_env_get_oomkiller(MDB_env *env);
 /**	@} */
 
 #ifdef __cplusplus
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 6cc3433..e60d83d 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -1145,6 +1145,7 @@ struct MDB_env {
 #endif
 	void		*me_userctx;	 /**< User-settable context */
 	MDB_assert_func *me_assert_func; /**< Callback for assertion failures */
+	MDB_oomkiller_func *me_oomkiller; /**< Callback for killing laggard readers
*/
 };
 
 	/** Nested transaction */
@@ -1900,6 +1901,77 @@ mdb_find_oldest(MDB_txn *txn)
 	return oldest;
 }
 
+static txnid_t
+mdb_laggard_reader(MDB_env *env, int *laggard)
+{
+	txnid_t tail = 0;
+	if (laggard)
+		*laggard = -1;
+	if (env->me_txns->mti_txnid > 1) {
+		int i;
+		MDB_reader *r = env->me_txns->mti_readers;
+
+		tail = env->me_txns->mti_txnid - 1;
+		for (i = env->me_txns->mti_numreaders; --i >= 0; ) {
+			if (r[i].mr_pid) {
+				txnid_t mr = r[i].mr_txnid;
+				if (tail > mr) {
+					tail = mr;
+					if (laggard)
+						*laggard = i;
+				}
+			}
+		}
+	}
+
+	return tail;
+}
+
+static int
+mdb_oomkill_laggard(MDB_env *env)
+{
+	int dead, idx;
+	txnid_t tail = mdb_laggard_reader(env, &idx);
+	if (idx < 0)
+		return 0;
+
+	for(;;) {
+		MDB_reader *r;
+		MDB_THR_T tid;
+		pid_t pid;
+		int rc;
+
+		if (mdb_reader_check(env, &dead))
+			break;
+
+		if (dead && tail < mdb_laggard_reader(env, NULL))
+			return 1;
+
+		if (!env->me_oomkiller)
+			break;
+
+		r = &env->me_txns->mti_readers[ idx ];
+		pid = r->mr_pid;
+		tid = r->mr_tid;
+		if (r->mr_txnid != tail || pid <= 0)
+			continue;
+
+		rc = env->me_oomkiller(env, pid, (void*) tid, tail);
+		if (rc < 0)
+			break;
+
+		if (

Message of length 10619 truncated


Followup 2

Download message
Date: Thu, 23 Oct 2014 09:26:05 +0400
From: Leonid Yuriev <leo@yuriev.ru>
To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in continue
 of ITS#7904
This is a multi-part message in MIME format.
--------------060806060106060808080102
Content-Type: text/plain; charset=windows-1251; format=flowed
Content-Transfer-Encoding: 7bit

The attached files is derived from OpenLDAP Software. All of the 
modifications
to OpenLDAP Software represented in the following patch(es) were 
developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned 
rights
and/or interest in this work to any party. I, Leonid Yuriev am 
authorized by
Peter-Service LLC, my employer, to release this work under the following 
terms.

Peter-Service LLC hereby places the following modifications to OpenLDAP 
Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose 
with or
without attribution and/or other notice.



--------------060806060106060808080102
Content-Type: text/x-patch;
 name="0001-lmdb-ITS-7974-a-reading-lag-for-dreamcatcher.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*0="0001-lmdb-ITS-7974-a-reading-lag-for-dreamcatcher.patch"

From a30ece9b236b7217481c086fd27b133bd3404317 Mon Sep 17 00:00:00 2001
From: Leo Yuriev <leo@yuriev.ru>
Date: Tue, 21 Oct 2014 15:34:22 +0400
Subject: [PATCH 1/2] lmdb: ITS#7974 a reading lag for dreamcatcher.

---
 libraries/liblmdb/lmdb.h | 11 +++++++++++
 libraries/liblmdb/mdb.c  | 20 ++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h
index a3ca62e..82eff14 100644
--- a/libraries/liblmdb/lmdb.h
+++ b/libraries/liblmdb/lmdb.h
@@ -1571,6 +1571,17 @@ void mdb_env_set_oomkiller(MDB_env *env,
MDB_oomkiller_func *oomkiller);
 	 * @return A #MDB_oomkiller_func function or NULL if disabled.
 	 */
 MDB_oomkiller_func* mdb_env_get_oomkiller(MDB_env *env);
+
+	/** @brief Returns a reading lag.
+	 *
+	 * Returns an information for estimate how much given read-only
+	 * transaction is lagging relative the to actual head.
+	 *
+	 * @param[in] txn A transaction handle returned by #mdb_txn_begin()
+	 * @param[out] percent Percentage of page allocation in the database.
+	 * @return Number of transactions committed after the given was started for
read, or -1 on failure.
+	 */
+int  mdb_txn_straggler(MDB_txn *txnm, int *percent);
 /**	@} */
 
 #ifdef __cplusplus
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index e60d83d..a417c9b 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2823,6 +2823,26 @@ mdb_dbis_update(MDB_txn *txn, int keep)
 		env->me_numdbs = n;
 }
 
+int
+mdb_txn_straggler(MDB_txn *txn, int *percent)
+{
+	MDB_env	*env;
+	MDB_meta *meta;
+	txnid_t lag;
+
+	if (! txn || ! txn->mt_u.reader)
+		return -1;
+
+	env = txn->mt_env;
+	meta = env->me_metas[ mdb_env_pick_meta(env) ];
+	if (percent) {
+		long cent = env->me_maxpg / 100;
+		*percent = (meta->mm_last_pg + cent / 2 + 1) / (cent ? cent : 1);
+	}
+	lag = meta->mm_txnid - txn->mt_u.reader->mr_txnid;
+	return (0 > (int) lag) ? ~0u >> 1: lag;
+}
+
 /** Common code for #mdb_txn_reset() and #mdb_txn_abort().
  * May be called twice for readonly txns: First reset it, then abort.
  * @param[in] txn the transaction handle to reset
-- 
2.1.0


--------------060806060106060808080102
Content-Type: text/x-patch;
 name="0002-slapd-ITS-7974-dreamcatcher-feature.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename="0002-slapd-ITS-7974-dreamcatcher-feature.patch"

From 133cec8eadc93fe3083d110e58259ba2a067908c Mon Sep 17 00:00:00 2001
From: Leo Yuriev <leo@yuriev.ru>
Date: Tue, 21 Oct 2014 15:38:28 +0400
Subject: [PATCH 2/2] slapd: ITS#7974 dreamcatcher feature.

---
 servers/slapd/back-mdb/config.c | 46 +++++++++++++++++++++++++++++-
 servers/slapd/back-mdb/search.c | 62 ++++++++++++++++++++++-------------------
 2 files changed, 79 insertions(+), 29 deletions(-)

diff --git a/servers/slapd/back-mdb/config.c b/servers/slapd/back-mdb/config.c
index b54da49..65034b1 100644
--- a/servers/slapd/back-mdb/config.c
+++ b/servers/slapd/back-mdb/config.c
@@ -39,7 +39,8 @@ enum {
 	MDB_MAXREADERS,
 	MDB_MAXSIZE,
 	MDB_MODE,
-	MDB_SSTACK
+	MDB_SSTACK,
+	MDB_DREAMCATCHER
 };
 
 static ConfigTable mdbcfg[] = {
@@ -74,6 +75,10 @@ static ConfigTable mdbcfg[] = {
 		mdb_cf_gen, "( OLcfgDbAt:12.2 NAME 'olcDbMaxSize' "
 		"DESC 'Maximum size of DB in bytes' "
 		"SYNTAX OMsInteger SINGLE-VALUE )", NULL, NULL },
+	{ "dreamcatcher", "lag> <percentage", 3, 3, 0,
ARG_MAGIC|MDB_DREAMCATCHER,
+		mdb_cf_gen, "( OLcfgDbAt:12.4 NAME 'olcDbDreamcatcher' "
+			"DESC 'Dreamcatcher to avoids withhold of reclaiming' "
+			"SYNTAX OMsDirectoryString SINGLE-VALUE )",NULL, NULL },
 	{ "mode", "mode", 2, 2, 0, ARG_MAGIC|MDB_MODE,
 		mdb_cf_gen, "( OLcfgDbAt:0.3 NAME 'olcDbMode' "
 		"DESC 'Unix permissions of database files' "
@@ -319,6 +324,23 @@ mdb_cf_gen( ConfigArgs *c )
 			}
 			break;
 
+		case MDB_DREAMCATCHER:
+			if 

Message of length 9400 truncated


Followup 3

Download message
Date: Thu, 23 Oct 2014 09:42:26 +0400
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in
 continue of ITS#7904
From: =?UTF-8?B?0JvQtdC+0L3QuNC0INCu0YDRjNC10LI=?= <leo@yuriev.ru>
To: openldap-its@openldap.org
I assume ITS#7830 is the same issue.



Followup 4

Download message
Date: Tue, 02 Dec 2014 12:20:04 +0100
From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
To: leo@yuriev.ru
CC: openldap-its@OpenLDAP.org
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in continue
 of ITS#7904
On 10/23/2014 07:13 AM, leo@yuriev.ru wrote:
> Subject: [PATCH 1/2] lmdb: ITS#7974 oomkiller feature.
> (...)
> +typedef int (MDB_oomkiller_func)(MDB_env *env, int pid, void* thread_id,
size_t txn);

Some thoughts about this:

Instead of trusting the return value, it seems safer to re-check
with mdb_reader_pid().  Like mdb_reader_check0() does.  Maybe
except on Windows, where file locks from dead processes may
linger for a while until the OS reclaims them.

Don't call it OOMkiller just because that's how you use it.
Others might do something else, like sending a reader a signal
which it interprets as "please wake up and finish your txn".
Or it might decide this process is the one which should give up.

This feature could make it interesting to let readers and writers
tell each other things: Reserve some unused space in the reader
table slots for stuff the reader's caller could put there, and
some space for an impatient writer to leave a note.  Could go
in an independent commit if there is any demand for it though.

-- 
Hallvard



Followup 5

Download message
Date: Thu, 4 Dec 2014 14:31:06 +0400
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in
 continue of ITS#7904
From: =?UTF-8?B?0JvQtdC+0L3QuNC0INCu0YDRjNC10LI=?= <leo@yuriev.ru>
To: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
Cc: openldap-its@openldap.org
Hallvard, thank for your comments.

2014-12-02 14:20 GMT+03:00 Hallvard Breien Furuseth
<h.b.furuseth@usit.uio.no>:
> On 10/23/2014 07:13 AM, leo@yuriev.ru wrote:
>>
>> Subject: [PATCH 1/2] lmdb: ITS#7974 oomkiller feature.
>> (...)
>> +typedef int (MDB_oomkiller_func)(MDB_env *env, int pid, void*
thread_id,
>> size_t txn);
>
>
> Some thoughts about this:
>
> Instead of trusting the return value, it seems safer to re-check
> with mdb_reader_pid().  Like mdb_reader_check0() does.  Maybe
> except on Windows, where file locks from dead processes may
> linger for a while until the OS reclaims them.

I agree that usign mdb_reader_pid() is a better way.

> Don't call it OOMkiller just because that's how you use it.
> Others might do something else, like sending a reader a signal
> which it interprets as "please wake up and finish your txn".
> Or it might decide this process is the one which should give up.

Could you suggest something other instead of "oomkiller"?
Be noted, the "dreamcatcher" feature has a critical bug, which I has
found and made fix while work on ITS#7968 & ITS#7987.
Currently we hard testing a new code.
So, in a week I plan to update both of the patches.

> This feature could make it interesting to let readers and writers
> tell each other things: Reserve some unused space in the reader
> table slots for stuff the reader's caller could put there, and
> some space for an impatient writer to leave a note.  Could go
> in an independent commit if there is any demand for it though.

Communications between readers and writers may be interesting, but I
think it is over-engineering in the LMDB context.
IMHO the LMDB's code has a lot of technical debt, so it is more
usefull to re-implement all of from a scratch, under a rules of
perfectly-clean codestyle.
May be I will do this, but on a basis and after a release of 1Hippeus
- it is a extreme performance engine for zero-copy mesaging in a
shared memory, partially like Intel DPDK.

Leonid.



Followup 6

Download message
Date: Thu, 04 Dec 2014 17:03:32 +0100
From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>
To: leo@yuriev.ru
CC: openldap-its@OpenLDAP.org
Subject: Re: (ITS#7974) LDBM's "laggard reader" flaw still present, in continue
 of ITS#7904
On 12/04/2014 11:31 AM, leo@yuriev.ru wrote:
> Could you suggest something other instead of "oomkiller"?

Don't have a particularly good idea.  oom_func, maybe.

>> This feature could make it interesting to let readers and writers
>> tell each other things: Reserve some unused space in the reader
>> table slots for stuff the reader's caller could put there, and
>> some space for an impatient writer to leave a note.  Could go
>> in an independent commit if there is any demand for it though.
>
> Communications between readers and writers may be interesting, but I
> think it is over-engineering in the LMDB context.

Yes... I guess I was thinking mostly of the prototype, in case we
want to add something like it later.  Might be useful to add a void*
argument which would be NULL now but could be used later, if needed.

-- 
Hallvard


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org