8179 – LMDB: Feature Request: mdb_env_check_mapsize(env)

Issue 8179 - LMDB: Feature Request: mdb_env_check_mapsize(env)

Summary: LMDB: Feature Request: mdb_env_check_mapsize(env)

Status:	UNCONFIRMED

Alias:	None

Product:	LMDB
Classification:	Unclassified
Component:	liblmdb (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-28 15:55 UTC by scott@gameranger.com
Modified:	2021-04-14 01:34 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description scott@gameranger.com 2015-06-28 15:55:48 UTC

Full_Name: Scott Kevill
Version: LMDB 0.9.14
OS: CentOS 5-7
URL: ftp://ftp.openldap.org/incoming/scott-kevill-150628.patch
Submission from: (NULL) (60.224.160.178)


mdb_txn_begin / mdb_txn_renew will return MDB_MAP_RESIZED for the current
process if another process has changed the mapsize AND the data has grown past
the current process mapsize. However, by that point, in a multithreaded
environment it may be difficult to wind back to a "safe" point of no active txns
in order to call mdb_env_set_mapsize(env, 0) to update the mapsize.

Requesting:
    int  mdb_env_check_mapsize(MDB_env *env);
to return either MDB_SUCCESS or MDB_MAP_RESIZED. This should be lightweight
enough to call frequently enough to decide whether to initiate a "safe" point to
enable calling mdb_env_set_mapsize(env, 0). Return MDB_SUCCESS if the MDB has
not been opened yet.

Scenario:
- Process W is a writer
- Process R is a reader with multiple reader threads (read-only)
- Both processes are running and using a 2GB mapsize MDB
- Data is currently 1GB
- Ops decide W should increase mapsize to 3GB allowing for more expansion
- W calls mdb_env_set_mapsize(env, 3GB)
- R's mapsize remains at 2GB
%0 D Data grows
- R:1 thread begins a moderately long read-only txn
- Data grows past 2GB
- R:2 thread tries to begin a read-only txn, but fails with MDB_MAP_RESIZED
- R:2 thread cannot call mdb_env_set_mapsize(env, 0), because of R:1's txn
- A major control flow issue exists now, as R:2 may be deep into work, but can't
continue nor back out

With mdb_env_check_mapsize(), R could periodically test it and if needed,
initiate a "safe" point where none of R's threads have active txns. This could
be done long before the data size actually reaches R's original mapsize (ie.
when data is ~1GB rather >2GB).

- Enforcing synchronisation to a "safe" point might be relatively expensive, so
it would be important to know if the mapsize has changed first. ie. Making
mdb_env_set_mapsize(env, 0) lightweight for the unchanged mapsize case isn't
enough

I've attached a sample patch against master (scott-kevill-150628.patch). I've
tested it against 0.9.14. The code is trivial and is based on the behaviour of
mdb_env_set_mapsize(env, 0). I didn't use git for the patch, so not too
difficult to get working. Less than 10 new lines, but for the record, I release
it to public domain. Feel free to massage it as needed.

Comment 1 Leonardo Lopes 2021-04-13 23:39:15 UTC

Hello.

I can't say if this is really a bug, but this is one of the (very) few results I could find while serching the internet for MDB_MAP_RESIZED error in OpenLDAP.

I'll describe my setup and the circumstances of the error.

I have  OpenLDAP/LMDB installed from Debian 10 default packages in a amd64 box. The package versions are:

- liblmdb: 0.9.22-1
- slapd: 2.4.47

Along with the usual settings, I chose the mdb backend with maxsize = 7516192768 (7GB). The on-disk base size (the data.mdb file) is 134MB. I also have loglevel = stats 

Everything seems to work flawless and fast, so all of sudden syslog prints the this message:

Apr  9 15:06:09 vm-ldap-01 slapd[12826]: mdb_opinfo_get: err MDB_MAP_RESIZED: Database contents grew beyond environment mapsize(-30785)

and seconds later, the slapd daemon stop to answer all requests.

For the record, my workload is essentially reads with ~1000k SRCH ops, ~750k BIND ops and only ~100 ADD/DEL/MOD ops in a typical day.

I tried to relate the error with anything possible, but no success. All I have is that when the error occurs, there were always an ADD operation logged right before.

As I said, search the internet were of almost no help, except for this bug report and for the source code. After read the sources and the context of occurrence for the MDB_MAP_RESIZED error, I thought it may really be a bug.


Thanks for your consideration.

Comment 2 Quanah Gibson-Mount 2021-04-14 01:34:13 UTC

(In reply to Leonardo Lopes from comment #1)
> Hello.
> 
> I can't say if this is really a bug, but this is one of the (very) few
> results I could find while serching the internet for MDB_MAP_RESIZED error
> in OpenLDAP.

Don't hijack stuff.  Direct your question to openldap-technical@openldap.org