8813 – MDB_VL32 causes MDB_TXN_FULL

Issue 8813 - MDB_VL32 causes MDB_TXN_FULL

Summary: MDB_VL32 causes MDB_TXN_FULL

Status:	UNCONFIRMED

Alias:	None

Product:	LMDB
Classification:	Unclassified
Component:	liblmdb (show other issues)
Version:	unspecified
Hardware:	All All

Importance:	--- normal
Target Milestone:	---
Assignee:	OpenLDAP project

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-03-04 13:19 UTC by markus@greenrobot.de
Modified:	2020-03-23 17:38 UTC (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description markus@greenrobot.de 2018-03-04 13:19:33 UTC

Full_Name: Markus Junginger
Version: 
OS: 
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (77.189.91.168)


We have a 1GB LMDB file with 7M K/V entries. With MDB_VL32, we always get a
MDB_TXN_FULL error for a transaction that is removing 2M entries (probably fails
at a lower count, we haven't measured that). Without MDB_VL32 it works fine.

Another observation:
Once the transaction fails with MDB_TXN_FULL, the data file has grown to 1.5GB.
Without MDB_VL32, the data file stays consistent at 1 GB even if all 7M entries
are deleted in a single transaction.

Expected behavior:
No MDB_TXN_FULL error, no data file growth.

Comment 1 markus@greenrobot.de 2018-03-14 21:30:33 UTC

Some more details:

Transaction read only pages are running out in mdb_rpage_get(). MDB_txn.
mt_rpages[0] is at 4096 and thus MDB_TXN_FULL is returned. This happens
after around 100K mdb_cursor_del() calls.

How can we get millions of deletes to work in a single transaction?



It's an essential feature to us and I am happy to contribute under some
guidance.



So, here's my hope: given that the cleanup involves only consecutive keys,
we could consider a bulk delete function (delete an entire range from
position 1 to position 2). This should be much more efficient. Basically,
for most of the data, there's no need to operate on the node level.
Deleting entire leaf pages should be relatively simple (rebalance on the
parent branch page)? Not sure if rebalancing could also cope with cutting
off entire branches (superefficient if that would work).



Thanks,

Markus

Comment 2 Howard Chu 2019-10-21 15:14:46 UTC

markus@greenrobot.de wrote:
> Full_Name: Markus Junginger
> Version: 
> OS: 
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (77.189.91.168)
> 
> 
> We have a 1GB LMDB file with 7M K/V entries. With MDB_VL32, we always get a
> MDB_TXN_FULL error for a transaction that is removing 2M entries (probably fails
> at a lower count, we haven't measured that). Without MDB_VL32 it works fine.
> 
> Another observation:
> Once the transaction fails with MDB_TXN_FULL, the data file has grown to 1.5GB.
> Without MDB_VL32, the data file stays consistent at 1 GB even if all 7M entries
> are deleted in a single transaction.
> 
> Expected behavior:
> No MDB_TXN_FULL error, no data file growth.
> 
> 
> 
I believe this was fixed by commit 7edf504106c61639a89b9a4e5987242598196932 in mdb.master.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 3 markus@greenrobot.de 2019-10-23 18:56:02 UTC

On 21.10.19 17:14, Howard Chu wrote:
> I believe this was fixed by commit 
> 7edf504106c61639a89b9a4e5987242598196932 in mdb.master. 

I can not confirm that this works.

This is the stack trace where MDB_TXN_FULL is still returned with latest 
mdb.master (note: line numbers shown here are off by 60 compared to 
mdb.master):

mdb_rpage_get mdb.c:6196
mdb_page_get mdb.c:6378
mdb_page_search_lowest mdb.c:6492
mdb_node_move mdb.c:8842
mdb_rebalance mdb.c:9366
mdb_page_merge mdb.c:9166
mdb_rebalance mdb.c:9373
mdb_cursor_del0 mdb.c:9426
mdb_cursor_del mdb.c:811


Code segment:

if (tl[0].mid >= MDB_TRPAGE_MAX)
     return MDB_TXN_FULL;

Debugger shows tl[0].mid to be 4095.


Hope that helps.

Markus

Comment 4 Howard Chu 2019-10-24 01:49:39 UTC

markus@greenrobot.de wrote:
> On 21.10.19 17:14, Howard Chu wrote:
>> I believe this was fixed by commit 
>> 7edf504106c61639a89b9a4e5987242598196932 in mdb.master. 
> 
> I can not confirm that this works.
> 
> This is the stack trace where MDB_TXN_FULL is still returned with latest 
> mdb.master (note: line numbers shown here are off by 60 compared to 
> mdb.master):
> 
> mdb_rpage_get mdb.c:6196
> mdb_page_get mdb.c:6378
> mdb_page_search_lowest mdb.c:6492
> mdb_node_move mdb.c:8842
> mdb_rebalance mdb.c:9366
> mdb_page_merge mdb.c:9166
> mdb_rebalance mdb.c:9373
> mdb_cursor_del0 mdb.c:9426
> mdb_cursor_del mdb.c:811
> 
> 
> Code segment:
> 
> if (tl[0].mid >= MDB_TRPAGE_MAX)
>  Â Â Â  return MDB_TXN_FULL;
> 
> Debugger shows tl[0].mid to be 4095.

You're free to define MDB_TRPAGE_MAX to a larger value. It just means
you increase the chance of overrunning the 2GB available address space.
There's no magic, you can't fit every 64bit database workload into only
a 32bit address space. When your transactions are too large, the normal
thing to do is commit more often so they don't grow so large.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 markus@greenrobot.de 2019-10-24 13:03:00 UTC

On 24.10.19 03:49, Howard Chu wrote:
> You're free to define MDB_TRPAGE_MAX to a larger value. It just means
> you increase the chance of overrunning the 2GB available address space.
> There's no magic, you can't fit every 64bit database workload into only
> a 32bit address space. When your transactions are too large, the normal
> thing to do is commit more often so they don't grow so large.

I think you meant MDB_TRPAGE_SIZE? At least that seemed to work, while 
MDB_TRPAGE_MAX ended up in another MDB_TXN_FULL (tl[0].mid < 
MDB_TRPAGE_SIZE check failed).

Doubling MDB_TRPAGE_SIZE also doubled the threshold of the object count 
where it starts failing: using 8192 it was able to remove 4M entries 
while the limit with 4096 was 2M entries.

MDB_TRPAGE_SIZE is only used to malloc txn->mt_rpages (and some checks), 
as far as I can tell.

To make a reasonable decision here, could you please confirm:

1. txn->mt_rpages is RAM only and has no impact on mmaped sections 
(and/or file format).

2. RAM consumption is defined by number of transactions only: per 
transaction MDB_TRPAGE_SIZE bytes is allocated. Other than that it does 
not increase memory consumption.

3. There is no other disadvantage other than higher memory consumption 
per transaction as written in the previous point.

In that case, I think I'll go with 16K (or even 32K). Does not seem too 
much for an transaction, given that it quadruples the amount of data 
allowed to be processed. 32 bit devices are not servers, so I do not 
expect a high number of concurrent transactions anyway.

Thanks!

Markus

Comment 6 markus@greenrobot.de 2019-10-24 14:00:48 UTC

On 24.10.19 15:03, Markus Junginger wrote:
> per transaction MDB_TRPAGE_SIZE bytes is allocated

I guess, I was too quick here. It's MDB_TRPAGE_SIZE * 16.

Also, I guess, txn->mt_rpages are just pointers to in-memory cached 
pages (or chunks?), so the memory consumption of those is much higher.

The docs mention "chunks of 16 pages"; does that refer to mt_rpages? If 
so, a single mt_rpages pointer can actually hold on to 16 * 4K = 64 KB? 
Thus a MDB_TRPAGE_SIZE 4096 can go up to 256 MB memory? That changes the 
picture of course...

PS.: let me express the irony once more that "removing data" is the most 
costly operation here ;-)