Full_Name: Markus Junginger Version: OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (77.189.91.168) We have a 1GB LMDB file with 7M K/V entries. With MDB_VL32, we always get a MDB_TXN_FULL error for a transaction that is removing 2M entries (probably fails at a lower count, we haven't measured that). Without MDB_VL32 it works fine. Another observation: Once the transaction fails with MDB_TXN_FULL, the data file has grown to 1.5GB. Without MDB_VL32, the data file stays consistent at 1 GB even if all 7M entries are deleted in a single transaction. Expected behavior: No MDB_TXN_FULL error, no data file growth.
Some more details: Transaction read only pages are running out in mdb_rpage_get(). MDB_txn. mt_rpages[0] is at 4096 and thus MDB_TXN_FULL is returned. This happens after around 100K mdb_cursor_del() calls. How can we get millions of deletes to work in a single transaction? It's an essential feature to us and I am happy to contribute under some guidance. So, here's my hope: given that the cleanup involves only consecutive keys, we could consider a bulk delete function (delete an entire range from position 1 to position 2). This should be much more efficient. Basically, for most of the data, there's no need to operate on the node level. Deleting entire leaf pages should be relatively simple (rebalance on the parent branch page)? Not sure if rebalancing could also cope with cutting off entire branches (superefficient if that would work). Thanks, Markus
markus@greenrobot.de wrote: > Full_Name: Markus Junginger > Version: > OS: > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (77.189.91.168) > > > We have a 1GB LMDB file with 7M K/V entries. With MDB_VL32, we always get a > MDB_TXN_FULL error for a transaction that is removing 2M entries (probably fails > at a lower count, we haven't measured that). Without MDB_VL32 it works fine. > > Another observation: > Once the transaction fails with MDB_TXN_FULL, the data file has grown to 1.5GB. > Without MDB_VL32, the data file stays consistent at 1 GB even if all 7M entries > are deleted in a single transaction. > > Expected behavior: > No MDB_TXN_FULL error, no data file growth. > > > I believe this was fixed by commit 7edf504106c61639a89b9a4e5987242598196932 in mdb.master. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 21.10.19 17:14, Howard Chu wrote: > I believe this was fixed by commit > 7edf504106c61639a89b9a4e5987242598196932 in mdb.master. I can not confirm that this works. This is the stack trace where MDB_TXN_FULL is still returned with latest mdb.master (note: line numbers shown here are off by 60 compared to mdb.master): mdb_rpage_get mdb.c:6196 mdb_page_get mdb.c:6378 mdb_page_search_lowest mdb.c:6492 mdb_node_move mdb.c:8842 mdb_rebalance mdb.c:9366 mdb_page_merge mdb.c:9166 mdb_rebalance mdb.c:9373 mdb_cursor_del0 mdb.c:9426 mdb_cursor_del mdb.c:811 Code segment: if (tl[0].mid >= MDB_TRPAGE_MAX) return MDB_TXN_FULL; Debugger shows tl[0].mid to be 4095. Hope that helps. Markus
markus@greenrobot.de wrote: > On 21.10.19 17:14, Howard Chu wrote: >> I believe this was fixed by commit >> 7edf504106c61639a89b9a4e5987242598196932 in mdb.master. > > I can not confirm that this works. > > This is the stack trace where MDB_TXN_FULL is still returned with latest > mdb.master (note: line numbers shown here are off by 60 compared to > mdb.master): > > mdb_rpage_get mdb.c:6196 > mdb_page_get mdb.c:6378 > mdb_page_search_lowest mdb.c:6492 > mdb_node_move mdb.c:8842 > mdb_rebalance mdb.c:9366 > mdb_page_merge mdb.c:9166 > mdb_rebalance mdb.c:9373 > mdb_cursor_del0 mdb.c:9426 > mdb_cursor_del mdb.c:811 > > > Code segment: > > if (tl[0].mid >= MDB_TRPAGE_MAX) > Â Â Â return MDB_TXN_FULL; > > Debugger shows tl[0].mid to be 4095. You're free to define MDB_TRPAGE_MAX to a larger value. It just means you increase the chance of overrunning the 2GB available address space. There's no magic, you can't fit every 64bit database workload into only a 32bit address space. When your transactions are too large, the normal thing to do is commit more often so they don't grow so large. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 24.10.19 03:49, Howard Chu wrote: > You're free to define MDB_TRPAGE_MAX to a larger value. It just means > you increase the chance of overrunning the 2GB available address space. > There's no magic, you can't fit every 64bit database workload into only > a 32bit address space. When your transactions are too large, the normal > thing to do is commit more often so they don't grow so large. I think you meant MDB_TRPAGE_SIZE? At least that seemed to work, while MDB_TRPAGE_MAX ended up in another MDB_TXN_FULL (tl[0].mid < MDB_TRPAGE_SIZE check failed). Doubling MDB_TRPAGE_SIZE also doubled the threshold of the object count where it starts failing: using 8192 it was able to remove 4M entries while the limit with 4096 was 2M entries. MDB_TRPAGE_SIZE is only used to malloc txn->mt_rpages (and some checks), as far as I can tell. To make a reasonable decision here, could you please confirm: 1. txn->mt_rpages is RAM only and has no impact on mmaped sections (and/or file format). 2. RAM consumption is defined by number of transactions only: per transaction MDB_TRPAGE_SIZE bytes is allocated. Other than that it does not increase memory consumption. 3. There is no other disadvantage other than higher memory consumption per transaction as written in the previous point. In that case, I think I'll go with 16K (or even 32K). Does not seem too much for an transaction, given that it quadruples the amount of data allowed to be processed. 32 bit devices are not servers, so I do not expect a high number of concurrent transactions anyway. Thanks! Markus
On 24.10.19 15:03, Markus Junginger wrote: > per transaction MDB_TRPAGE_SIZE bytes is allocated I guess, I was too quick here. It's MDB_TRPAGE_SIZE * 16. Also, I guess, txn->mt_rpages are just pointers to in-memory cached pages (or chunks?), so the memory consumption of those is much higher. The docs mention "chunks of 16 pages"; does that refer to mt_rpages? If so, a single mt_rpages pointer can actually hold on to 16 * 4K = 64 KB? Thus a MDB_TRPAGE_SIZE 4096 can go up to 256 MB memory? That changes the picture of course... PS.: let me express the irony once more that "removing data" is the most costly operation here ;-)