Full_Name: Markus Junginger Version: mdb.master OS: Ubuntu URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (77.190.230.93) Read cursors may outlive a transaction to be renewed with another. Thus, cursors should be correctly closed even after their last transaction was aborted. But they access the (already freed) transaction during closing when MDB_VL32 is used (mdb.master branch). ASan captures this like this: ==19722==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100001b140 at pc 0x000000ea2387 bp 0x7ffd36e3cd20 sp 0x7ffd36e3cd18 READ of size 8 at 0x61100001b140 thread T0 #0 0xea2386 in mdb_cursor_unref mdb.c:2082:18 #1 0xec577b in mdb_cursor_close mdb.c:8538:3 0x61100001b140 is located 128 bytes inside of 250-byte region [0x61100001b0c0,0x61100001b1ba) freed by thread T0 here: #0 0x74f840 in __interceptor_cfree.localalias.0 (...) #1 0xe8ea98 in mdb_txn_end mdb.c:3380:3 #2 0xe8ecc5 in mdb_txn_abort mdb.c:3405:2 Possible fix for line mdb.c:2082 (applying what I found in mdb_cursor_close): - if (mc->mc_txn->mt_rpages[0].mid) { + if ((mc->mc_flags & C_UNTRACK) && mc->mc_txn->mt_rpages[0].mid) { Code to reproduce using MDB_VL32: err = mdb_txn_begin(env, nullptr, MDB_RDONLY, &tx); err = mdb_cursor_open(tx, dbi, &cursor); // We must have data so get will set C_INITIALIZED err = mdb_cursor_get(cursor, &key, &value, MDB_SET_KEY); mdb_txn_abort(tx); mdb_cursor_close(cursor); // <- still uses tx
markus@greenrobot.de wrote: > Full_Name: Markus Junginger > Version: mdb.master > OS: Ubuntu > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (77.190.230.93) > > > Read cursors may outlive a transaction to be renewed with another. Thus, cursors > should be correctly closed even after their last transaction was aborted. But > they access the (already freed) transaction during closing when MDB_VL32 is used > (mdb.master branch). > > ASan captures this like this: > ==19722==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100001b140 > at pc 0x000000ea2387 bp 0x7ffd36e3cd20 sp 0x7ffd36e3cd18 > READ of size 8 at 0x61100001b140 thread T0 > #0 0xea2386 in mdb_cursor_unref mdb.c:2082:18 > #1 0xec577b in mdb_cursor_close mdb.c:8538:3 > > 0x61100001b140 is located 128 bytes inside of 250-byte region > [0x61100001b0c0,0x61100001b1ba) > freed by thread T0 here: > #0 0x74f840 in __interceptor_cfree.localalias.0 (...) > #1 0xe8ea98 in mdb_txn_end mdb.c:3380:3 > #2 0xe8ecc5 in mdb_txn_abort mdb.c:3405:2 > > > Possible fix for line mdb.c:2082 (applying what I found in mdb_cursor_close): > - if (mc->mc_txn->mt_rpages[0].mid) { > + if ((mc->mc_flags & C_UNTRACK) && mc->mc_txn->mt_rpages[0].mid) { > > > Code to reproduce using MDB_VL32: > err = mdb_txn_begin(env, nullptr, MDB_RDONLY, &tx); > err = mdb_cursor_open(tx, dbi, &cursor); > // We must have data so get will set C_INITIALIZED > err = mdb_cursor_get(cursor, &key, &value, MDB_SET_KEY); > mdb_txn_abort(tx); > mdb_cursor_close(cursor); // <- still uses tx Thanks for the report. Unfortunately your suggested fix means the rpage test will always be skipped on read-only cursors, which will break the rpage tracking if the txn is still alive. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Thanks for your fast response. > Thanks for the report. Unfortunately your suggested fix means > the rpage test will always be skipped on read-only cursors, which > will break the rpage tracking if the txn is still alive. Does that mean it's not trivial to fix and will take some time? It's a last outstanding issue for a release, so I guess we have to find a work around until it's properly fixed? Thanks, Markus
> > Thanks for the report. Unfortunately your suggested fix means the rpage > test will always be skipped on read-only cursors, which will break the > rpage tracking if the txn is still alive. Could you maybe explain a bit about rpages? Why does mdb_cursor_close need to unref pages, but not mdb_cursor_renew? If closing a transaction does not clean up rpages, I would have expected mdb_cursor_renew would have to do. And why does setting MDB_VL32 affect rpages in the first place? To my understanding, MDB_VL32 is about using 64 bits also on 32 bit CPUs, so a differences on page handling seem surprising at first sight. Thanks, Markus
Markus Junginger wrote: > Thanks for your fast response. > >> Thanks for the report. Unfortunately your suggested fix means >> the rpage test will always be skipped on read-only cursors, which >> will break the rpage tracking if the txn is still alive. > > Does that mean it's not trivial to fix and will take some time? Correct. And I'm still on holiday this week, won't be looking deeper into this for a while. > It's a last outstanding issue for a release, so I guess we have to find a > work around until it's properly fixed? Yes. Best is to use mdb_txn_reset/renew instead of txn_abort. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Markus Junginger wrote: > Thanks for the report. Unfortunately your suggested fix means the rpage > test will always be skipped on read-only cursors, which will break the > rpage tracking if the txn is still alive. > > > Could you maybe explain a bit about rpages? Why does mdb_cursor_close need to > unref pages, but not mdb_cursor_renew? If closing a transaction does not clean > up rpages, I would have expected mdb_cursor_renew would have to do. And why > does setting MDB_VL32 affect rpages in the first place? To my understanding, > MDB_VL32 is about using 64 bits also on 32 bit CPUs, so a differences on page > handling seem surprising at first sight. Think about how LMDB works. LMDB was originally written for use on 64bit CPUs, with 64bit virtual address spaces. It expects to map an entire DB into virtual memory using a single mmap call. Obviously this cannot be done when trying to use a DB larger than 2GB on a machine with only a 32bit virtual address space. When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit into a single mmap. It must be mapped in chunks, and the chunks must be mapped, unmapped, and remapped on demand. The rpages thus need to be tracked, to know which are currently mapped. None of this rpage tracking exists in the native non-VL32 build because it's completely unnecessary. Also note that the MDB_VL32 feature is not officially released. You're free to use it if you want, but there is no expectation that it is suitable for general use. When you use unreleased features, you're expected to know how to support yourself. When you use LMDB at all, you're expected to know how virtual addresses work. Ideally, you should quit using obsolete 32bit CPUs. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
> When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit > into a single mmap. > [...] Long story short: I did not understand MDB_VL32 in the first place. I thought it would just enable 64 bit sizes. > Ideally, you should quit using obsolete 32bit CPUs. Of course. Think about mobile and IoT however: Here, there are still 32 bit CPUs and it's not uncommon to even make 64 bit ARM CPUs operate in 32 bit mode to reduce binary size etc. I'd like to propose a different approach to 64 bits on 32 bit CPUs; something like this: #ifdef MDB_SIZE64 typedef uint64_t mdb_size_t; ... Why? MDB_VL32 adds complexity and it would take additional coding and testing efforts to get it production ready. I think MDB_SIZE64 might be a better trade-off. Of course this would only work with smaller DBs files (< 2GB/4GB). But I don't see a huge scenario for 32 bit CPUs being able to work with huge files anyway. The main reason I see to use 64 bits consistently across CPU architecture is binary compatibility. This is a big thing for Android (shipping apps with prefilled DBs). Also it enables getting DB files from 32 bit devices and open those on the desktop, etc. What do you think? Thanks, Markus PS.: Here's the full patch (adding MDB_SIZE64), which runs fine at least with our test suite: lmdb.h @@ -187,7 +187,15 @@ # define MDB_FMT_Z "z" /**< printf/scanf format modifier for size_t */ #endif -#ifndef MDB_VL32 +#if defined(MDB_VL32) && defined(MDB_SIZE64) +#error either define MDB_VL32 OR MDB_SIZE64 +#endif +#ifdef MDB_SIZE64 +typedef uint64_t mdb_size_t; +# define MDB_SIZE_MAX SIZE_MAX /**< max #mdb_size_t */ +# define MDB_PRIy(t) PRI##t##64 +# define MDB_SCNy(t) SCN##t##64 +#elif !defined(MDB_VL32)) /** Unsigned type used for mapsize, entry counts and page/transaction IDs. * * It is normally size_t, hence the name. Defining MDB_VL32 makes it
markus@greenrobot.de wrote: >> When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit >> into a single mmap. >> [...] > > Long story short: I did not understand MDB_VL32 in the first place. I > thought it would just enable 64 bit sizes. Pointless, if you're still limited to 2GB. It would only eat up 4 extra bytes overhead in every internal integer. >> Ideally, you should quit using obsolete 32bit CPUs. > > Of course. Think about mobile and IoT however: Here, there are still 32 bit > CPUs and it's not uncommon to even make 64 bit ARM CPUs operate in 32 bit > mode to reduce binary size etc. > > I'd like to propose a different approach to 64 bits on 32 bit CPUs; > something like this: > #ifdef MDB_SIZE64 > typedef uint64_t mdb_size_t; > ... > > Why? MDB_VL32 adds complexity and it would take additional coding and > testing efforts to get it production ready. If you had used the API as intended, with txn_reset/renew as I originally suggested, there would be no problem. > > I think MDB_SIZE64 might be a better trade-off. Of course this would only > work with smaller DBs files (< 2GB/4GB). But I don't see a huge scenario for > 32 bit CPUs being able to work with huge files anyway. The reason MDB_VL32 exists at all is because of exactly this - people wanted to use 32 bit CPUs with 64 bit databases. > The main reason I see > to use 64 bits consistently across CPU architecture is binary compatibility. > This is a big thing for Android (shipping apps with prefilled DBs). Also it > enables getting DB files from 32 bit devices and open those on the desktop, > etc. > > What do you think? Bad idea. MDB_VL32 provides *full* compatibility between 32bit and 64bit CPUs *regardless of filesize*. Your suggestion will only cause confusion, as people wonder why their DB dies after reaching 2GB in size. It offers none of the benefits of 64bit capability, while still paying all of the cost in terms of storage overhead. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed state Open to Closed
I perfectly see that MDB_VL32 is the superior as a feature. On the other hand I'm a bit worried that it may hold up LMDB development for use cases you refer to as obsolete yourself. While there are workarounds for this particular bug (which we applied), I wouldn't be surprised if MDB_VL32 still needs some stabilization after it will be officially released to get real exposure in the real world. Also, for future changes, the MDB_VL32 "fork" will have to be adjusted as well.
Markus Junginger wrote: > I perfectly see that MDB_VL32 is the superior as a feature. On the other > hand I'm a bit worried that it may hold up LMDB development for use cases > you refer to as obsolete yourself. > > While there are workarounds for this particular bug (which we applied), There was never a bug here. The doc is quite explicit http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882 After txn_abort the only valid call on a still-open cursor is cursor_renew, not cursor_close. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
> There was never a bug here. The doc is quite explicit > http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882 > > After txn_abort the only valid call on a still-open cursor is > cursor_renew, not cursor_close. However you want to call it, are you open to improve this workflow? API-wise, it would be so much simpler just to free a cursor after it became obsolete. It's counter-intuitive to open a new txn, renew the cursor, close the cursor, and abort the txn - just to close a cursor. And without MDB_VL32, plain cursor closing after txn abort seems to work just fine (I did not consider "closing" as "using" when I read the docs). One could say that the inconsistent behavior is a result of the complexity MDB_VL32 introduces. If you don't want to change the current behavior, it would still be nice for future developers to null the obsolete txn ptr in the cursor on txn abort to make the unwanted close call crash always, not just occasionally.
Markus Junginger wrote: >> There was never a bug here. The doc is quite explicit >> http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882 >> >> After txn_abort the only valid call on a still-open cursor is >> cursor_renew, not cursor_close. > > However you want to call it, are you open to improve this workflow? > > API-wise, it would be so much simpler just to free a cursor after it became > obsolete. It's counter-intuitive to open a new txn, renew the cursor, close > the cursor, and abort the txn - just to close a cursor. API-wise, it is much simpler to just close the cursor before closing its transaction, if you have no need to reuse the cursor. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Howard Chu wrote: > Markus Junginger wrote: >>> There was never a bug here. The doc is quite explicit >>> http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882 >>> >>> After txn_abort the only valid call on a still-open cursor is >>> cursor_renew, not cursor_close. >> >> However you want to call it, are you open to improve this workflow? >> >> API-wise, it would be so much simpler just to free a cursor after it became >> obsolete. It's counter-intuitive to open a new txn, renew the cursor, close >> the cursor, and abort the txn - just to close a cursor. > > API-wise, it is much simpler to just close the cursor before closing its > transaction, if you have no need to reuse the cursor. To be explicit - a typical workflow, as outlined in the Getting Started guide: Open an env Open a txn inside the env Open a cursor inside the txn When finished: Close cursor inside txn Close txn inside env Close env The only reason the API allows a read cursor to exist outside of its creating read txn is to allow it to be reused in a later read txn. If you're not reusing it then you should just close it in the proper order. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
It's not that trivial. We recycle cursors because they *might* be reused later. At the point when the txn is done it is unknown whether the cursor will be used in the future.
Because of another MDB_VL32 issue ( http://www.openldap.org/its/index.cgi?findid=8813), I would like to come back the MDB_SIZE64 patch suggested here. I understand that you have no intention to merge this in. For us, applying the patch makes perfectly sense nevertheless: It fixes a severe problem and the 2 GB limit on 32 bits is fine for our use case. Do you see any side effects the proposed patch might have? I'm aware you cannot give guarantees, but a simple "should work" from your side would be appreciated! Thanks! Markus I'd like to propose a different approach to 64 bits on 32 bit CPUs; > something like this: > [...] > lmdb.h > @@ -187,7 +187,15 @@ > # define MDB_FMT_Z "z" /**< printf/scanf format > modifier for size_t */ > #endif > > -#ifndef MDB_VL32 > +#if defined(MDB_VL32) && defined(MDB_SIZE64) > +#error either define MDB_VL32 OR MDB_SIZE64 > +#endif > +#ifdef MDB_SIZE64 > +typedef uint64_t mdb_size_t; > +# define MDB_SIZE_MAX SIZE_MAX /**< max #mdb_size_t */ > +# define MDB_PRIy(t) PRI##t##64 > +# define MDB_SCNy(t) SCN##t##64 > +#elif !defined(MDB_VL32)) > /** Unsigned type used for mapsize, entry counts and page/transaction IDs. > * > * It is normally size_t, hence the name. Defining MDB_VL32 makes it >