Issue 8777 - [LMDB] Closing read cursor uses already freed transaction (MDB_VL32)
Summary: [LMDB] Closing read cursor uses already freed transaction (MDB_VL32)
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-11-26 12:01 UTC by markus@greenrobot.de
Modified: 2020-03-12 15:56 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description markus@greenrobot.de 2017-11-26 12:01:02 UTC
Full_Name: Markus Junginger
Version: mdb.master
OS: Ubuntu
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (77.190.230.93)


Read cursors may outlive a transaction to be renewed with another. Thus, cursors
should be correctly closed even after their last transaction was aborted. But
they access the (already freed) transaction during closing when MDB_VL32 is used
(mdb.master branch).

ASan captures this like this:
==19722==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100001b140
at pc 0x000000ea2387 bp 0x7ffd36e3cd20 sp 0x7ffd36e3cd18
READ of size 8 at 0x61100001b140 thread T0
    #0 0xea2386 in mdb_cursor_unref mdb.c:2082:18
    #1 0xec577b in mdb_cursor_close mdb.c:8538:3

0x61100001b140 is located 128 bytes inside of 250-byte region
[0x61100001b0c0,0x61100001b1ba)
freed by thread T0 here:
    #0 0x74f840 in __interceptor_cfree.localalias.0 (...)
    #1 0xe8ea98 in mdb_txn_end mdb.c:3380:3
    #2 0xe8ecc5 in mdb_txn_abort mdb.c:3405:2


Possible fix for line mdb.c:2082 (applying what I found in mdb_cursor_close):
-	if (mc->mc_txn->mt_rpages[0].mid) {
+	if ((mc->mc_flags & C_UNTRACK) && mc->mc_txn->mt_rpages[0].mid) {


Code to reproduce using MDB_VL32:
    err = mdb_txn_begin(env, nullptr, MDB_RDONLY, &tx);
    err = mdb_cursor_open(tx, dbi, &cursor);
    // We must have data so get will set C_INITIALIZED
    err = mdb_cursor_get(cursor, &key, &value, MDB_SET_KEY);
    mdb_txn_abort(tx);
    mdb_cursor_close(cursor); // <- still uses tx
Comment 1 Howard Chu 2017-11-26 13:50:40 UTC
markus@greenrobot.de wrote:
> Full_Name: Markus Junginger
> Version: mdb.master
> OS: Ubuntu
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (77.190.230.93)
> 
> 
> Read cursors may outlive a transaction to be renewed with another. Thus, cursors
> should be correctly closed even after their last transaction was aborted. But
> they access the (already freed) transaction during closing when MDB_VL32 is used
> (mdb.master branch).
> 
> ASan captures this like this:
> ==19722==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100001b140
> at pc 0x000000ea2387 bp 0x7ffd36e3cd20 sp 0x7ffd36e3cd18
> READ of size 8 at 0x61100001b140 thread T0
>      #0 0xea2386 in mdb_cursor_unref mdb.c:2082:18
>      #1 0xec577b in mdb_cursor_close mdb.c:8538:3
> 
> 0x61100001b140 is located 128 bytes inside of 250-byte region
> [0x61100001b0c0,0x61100001b1ba)
> freed by thread T0 here:
>      #0 0x74f840 in __interceptor_cfree.localalias.0 (...)
>      #1 0xe8ea98 in mdb_txn_end mdb.c:3380:3
>      #2 0xe8ecc5 in mdb_txn_abort mdb.c:3405:2
> 
> 
> Possible fix for line mdb.c:2082 (applying what I found in mdb_cursor_close):
> -	if (mc->mc_txn->mt_rpages[0].mid) {
> +	if ((mc->mc_flags & C_UNTRACK) && mc->mc_txn->mt_rpages[0].mid) {
> 
> 
> Code to reproduce using MDB_VL32:
>      err = mdb_txn_begin(env, nullptr, MDB_RDONLY, &tx);
>      err = mdb_cursor_open(tx, dbi, &cursor);
>      // We must have data so get will set C_INITIALIZED
>      err = mdb_cursor_get(cursor, &key, &value, MDB_SET_KEY);
>      mdb_txn_abort(tx);
>      mdb_cursor_close(cursor); // <- still uses tx

Thanks for the report. Unfortunately your suggested fix means the rpage test 
will always be skipped on read-only cursors, which will break the rpage 
tracking if the txn is still alive.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 markus@greenrobot.de 2017-11-26 14:29:49 UTC
Thanks for your fast response.

> Thanks for the report. Unfortunately your suggested fix means
> the rpage test will always be skipped on read-only cursors, which
> will break the rpage tracking if the txn is still alive.

Does that mean it's not trivial to fix and will take some time?

It's a last outstanding issue for a release, so I guess we have to find a
work around until it's properly fixed?

Thanks,
Markus

Comment 3 markus@greenrobot.de 2017-11-26 19:11:39 UTC
>
> Thanks for the report. Unfortunately your suggested fix means the rpage
> test will always be skipped on read-only cursors, which will break the
> rpage tracking if the txn is still alive.


Could you maybe explain a bit about rpages? Why does mdb_cursor_close need
to unref pages, but not mdb_cursor_renew? If closing a transaction does not
clean up rpages, I would have expected mdb_cursor_renew would have to do.
And why does setting MDB_VL32 affect rpages in the first place? To my
understanding, MDB_VL32  is about using 64 bits also on 32 bit CPUs, so a
differences on page handling seem surprising at first sight.

Thanks,
Markus
Comment 4 Howard Chu 2017-11-27 04:27:48 UTC
Markus Junginger wrote:
> Thanks for your fast response.
> 
>> Thanks for the report. Unfortunately your suggested fix means
>> the rpage test will always be skipped on read-only cursors, which
>> will break the rpage tracking if the txn is still alive.
> 
> Does that mean it's not trivial to fix and will take some time?

Correct. And I'm still on holiday this week, won't be looking deeper into this 
for a while.

> It's a last outstanding issue for a release, so I guess we have to find a
> work around until it's properly fixed?

Yes. Best is to use mdb_txn_reset/renew instead of txn_abort.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 Howard Chu 2017-11-27 04:37:46 UTC
Markus Junginger wrote:
>     Thanks for the report. Unfortunately your suggested fix means the rpage
>     test will always be skipped on read-only cursors, which will break the
>     rpage tracking if the txn is still alive.
> 
> 
> Could you maybe explain a bit about rpages? Why does mdb_cursor_close need to 
> unref pages, but not mdb_cursor_renew? If closing a transaction does not clean 
> up rpages, I would have expected mdb_cursor_renew would have to do. And why 
> does setting MDB_VL32 affect rpages in the first place? To my understanding, 
> MDB_VL32  is about using 64 bits also on 32 bit CPUs, so a differences on page 
> handling seem surprising at first sight.

Think about how LMDB works.

LMDB was originally written for use on 64bit CPUs, with 64bit virtual address 
spaces. It expects to map an entire DB into virtual memory using a single mmap 
call.

Obviously this cannot be done when trying to use a DB larger than 2GB on a 
machine with only a 32bit virtual address space.

When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit 
into a single mmap. It must be mapped in chunks, and the chunks must be 
mapped, unmapped, and remapped on demand. The rpages thus need to be tracked, 
to know which are currently mapped. None of this rpage tracking exists in the 
native non-VL32 build because it's completely unnecessary.

Also note that the MDB_VL32 feature is not officially released. You're free to 
use it if you want, but there is no expectation that it is suitable for 
general use. When you use unreleased features, you're expected to know how to 
support yourself. When you use LMDB at all, you're expected to know how 
virtual addresses work.

Ideally, you should quit using obsolete 32bit CPUs.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 6 markus@greenrobot.de 2018-02-09 13:57:53 UTC
> When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit
> into a single mmap.
> [...]

Long story short: I did not understand MDB_VL32 in the first place. I
thought it would just enable 64 bit sizes.

> Ideally, you should quit using obsolete 32bit CPUs.

Of course. Think about mobile and IoT however: Here, there are still 32 bit
CPUs and it's not uncommon to even make 64 bit ARM CPUs operate in 32 bit
mode to reduce binary size etc.

I'd like to propose a different approach to 64 bits on 32 bit CPUs;
something like this:
#ifdef MDB_SIZE64
typedef uint64_t	mdb_size_t;
...

Why? MDB_VL32 adds complexity and it would take additional coding and
testing efforts to get it production ready.

I think MDB_SIZE64 might be a better trade-off. Of course this would only
work with smaller DBs files (< 2GB/4GB). But I don't see a huge scenario for
32 bit CPUs being able to work with huge files anyway. The main reason I see
to use 64 bits consistently across CPU architecture is binary compatibility.
This is a big thing for Android (shipping apps with prefilled DBs). Also it
enables getting DB files from 32 bit devices and open those on the desktop,
etc.

What do you think?

Thanks,
Markus

PS.: Here's the full patch (adding MDB_SIZE64), which runs fine at least
with our test suite:

lmdb.h
@@ -187,7 +187,15 @@
 # define MDB_FMT_Z	"z"			/**< printf/scanf format modifier for size_t */
 #endif

-#ifndef MDB_VL32
+#if defined(MDB_VL32) && defined(MDB_SIZE64)
+#error either define MDB_VL32 OR MDB_SIZE64
+#endif
+#ifdef MDB_SIZE64
+typedef uint64_t	mdb_size_t;
+# define MDB_SIZE_MAX	SIZE_MAX	/**< max #mdb_size_t */
+# define MDB_PRIy(t)	PRI##t##64
+# define MDB_SCNy(t)	SCN##t##64
+#elif !defined(MDB_VL32))
 /** Unsigned type used for mapsize, entry counts and page/transaction IDs.
  *
  *	It is normally size_t, hence the name. Defining MDB_VL32 makes it

Comment 7 Howard Chu 2018-02-09 14:20:41 UTC
markus@greenrobot.de wrote:
>> When using MDB_VL32 to access a 64bit DB on a 32bit CPU, the DB cannot fit
>> into a single mmap.
>> [...]
> 
> Long story short: I did not understand MDB_VL32 in the first place. I
> thought it would just enable 64 bit sizes.

Pointless, if you're still limited to 2GB. It would only eat up 4 extra bytes 
overhead in every internal integer.

>> Ideally, you should quit using obsolete 32bit CPUs.
> 
> Of course. Think about mobile and IoT however: Here, there are still 32 bit
> CPUs and it's not uncommon to even make 64 bit ARM CPUs operate in 32 bit
> mode to reduce binary size etc.
> 
> I'd like to propose a different approach to 64 bits on 32 bit CPUs;
> something like this:
> #ifdef MDB_SIZE64
> typedef uint64_t	mdb_size_t;
> ...
> 
> Why? MDB_VL32 adds complexity and it would take additional coding and
> testing efforts to get it production ready.

If you had used the API as intended, with txn_reset/renew as I originally 
suggested, there would be no problem.
> 
> I think MDB_SIZE64 might be a better trade-off. Of course this would only
> work with smaller DBs files (< 2GB/4GB). But I don't see a huge scenario for
> 32 bit CPUs being able to work with huge files anyway.

The reason MDB_VL32 exists at all is because of exactly this - people wanted 
to use 32 bit CPUs with 64 bit databases.

> The main reason I see
> to use 64 bits consistently across CPU architecture is binary compatibility.
> This is a big thing for Android (shipping apps with prefilled DBs). Also it
> enables getting DB files from 32 bit devices and open those on the desktop,
> etc.
> 
> What do you think?

Bad idea. MDB_VL32 provides *full* compatibility between 32bit and 64bit CPUs 
*regardless of filesize*. Your suggestion will only cause confusion, as people 
wonder why their DB dies after reaching 2GB in size. It offers none of the 
benefits of 64bit capability, while still paying all of the cost in terms of 
storage overhead.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 Howard Chu 2018-02-09 14:21:00 UTC
changed state Open to Closed
Comment 9 markus@greenrobot.de 2018-02-09 15:17:15 UTC
I perfectly see that MDB_VL32 is the superior as a feature. On the other
hand I'm a bit worried that it may hold up LMDB development for use cases
you refer to as obsolete yourself.

While there are workarounds for this particular bug (which we applied), I
wouldn't be surprised if MDB_VL32 still needs some stabilization after it
will be officially released to get real exposure in the real world. Also,
for future changes, the MDB_VL32 "fork" will have to be adjusted as well.

Comment 10 Howard Chu 2018-02-09 15:55:48 UTC
Markus Junginger wrote:
> I perfectly see that MDB_VL32 is the superior as a feature. On the other
> hand I'm a bit worried that it may hold up LMDB development for use cases
> you refer to as obsolete yourself.
> 
> While there are workarounds for this particular bug (which we applied),

There was never a bug here. The doc is quite explicit
http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882

After txn_abort the only valid call on a still-open cursor is cursor_renew, 
not cursor_close.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 markus@greenrobot.de 2018-02-09 16:49:07 UTC
> There was never a bug here. The doc is quite explicit
> http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882
>
> After txn_abort the only valid call on a still-open cursor is
> cursor_renew, not cursor_close.

However you want to call it, are you open to improve this workflow?

API-wise, it would be so much simpler just to free a cursor after it became
obsolete. It's counter-intuitive to open a new txn, renew the cursor, close
the cursor, and abort the txn - just to close a cursor.

And without MDB_VL32, plain cursor closing after txn abort seems to work
just fine (I did not consider "closing" as "using" when I read the docs).
One could say that the inconsistent behavior is a result of the complexity
MDB_VL32 introduces.

If you don't want to change the current behavior, it would still be nice for
future developers to null the obsolete txn ptr in the cursor on txn abort to
make the unwanted close call crash always, not just occasionally.

Comment 12 Howard Chu 2018-02-09 17:10:14 UTC
Markus Junginger wrote:
>> There was never a bug here. The doc is quite explicit
>> http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882
>>
>> After txn_abort the only valid call on a still-open cursor is
>> cursor_renew, not cursor_close.
> 
> However you want to call it, are you open to improve this workflow?
> 
> API-wise, it would be so much simpler just to free a cursor after it became
> obsolete. It's counter-intuitive to open a new txn, renew the cursor, close
> the cursor, and abort the txn - just to close a cursor.

API-wise, it is much simpler to just close the cursor before closing its 
transaction, if you have no need to reuse the cursor.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 13 Howard Chu 2018-02-09 17:14:10 UTC
Howard Chu wrote:
> Markus Junginger wrote:
>>> There was never a bug here. The doc is quite explicit
>>> http://www.lmdb.tech/doc/group__mdb.html#ga73a5938ae4c3239ee11efa07eb22b882
>>>
>>> After txn_abort the only valid call on a still-open cursor is
>>> cursor_renew, not cursor_close.
>>
>> However you want to call it, are you open to improve this workflow?
>>
>> API-wise, it would be so much simpler just to free a cursor after it became
>> obsolete. It's counter-intuitive to open a new txn, renew the cursor, close
>> the cursor, and abort the txn - just to close a cursor.
> 
> API-wise, it is much simpler to just close the cursor before closing its 
> transaction, if you have no need to reuse the cursor.

To be explicit - a typical workflow, as outlined in the Getting Started guide:

Open an env
Open a txn inside the env
Open a cursor inside the txn

When finished:

Close cursor inside txn
Close txn inside env
Close env

The only reason the API allows a read cursor to exist outside of its creating 
read txn is to allow it to be reused in a later read txn. If you're not 
reusing it then you should just close it in the proper order.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 14 markus@greenrobot.de 2018-02-09 17:28:15 UTC
It's not that trivial. We recycle cursors because they *might* be reused
later. At the point when the txn is done it is unknown whether the cursor
will be used in the future.

Comment 15 markus@greenrobot.de 2018-03-04 13:56:06 UTC
Because of another MDB_VL32 issue (
http://www.openldap.org/its/index.cgi?findid=8813), I would like to come
back the MDB_SIZE64 patch suggested here.

I understand that you have no intention to merge this in. For us, applying
the patch makes perfectly sense nevertheless: It fixes a severe problem and
the 2 GB limit on 32 bits is fine for our use case.

Do you see any side effects the proposed patch might have? I'm aware you
cannot give guarantees, but a simple "should work" from your side would be
appreciated!

Thanks!
Markus


I'd like to propose a different approach to 64 bits on 32 bit CPUs;
> something like this:
> [...]
> lmdb.h
> @@ -187,7 +187,15 @@
>  # define MDB_FMT_Z     "z"                     /**< printf/scanf format
> modifier for size_t */
>  #endif
>
> -#ifndef MDB_VL32
> +#if defined(MDB_VL32) && defined(MDB_SIZE64)
> +#error either define MDB_VL32 OR MDB_SIZE64
> +#endif
> +#ifdef MDB_SIZE64
> +typedef uint64_t       mdb_size_t;
> +# define MDB_SIZE_MAX  SIZE_MAX        /**< max #mdb_size_t */
> +# define MDB_PRIy(t)   PRI##t##64
> +# define MDB_SCNy(t)   SCN##t##64
> +#elif !defined(MDB_VL32))
>  /** Unsigned type used for mapsize, entry counts and page/transaction IDs.
>   *
>   *     It is normally size_t, hence the name. Defining MDB_VL32 makes it
>