Issue 8916 - MDB_VL32 in readonly causes MDB_CORRUPTED
Summary: MDB_VL32 in readonly causes MDB_CORRUPTED
Status: UNCONFIRMED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-19 14:38 UTC by pierre.chambart@ocamlpro.com
Modified: 2020-03-23 20:45 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description pierre.chambart@ocamlpro.com 2018-09-19 14:38:38 UTC
Full_Name: Pierre Chambart
Version: 
OS: 
URL: 
Submission from: (NULL) (185.194.234.133)


We have been trying LMDB compiled with the MDB_VL32 flag and encountered some
MDB_CORRUPTED error that didn't occur without MDB_VL32.
We could produce a minimal reproduction case:
https://raw.githubusercontent.com/chambart/lmdb/vl32_readonly_fix/libraries/liblmdb/vl32_failure.c
This is basically writing a key, reading it, writing something else then reading
it again.

After some investigation, this is how I understand what is happening:

* The chunk being mapped for the first read contains 3 pages instead of the 16
pages that would be mapped without readonly.
* The second write extend the size of the base to 5 pages
* Finally the last read retrieve the mapped chunk from the first read, which has
size 3, but assumes that it has size 5, hence reading some random value in
memory past the end of the pages.

It can be verified by asserting that id3.mcnt is larger than rem at the end of
mdb_rpage_get, around this line:

ok:
    p = (MDB_page *)((char *)id3.mptr + rem * env->me_psize);

This does not seem to be the case.

I made a dirty fix that seems to work. It's simply to remove those lines:

        /* don't map past last written page in read-only envs */
        if ((env->me_flags & MDB_RDONLY) && pgno + MDB_RPAGE_CHUNK-1 >
txn->mt_last_pgno)
            id3.mcnt = txn->mt_last_pgno + 1 - pgno;
        else

This seems to be some optimization that avoids mapping the 16 pages of the
chunk. But given that those pages are
initialized, there doesn't seem to be much harm in mapping a few more pages in
that relatively rare situation.

The patch is provided in
https://github.com/chambart/lmdb/commit/52e57a392e91935efe6d57337da09f23e4076924

If you want to reproduce the bug you can use the
https://github.com/chambart/lmdb/tree/vl32_readonly_fix which also contains the
patch to the makefile to build in 32bits with MDB_VL32.

Thanks a lot, hoping that this can be of some help.
Comment 1 Howard Chu 2018-09-19 15:35:39 UTC
pierre.chambart@ocamlpro.com wrote:
> Full_Name: Pierre Chambart
> Version: 
> OS: 
> URL: 
> Submission from: (NULL) (185.194.234.133)
> 
> 
> We have been trying LMDB compiled with the MDB_VL32 flag and encountered some
> MDB_CORRUPTED error that didn't occur without MDB_VL32.
> We could produce a minimal reproduction case:
> https://raw.githubusercontent.com/chambart/lmdb/vl32_readonly_fix/libraries/liblmdb/vl32_failure.c
> This is basically writing a key, reading it, writing something else then reading
> it again.
> 
> After some investigation, this is how I understand what is happening:
> 
> * The chunk being mapped for the first read contains 3 pages instead of the 16
> pages that would be mapped without readonly.
> * The second write extend the size of the base to 5 pages
> * Finally the last read retrieve the mapped chunk from the first read, which has
> size 3, but assumes that it has size 5, hence reading some random value in
> memory past the end of the pages.
> 
> It can be verified by asserting that id3.mcnt is larger than rem at the end of
> mdb_rpage_get, around this line:
> 
> ok:
>     p = (MDB_page *)((char *)id3.mptr + rem * env->me_psize);
> 
> This does not seem to be the case.
> 
> I made a dirty fix that seems to work. It's simply to remove those lines:
> 
>         /* don't map past last written page in read-only envs */
>         if ((env->me_flags & MDB_RDONLY) && pgno + MDB_RPAGE_CHUNK-1 >
> txn->mt_last_pgno)
>             id3.mcnt = txn->mt_last_pgno + 1 - pgno;
>         else
> 
> This seems to be some optimization that avoids mapping the 16 pages of the
> chunk. But given that those pages are
> initialized, there doesn't seem to be much harm in mapping a few more pages in
> that relatively rare situation.

Please check git history before assuming the code is an optimization.

These lines prevent a crash on Windows.

-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 pierre.chambart@ocamlpro.com 2018-09-19 21:36:53 UTC
On 19/09/2018 17:35, Howard Chu wrote:
> pierre.chambart@ocamlpro.com wrote:
>> Full_Name: Pierre Chambart
>> Version: 
>> OS: 
>> URL: 
>> Submission from: (NULL) (185.194.234.133)
>>
>>
>> We have been trying LMDB compiled with the MDB_VL32 flag and encountered some
>> MDB_CORRUPTED error that didn't occur without MDB_VL32.
>> We could produce a minimal reproduction case:
>> https://raw.githubusercontent.com/chambart/lmdb/vl32_readonly_fix/libraries/liblmdb/vl32_failure.c
>> This is basically writing a key, reading it, writing something else then reading
>> it again.
>>
>> After some investigation, this is how I understand what is happening:
>>
>> * The chunk being mapped for the first read contains 3 pages instead of the 16
>> pages that would be mapped without readonly.
>> * The second write extend the size of the base to 5 pages
>> * Finally the last read retrieve the mapped chunk from the first read, which has
>> size 3, but assumes that it has size 5, hence reading some random value in
>> memory past the end of the pages.
>>
>> It can be verified by asserting that id3.mcnt is larger than rem at the end of
>> mdb_rpage_get, around this line:
>>
>> ok:
>>     p = (MDB_page *)((char *)id3.mptr + rem * env->me_psize);
>>
>> This does not seem to be the case.
>>
>> I made a dirty fix that seems to work. It's simply to remove those lines:
>>
>>         /* don't map past last written page in read-only envs */
>>         if ((env->me_flags & MDB_RDONLY) && pgno + MDB_RPAGE_CHUNK-1 >
>> txn->mt_last_pgno)
>>             id3.mcnt = txn->mt_last_pgno + 1 - pgno;
>>         else
>>
>> This seems to be some optimization that avoids mapping the 16 pages of the
>> chunk. But given that those pages are
>> initialized, there doesn't seem to be much harm in mapping a few more pages in
>> that relatively rare situation.
> Please check git history before assuming the code is an optimization.
>
> These lines prevent a crash on Windows.
>
Sorry, about that.

Also I'm not suggesting that this is a proper fix, rather that this is some
clue to understand the problem. I'm happy producing a patch if you
indicate what kind of fix you would like.
-- 
Pierre