Issue 7789 - Unreliable mdb_env_set_mapsize()
Summary: Unreliable mdb_env_set_mapsize()
Status: VERIFIED FIXED
Alias: None
Product: OpenLDAP
Classification: Unclassified
Component: slapd (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-22 20:21 UTC by Hallvard Furuseth
Modified: 2014-12-11 01:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Hallvard Furuseth 2014-01-22 20:21:28 UTC
Full_Name: Hallvard B Furuseth
Version: LMDB 0.9.11
OS: Linux x86_64
URL: 
Submission from: (NULL) (129.240.6.254)
Submitted by: hallvard


Mapsize changes do not work as described, do not reliably store the
mapsize in the map, and it's hard to see how it is supposed to work.
E.g.:
- Open an environment twice, in processes X and Y.
- X grows the map and writes (commits) something to the DB.  That
  MDB_meta gets the new mapsize.
- Y writes to the DB.  It does not get MDB_MAP_RESIZED like the doc
  says, nor does it carry forward X's MDB_meta.mm_mapsize change.
- Process Z opens the environment without doing set_mapsize(),
  and gets the original mapsize from the MDB_meta written by Y.

For that matter, from reading the doc I'd expect a mapsize change to
commit a txn with the new mapsize.  There's no mention that the change
(and the MDB_MAP_RESIZED) will wait for something to be committed.

mdb_txn_commit() writes nothing if the txn didn't change anything.
It needs to notice that there is a mapsize change to write.

The doc talks about shrinking the map, but reduced mapsizes are not
written to the datafile.  Only increases are written.

All in all, it looks to me like _changing_ the mapsize should be an
operation on a write transaction or invoke a write transaction, while
setting the size or catching up with a mapsize change can be an
environment operation.  That way it would be possible to make sense of
it.  A txn can do it when it has no cursors and no dirty WRITEMAP
pages (or WRITEMAP could spill all pages first).

BTW, I don't see the point of conditionally avoiding to write the
mapsize in mdb_env_write_meta() when full page gets written to disk
anyway - as long as txn_begin() stashes the mapsize from the original
meta so it knows what to write.  (It need not obey the mapsize at that
point, but it must carry a change forward.)
Comment 1 Howard Chu 2014-01-23 23:07:11 UTC
h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: LMDB 0.9.11
> OS: Linux x86_64
> URL:
> Submission from: (NULL) (129.240.6.254)
> Submitted by: hallvard
>
>
> Mapsize changes do not work as described, do not reliably store the
> mapsize in the map, and it's hard to see how it is supposed to work.
> E.g.:
> - Open an environment twice, in processes X and Y.
> - X grows the map and writes (commits) something to the DB.  That
>    MDB_meta gets the new mapsize.
> - Y writes to the DB.  It does not get MDB_MAP_RESIZED like the doc
>    says, nor does it carry forward X's MDB_meta.mm_mapsize change.

The doc says the caller of set_mapsize is required to make sure there are no 
active transactions when it is called. As such, X failed this requirement, and 
this sequence of events is explicitly unsupported.

If Y doesn't start its write txn until after X finishes, then Y will see the 
new size.

> - Process Z opens the environment without doing set_mapsize(),
>    and gets the original mapsize from the MDB_meta written by Y.
>
> For that matter, from reading the doc I'd expect a mapsize change to
> commit a txn with the new mapsize.  There's no mention that the change
> (and the MDB_MAP_RESIZED) will wait for something to be committed.
>
> mdb_txn_commit() writes nothing if the txn didn't change anything.
> It needs to notice that there is a mapsize change to write.
>
> The doc talks about shrinking the map, but reduced mapsizes are not
> written to the datafile.  Only increases are written.
>
> All in all, it looks to me like _changing_ the mapsize should be an
> operation on a write transaction or invoke a write transaction, while
> setting the size or catching up with a mapsize change can be an
> environment operation.  That way it would be possible to make sense of
> it.  A txn can do it when it has no cursors and no dirty WRITEMAP
> pages (or WRITEMAP could spill all pages first).
>
> BTW, I don't see the point of conditionally avoiding to write the
> mapsize in mdb_env_write_meta() when full page gets written to disk
> anyway - as long as txn_begin() stashes the mapsize from the original
> meta so it knows what to write.  (It need not obey the mapsize at that
> point, but it must carry a change forward.)
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Hallvard Furuseth 2014-01-24 04:39:27 UTC
Should have enclosed a test program.  Here:
   http://folk.uio.no/hbf/testsize.c
Output:
   process #0: open env
   process #1: open env
   process #0: write txn, mapsize 204800
   process #1: write txn, mapsize 40960
   process #2: open env, write txn, mapsize 40960
   process #3: open env, write txn, mapsize 204800


Summarizing a reply which got sent privately instead of to ITS:

On 2014-01-24 00:07, Howard Chu wrote:
> The doc says the caller of set_mapsize is required to make sure there
> are no active transactions when it is called. As such, X failed this
> requirement, and this sequence of events is explicitly unsupported.

No, I'm talking about X changing the mapsize when there is no txn,
then afterwards doing a write txn so the mapsize gets written.

> If Y doesn't start its write txn until after X finishes, then Y will
> see the new size.

It doesn't, that's the point.  Only txn_commit pays attention,
and it checks the size which existed in the wrong metapage.

Come to think of it, set_mapsize(); env_open(); txn begin/end
should do the same.  Y will either way not pay attention to
the new mapsize.

-- 
Hallvard

Comment 3 Howard Chu 2014-07-08 19:19:32 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 4 Howard Chu 2014-07-09 02:19:13 UTC
Hallvard Breien Furuseth wrote:
> Should have enclosed a test program.  Here:
>     http://folk.uio.no/hbf/testsize.c
> Output:
>     process #0: open env
>     process #1: open env
>     process #0: write txn, mapsize 204800
>     process #1: write txn, mapsize 40960
>     process #2: open env, write txn, mapsize 40960
>     process #3: open env, write txn, mapsize 204800
>
>
> Summarizing a reply which got sent privately instead of to ITS:
>
> On 2014-01-24 00:07, Howard Chu wrote:
>> The doc says the caller of set_mapsize is required to make sure there
>> are no active transactions when it is called. As such, X failed this
>> requirement, and this sequence of events is explicitly unsupported.
>
> No, I'm talking about X changing the mapsize when there is no txn,
> then afterwards doing a write txn so the mapsize gets written.
>
>> If Y doesn't start its write txn until after X finishes, then Y will
>> see the new size.
>
> It doesn't, that's the point.  Only txn_commit pays attention,
> and it checks the size which existed in the wrong metapage.
>
> Come to think of it, set_mapsize(); env_open(); txn begin/end
> should do the same.  Y will either way not pay attention to
> the new mapsize.
>
Fixed now in mdb.master

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 Hallvard Furuseth 2014-07-23 23:26:56 UTC
On 09/07/14 04:19, hyc@symas.com wrote:
> Fixed now in mdb.master

I'm afraid it's broken worse now.


Splitting this ITS up in several component problems, copied in
part from IRC.

The recent fixes seem broken.  Mapsize is no longer checked against
the highest written page number.  And if one MDB_env shrinks the
mapsize, another co-existing MDB_env will not have MDB_RESIZING set,
so txn_renew0 will not return MDB_MAP_RESIZED nor will write_meta
resize the mapsize up again.  Thus it will run with a bigger mapsize
than the MDB_env which shrunk mapsize, and it can write pages beyond
what that other MDB_env has mapped.


I think LMDB 0.9.13 tried to implement this: mdb_env_set_mapsize() can
grow but not shrink the mapsize written to metapages.  lmdb writes the
biggest mapsize it ever saw, to the metapages.  But an MDB_env can run
with a smaller mapsize than the metapage says, as long as the
snapshots do not use pages beyond it.  So a metapage mapsize is just a
default.

Seems a rule we can live with, but if so it should be documented.

However it did not and does not always write a new mapsize to both
metapages.  In LDMB 0.9.13, a fix would be for mdb_env_write_meta()
to always write mapsize = max(other metapage:mapsize, env:mapsize):
     http://folk.uio.no/hbf/mapsize.0.9.13.patch

In current LMDB, it would be
     http://folk.uio.no/hbf/mapsize.patch


I have not tested this against the test program in this ITS.
Getting late.


Comment 6 Hallvard Furuseth 2014-07-25 09:47:48 UTC
Another problem:

With an MDB_RDONLY environment on Windows, env->me_mapsize
does not seem to affect the actual map size or vice versa.
mdb_env_map() does
		/* Don't set explicit map size, use whatever exists */

If the user sets a mapsize bigger than the current file,
and another process resizes the map and grows the DB into
the new size, then the RDONLY process will not discover
in mdb_txn_renew0() that its map is too small.

If there is a technical reason for the current behavior,
maybe it could create the full map (for the address space)
but record the smaller mapped size in MDB_env, and then
mdb_txn_renew0 could do some mutex-protected magic to
map in the new file segment if the file grew.  That
should avoid surprising the user.

Comment 7 Howard Chu 2014-07-25 10:54:37 UTC
h.b.furuseth@usit.uio.no wrote:
> Another problem:
>
> With an MDB_RDONLY environment on Windows, env->me_mapsize
> does not seem to affect the actual map size or vice versa.
> mdb_env_map() does
> 		/* Don't set explicit map size, use whatever exists */
>
> If the user sets a mapsize bigger than the current file,

That is impossible on Windows. The OS will not allow a mapsize bigger than the 
filesize.

> and another process resizes the map and grows the DB into
> the new size, then the RDONLY process will not discover
> in mdb_txn_renew0() that its map is too small.
>
> If there is a technical reason for the current behavior,

That change was made to accomodate opening of an mdb_copy'd DB file in 
read-only mode. Since mdb_copy only copies the used pages, the resulting file 
will be smaller than the configured mapsize. If you attempt to map this file 
as-is with the configured mapsize, it will fail. You would have to resize the 
file, but since the env is opened read-only, you should not make any such 
change to the file.

We could alternatively change mdb_copy to explicitly set the filesize to match 
the mapsize at the end of the copy, and then this particular hack wouldn't be 
needed.

> maybe it could create the full map (for the address space)
> but record the smaller mapped size in MDB_env, and then
> mdb_txn_renew0 could do some mutex-protected magic to
> map in the new file segment if the file grew.  That
> should avoid surprising the user.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 Hallvard Furuseth 2014-07-26 10:42:21 UTC
On 25/07/14 12:54, Howard Chu wrote:
> h.b.furuseth@usit.uio.no wrote:
>> Another problem:
>>
>> With an MDB_RDONLY environment on Windows, env->me_mapsize
>> does not seem to affect the actual map size or vice versa.
>> mdb_env_map() does
>>         /* Don't set explicit map size, use whatever exists */
>>
>> If the user sets a mapsize bigger than the current file,
>
> That is impossible on Windows. The OS will not allow a mapsize bigger
> than the filesize.

Clarification: If the Windows user sets MDB_env.me_mapsize
bigger than the file size, then as far as I can tell the read-
only MDB_env will think the map is bigger than it is.  Then:

>> and another process resizes the map and grows the DB into
>> the new size, then the RDONLY process will not discover
>> in mdb_txn_renew0() that its map is too small. (...)
>
> That change was made to accomodate opening of an mdb_copy'd DB file in
> read-only mode. Since mdb_copy only copies the used pages, the resulting
> file will be smaller than the configured mapsize. If you attempt to map
> this file as-is with the configured mapsize, it will fail. You would
> have to resize the file, but since the env is opened read-only, you
> should not make any such change to the file.
>
> We could alternatively change mdb_copy to explicitly set the filesize to
> match the mapsize at the end of the copy, and then this particular hack
> wouldn't be needed.

With the several ways to play games with mapsize, the hack will
be needed anyway.  So I expect lmdb should implement and document:

   On Windows, read-only environments ignore the configured mapsize
   and instead use the file size as of the last mdb_env_set_mapsize()
   or mdb_env_open().  MDB_envinfo.me_mapsize will show that size.
   [So does MDB_env.me_mapsize.]
   For a bigger map, grow the file first - e.g. by opening the
   environment with write access and a bigger mapsize.

Setting filesize=mapsize: MDB_env.me_mapsize or MDB_meta.mm_mapsize?
(I don't really care though, since the "set mapsize=filesize"
suggestion addresses my original concern.)

OTOH mdb_copy seems the natural place for a "shrink mm_mapsize"
option, since it can only grow in an existing file.  E.g. set it
to the minimum possible or to MDB_env.me_mapsize and let the user
grow it to a bigger size later if needed.

Comment 9 OpenLDAP project 2014-08-01 21:04:51 UTC
fixed in mdb.master
Comment 10 Quanah Gibson-Mount 2014-12-11 01:07:36 UTC
changed state Test to Closed