[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: (ITS#7789) Unreliable mdb_env_set_mapsize()
h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: LMDB 0.9.11
> OS: Linux x86_64
> URL:
> Submission from: (NULL) (129.240.6.254)
> Submitted by: hallvard
>
>
> Mapsize changes do not work as described, do not reliably store the
> mapsize in the map, and it's hard to see how it is supposed to work.
> E.g.:
> - Open an environment twice, in processes X and Y.
> - X grows the map and writes (commits) something to the DB. That
> MDB_meta gets the new mapsize.
> - Y writes to the DB. It does not get MDB_MAP_RESIZED like the doc
> says, nor does it carry forward X's MDB_meta.mm_mapsize change.
The doc says the caller of set_mapsize is required to make sure there are no
active transactions when it is called. As such, X failed this requirement, and
this sequence of events is explicitly unsupported.
If Y doesn't start its write txn until after X finishes, then Y will see the
new size.
> - Process Z opens the environment without doing set_mapsize(),
> and gets the original mapsize from the MDB_meta written by Y.
>
> For that matter, from reading the doc I'd expect a mapsize change to
> commit a txn with the new mapsize. There's no mention that the change
> (and the MDB_MAP_RESIZED) will wait for something to be committed.
>
> mdb_txn_commit() writes nothing if the txn didn't change anything.
> It needs to notice that there is a mapsize change to write.
>
> The doc talks about shrinking the map, but reduced mapsizes are not
> written to the datafile. Only increases are written.
>
> All in all, it looks to me like _changing_ the mapsize should be an
> operation on a write transaction or invoke a write transaction, while
> setting the size or catching up with a mapsize change can be an
> environment operation. That way it would be possible to make sense of
> it. A txn can do it when it has no cursors and no dirty WRITEMAP
> pages (or WRITEMAP could spill all pages first).
>
> BTW, I don't see the point of conditionally avoiding to write the
> mapsize in mdb_env_write_meta() when full page gets written to disk
> anyway - as long as txn_begin() stashes the mapsize from the original
> meta so it knows what to write. (It need not obey the mapsize at that
> point, but it must carry a change forward.)
>
>
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/