[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7789) Unreliable mdb_env_set_mapsize()



h.b.furuseth@usit.uio.no wrote:
> Full_Name: Hallvard B Furuseth
> Version: LMDB 0.9.11
> OS: Linux x86_64
> URL:
> Submission from: (NULL) (129.240.6.254)
> Submitted by: hallvard
>
>
> Mapsize changes do not work as described, do not reliably store the
> mapsize in the map, and it's hard to see how it is supposed to work.
> E.g.:
> - Open an environment twice, in processes X and Y.
> - X grows the map and writes (commits) something to the DB.  That
>    MDB_meta gets the new mapsize.
> - Y writes to the DB.  It does not get MDB_MAP_RESIZED like the doc
>    says, nor does it carry forward X's MDB_meta.mm_mapsize change.

The doc says the caller of set_mapsize is required to make sure there are no 
active transactions when it is called. As such, X failed this requirement, and 
this sequence of events is explicitly unsupported.

If Y doesn't start its write txn until after X finishes, then Y will see the 
new size.

> - Process Z opens the environment without doing set_mapsize(),
>    and gets the original mapsize from the MDB_meta written by Y.
>
> For that matter, from reading the doc I'd expect a mapsize change to
> commit a txn with the new mapsize.  There's no mention that the change
> (and the MDB_MAP_RESIZED) will wait for something to be committed.
>
> mdb_txn_commit() writes nothing if the txn didn't change anything.
> It needs to notice that there is a mapsize change to write.
>
> The doc talks about shrinking the map, but reduced mapsizes are not
> written to the datafile.  Only increases are written.
>
> All in all, it looks to me like _changing_ the mapsize should be an
> operation on a write transaction or invoke a write transaction, while
> setting the size or catching up with a mapsize change can be an
> environment operation.  That way it would be possible to make sense of
> it.  A txn can do it when it has no cursors and no dirty WRITEMAP
> pages (or WRITEMAP could spill all pages first).
>
> BTW, I don't see the point of conditionally avoiding to write the
> mapsize in mdb_env_write_meta() when full page gets written to disk
> anyway - as long as txn_begin() stashes the mapsize from the original
> meta so it knows what to write.  (It need not obey the mapsize at that
> point, but it must carry a change forward.)
>
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/