Full_Name: Hallvard B Furuseth Version: LMDB 0.9.11 OS: Linux x86_64 URL: Submission from: (NULL) (129.240.6.254) Submitted by: hallvard Mapsize changes do not work as described, do not reliably store the mapsize in the map, and it's hard to see how it is supposed to work. E.g.: - Open an environment twice, in processes X and Y. - X grows the map and writes (commits) something to the DB. That MDB_meta gets the new mapsize. - Y writes to the DB. It does not get MDB_MAP_RESIZED like the doc says, nor does it carry forward X's MDB_meta.mm_mapsize change. - Process Z opens the environment without doing set_mapsize(), and gets the original mapsize from the MDB_meta written by Y. For that matter, from reading the doc I'd expect a mapsize change to commit a txn with the new mapsize. There's no mention that the change (and the MDB_MAP_RESIZED) will wait for something to be committed. mdb_txn_commit() writes nothing if the txn didn't change anything. It needs to notice that there is a mapsize change to write. The doc talks about shrinking the map, but reduced mapsizes are not written to the datafile. Only increases are written. All in all, it looks to me like _changing_ the mapsize should be an operation on a write transaction or invoke a write transaction, while setting the size or catching up with a mapsize change can be an environment operation. That way it would be possible to make sense of it. A txn can do it when it has no cursors and no dirty WRITEMAP pages (or WRITEMAP could spill all pages first). BTW, I don't see the point of conditionally avoiding to write the mapsize in mdb_env_write_meta() when full page gets written to disk anyway - as long as txn_begin() stashes the mapsize from the original meta so it knows what to write. (It need not obey the mapsize at that point, but it must carry a change forward.)
h.b.furuseth@usit.uio.no wrote: > Full_Name: Hallvard B Furuseth > Version: LMDB 0.9.11 > OS: Linux x86_64 > URL: > Submission from: (NULL) (129.240.6.254) > Submitted by: hallvard > > > Mapsize changes do not work as described, do not reliably store the > mapsize in the map, and it's hard to see how it is supposed to work. > E.g.: > - Open an environment twice, in processes X and Y. > - X grows the map and writes (commits) something to the DB. That > MDB_meta gets the new mapsize. > - Y writes to the DB. It does not get MDB_MAP_RESIZED like the doc > says, nor does it carry forward X's MDB_meta.mm_mapsize change. The doc says the caller of set_mapsize is required to make sure there are no active transactions when it is called. As such, X failed this requirement, and this sequence of events is explicitly unsupported. If Y doesn't start its write txn until after X finishes, then Y will see the new size. > - Process Z opens the environment without doing set_mapsize(), > and gets the original mapsize from the MDB_meta written by Y. > > For that matter, from reading the doc I'd expect a mapsize change to > commit a txn with the new mapsize. There's no mention that the change > (and the MDB_MAP_RESIZED) will wait for something to be committed. > > mdb_txn_commit() writes nothing if the txn didn't change anything. > It needs to notice that there is a mapsize change to write. > > The doc talks about shrinking the map, but reduced mapsizes are not > written to the datafile. Only increases are written. > > All in all, it looks to me like _changing_ the mapsize should be an > operation on a write transaction or invoke a write transaction, while > setting the size or catching up with a mapsize change can be an > environment operation. That way it would be possible to make sense of > it. A txn can do it when it has no cursors and no dirty WRITEMAP > pages (or WRITEMAP could spill all pages first). > > BTW, I don't see the point of conditionally avoiding to write the > mapsize in mdb_env_write_meta() when full page gets written to disk > anyway - as long as txn_begin() stashes the mapsize from the original > meta so it knows what to write. (It need not obey the mapsize at that > point, but it must carry a change forward.) > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Should have enclosed a test program. Here: http://folk.uio.no/hbf/testsize.c Output: process #0: open env process #1: open env process #0: write txn, mapsize 204800 process #1: write txn, mapsize 40960 process #2: open env, write txn, mapsize 40960 process #3: open env, write txn, mapsize 204800 Summarizing a reply which got sent privately instead of to ITS: On 2014-01-24 00:07, Howard Chu wrote: > The doc says the caller of set_mapsize is required to make sure there > are no active transactions when it is called. As such, X failed this > requirement, and this sequence of events is explicitly unsupported. No, I'm talking about X changing the mapsize when there is no txn, then afterwards doing a write txn so the mapsize gets written. > If Y doesn't start its write txn until after X finishes, then Y will > see the new size. It doesn't, that's the point. Only txn_commit pays attention, and it checks the size which existed in the wrong metapage. Come to think of it, set_mapsize(); env_open(); txn begin/end should do the same. Y will either way not pay attention to the new mapsize. -- Hallvard
changed notes changed state Open to Test moved from Incoming to Software Bugs
Hallvard Breien Furuseth wrote: > Should have enclosed a test program. Here: > http://folk.uio.no/hbf/testsize.c > Output: > process #0: open env > process #1: open env > process #0: write txn, mapsize 204800 > process #1: write txn, mapsize 40960 > process #2: open env, write txn, mapsize 40960 > process #3: open env, write txn, mapsize 204800 > > > Summarizing a reply which got sent privately instead of to ITS: > > On 2014-01-24 00:07, Howard Chu wrote: >> The doc says the caller of set_mapsize is required to make sure there >> are no active transactions when it is called. As such, X failed this >> requirement, and this sequence of events is explicitly unsupported. > > No, I'm talking about X changing the mapsize when there is no txn, > then afterwards doing a write txn so the mapsize gets written. > >> If Y doesn't start its write txn until after X finishes, then Y will >> see the new size. > > It doesn't, that's the point. Only txn_commit pays attention, > and it checks the size which existed in the wrong metapage. > > Come to think of it, set_mapsize(); env_open(); txn begin/end > should do the same. Y will either way not pay attention to > the new mapsize. > Fixed now in mdb.master -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 09/07/14 04:19, hyc@symas.com wrote: > Fixed now in mdb.master I'm afraid it's broken worse now. Splitting this ITS up in several component problems, copied in part from IRC. The recent fixes seem broken. Mapsize is no longer checked against the highest written page number. And if one MDB_env shrinks the mapsize, another co-existing MDB_env will not have MDB_RESIZING set, so txn_renew0 will not return MDB_MAP_RESIZED nor will write_meta resize the mapsize up again. Thus it will run with a bigger mapsize than the MDB_env which shrunk mapsize, and it can write pages beyond what that other MDB_env has mapped. I think LMDB 0.9.13 tried to implement this: mdb_env_set_mapsize() can grow but not shrink the mapsize written to metapages. lmdb writes the biggest mapsize it ever saw, to the metapages. But an MDB_env can run with a smaller mapsize than the metapage says, as long as the snapshots do not use pages beyond it. So a metapage mapsize is just a default. Seems a rule we can live with, but if so it should be documented. However it did not and does not always write a new mapsize to both metapages. In LDMB 0.9.13, a fix would be for mdb_env_write_meta() to always write mapsize = max(other metapage:mapsize, env:mapsize): http://folk.uio.no/hbf/mapsize.0.9.13.patch In current LMDB, it would be http://folk.uio.no/hbf/mapsize.patch I have not tested this against the test program in this ITS. Getting late.
Another problem: With an MDB_RDONLY environment on Windows, env->me_mapsize does not seem to affect the actual map size or vice versa. mdb_env_map() does /* Don't set explicit map size, use whatever exists */ If the user sets a mapsize bigger than the current file, and another process resizes the map and grows the DB into the new size, then the RDONLY process will not discover in mdb_txn_renew0() that its map is too small. If there is a technical reason for the current behavior, maybe it could create the full map (for the address space) but record the smaller mapped size in MDB_env, and then mdb_txn_renew0 could do some mutex-protected magic to map in the new file segment if the file grew. That should avoid surprising the user.
h.b.furuseth@usit.uio.no wrote: > Another problem: > > With an MDB_RDONLY environment on Windows, env->me_mapsize > does not seem to affect the actual map size or vice versa. > mdb_env_map() does > /* Don't set explicit map size, use whatever exists */ > > If the user sets a mapsize bigger than the current file, That is impossible on Windows. The OS will not allow a mapsize bigger than the filesize. > and another process resizes the map and grows the DB into > the new size, then the RDONLY process will not discover > in mdb_txn_renew0() that its map is too small. > > If there is a technical reason for the current behavior, That change was made to accomodate opening of an mdb_copy'd DB file in read-only mode. Since mdb_copy only copies the used pages, the resulting file will be smaller than the configured mapsize. If you attempt to map this file as-is with the configured mapsize, it will fail. You would have to resize the file, but since the env is opened read-only, you should not make any such change to the file. We could alternatively change mdb_copy to explicitly set the filesize to match the mapsize at the end of the copy, and then this particular hack wouldn't be needed. > maybe it could create the full map (for the address space) > but record the smaller mapped size in MDB_env, and then > mdb_txn_renew0 could do some mutex-protected magic to > map in the new file segment if the file grew. That > should avoid surprising the user. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On 25/07/14 12:54, Howard Chu wrote: > h.b.furuseth@usit.uio.no wrote: >> Another problem: >> >> With an MDB_RDONLY environment on Windows, env->me_mapsize >> does not seem to affect the actual map size or vice versa. >> mdb_env_map() does >> /* Don't set explicit map size, use whatever exists */ >> >> If the user sets a mapsize bigger than the current file, > > That is impossible on Windows. The OS will not allow a mapsize bigger > than the filesize. Clarification: If the Windows user sets MDB_env.me_mapsize bigger than the file size, then as far as I can tell the read- only MDB_env will think the map is bigger than it is. Then: >> and another process resizes the map and grows the DB into >> the new size, then the RDONLY process will not discover >> in mdb_txn_renew0() that its map is too small. (...) > > That change was made to accomodate opening of an mdb_copy'd DB file in > read-only mode. Since mdb_copy only copies the used pages, the resulting > file will be smaller than the configured mapsize. If you attempt to map > this file as-is with the configured mapsize, it will fail. You would > have to resize the file, but since the env is opened read-only, you > should not make any such change to the file. > > We could alternatively change mdb_copy to explicitly set the filesize to > match the mapsize at the end of the copy, and then this particular hack > wouldn't be needed. With the several ways to play games with mapsize, the hack will be needed anyway. So I expect lmdb should implement and document: On Windows, read-only environments ignore the configured mapsize and instead use the file size as of the last mdb_env_set_mapsize() or mdb_env_open(). MDB_envinfo.me_mapsize will show that size. [So does MDB_env.me_mapsize.] For a bigger map, grow the file first - e.g. by opening the environment with write access and a bigger mapsize. Setting filesize=mapsize: MDB_env.me_mapsize or MDB_meta.mm_mapsize? (I don't really care though, since the "set mapsize=filesize" suggestion addresses my original concern.) OTOH mdb_copy seems the natural place for a "shrink mm_mapsize" option, since it can only grow in an existing file. E.g. set it to the minimum possible or to MDB_env.me_mapsize and let the user grow it to a bigger size later if needed.
fixed in mdb.master
changed state Test to Closed