Full_Name: Timur Krist�f Version: LMDB latest master OS: Fedora 20 URL: Submission from: (NULL) (94.248.245.60) Hi, I'm the author of the Node.js LMDB binding and I've found an interesting issue that might be useful to share with other people who develop bindings or applications on LMDB. When you operate on a dbi using a cursor and accidentally close the dbi before committing the transaction of the cursor, mdb_txn_commit will return MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't actually have anything to do with what the error message suggests. ("Too big key/data, key is empty, or wrong DUPFIXED size") I suggest to either add a new kind of error code for this situation or to extend the description of MDB_BAD_VALSIZE. Here's a small test case which will demonstrate it: http://pastebin.com/hHwD1srf Best regards, Timur Krist�f
On Sat, 2014-03-22 at 22:49 +0000, timur.kristof@gmail.com wrote: > When you operate on a dbi using a cursor and accidentally close the dbi before > committing the transaction of the cursor, mdb_txn_commit will return > MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't actually > have anything to do with what the error message suggests. ("Too big key/data, > key is empty, or wrong DUPFIXED size") > > I suggest to either add a new kind of error code for this situation or to extend > the description of MDB_BAD_VALSIZE. Let's see -- here are the options I can see (after an IRC chat): 1) Extending the MDB_BAD_VALSIZE description. Simple. However: This is documented as a user error, and it's one which lmdb won't always catch. If you also called mdb_dbi_open() and the DBI got reused, lmdb silently updates the wrong named database. So I'm a bit wary about both of these fixes: Documenting this case can be misleading if it makes users expect it to be caught. Unless people know better - it corresponds to EBADF for using a closed file descriptor, not caught if open() reused it. 2) Catch this case and instead return a generic "something is wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE. And maybe assert(did not happen) to teach users to avoid this. Needs to be done where md_name is used: Commit/drop(named DB), page_search(stale named DB), cursor_touch(clean named DB). Or pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE" statements to detect this. After playing around a bit, I guess this is my favorite. 3) Leave the misleading message alone, which can send the user off to a wild goose chase (as described on IRC). Or we can add code to also catch close followed by open: 4) Put hash(DB name) in MDB_db.md_pad and verify it before overwriting a named DB. Costs execution time even when there's no error. Must be done in commit, drop and maybe cursor_touch. (I've pushed a demo patch for just commit.) 5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number, incremented when dbi_open creates (not reuses) a DBI and in dbi_close. Copy it to a new malloced array txn->mt_dbiseqs[] in write txns, verify it in at least commit and drop. This means close()-open(same DB) will also be caught, which seems a good thing since with (4) it will work unreliably: Only if open() reuses the same DBI for the same DB.
I wrote: > Let's see -- here are the options I can see (after an IRC chat): > (...) > 2) Catch this case and instead return a generic "something is > wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE. > (...) Duh, I forgot (6) Like (2) but create a new error code after all. Then we can document in its description that it's not caught reliably. That wasn't hard:-)
Hi, For further information, see the discussion here: https://github.com/Venemo/node-lmdb/pull/20 We genuinely thought that there had been something wrong with our value sizes somewhere, so I wasted several hours debugging in the wrong place. Then, by accident I discovered that it's in fact an issue with V8's GC. I definitely suggest to choose one of the solutions that don't cost any execution time when there is no error. Perhaps mdb_dbi_close could return an error (or crash) when the dbi is still used by a cursor. That'd eliminate the issue but it'd also mean an API/ABI break. Best regards, Timur On Tue, Mar 25, 2014 at 10:37 AM, Hallvard Breien Furuseth < h.b.furuseth@usit.uio.no> wrote: > On Sat, 2014-03-22 at 22:49 +0000, timur.kristof@gmail.com wrote: > > When you operate on a dbi using a cursor and accidentally close the dbi > before > > committing the transaction of the cursor, mdb_txn_commit will return > > MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't > actually > > have anything to do with what the error message suggests. ("Too big > key/data, > > key is empty, or wrong DUPFIXED size") > > > > I suggest to either add a new kind of error code for this situation or > to extend > > the description of MDB_BAD_VALSIZE. > > Let's see -- here are the options I can see (after an IRC chat): > > 1) Extending the MDB_BAD_VALSIZE description. Simple. However: > > This is documented as a user error, and it's one which lmdb > won't always catch. If you also called mdb_dbi_open() and the > DBI got reused, lmdb silently updates the wrong named database. > > So I'm a bit wary about both of these fixes: Documenting this > case can be misleading if it makes users expect it to be caught. > > Unless people know better - it corresponds to EBADF for using a > closed file descriptor, not caught if open() reused it. > > > 2) Catch this case and instead return a generic "something is > wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE. > And maybe assert(did not happen) to teach users to avoid this. > > Needs to be done where md_name is used: Commit/drop(named DB), > page_search(stale named DB), cursor_touch(clean named DB). Or > pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE" > statements to detect this. > > After playing around a bit, I guess this is my favorite. > > > 3) Leave the misleading message alone, which can send the user > off to a wild goose chase (as described on IRC). > > > Or we can add code to also catch close followed by open: > > 4) Put hash(DB name) in MDB_db.md_pad and verify it before > overwriting a named DB. Costs execution time even when there's > no error. Must be done in commit, drop and maybe cursor_touch. > (I've pushed a demo patch for just commit.) > > 5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number, > incremented when dbi_open creates (not reuses) a DBI and in > dbi_close. Copy it to a new malloced array txn->mt_dbiseqs[] > in write txns, verify it in at least commit and drop. > This means close()-open(same DB) will also be caught, which > seems a good thing since with (4) it will work unreliably: > Only if open() reuses the same DBI for the same DB. >
timur.kristof@gmail.com wrote: > Here's a small test case which will demonstrate it: > http://pastebin.com/hHwD1srf As an aside, this is some pretty horrible code. Why are you using MDB_GET_CURRENT after all of your cursor_get calls? -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Fri Mar 28 2014 18:36:11 GMT+0100 (CET), Howard Chu wrote: > timur.kristof@gmail.com wrote: > > > Here's a small test case which will demonstrate it: > > http://pastebin.com/hHwD1srf > > As an aside, this is some pretty horrible code. Why are you using > MDB_GET_CURRENT after all of your cursor_get calls? It truly is horrendous. I wrote it in an attempt to (roughly) reproduce what one of my examples did in javascript. Then when I discovered what the issue really was, I forgot to clean it up before pastebinning. I hope it's still okay as a testcase for this bugreport :) > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ > -- Timur Sent from my Jolla
changed notes changed state Open to Test moved from Incoming to Software Bugs
h.b.furuseth@usit.uio.no wrote: > 5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number, > incremented when dbi_open creates (not reuses) a DBI and in > dbi_close. Copy it to a new malloced array txn->mt_dbiseqs[] > in write txns, verify it in at least commit and drop. > This means close()-open(same DB) will also be caught, which > seems a good thing since with (4) it will work unreliably: > Only if open() reuses the same DBI for the same DB. This is now fixed in mdb.master, using basically this approach. There is a master array of DBI sequence numbers in the environment and another array in each write transaction. It is verified anywhere the dbx->md_name would get used. The sequence number is incremented in dbi_open and when a handle is closed. Note - it is only checked in write txns, and only a few places (drop, commit, cursor_touch, page_search) ever check. There is a new error code MDB_BAD_DBI as well. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
fixed in mdb.master
changed state Test to Closed