Issue 7825 - LMDB: misleading error message
Summary: LMDB: misleading error message
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-22 22:49 UTC by timur.kristof@gmail.com
Modified: 2020-03-12 15:54 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description timur.kristof@gmail.com 2014-03-22 22:49:25 UTC
Full_Name: Timur Krist�f
Version: LMDB latest master
OS: Fedora 20
URL: 
Submission from: (NULL) (94.248.245.60)


Hi,

I'm the author of the Node.js LMDB binding and I've found an interesting issue
that might be useful to share with other people who develop bindings or
applications on LMDB.

When you operate on a dbi using a cursor and accidentally close the dbi before
committing the transaction of the cursor, mdb_txn_commit will return
MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't actually
have anything to do with what the error message suggests. ("Too big key/data,
key is empty, or wrong DUPFIXED size")

I suggest to either add a new kind of error code for this situation or to extend
the description of MDB_BAD_VALSIZE.

Here's a small test case which will demonstrate it:
http://pastebin.com/hHwD1srf

Best regards,
Timur Krist�f
Comment 1 Hallvard Furuseth 2014-03-25 09:37:11 UTC
On Sat, 2014-03-22 at 22:49 +0000, timur.kristof@gmail.com wrote:
> When you operate on a dbi using a cursor and accidentally close the dbi before
> committing the transaction of the cursor, mdb_txn_commit will return
> MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't actually
> have anything to do with what the error message suggests. ("Too big key/data,
> key is empty, or wrong DUPFIXED size")
> 
> I suggest to either add a new kind of error code for this situation or to extend
> the description of MDB_BAD_VALSIZE.

Let's see -- here are the options I can see (after an IRC chat):

1) Extending the MDB_BAD_VALSIZE description. Simple. However:

This is documented as a user error, and it's one which lmdb
won't always catch.  If you also called mdb_dbi_open() and the
DBI got reused, lmdb silently updates the wrong named database.

So I'm a bit wary about both of these fixes: Documenting this
case can be misleading if it makes users expect it to be caught.

Unless people know better - it corresponds to EBADF for using a
closed file descriptor, not caught if open() reused it.


2) Catch this case and instead return a generic "something is
wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE.
And maybe assert(did not happen) to teach users to avoid this.

Needs to be done where md_name is used: Commit/drop(named DB),
page_search(stale named DB), cursor_touch(clean named DB).  Or
pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE"
statements to detect this.

After playing around a bit, I guess this is my favorite.


3) Leave the misleading message alone, which can send the user
off to a wild goose chase (as described on IRC).


Or we can add code to also catch close followed by open:

4) Put hash(DB name) in MDB_db.md_pad and verify it before
overwriting a named DB.  Costs execution time even when there's
no error.  Must be done in commit, drop and maybe cursor_touch.
(I've pushed a demo patch for just commit.)

5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number,
incremented when dbi_open creates (not reuses) a DBI and in
dbi_close.  Copy it to a new malloced array txn->mt_dbiseqs[]
in write txns, verify it in at least commit and drop.
This means close()-open(same DB) will also be caught, which
seems a good thing since with (4) it will work unreliably:
Only if open() reuses the same DBI for the same DB.

Comment 2 Hallvard Furuseth 2014-03-25 10:08:57 UTC
I wrote:
> Let's see -- here are the options I can see (after an IRC chat):
> (...)
> 2) Catch this case and instead return a generic "something is
> wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE.
> (...)

Duh, I forgot

(6) Like (2) but create a new error code after all.  Then we can
document in its description that it's not caught reliably.  That
wasn't hard:-)


Comment 3 timur.kristof@gmail.com 2014-03-26 08:22:04 UTC
Hi,

For further information, see the discussion here:
https://github.com/Venemo/node-lmdb/pull/20
We genuinely thought that there had been something wrong with our value
sizes somewhere, so I wasted several hours debugging in the wrong place.
Then, by accident I discovered that it's in fact an issue with V8's GC.

I definitely suggest to choose one of the solutions that don't cost any
execution time when there is no error.

Perhaps mdb_dbi_close could return an error (or crash) when the dbi is
still used by a cursor. That'd eliminate the issue but it'd also mean an
API/ABI break.

Best regards,
Timur




On Tue, Mar 25, 2014 at 10:37 AM, Hallvard Breien Furuseth <
h.b.furuseth@usit.uio.no> wrote:

> On Sat, 2014-03-22 at 22:49 +0000, timur.kristof@gmail.com wrote:
> > When you operate on a dbi using a cursor and accidentally close the dbi
> before
> > committing the transaction of the cursor, mdb_txn_commit will return
> > MDB_BAD_VALSIZE. This is quite misleading because the problem doesn't
> actually
> > have anything to do with what the error message suggests. ("Too big
> key/data,
> > key is empty, or wrong DUPFIXED size")
> >
> > I suggest to either add a new kind of error code for this situation or
> to extend
> > the description of MDB_BAD_VALSIZE.
>
> Let's see -- here are the options I can see (after an IRC chat):
>
> 1) Extending the MDB_BAD_VALSIZE description. Simple. However:
>
> This is documented as a user error, and it's one which lmdb
> won't always catch.  If you also called mdb_dbi_open() and the
> DBI got reused, lmdb silently updates the wrong named database.
>
> So I'm a bit wary about both of these fixes: Documenting this
> case can be misleading if it makes users expect it to be caught.
>
> Unless people know better - it corresponds to EBADF for using a
> closed file descriptor, not caught if open() reused it.
>
>
> 2) Catch this case and instead return a generic "something is
> wrong" - MDB_BAD_TXN or MDB_INCOMPATIBLE.
> And maybe assert(did not happen) to teach users to avoid this.
>
> Needs to be done where md_name is used: Commit/drop(named DB),
> page_search(stale named DB), cursor_touch(clean named DB).  Or
> pass F_SUBDATA inwards and tweak the "return MDB_BAD_VALSIZE"
> statements to detect this.
>
> After playing around a bit, I guess this is my favorite.
>
>
> 3) Leave the misleading message alone, which can send the user
> off to a wild goose chase (as described on IRC).
>
>
> Or we can add code to also catch close followed by open:
>
> 4) Put hash(DB name) in MDB_db.md_pad and verify it before
> overwriting a named DB.  Costs execution time even when there's
> no error.  Must be done in commit, drop and maybe cursor_touch.
> (I've pushed a demo patch for just commit.)
>
> 5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number,
> incremented when dbi_open creates (not reuses) a DBI and in
> dbi_close.  Copy it to a new malloced array txn->mt_dbiseqs[]
> in write txns, verify it in at least commit and drop.
> This means close()-open(same DB) will also be caught, which
> seems a good thing since with (4) it will work unreliably:
> Only if open() reuses the same DBI for the same DB.
>
Comment 4 Howard Chu 2014-03-28 17:36:11 UTC
timur.kristof@gmail.com wrote:

> Here's a small test case which will demonstrate it:
> http://pastebin.com/hHwD1srf

As an aside, this is some pretty horrible code. Why are you using 
MDB_GET_CURRENT after all of your cursor_get calls?

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 5 timur.kristof@gmail.com 2014-03-28 19:18:54 UTC
On Fri Mar 28 2014 18:36:11 GMT+0100 (CET), Howard Chu wrote:
> timur.kristof@gmail.com wrote:
> 
> > Here's a small test case which will demonstrate it:
> > http://pastebin.com/hHwD1srf
> 
> As an aside, this is some pretty horrible code. Why are you using 
> MDB_GET_CURRENT after all of your cursor_get calls?

It truly is horrendous. I wrote it in an attempt to (roughly) reproduce what one of my examples did in javascript. Then when I discovered what the issue really was, I forgot to clean it up before pastebinning. I hope it's still okay as a testcase for this bugreport :) 

 
>    -- Howard Chu
>    CTO, Symas Corp.           http://www.symas.com
>    Director, Highland Sun     http://highlandsun.com/hyc/
>    Chief Architect, OpenLDAP  http://www.openldap.org/project/
>

-- 
Timur

Sent from my Jolla

Comment 6 Howard Chu 2014-07-08 21:36:34 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 7 Howard Chu 2014-07-09 04:36:05 UTC
h.b.furuseth@usit.uio.no wrote:
> 5) Add field MDB_dbx.md_dbiseq = DBI usage sequence number,
> incremented when dbi_open creates (not reuses) a DBI and in
> dbi_close.  Copy it to a new malloced array txn->mt_dbiseqs[]
> in write txns, verify it in at least commit and drop.
> This means close()-open(same DB) will also be caught, which
> seems a good thing since with (4) it will work unreliably:
> Only if open() reuses the same DBI for the same DB.

This is now fixed in mdb.master, using basically this approach. There is a 
master array of DBI sequence numbers in the environment and another array in 
each write transaction. It is verified anywhere the dbx->md_name would get 
used. The sequence number is incremented in dbi_open and when a handle is closed.

Note - it is only checked in write txns, and only a few places (drop, commit, 
cursor_touch, page_search) ever check. There is a new error code MDB_BAD_DBI 
as well.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 8 OpenLDAP project 2014-08-01 21:04:51 UTC
fixed in mdb.master
Comment 9 Quanah Gibson-Mount 2014-12-11 01:06:58 UTC
changed state Test to Closed