[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#7589) MDB nodesize issues



Full_Name: Hallvard B Furuseth
Version: 2.4.35
OS: Linux x86_64
URL: 
Submission from: (NULL) (2001:700:100:556::233)
Submitted by: hallvard


Nodes of size MDB_env.me_nodemax-1 do not go in overflow pages,
but are too big for two of them to fit in the same page. Because:
- LEAFSIZE() does not round up to a factor of 2.
- MDB_env.me_nodemax is described as /** Max size of a node on a page */
  (i.e. non-overflow page), but actually it's the minimum node size
  which mdb_node_add() *will* put in an overflow page.

I don't see a reason to treat nodes of size 2k-1 and 2k differently
when seen "from the outside", so I think LEAFSIZE should have been

#define LEAFSIZE(k, d) ((1 + NODESIZE + (k)->mv_size + (d)->mv_size) & -2)
Also mdb_leaf_size():
    sz = LEAFSIZE(key, data);
    if (sz >= env->me_nodemax)
        sz = (1 + NODESIZE + sizeof(pgno_t) + key->mv_size) & -2;
    return sz + sizeof(indx_t);
and a corresponding fix in mdb_node_add().

Those changes crash test001.  I don't know what else would need
fixing.  Subpage sizes for one thing.

OTOH if you add a bunch of slightly smaller nodes, mdb will put
most of them in separate pages anyway without MDB_APPEND.  To keep
down rebalancing when adding nodes in the middle?  Or is this
another bug?  So I don't know if more than the me_nodemax doc a
fix.  And maybe a rename to something which fits what it does.

...and also the doc of mdb_branch_size().  "Sizes are always
rounded up to an even number of bytes, to guarantee 2-byte
alignment" happens somewhere, but not in mdb_branch_size() which
happily returns uneven sizes.  But it works fine since it's only
used as SIZELEFT()<mdb_branch_size() where SIZELEFT() does return
an even size. Rounding branch_size up would make no difference.