[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: openldap.git branch mdb.master updated. 0ce6bb4be0034120c850917bc4f59b4d4efc1432

openldap-commit2devel@OpenLDAP.org writes:
> commit aff2693fc0721df4ccb6ceb357f80501c413ed38
> Author: Howard Chu <hyc@symas.com>
> Date:   Mon Dec 10 12:16:50 2012 -0800
>     ITS#7455 simplify
>     Don't try to reclaim overflow pages while operating on
>     the freelist (for now). The circular dependencies are much like
>     the single-page case, but worse. Maybe look into this in the
>     future, but it's not absolutely necessary now.

Suggestions to reduce freelist changes during commit:

Let a freelist entry steal page numbers listed in the next entries.
Then mdb_page_alloc can grab more old pages without deleting/updating
their entries and producing new dirty pages. Next txn does the updates.

Preallocate the final MDB_oldpages with MDB_RESERVE in mdb_txn_commit()
and leave some room to spare.  Then use page numbers from it and/or
steal new ones at need.

BTW, could MDB offer an MDB_RESERVE2 which says "give me data->mv_size
bytes plus as much more as will fit without growing the page"?
And MDB_RESERVE2_SHRINK which shrinks the size to the final size.

Stolen pages -- one way would be to search for particular pages to seal,
and list the stolen ones at the end of the freelist entry.
Or: Stealing only from the end of the previous entry/entries should be
simpler, but doesn't let us choose some specific pages to steal in order
to gain a big enough contiguous page range:
  typedef struct MDB_freelist_entry { /* freelist entry in the DB */
      short mf_len;               /* saved length */
      short mf_stolen_entries;    /* #fully stolen entries  */
      short mf_nextlen;           /* 0 or remaining length of next entry */
      MDB_ID mf_pages[];          /* length mf_len. */
  } MDB_freelist_entry;
Thus, if the free DB contains
    (txnid_t)123 => { .mf_stolen_entries = 1, .mf_nextlen = 7 }
    (txnid_t)124 => { ... }
    (txnid_t)125 => { .mf_len = 20 }
then mdb should henceforth skip entry#124 and entry#125.mf_pages[7..19].

A simple variant of page ranges, to save space and simplify range handling:
  /* Page range: (pagecount << MDB_PGNO_BITS) | (pageno + pagecount) */
  typedef pgno_t mdb_pages_t;

Lone pages get pagecount=1.  With MDB_PGCOUNT_BITS = (64bit 4 ? 19 : 12)
and page size 4096, that limits MDB to a 128 petabyte DB and 2G entry
size.  Or 4G database and 16M entry size on 32-bit machines.  (I'd call
limiting the entry size a bonus compared to today's mdb: The current
freelist doesn't exactly handle 2 billion freed pages gracefully.)