[Date Prev][Date Next]
libmdb freelist and overflow pages
Quanah Gibson-Mount wrote:
--On Monday, November 05, 2012 10:10 AM -0800 Howard Chu <email@example.com>
So the issue is how to find a contiguous run of pages large enough to
satisfy the overflow page, in the current freelist. This takes us into
the realm of malloc algorithms, first-fit/best-fit/..., etc.
I think first we scan whatever freelist we have in memory, to see if a
suitable run of pages is already present.
If not, and there are additional freelists still available:
1) we could just merge all of them, and then search again
2) merge one at a time, and search again
Leaning toward #2, I suspect we don't need to coalesce all freelists all
I like the sound of #2 as well. If you come up with a patch, I can test. ;)
Well, just like last LinuxCon, we've had some new input from this LinuxCon.
Theodore Ts'o (ext4 lead developer) raised the topic of Erase Blocks on
flash-based storage devices. If we can ensure that our page allocations are
aligned with the Erase Block size of the data store, we'll get higher write
throughput on SSDs, MMC cards, etc. Erase Blocks are commonly 32KB or 64KB
today, with 128KB coming soon.
So, we may want to think about chunking up our page allocations into
power-of-two chunks. Perhaps as a separate environment flag setting.
Even if we don't explicitly try to form 64K chunks all the time, it may be
best for us to fully coalesce all available free lists whenever a request for
an overflow page arrives. So our default case of single-page requests will
continue as before, overflow pages will have some chance of reusing old pages,
and otherwise they'll just use new pages, as they currently do.
Interestingly, the Red Hat folks expressed a desire to adopt MDB in RPM, which
currently uses BerkeleyDB. Ironically, they'd like an option to run in pure
append-only mode, to allow rolling back to previous state if one package of a
large upgrade fails, and the user decides to abandon the entire upgrade.
It would be simple enough to add an environment flag for append-only mode,
which would skip all of the freelist management entirely. Not interested in
looking at that yet, can address it later if/when movement happens on the RPM
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/