[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Dnyamic Re: commit: ldap/servers/slapd entry.c proto-slap.h slap.h zn_malloc.c

David Boreham wrote:

Since the kernel knows who has what memory, it seems like
it'd be easier to just have it tell the applications when to
grow/shrink their usage.

I agree. But this goes somewhat against the philosophy of a VM-based OS, and I've seldom seen an easy way to get that advice from the kernel.

It is shown in the experiment that the BDB cache exhibits significantly
better performance than the entry cache once swapping occurs. It is not easy
to pinpoint the exact differences because I'm not BDB expert. The main
difference I attribute to the difference in performance is the difference in
the locality of page references in BDB and entry caches. The actual working
set size of the BDB cache seems a lot smaller than that of the entry cache.

BDB hashes the pages in the mpool. You could try hashing rather than
AVL trees in the entry cache. Entries in the entry cache aren't optimized
for storage efficiency, so it's reasonable to expect the memory footprint
per entry to be larger always.

Actually entries in the back-bdb entry cache are rather more efficient than most. It uses a single malloc for the entire entry when reading from the database, as opposed to individual mallocs per data field. Of course it's not page-aligned, but user-level valloc and friends tends to be pretty wasteful so there's not much recourse there.

The other difference is that paging would
happen, but to the region backing file (in the case that you're not using
a private region). Perhaps that paging doesn't show up in the stats you
measured ?

back-bdb always uses a shared region. back-ldbm uses a private region, but nobody cares about back-ldbm.
I wonder about these measurements as well.

Also, what about using a pure shared memory region without any backing file? Obviously that would just page to the swap partition if it needs to be paged out. (Assuming the OS will do so; older Solaris versions never paged shared memory as I recall.)

The new entry cache design in the adaptive caching also reduces the working
set size of the entry cache since the zone memory heap only contains Entry
and DBT structures separating them from the EntryInfo AVL tree in the normal
heap. Since it becomes efficient to resize the entry cache with the
zone-based adaptive cache, one can rely more on the entry cache by shifting
memory from BDB cache to the entry cache in configurations where it is not
easy to avoid swapping from occurring.

Reminds me a bit of the old object databases and pointer swizling.
Not sure those things ever performed very well though.

Would it be useful to be able to re-size the BDB mpool via the CKRM
policy ? (to avoid the shutdown and re-start of the database : although
your paper says that the overhead is low, if the region has to be shared
the underlying file will need to be created and on some OS'es that can
be quite expensive). BDB has the capability to have multiple mpools
in use, and I suspect it wouldn't be too hard to have it create new ones
on-the-fly. Shrinking its cache presumably can be done by simply
avoiding touching a subset of the already comitted pages (unless the goal
is to reduce page file usage too).

Again, using a pure shared memory region would avoid most of this overhead.

I wonder if it really needs to create new regions on the fly though. More likely you would begin execution with a maximal set, and de-commit a region as memory pressure increased.

This is starting to sound like Multics and multiple segment linking.

 -- Howard Chu
 Chief Architect, Symas Corp.       Director, Highland Sun
 http://www.symas.com               http://highlandsun.com/hyc
 Symas: Premier OpenSource Development and Support