[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: back-mdb notes



--On Saturday, March 05, 2011 5:05 AM -0800 Howard Chu <hyc@symas.com> wrote:

I've been working on a new "in-memory" B-tree library that operates on an
mmap'd file. It is a copy-on-write design; it supports MVCC and is immune
to corruption and requires no recovery procedure. It is not an
append-only design, since that requires explicit compaction, and also is
not amenable to mmap usage. Also the append-only approach requires total
serialization of write operations, which would be quite poor for
throughput.

My experience with back-(bdb/hdb) and syncrepl was the only reliable way to ensure consistent replication was to use delta-syncrepl which... serializes write operations. In fact, not forcing serialized writes for back-(bdb/hdb) was slower than serializing things, because of all the contention in the database. I understand this may not hold true for back-mdb, but thought I would note that currently our best performance is already achieved by serialization, write-wise.

re: configuring the size of the DB file - this is most likely not a value
that can be changed on an existing DB. I.e., if you configure a DB and
find that you need to grow it later, you will probably need to
slapcat/slapadd it again. At DB creation time the file is mmap'd with
address NULL so that the OS picks the address, and the address is
recorded in the DB. On subsequent opens the file is mmap'd at the
recorded address. If the size is changed, and the process' address space
is already full of other mappings, it may not be possible to simply grow
the mapping at its current address. Since the DB records contain actual
memory pointers based on the region address, any change in the mapping
address would render the DB unusable.

How exactly does the DB file size for back-mdb relate to the existing size of the database? Do they have to match? I.e., is this more like the DB_CONFIG cachesize, which can be more or less than the database size, or are they supposed to be an exact match? We have plenty of customers who have databases that are certainly not static in size. Particularly if you are using an accesslog databases for delta-syncrepl or other operations.

--Quanah

--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration