[Date Prev][Date Next]
Re: mdb fragmentation
- To: Quanah Gibson-Mount <firstname.lastname@example.org>, Geert Hendrickx <email@example.com>
- Subject: Re: mdb fragmentation
- From: Klaus Malorny <Klaus.Malorny@knipp.de>
- Date: Mon, 15 Jan 2018 11:33:15 +0100
- Cc: firstname.lastname@example.org, openldap-technical <email@example.com>
- Content-language: en-US
- In-reply-to: <CA13EA4DE42F0DDC722631FA@[192.168.1.30]>
- References: <20170824115332.GA24591@vera.ghen.be> <WMfirstname.lastname@example.org> <1556375279.52119069.1503621017507.JavaMail.email@example.com> <WMfirstname.lastname@example.org> <CA13EA4DE42F0DDC722631FA@[192.168.1.30]>
- User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:59.0) Gecko/20100101 Thunderbird/59.0a1
On 03.01.18 00:06, Quanah Gibson-Mount wrote:
I wanted to follow up on this, based on doing an examination of Geert's
database, and other affected databases. Geert already has this answer, but it's
useful for the general OpenLDAP community.
This fragmentation problem is not common. It depends entirely on size of the
entries in the database. The issue arises when entries in the LDAP DB are
greater than the LMDB pagesize (Usually 4KB) and then have frequent updates.
This most often occurs in one of two ways:
a) multi-valued attributes with a large number of values
b) a very large single-valued attribute (I.e., binary data)
For the first problem (a), there is code in the 2.5 release to address this
problem, called multival. This feature puts multi-valued attributes with a
(configurable) number of values into its own sub-database. For (b), there's not
really a solutionn, but it's pretty rare.
So for those who have entries that are < 4 KB, they will never see this
problem. Note that this is the size of the binary entry on disk, not the size
of the entry when exported to LDIF. The binary size is generally significantly
smaller than the LDIF version.
I did some own research on this issue in the meantime and gained some more
details about overflow/bigdata: A constant in the LMDB code defines that each
tree page must be able to store at least two tree nodes. So each node may not be
larger than half of the page size (minus the page header size). As the node also
contains the key data, the key contributes to the size of the node. With a
maximum of 511 bytes for the key, only data roughly below 1500 bytes will be
always stored within the tree and not in overflow pages.
In respect to overflow pages, it needs to be considered that they contain a
single header also. Choosing exactly a multiple of the page size as the data
size will thus definitely waste nearly a full page.
Unfortunately, the various constants and calculated values can not be retrieved
via the regular API, so there is no safe way to deal with it from a user's
I have not yet investigated how LMDB stores released runs of pages and what
strategies are used for allocation, specifically, whether only exactly matching
sizes are taken or whether larger runs are broken up. In any way, I do not
expect a fragmentation problem if only data is used which requires only a few pages.
For the project where I am using LMDB, there is a certain likeliness that the
data may be megabytes large. I currently plan to revise the way the data is
stored and to split it up into multiple chunks, each represented by an
individual database entry. The chunks will be dimensioned that the number of
overflow pages will be always a power of two, e.g. 8, 16 and 32 pages, even if
it creates unused space within the chunk. This will of course not stop the
fragmentation, but keep the problem at a much lower level.