[Date Prev][Date Next]
Re: large write amplification
Леонид Юрьев wrote:
I will try to answer briefly, without a details:
- To allow readers be never blocked by a writer, LMDB provides a
snapshot of data, indexes and directory for each completed
- Most of a db-pages (which is not changed by a particular
transaction) are "shared" between such snapshots. But any changes of
data itself and reflection to btree-indexes (include a particular
table, free-db, main-db and so forth) require a new pages to be used
and written to the disk.
- In a large db a small "one-byte" change may make "dirty" a lot of
db-pages (usualy 4K each). For example, one add/del/mod operation in
LDAP-db with size of few GB, requires about 50-100 page-level IOPS.
Correct, up to this last point. The degree of amplification is greatly
The number of pages touched depends on the height of the B+tree, which
is O(logN) of the number of records. Even a tree of multiple terabytes
is unlikely to reach beyond a height of 5.
The minimum write amplification may be on the order of 8 pages for a
trivial write. But it also tends to be the maximum write amplification too.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging
benefits of write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
The LIFO approach inherently breaks the safety guarantees of the LMDB
concurrency design, as I have already explained.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/