[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB: Delete data after MDB_MAP_FULL



opensource@gmx-topmail.de wrote:
In an app with quite a few installs the LMDB data file grew to our maximum size of 1.5 GB. This was due to a programming error in the app. Because this affects a significant amount of users the general question is how to resolve the issue and delete the superfluous data. Obvious problem is that in LMDB you cannot delete data without growing the data file.

That's not entirely true; if you already have free pages available, it is possible to delete data without using any new pages at all.

Straight forward approach is to increase the max file size (mdb_env_set_mapsize) to let's say 2 GB and delete the superfluous data. However because this a mobile app, it is not always granted that additional disk space is available on all devices.

That's true regardless of whether you're on a mobile device.

Three questions:
1) Would an additional 0.5 GB be relatively safe to have enough space to delete around 1.2 GB of data (around 10M K/V pairs)? As far as I understood, the extra space is required to build up a new tree with creating new nodes on the fly. Because most of the K/V pairs will be deleted in that process, I would expect the tree to build a completely new set of branch pages but still using the existing leaf pages (data values). Is that correct?

Not quite. If you're deleting one key at a time, then a new leaf page still needs to be created, even if it will eventually be emptied. However, once the page is emptied it can be reused for whatever page is needed next.

2) Would it make any sense to delete in superfluous data in multiple transactions (let's say 10)? Smaller change sets should grow the file size less, but it would only worth the extra effort if the size saving are significant.

Yes, multiple smaller transactions means the free pages will be recycled and reusable more quickly, so the overall file growth should be minimal.

3) Any additional thoughts or tricks that come to your mind in this scenario?

If you're deleting a lot of keys in sequence, it will be slightly faster to delete them from the tail forward.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/