[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Accessing random rows from LMDB

On 09/02/15 20:16, Sravan Kumar Reddy Javaji wrote:
1) Is there anyway that I can find the total number of records in LMDB.

mdb_stat -a <database>.

2) Can I access all the rows from LMDB randomly instead of sequentially.


I know that it is better to read sequentially from LMDB and then later
randomize the records. But I have around 1 million records in LMDB, I
cant upload entire data to memory at once. I am planning to read data
batch wise into memory and perform some operation on it. So, I am
wondering, is there anyway that I can read the data randomly from LMDB

Make a random permutation of the integers [1..number of records].
Walk the DB with mdb_cursor_get:MDB_<FIRST/NEXT>, associate each
record with an ID from the permutation.  Or something like that.

To avoid massacring your cache, avoid following the data.mv_data
pointer at this stage.  (Only relevant when nodes are > 1/2 OS page
so the data items are stored in overflow pages rather than next to
the keys.)  Unless you preprocess your entries and write them to
a file at this stage, then just record (file position, size).

Now process your records ordered by ID, that'll be your random walk.

Don't know what "associate a record with an ID" will be for you.
If you have a read-only copy of your database, maybe just build
a 32 Mbyte array of (offset of key, size, offset of data, size)
for each record, save that to a file, and bypass LMDB.  Offsets
relative to MDB_envinfo.me_mapaddr. Otherwise, maybe build a
named database with {key = record ID, data = original key}.