[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB usage with long or variable length keys



Alex V. wrote:
I am new to key-value stores. I would like to use it like big persistent
associative array and I want to be able to use long length keys. For
example file path or URL. What technique should I use to implement such
dictionary on top of LMDB? Probably some kind of hashing and collision
resolution? Or just recompile with MDB_MAXKEYSIZE=2047 or something like
this? Or LMDB is a wrong tool for this job?

Probably some combination of the above. Recompile with larger MAXKEYSIZE will make everything else easier.

Mozilla/Firefox/Seamonkey/whatever stores complete URLs in a history database, and uses the entire URL (including any URL parameters) as the key. This is an extremely wasteful design; first of all the URL params aren't meaningful when you just want to look back and see what sites you visited. Also, when storing keys that are inherently a hierarchical structure, you should store them hierarchically. You should look at the back-mdb code and the back-hdb design paper for how to do this efficiently.

http://www.openldap.org/conf/odd-sfo-2003/howard-dev.html

e.g., when storing a long pathname /some/very/long/path/name you should use a sequence of keys instead of a single key:

01some00
02very01
03long02
04path03
05name04

Each key is composed of its own unique ID, the path component, and the ID of its parent. (There's more to it, read the design paper.)

This saves space, particularly when you have a lot of items within a related hierarchy. e.g.
  /some/very/long/other/path
  /some/very/long/path/foo
  /some/very/long/other/file

The common components of each path only need to be saved once. It takes multiple DB lookups to retrieve the record for a given pathname, but individual lookups are extremely fast because the keys are short.

In SQLightning I used the hashing approach - save the first 56 bytes of a key as-is, for anything beyond that take a hash and append it to the first 56 bytes. This is reasonably fast but loses ordering properties of the keys. My next version will probably decompose keys into components just like the above, for space and speed reasons.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/