[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: [LMDB] Append To Value



Yucheng Low wrote:
I see, correct me if I misunderstood: you are suggesting to use duplicate keys
to store the vector?
In other words, if I have an neighbor list for vertex 5, comprising of values
[1,2,3], I will store it as 3 duplicate values for key 5.

Also, does the LMDB store the key just once for each duplicate value? Or
multiple times once for each value.

The key is only stored once. Duplicating the key would be stupid. Also DUPFIXED values are stored contiguously, with no headers or other overhead. It will be effectively the same amount of storage as storing your single record.

Thanks,
Yucheng

On Tue, Mar 26, 2013 at 11:35 AM, Howard Chu <hyc@symas.com
<mailto:hyc@symas.com>> wrote:

    Yucheng Low wrote:

        Hi,

        I am considering using LMDB to store a dynamic graph structure, and it
        will be
        very helpful to support "append to value" having the same semantics as
        "append" in Kyoto Cabinet.
        http://fallabs.com/__kyotocabinet/api/__classkyotocabinet_1_1BasicDB.__html#__a23e776e5bd1e3c5caa0f62edffb87__a54
        <http://fallabs.com/kyotocabinet/api/classkyotocabinet_1_1BasicDB.html#a23e776e5bd1e3c5caa0f62edffb87a54>

        For instance, if I were to store an adjacency list representation of a
        graph
        in LMDB, I would use vertex ID as the key, and the value is a vector
        of 64-bit
        IDs. To insert an edge into the graph will simply require appending 8
        bytes to
        the end of the vector. Re-reading and re-writing the entire value will be
        somewhat excessive (and will actually change the asymptotic runtime of the
        insertion).

        This is also one of those situations where a batch append might be
        useful. For
        instance, if I have an edge list on file which is too large to retain in
        memory. To convert it to an adjacency list in LMDB will require
        batch-append.
        Batch-writes are insufficient since that will require me to cache at
        least,
        large parts of the adjacency list in memory, and the original edge
        list could
        be ordered arbitrarily.

        The actual graph representation I am thinking of is somewhat more
        intelligent
        than a straight adjacency list, but similar concepts hold.


    Based solely on the description here, I would use MDB_DUPFIXED and
    MDB_APPENDDUP or MDB_MULTIPLE. You could later use MDB_GET_MULTIPLE to
    read the values and reconstruct the vector.

    --
       -- Howard Chu
       CTO, Symas Corp. http://www.symas.com
       Director, Highland Sun http://highlandsun.com/hyc/
       Chief Architect, OpenLDAP http://www.openldap.org/__project/
    <http://www.openldap.org/project/>




--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/