[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: wasteful data structures: AVL tree



Emmanuel Lécharny wrote:
Le 29/01/15 04:20, Howard Chu a écrit :
ITS#8038 (syncrepl hanging onto its presentlist) only came to my
attention due to the amount of memory involved. On a refresh of a DB
with 2.8M entries I saw the consumer using about 320MB just for the
presentlist. This list consists solely of 16 byte entryUUIDs; 2.8M
items should have used no more than 48MB. An AVL node itself is 28
bytes on 64-bit platform, plus 16 bytes for the struct berval wrapped
around the UUID.

I'm looking into adding an in-memory B+tree library to liblutil. For
the type of fixed-size records we're usually storing in AVL trees, a
Btree will be much more compact and higher performance since it will
need rebalancing far less frequently.

Why using a B+tree ? A hash map wouldn't be a more appropriate data
structure ? EntryUUID ordering seems overkilling...

I'm not fond of hashes, they're always cache-unfriendly and most of them have very poor dynamic growth behavior. Since we don't know in advance how many IDs are being stored, growth/resizing is a major concern. Tree structures are generally preferred because they have very good incremental growth performance, and B+trees have the best CPU cache behavior.

--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/