[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: back-bdb performance

To: "Ganesan R" <rganesan-ldap@myrealbox.com>, <openldap-devel@OpenLDAP.org>
Subject: RE: back-bdb performance
From: "Howard Chu" <hyc@highlandsun.com>
Date: Thu, 6 Dec 2001 02:52:09 -0800
Importance: Normal
In-reply-to: <ueaher4mydz.fsf@andlx-anamika.cisco.com>

> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Ganesan R

> Thanks, these results are quite promising!
>
> > back-bdb, a couple days ago
> > ldadd 6.400u 1.230s 6:34.97 1.9%      0+0k 0+0io 1561pf+0w
> > slapd 136.410u 223.560s 6:38.83 90.2% 0+0k 0+0io 3193pf+0w
>
> > back-bdb, with the newer entry_encode/decode routines
> > ldadd 6.600u 1.040s 7:02.39 1.8%      0+0k 0+0io 2354pf+0w
> > slapd 153.930u 245.990s 7:06.63 93.7% 0+0k 0+0io 3172pf+0w

> > .... (snipped)

> Why is there an increase in slapd user time between the above two runs? I
> thought your newer entry_encode/decode routines should reduce the time.

The current entry_encode/decode uses half the disk space of my previous
version, at a slightly higher CPU cost because it always takes two mallocs
per entry. (Previous version used twice as much disk, but only used 0 or 1
malloc per entry.) I went looking for a more disk-efficient approach when
I started investigating why the transaction logs were so huge, thinking that
overflow pages in the id2entry database were a major part of the problem. It
turns out they were a problem, but not the main one.

> > For write operations, it's quite obvious that all of those
> index updates are
> > costing a lot in terms of CPU time and I/O operations. The cost
> arises from
> > the transaction logging. The actual volume of data that slapd
> is managing is
> > reasonably small; with my first entry_encode/decode routines
> the id2entry
> > database was around 20MB for the 10000 entries. The transaction logs
> > generated from loading those entries was over 1.5GB. With my current
> > entry_encode/decode routines, the id2entry database is now down to about
> > 10MB, but the transaction logs were still over 1.2GB. After I
> eliminated the
> > DN_SUBTREE index for the backend's suffix, as you can see there was a
> > dramatic savings in time, and the transaction logs only totalled 520MB.
>
> Wow, I am surprised that the transaction log grow to 520MB with attribute
> indexing disabled. Is most of this due to just the dn2entry indexing?

Yes, there is no other indexing being done besides that.

> > I'm currently investigating an alternate indexing layout in the
> hopes that I
> > can reduce the transaction cost even further. Ultimately I would like to
> > find a way to make the transaction cost effectively disappear. I have
> > another variant of back-bdb that uses a hierarchical data
> structure, thus
> > completely eliminating the dn2id database. I have just gotten it into
> > working order today, and run it successfully through the test
> suite. For the
> > same 10000 entries, this backend loads them in only 27 seconds.
>
> > back-hdb
> > ldadd 5.960u 0.800s 0:27.44 24.6%     0+0k 0+0io 265pf+0w
> > slapd 16.030u 2.270s 0:29.94 61.1%    0+0k 0+0io 2995pf+0w
>
> > The transaction logs for loading these 10000 entries amount to
> only 20.8MB.
> > The id2entry database and tree structure consume only 9.8MB.
> > Of course, this backend still uses the existing indexing
> layout, so as soon
> > as you turn on attribute indices the transaction overhead
> skyrockets again.
> > So I'm returning my attention to the index management for now...
>
> This is really good! I'll watch out for your index management changes.

bdb 10000, new indexing
ldadd 6.170u 0.750s 0:33.51 20.6%     0+0k 0+0io 644pf+0w
slapd 20.760u 2.420s 0:35.93 64.5%    0+0k 0+0io 3084pf+0w

Pretty good, finally faster than back-ldbm. The transaction log size is
27MB. The database size is about 12MB (id2entry + dn2id). Using this in
back-hdb will be just about ideal.

As for search/read speed, I don't have a good metric yet. My current runs of
test008-concurrency all execute in about 52 seconds, no matter if it's
back-ldbm, back-hdb, or (back-bdb with new index code). All of the
search/read iterations finish quickly, and the bulk of the time is the
add/del task.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Follow-Ups:
- Re: back-bdb performance
  - From: Markus Storm <Markus.Storm@mediaWays.net>
- RE: back-bdb performance
  - From: Pedro Jose Marron <pjmarron@informatik.uni-freiburg.de>

References:
- Re: back-bdb performance
  - From: Ganesan R <rganesan-ldap@myrealbox.com>

Prev by Date: Re: back-bdb performance
Next by Date: Re: back-bdb performance
Index(es):
- Chronological
- Thread