[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: dissection of search latency



Yes. your assumptions are correct.
Both of them use the same Sleepycat BDB 3.3 and use btree.
The test system is Pentium III 1GHz (1CPU)  with 512MB of RAM,
and there is no swapping during test.

When back-bdb have entry caches, search performance will be of the same
order
as back-ldbm, even better if caches are made more efficient,
because search operations do not have to be transaction protected.

- Jong

------------------------
Jong Hyuk Choi
jongchoi@us.ibm.com
IBM Thomas J. Watson Research Center
Enterprise Linux Group


"Howard Chu" <hyc@highlandsun.com>@OpenLDAP.org on 2001-11-14 06:35:55 PM

Sent by:  owner-openldap-devel@OpenLDAP.org


To:   "Lawrence Greenfield" <leg+@andrew.cmu.edu>,
      <openldap-devel@OpenLDAP.org>
cc:
Subject:  RE: dissection of search latency




> -----Original Message-----
> From: owner-openldap-devel@OpenLDAP.org
> [mailto:owner-openldap-devel@OpenLDAP.org]On Behalf Of Lawrence
> Greenfield

> A search with no indexing isn't a very interesting benchmark; this
> isn't what we should be optimizing for.

Unfortunately back-bdb's indexing support is still incomplete so doing it
any other way wouldn't make a very interesting benchmark either. Besides,
slowdowns revealed here will still be relatively slow in the indexed case.
But yes, it will also be necessary to analyze the timing for indexed
searches using some large number of non-sequential queries.
>
> Berkeley db with txns is fairly slow at iterations, though 0.5 seconds
> to iterate across 10000 entries seems pretty bad.  Is there any
> locking going on besides the internal locks in Berkeley db?  (On the
> plus side, the Berkeley db code will allow multiple iterators
> simultaneously.)

No, back-bdb does no locking of its own, everything is left up to Berkeley
db. Makes me wonder about the approach though; I don't see back-bdb being
viable as a general-purpose backend with such a high performance cost. Yes,
some people will sleep better at night knowing they can recover from
catastrophic disk failure, but those data-paranoid people also run
redundant
hardware already, and don't need to be coddled by overprotective software.

> Also, Berkeley db does internal caching; no I/O goes on the second
> time around anyway.

True... However, it's probably still better to keep data in the slapd
internal format than in the disk format, especially for frequently
re-accessed entries.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

---------------------------------------------------------------------------

Fascinating. Assuming that both backends were built using the same version
of Berkeley DB, I'd guess the slowdown in back-bdb is due to transaction
management. Except, the search code is not transaction-protected, so it's
hard to say what the real issue is. I believe in both cases Berkeley DB is
used with the Btree access method, so again the difference in size and
access times is rather odd.

Out of curiosity, how much RAM was available on the system during these
tests? I'll assume that swapping activity was zero at all times?

When the back-bdb database was built, were all of the log files committed
and then removed? I'm not sure it has any relevance to runtime performance,
but obviously it consumes disk space. (I don't recall if ext2fs cares, but
Berkeley Fast Filesystem performance always degraded if the usage went
above
90%.)

A couple of the times are actually slower in the warm case, and it may not
just be a measurement anomaly since it appears for both backends. I wonder
why that is.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support