[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Sleepycat and hash functions

To: "'Kurt D. Zeilenga'" <Kurt@OpenLDAP.org>, "'Hallvard B Furuseth'" <h.b.furuseth@usit.uio.no>
Subject: RE: Sleepycat and hash functions
From: "Howard Chu" <hyc@highlandsun.com>
Date: Sat, 14 Dec 2002 19:49:11 -0800
Cc: <openldap-software@OpenLDAP.org>
Importance: Normal
In-reply-to: <5.2.0.9.0.20021214124826.04a8aa38@127.0.0.1>

> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Kurt D. Zeilenga

> At 11:56 AM 12/14/2002, Hallvard B Furuseth wrote:
> >http://www.openldap.org/faq/index.cgi?file=756 says one can use BDB's
> >db_dump utility for on-line backups while slapd is running.

> >The manpage continues:
> >
> >  Dumping and reloading Btree databases that use
> user-defined prefix or
> >  comparison functions will result in new databases that use
> the default
> >  prefix and comparison functions.  In this case, it is quite likely
> >  that the database will be damaged beyond repair permitting neither
> >  record storage or retrieval.
> >
> >Since db_dump is recommended, I take it back-bdb does not use Btree
> >databases?
>
> It does... but not with a user-defined comparison function.
> User-defined comparison function is only defined for index
> databases (which use DB_HASH).  So, if they are slower, one
> can always recreate them using slapindex(8).

Actually, there are a number of issues here that will impact a Little-Endian
machine. BerkeleyDB's default comparison functions are all byte-oriented,
like strcmp and memcmp. When comparing integer data stored in Little-Endian
order, the data items will not sort into proper numerical order. The reason
we used a non-default comparison function was to preserve the proper sort
order without changing the stored byte-order. Note that on a Big-Endian
machine there are no problems whatsoever.

The non-default comparison function affects both the id2entry database (which
is a Btree) and the index databases. Even though the index databases are
keyed with Hashes, their data are numeric (lists of entry IDs) using the
Sorted Duplicates feature. I believe, on a Little-Endian machine, using
db_dump on an index database will fail because the data items "are out of
order."

The id2entry database is keyed on entry ID, and it does indeed use a
non-default comparison function. db_dump on a Little-Endian machine should
fail there too.

A similar problem used to exist in back-ldbm, but it only affected the
id2entry database, and it was fixed by byteswapping the entry IDs before
reading/writing entries.

Given the large volume of byteswapping that would be needed to correct this
sorting issue for back-bdb's index databases, I chose to instead use an
alternate compare function. I think we should update the documentation and
note that db_dump/db_load must not be used on Intel and other Little-Endian
machines.

I'm pretty sure all of this was discussed on the -devel list 'way back when
we were implementing...

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

Follow-Ups:
- RE: Sleepycat and hash functions
  - From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>

References:
- Re: Sleepycat and hash functions
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>

Prev by Date: Re: Sleepycat and hash functions
Next by Date: Re: backup of only the delta changes
Index(es):
- Chronological
- Thread