[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: Sleepycat and hash functions



> -----Original Message-----
> From: owner-openldap-software@OpenLDAP.org
> [mailto:owner-openldap-software@OpenLDAP.org]On Behalf Of Kurt D. Zeilenga

> At 11:56 AM 12/14/2002, Hallvard B Furuseth wrote:
> >http://www.openldap.org/faq/index.cgi?file=756 says one can use BDB's
> >db_dump utility for on-line backups while slapd is running.

> >The manpage continues:
> >
> >  Dumping and reloading Btree databases that use
> user-defined prefix or
> >  comparison functions will result in new databases that use
> the default
> >  prefix and comparison functions.  In this case, it is quite likely
> >  that the database will be damaged beyond repair permitting neither
> >  record storage or retrieval.
> >
> >Since db_dump is recommended, I take it back-bdb does not use Btree
> >databases?
>
> It does... but not with a user-defined comparison function.
> User-defined comparison function is only defined for index
> databases (which use DB_HASH).  So, if they are slower, one
> can always recreate them using slapindex(8).

Actually, there are a number of issues here that will impact a Little-Endian
machine. BerkeleyDB's default comparison functions are all byte-oriented,
like strcmp and memcmp. When comparing integer data stored in Little-Endian
order, the data items will not sort into proper numerical order. The reason
we used a non-default comparison function was to preserve the proper sort
order without changing the stored byte-order. Note that on a Big-Endian
machine there are no problems whatsoever.

The non-default comparison function affects both the id2entry database (which
is a Btree) and the index databases. Even though the index databases are
keyed with Hashes, their data are numeric (lists of entry IDs) using the
Sorted Duplicates feature. I believe, on a Little-Endian machine, using
db_dump on an index database will fail because the data items "are out of
order."

The id2entry database is keyed on entry ID, and it does indeed use a
non-default comparison function. db_dump on a Little-Endian machine should
fail there too.

A similar problem used to exist in back-ldbm, but it only affected the
id2entry database, and it was fixed by byteswapping the entry IDs before
reading/writing entries.

Given the large volume of byteswapping that would be needed to correct this
sorting issue for back-bdb's index databases, I chose to instead use an
alternate compare function. I think we should update the documentation and
note that db_dump/db_load must not be used on Intel and other Little-Endian
machines.

I'm pretty sure all of this was discussed on the -devel list 'way back when
we were implementing...

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support