[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: BDB, endian



> -----Original Message-----
> From: Gertjan van Wingerde [mailto:gwingerde@home.nl]

> Howard,
>
> I would be very much in favor of such a change. The main problem I'm having
at
> the moment in having to maintain LDAP servers on multiple platforms are
that
> the databases are not portable across platforms. The issues are:

Database portability has never been a priority.

> 1. The ID's do not have a fixed, platform-independent, data-type associated
>     with them, i.e. on 32-bit platforms the ID's are 32-bit integer values
and
>     on 64-bit platforms the ID's are 64-bit integer values.

This implies standardizing on either 32 or 64 bit, neither of which is
appropriate for the opposite platform. Using 64-bit IDs on a 32-bit machine
will hurt performance. Using 32-bit IDs on a 64-bit platform will hurt
scalability.

> 2. The ID's do not have a fixed, platform-independent, representation
format
>      associated with them when they are stored in the database, i.e.
>      byte-orders differ between little-endian machines and big-endian
>      machines.

This issue is of little concern in itself. My point was simply to enable
using the standard db_dump/db_load tools. Of course, that now reminds me that
we use a custom hash function in the Hash databases, so we're already stuck
in that area.

> These 2 issues prevent me from being able to just copy the databases across
to
> the different platforms whenever a database corruption occurs, or I need to
> resync the database contents with an other database. The method I have to
use
> now (running a slapcat on the source machine and a slapadd on the
destination
> machine) takes way too long for the multi-million entries LDAP directory
I'm
> maintaining.

No. You can not use raw cp, dd, tar etc. to move these files around on a live
system. You must use slapcat or db_dump in order to get a consistent image.
On a live system, db_dump is not a proper solution anyway, because it only
dumps a single BDB file at a time. You will get a consistent copy of that one
file, but the other files may be changing during that time. With back-bdb,
you only need to backup the id2entry file, and the rest can be regenerated
using slapindex. But then you're still stuck regenerating the files. If you
do raw copies of the individual files, they are not guaranteed to be in sync
with each other. The only safe way to get a consistent hot backup is to use
slapcat.

> I would be very pleased if these issues could be resolved. I guess your
> suggestion already takes care of issue 2, but I would be very happy if we
> could fix issue 1 while we are at it.

> By the way, we should do this for both the bdb backend as the ldbm backend.

back-ldbm was fixed a long time ago to use big-endian IDs on-disk for the
id2entry database, which is the only place that matters there. Due to
back-bdb's design, this issue affects a lot more of the underlying databases.

> Just my $0.02.
>
>     Best regards,
>
>         Gertjan.
>
> On Tuesday 18 November 2003 21:19, Howard Chu wrote:
> > back-bdb currently uses custom sort functions with its
> Btree databases
> > because otherwise the integer entry IDs don't sort correctly on
> > little-endian (e.g. Intel) systems. The use of custom sort
> functions means
> > it's unsafe to use SleepyCat's db_dump/db_load to
> backup/restore databases
> > on little-endian machines.
> >
> > An alternative is to byteswap all of the entry IDs when
> they go to/from the
> > database. I think this may impose a slight performance
> cost, but it would
> > make the generic tools safe for use. Any thoughts on
> whether to make this
> > change?

I guess the point of making db_dump/db_load "safe" is moot since the index
databases still use a custom hash function.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support