(Answer) (Category) OpenLDAP Faq-O-Matic : (Category) OpenLDAP Software FAQ : (Category) Installation : (Answer) What are the different backends? What are their differences?
OpenLDAP Software includes a number of backends which may be used with slapd(8). It may not be obvious which ones to use. Most of the backends offer very diverse sets of features, so it's not meaningful to directly compare them to each other.
  • back-bdb, back-hdb, and back-ldbm are the "primary" storage database backends. These backends manage directory objects in an embedded database and are more fully featured than other backends. back-hdb is generally superior to back-bdb (especially as back-hdb supports subtree renames) but tends to require larger caches than back-bdb. back-ldbm is obsolete and should not be used.
  • back-ldap and back-meta are special purpose backends designed to forward (proxy) requests to other remote servers.
  • back-monitor is a status monitoring backend that gives operating statistics on slapd(8) itself.
  • back-null does nothing. It is the LDAP equivalent of /dev/null.
  • back-passwd is a piece of demonstration code whose main purpose is to illustrate the backend interface. It happens to do this by mapping queries onto /etc/passwd, somewhat like an LDAP version of finger.
  • back-perl and back-shell are interfaces to external scripts written in their respective languages. Obviously, since each of these languages offer the ability to spawn external programs, these backends are essentially interfaces to any kind of code you'd care to write in any language of your choice. back-shell is generally viewed as deprecated in favor of back-perl.
  • back-dnssrv is a special purpose backend that maps search queries with DNs of the form dc=foo,dc=com into DNS queries to return a URL for the LDAP server that handles the specified DNS domain. It is essentially an LDAP server locator. It is experimental in nature. See (Xref) OpenLDAP LDAP Root Service for more information.
  • back-sql is also a RDBMS backend, mapping LDAP queries into SQL queries. It's still experimental in nature.
back-perl and back-shell are directly comparable in purpose and function. However, as back-shell suffers from a number of limitations (doesn't support threads, is not extensible, etc.), back-perl is generally recommended over back-shell.

back-ldap and back-meta are directly comparable as back-meta is a proper superset of back-ldap and back-ldap code is shared with back-meta.

back-bdb, back-hdb and back-ldbm are comparable in purpose. back-bdb evolved from experience gained from back-ldbm, but the two are quite distinct today. back-hdb is a further refinement of back-bdb and most considerations for back-bdb apply equally to back-hdb. back-bdb and back-ldbm both store entries based on a 32-bit entry ID key, and they use a dn2id table to map from DNs to entry IDs. They both perform attribute indexing using the same code, and store index data as lists of entry IDs. As such, the LDAP-specific features they offer are nearly identical. The differences are in the APIs used to implement the databases. back-ldbm uses a generic database API that can plug into GDBM, MDBM, BerkeleyDB (BDB), or any other database package that supports the (key,data) pair style of access. While BerkeleyDB supports this generic interface, it also offers a much richer API that has a lot more power and a lot more complexity. back-bdb is written specifically for the Berkeley DB Transactional Data Store API. That is, back-bdb uses BDB's most advanced features to offer transactional consistency, durability, fine-grained locking, and other features that offer improved concurrency, reliability, and useability.

With back-ldbm, there is no fine-grain database locking. This means write operations are serialized. And while multiple read operations may be performed concurrently, they cannot be performed concurrently with any write operation. Additionally, LDBM databases can be accessed by only one program at a time (generally at the file level). (While one may be able to bypass the locking mechanism, you will likely corrupt the database (and/or obtain bogus information).)

With back-bdb, databases are locked on a page level, which means that multiple threads (and processes) can operate on the databases concurrently. In OpenLDAP 2.1.4 we lifted the restriction against using the slap tools while slapd is running on back-bdb. You can perform online backups using slapcat or BDB's db_dump utility without interrupting your LDAP service. You still must not use slapadd or slapindex while slapd is running (due to application-level caching in slapd(8)). Note that the alock feature added in OpenLDAP 2.3 automatically prevents slapadd or slapindex from being used while slapd is running.

Using BDB's transaction logging means that every modification request is logged in a separate log file before any database files are modified. If the server crashes in the middle of an update, you can recover easily with no data loss or corruption. Barring catastrophic disk hardware failures, when the database returns "success" for an update operation, you know that the update was completed cleanly on disk.

There are many other differences between the two that are really only visible in the code itself. For example, back-ldbm stores entries in LDIF format, and back-bdb stores them in a binary format that is 3-4 times faster to read and write. back-ldbm's index management is reminiscent of filesystem inodes, with direct blocks and indirect blocks, and individual index blocks are malloc'd and free'd on demand. back-bdb's index management is much simpler, and blocks are malloc'd and free'd much less frequently, which again yields better performance.

As a historical note, the back-ldbm code is a direct descendant of the original University of Michigan code. The age of the code and its byzantine data structures were becoming unmaintainable, and since back-bdb has proven itself to be more reliable, the decision was made to delete back-ldbm from the code base.
hyc@openldap.org, Kurt@OpenLDAP.org, quanah@openldap.org

Perhaps some anecdotal information may help people see the difference between ldbm and bdb. We were happily using ldbm as a backend at Columbia, and getting searches responses in about .03 seconds. However, at various points in the day a program would make a large number (300+) of add/modifies to our OpenLDAP server and search times would suddenly jump to 3 or 4 seconds. After switching to bdb things improved drastically. Even with the same large amounts of add/modifies occurring, search times only increase to about .04 seconds.
As a further testimony to limitations of ldbm/gdbm, the ldbm/gdbm combination also has a 2GB filesize limitation that can leave one with a corrupted directory! If gdbm attempts to write any file past the 2 GB filesize (2*1024*1024*1024 = 2147483648 bytes), it will abort and die, which will then cause slapd to die. The symptoms are that a file is 2147483647 (2 GB - 1) bytes and slapd runs for a little while but dies when a write is attempted. There is nothing in the log file and nothing prints out unless you run slapd with the -d option so that it doesn't fork. Then you see the gdbm error saying that it was unable to write to the file (but it doesn't tell you which one). If you look at the end of the 2 GB file, for example 'tail id2entry.gdbm', you'll see that it probably was interrupted in the middle of the write, so now it's corrupted also.

Any directory with reads and writes will have gdbm files typically much larger than the amount of data they contain due to the "sparse files" design of gdbm. In my specific case, a 2 GB id2entry.gdbm shrunk down to 32 Megs when it was restored. Since my corrupted file was the id2entry.gdbm file, it was most severe because this file is the main data store, all of the other *.gdbm files are indexes. Since that main file was corrupted, I could not recover my data with 100% certainty.

How to restore these files: In my specific case, I had slave ldap servers, so I had a copy of the directory before the corruption occurred. This is because the slapd dying on the master also prevented it from writing the info to the slurpd replog, so the data never replicated out to the slaves. I performed the following steps:

1) stop slurpd on the master (slapd had already died),
2) stop slapd on one of the slaves,
3) slapcat to an ldif file,
4) rsync the ldif over to the corrupted master,
5) save a copy of the corrupted directory db files,
6) delete the corrupted directory db files,
7) slapadd the ldif file (which creates new directory db files),
8) change ownership to user slapd runs as (ldap:ldap in my case),
9) delete the replog (replication) files,
10) stop slapd on all slaves,
11) rsync the new files out to the slaves,
12) restart the slapd daemon on master,
13) restart slapd daemons on slaves,
14) restart slurpd daemon on master.

Unfortunately, this is only a quick fix. The root problem is that my directory db files could grow beyond a point where performing the above steps can fix it. The correct fix is to convert to a db that doesn't have this limitation. The correct fix is to convert to Berkeley DB, preferrably 4.2.52 with patches (see the mailing list archives). But the above steps will get you out of a tight spot and give you enough breathing room to get the OpenLDAP server back running and give you time to plan a migration.
[Append to This Answer]
Previous: (Answer) Are third party thread packages supported?
Next: (Answer) Which version of BerkeleyDB should I use?
This document is: http://www.openldap.org/faq/index.cgi?file=756
[Search] [Appearance]
This is a Faq-O-Matic 2.721.test.
© Copyright 1998-2013, OpenLDAP Foundation, info@OpenLDAP.org