[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#6482) slapcat doesn't continue with -c after the first corrupted database entry



> The main point wasn't to tell you how important the data was, but that
> tools like ldapsearch or other tools connecting to slapd using port 389
> still saw the good data.

Because it was cached?  Was this true after restarting the server?

> And slapd never complained about a corrupted
> database. As a result of that the data corruption was hidden (all other
> programs worked, only three non-important entries were not accessible)
> until I tried to use slapcat to export the data for an upgrade.
>
> Regarding "If the underlying database fails, there's little slapcat can
> do.": As long as slapd sees the data, because it sends it in responses
> to ldap-queries, the database hasn't failed so much, that slapcat
> couldn't retrieve those data.

Well, as I said in my previous message, slapcat is asking the underlying
database to provide entries.  If the underlying database fails, there's
little slapcat can do.  When the underlying database is back-bdb/hdb,
slapd accesses data using indexes (dn/filter indexes mapped to IDs; the
entry is then accessed using BDB in DB_SET mode), while slapcat accesses
it using BDB cursors in DB_NEXT mode.  Your approach of guessing IDs in
the hope it works sounds a bit naive; I think in your case it worked by
chance.

Perhaps, a more "robust" approach would be to have back-bdb/hdb
bi_tool_entry_next() to try to use itself the DB_SET approach instead of
the DB_NEXT when the latter fails, with slapcat only knowing about the
failure (so it can abort if -c is not passed).

> Regarding "any garbage coming out of a corrupted database": slapd never
> complained about the database being corrupted, it started without any
> problems (except that three entries were unreadable), the entries after
> those damaged entries made sense and weren't garbage. Berkeley's
> db_tools claimed that there were some errors in the db, but slapd worked
> without problems, only slapcat (and ldapdelete when trying to delete
> those entries) complained that it couldn't decode them (which seems to
> be rather a problem of the application layer than of the database layer,
> but might have been caused by some little db corruption).

When things go wrong, and specifically when this happens outside the code
of our pertinence (in this case BDB), usually the best we can do is to
rely on what any diagnostics/recovery capability they provide.  Again,
what worked for you may not work for another kind of corruption. 
Actually, going on might provide inconsistent data, something as bad as
data loss.

> I tried tools like "db_recover", but they didn't help,

What about catastrophic recovery, using archived log files?

> so I wanted to
> extract the good ldap entries (using slapcat ...) to drop the database
> and recreate it with the exported ldif, which failed because slapcat
> stopped after the first failure (even with -c). Using the submitted
> patch I was able to export all readable data into a ldif-file and to
> recreate the database.
>
> summary:
> * slapd never complained, ldapsearch listed the data, only few entries
> were missing (because for them following was true:
>
> <= entry_decode: slap_str2undef_ad(object&#65533;..!p):
> AttributeDescription
> contains inappropriate characters)
>
> -->  it seems that some attribute-values were damaged, but not the dn?
>
> * if ldapsearch is able to retrieve the data, slapcat should also do

In general, I concur.  However, slapd and tools access data in a slightly
different manner, so this may not be trivial, and might require a
redesign.

> * I don't know how you want to recover an corrupted database. I tried to
> recover all good entries and then to recreate it. And for that I wanted
> to use slapcat (or is there any other tool for exporting data from the
> ldap database than slapcat with option -c)?

None I know about.

> * maybe the way -c works shouldn't be changed, but there should be an
> option for trying data recovery harder without having to patch openldap
> (which is complicated for users of other distros not working directly
> with the sources, but with precompiled packages)

I'd take this as a feature request for redesigning the tools iteration
procedure in order to try to overcome minor db corruptions.

p.