[Date Prev][Date Next] [Chronological] [Thread] [Top]

[LMDB] getting MDB_CORRUPTED when deleting within a DUPSORT database



Hi,

I am using version 0.9.20 on Linux (Ubuntu derivates, uname see [1], [2]). One of the databases is used as an index to another database and thus has been created using the MDB_DUPSORT. Running my software in a test environment, about 33 million entries were generated in this database. In order to falsify a suspicion that my software would not perform housekeeping correctly, I copied the LMDB file to my workstation and forced my software to delete all "legal" entries in order to see whether any entries remain. Unfortunately, I got an MDB_CORRUPTED during the delete operation on that database.

Some more details: The deletion takes place in multiple steps, calling a function that deletes ranges in databases multiple times. The code is as follows (leaving some boilerplate code away):


    unsigned int dbFlags;

    int error = mdb_dbi_flags (txn, dbi, &dbFlags);

    // [...]

    bool isDupSort = dbFlags & MDB_DUPSORT;

    error = mdb_cursor_open (txn, dbi, &cursor);

    // [...]

    error = mdb_cursor_get (cursor, &ckey, &cdata, MDB_SET_RANGE);

    while (error != MDB_NOTFOUND)
    {
      // [...]

      int compResult = mdb_cmp (txn, dbi, &ckey, &ekey);

      if (compResult > 0 || !compResult && !endIsInclusive)
        break;

      error = mdb_cursor_del (cursor, isDupSort ? MDB_NODUPDATA : 0);

      // [...]

      error = mdb_cursor_get (cursor, &ckey, &cdata, MDB_NEXT);
    }

    mdb_cursor_close (cursor);


Is this the correct way to delete the data? The MDB_CORRUPTED error occurs in the mdb_cursor_del call. Other operations on that specific database are mdb_put (with no flags) and mdb_del, supplying both key and data.

One side observation: In a similar test with a lower number of entries, the database was completely emptied. However, the mdb_stat function still reported a larger number of entries for the database (5-6 digit figures). I also use the stat data to estimate the size of the database by adding all page counts and multiplying it by the page size. This puzzles me, as it is lower than I expected (it is roughly the net size of only the data part of the entries).

I guess that the provided information might not be sufficient to find the problem. What additional information would be helpful? How can I test whether the database is already corrupt at the start of the deletion or whether it becomes corrupt during the deletion (I guess the latter)? Shall I attempt to write a specific test case? While I could produce the error a second time with running my software from scratch, but I don't know to which extent the data pattern affects the problem and whether I can artificially reproduce this pattern.

Regards,

Klaus


[1] Linux aaa 4.4.0-65-generic #86-Ubuntu SMP Thu Feb 23 17:49:58 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [2] Linux bbb 4.8.0-42-generic #45-Ubuntu SMP Wed Mar 8 20:06:06 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux