[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5171) hdb txn_checkpoint failures



Full_Name: Aaron Richton
Version: 2.3.38
OS: Solaris 9
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (68.196.250.105)


Just noticed that my syslog files were growing faster than usual. Upon further
inspection, two slaves have multiple hdb databases corrupt. Both slave{4,6} have
been (and are) running slapd since September 4. All are running patched BDB
4.2.52 (same binaries I've been using throughout the whole 2.3 series). All
DB_CONFIGs have DB_LOG_AUTOREMOVE set. Messages similar to below are spewing out
every checkpoint interval, which is the root cause of my logs growing unusually.
I'm inclined to just zap all the databases and start again (they're only
slaves), but figured I'd post for tracking and to ask if there's anything that
can be grabbed out of the running process before I do so. Curiously enough,
base4 only corrupted on slave4, not slave6. Additionally, there are other
databases hosted on each slave that appear unaffected.


The first indication of trouble:

Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base1): DB_ENV->log_flush: LSN of 1/8730339 past current end-of-log of
1/188113
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base1): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base1): entryCSN.bdb: unable to flush page: 0
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base1): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base2): DB_ENV->log_flush: LSN of 54/1636114 past current end-of-log of
4/2981780
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base2): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base2): entryUUID.bdb: unable to flush page: 0
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base2): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base3): DB_ENV->log_flush: LSN of 1/600564 past current end-of-log of 1/662
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base3): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base3): cn.bdb: unable to flush page: 0
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base3): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base4): DB_ENV->log_flush: LSN of 3/2765493 past current end-of-log of
1/539
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base4): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base4): uid.bdb: unable to flush page: 0
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug]
bdb(base4): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base1): DB_ENV->log_flush: LSN of 1/8730401 past current end-of-log of
1/188113
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base1): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base1): entryCSN.bdb: unable to flush page: 0
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base1): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base2): DB_ENV->log_flush: LSN of 54/1634334 past current end-of-log of
4/1649467
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base2): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base2): entryUUID.bdb: unable to flush page: 0
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base2): txn_checkpoint: failed to flush the buffer cache Invalid argument
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base3): DB_ENV->log_flush: LSN of 1/600564 past current end-of-log of 1/538
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base3): Database environment corrupt; the wrong log files may have been
removed or incompatible database files imported from another environment
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base3): cn.bdb: unable to flush page: 0
Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug]
bdb(base3): txn_checkpoint: failed to flush the buffer cache Invalid argument