[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: delta-syncrepl missing changes



On 2/18/09 1:55 PM, Francis Swasey wrote:
On 2/18/09 12:07 PM, Quanah Gibson-Mount wrote:
--On Wednesday, February 18, 2009 9:55 AM -0500 Francis Swasey <Frank.Swasey@uvm.edu> wrote:

Is there anything that should be logged that would help identify the
failure (I'm currently using loglevel of "stats sync" on the master and
all the replicas) ?

Some further digging into this and I see that the changes this morning to
at least one of these entries are not present in the accesslog database.
No wonder the change didn't make it to the replica's, it didn't even make
it into the accesslog on the master (although auditlog sees the change
and the dc=uvm,dc=edu database on the master has the change).


Any suggestions on where to look in the accesslog overlay to see why
these modify operations are not being recorded?

Well, I'd generally check db_stat -c to make sure you didn't run out of locks/lockers/lock objects in the accesslog DB to start. I assume you have reasonable logging on the master, so that you can see if any errors were thrown when the MODs that didn't get written out occurred, etc.

The number of locks, lockers, and lock objects are all still at the default 1000. The maximum numbers are 54, 124, and 40 (respectively). So, I think I'm safe there.


As I said, I'm logging stats and sync. The modify that didn't make it into the accesslog happened at 1234942530 (2:35:30 EST this morning). The only "interesting" thing logged around that time was:

connection_input: conn=139425 deferring operation: too many executing

which happened at 2:34:58, 2:35:01, 2:35:04, 2:35:09, 2:35:11, 2:35:12, 2:35:16, 2:35:21, 2:35:27, 2:35:31, and 2:35:35. (a total of 11 times)

conn 139425 was the ldapmodify command which was connected from 2:34:57 until 2:35:38 that was performing the 1321 changes (559 adds, 1 delete, and 761 modifications).

Assuming my loglevel is high enough to catch the problem -- that looks like it.


The deferring operation messages do not seem to be related. There were several of those same messages this morning, but my audit of all the replicas this morning shows that none of them are missing any information.


Any suggestions about where to look for more information or a different loglevel to use on the master to catch this?

--
Frank Swasey                    | http://www.uvm.edu/~fcs
Sr Systems Administrator        | Always remember: You are UNIQUE,
University of Vermont           |    just like everyone else.
  "I am not young enough to know everything." - Oscar Wilde (1854-1900)