[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: deferring operation: awaiting write





--On Tuesday, August 24, 2004 1:17 PM -0400 John Borwick <borwicjh@wfu.edu> wrote:

Quanah Gibson-Mount wrote:


--On Tuesday, August 24, 2004 8:53 AM -0400 John Borwick
<borwicjh@wfu.edu> wrote:

Hello.  Has anyone seen a syslog message like "deferring operation:
awaiting write", especially right before the server crashes?

I see this message on our *secondary*, which should send referrals to
people who try to write.  Is it reasonable to assume the only process
that could be writing is slurpd on our primary?

We're running 2.2.13 w/ bdb 4.2.52 (patches applied).  You can see our
spec files at http://www.wfu.edu/~borwicjh/spec/ .

Thanks!


I would be more interested to know what your DB_CONFIG file has in it.
And your slapd.conf on the replica. :)

True. :) OK, here's the DB_CONFIG and slapd.conf. I'm pretty sure the stuff in the "include" files for slapd.conf don't affect how BDB operates.

We didn't have any problems for a few weeks.  Does it sound like the
culprit is BDB, threading, ... something else?  Would 2.2.15 have any
patches that might help?

Looking at the slapd code, specifically in "connection.c":

   /* Don't process requests when the conn is in the middle of a
    * Bind, or if it's closing. Also, don't let any single conn
    * use up all the available threads, and don't execute if we're
    * currently blocked on output. And don't execute if there are
    * already pending ops, let them go first.  Abandon operations
    * get exceptions to some, but not all, cases.
    */
   if (tag != LDAP_REQ_ABANDON && conn->c_conn_state == SLAP_C_CLOSING) {
       defer = "closing";
   } else if (tag != LDAP_REQ_ABANDON && conn->c_writewaiter) {
       defer = "awaiting write";

It looks like something is waiting to write. You note that your slave has been crashing, have you been running db_recover after it crashes, before you restart slapd? I've had problems in the past where slapd crashed and there were write locks left in the database. You have to run db_recover to clear those out. You can also run:

db_stat -C A

in your database environment on the slave, and see if there are any WRITE locks being held.

--Quanah

--
Quanah Gibson-Mount
Principal Software Developer
ITSS/Shared Services
Stanford University
GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html