[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: Lock is no longer valid / deferring operation



Hi there,

Firstly, many thanks for the replies...

Hi Toby.

For largely historical reasons we run slapd servers on most clients
(this will probably change in the future - I'm just giving this
information as background).

Why?

Why will this change or why did we do it in the first place? I wasn't party to these decisions at the time, so I can't really comment on the reasons for them. I could speculate wildly, but I'd prefer not to.

 We're seeing problems when some of these
machines are busy, particularly, it seems, with memory intensive
activity, although it's hard to substantiate as I generally only see
the machines after they've broken.  It's annoying as I can't reproduce
these problems.

It's going to be hard to pin point then ;-) How much memory/CPU etc. do these clients have and what other services do they provide?

They're typically desktop or lab machines for academics, students, etc. Hardware-wise they're Dell desktop boxes of a few years old - a 2.4GHz processor with 512MB of memory is typical. Something I should have mentioned is that they're running Fedora Core 5, with a few running FC6.

As for what services they provide, general desktop services, but also
could be running long-running or intensive jobs.  Many of the machines
are also in a condor pool and this does seem to cause more problems.

Do you know if slapd gets unhappy if other processes use up lots of
memory?  This is my current line of investigation - I'll try to make
it unhappy by using increasing amounts of memory.

I suppose what I'm trying to determine is - is it the client activity
that's causing problems (i.e. a misbehaving client or similar) or is
it slapd itself getting unhappy for other reasons (possibly due to
resources being used by other programs)?  Or a combination of both?

We see quite a few problems with slapd getting into a state where it's
deferring operations, for whatever reason - I think I understand these
- these are when slapd basically says sorry, I'm too busy doing X, so
I'll defer Y until I have time.  Is this accurate?

Yes. What kind of clients are searching/binding to them? Local?

All local. As for what kind of clients - typical linux desktop activity I suppose. Hard to be specific about this really, as it will change from host to host.

The second case I'm also seeing is bdb complaining about locks being
no longer valid, e.g.

slapd[3780]: bdb(dc=inf,dc=ed,dc=ac,dc=uk): DB_LOCK->lock_put: Lock is no
longer valid

slapd seems to keep going for the time being until getting into a
state where it defers all binding operations and goes into some kind
of spin where it sits at 99% cpu and has to be killed with a -9.

Is everything local? Nothing mounted locally, like NFS for the directory data.

Machines will have both NFS and AFS for home directory data.

I suppose I have a couple of questions about the "Lock is no longer
valid" error....

- What causes it?
- Is it something I can prevent by configuration changes (for
   instance, would increasing the numbers of locks, lockers and objects
   help?)

One for the dev team. I do know this is an error message from Berkeley DB by grepping the source.

Yes, I saw it in the source, but don't know it well enough to be sure of what's causing it.

We're running openldap 2.3.35 with ITS#4924 and ITS#4925 patches with
a bdb backend running 4.2.52 with all 6 recommended patches.

I hope you mean 5, as there are only 5 listed on the Oracle site.

As Quanah said, there are 6.

The only DBCONFIG settings we currently have are:

dbconfig      set_cachesize 0 67108864 1
dbconfig      set_lg_regionmax 262144
dbconfig      set_lg_bsize 2097152

I take it dbconfig is a keyword you've added for this example, as it's not valid.

Sorry, I should have been more specific - this is in slapd.conf - look in the man page for slapd-bdb - this is just a way of getting directives into DB_CONFIG.

Cheers
Toby