[Date Prev][Date Next] [Chronological] [Thread] [Top]

RE: slapd hangs at 100% cpu in sched_yield (ITS#2030)



I've added some debug msgs to see what locker IDs we're getting and freeing.
We dutifully free all the locker IDs we allocate, there are no mismatches.
The locker IDs go onto a free list, but don't ever get re-used. I think this
is a simple consequence of the fact that locker IDs and transaction IDs must
be kept unique over the life of the database. We may need to re-work our
approach, to allocate one per thread and keep it for the life of the process
so we don't run out so quickly.

There are a couple of known bugs in BDB 4.0's lock management that are
relevant here. You may have to wait for Sleepycat to release BDB 4.1 to get
complete relief from these problems.

  -- Howard Chu
  Chief Architect, Symas Corp.       Director, Highland Sun
  http://www.symas.com               http://highlandsun.com/hyc
  Symas: Premier OpenSource Development and Support

> -----Original Message-----
> From: Steven Wilton [mailto:steven.wilton@team.eftel.com]
> Sent: Thursday, August 22, 2002 11:04 PM
> To: Howard Chu
> Subject: Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)
>
>
>
> I would have thought that the number of lock requests/releases is related
> to database operations.  The number of locks and the number of lockers are
> related to objects used to kep track of locks in the software using the
> bdb library (ie slapd) (I am just guessing here though.
>
> The number of locks in use increses constantly for each request,
> regardless of what the filter is, except for a search on a non-indexed
> field. The number of lockers increses only when objectclass is included in
> the filter.
>
> Steven
> On Fri, 23 Aug 2002 13:28:40 Howard Chu wrote:
> > Still not sure how to interpret this info. The number of lock requests
> > and
> > lock releases
> > are equal, so we're not leaking anything there. So I think perhaps BDB 4
> > has
> > a bug if the
> > current number of locks is still non-zero.
> >
> >   -- Howard Chu
> >   Chief Architect, Symas Corp.       Director, Highland Sun
> >   http://www.symas.com               http://highlandsun.com/hyc
> >   Symas: Premier OpenSource Development and Support
> >
> > > -----Original Message-----
> > > From: Steven Wilton [mailto:steven.wilton@team.eftel.com]
> > > Sent: Thursday, August 22, 2002 7:41 PM
> > > To: Howard Chu
> > > Subject: Re: slapd hangs at 100% cpu in sched_yield (ITS#2030)
> > >
> > >
> > > > No, this is not an unbounded problem. The maximum number of locks
> > needed
> > > > is
> > > > tied to the maximum number of slapd threads. It will also depend on
> > the
> > > > number of attribute indices you have configured. Each index is stored
> > in
> > > > its
> > > > own database, and some number of locks are needed to navigate those
> > > > databases
> > > > as well. You can use db_stat to help arrive at the ideal number of
> > locks
> > > > for
> > > > your configuration. Eventually you will reach a high-water mark and
> > no
> > > > more
> > > > locks will be needed.
> > > >
> > >
> > > I have upgraded to openldap 2.1.4m, and I have looked at the actual
> > search
> > > queries being sent to the ldap server (using ethereal on the loopback
> > > interface).  It looks like different types of ldap queries will result
> > in
> > > a different number of bdb resources being lost in the process.  Look at
> > > the following sequence of commands, and the number of current locks,
> > and
> > > current lockers used at the end of each:
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 101 Last allocated locker ID.
> > > 9       Number of lock modes.
> > > 1000    Maximum number of locks possible.
> > > 1000    Maximum number of lockers possible.
> > > 1000    Maximum number of objects possible.
> > > 24      Current locks.
> > > 27      Maximum number of locks so far.
> > > 61      Current number of lockers.
> > > 62      Maximum number  lockers so far.
> > > 0       Current number lock objects.
> > > 6       Maximum number of lock objects so far.
> > > 306     Number of lock requests.
> > > 306     Number of lock releases.
> > > 0       Number of lock requests that would have waited.
> > > 0       Number of lock conflicts.
> > > 0       Number of deadlocks.
> > > 0       Number of transaction timeouts.
> > > 0       Number of lock timeouts.
> > > 352KB   Lock region size (360448 bytes).
> > > 0       The number of region locks granted after waiting.
> > > 945     The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL
> > > '(&(objectclass=posixAccount)(uidNumber=20678))'
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: (&(objectclass=posixAccount)(uidNumber=20678))
> > > # requesting: ALL
> > > #
> > >
> > > # prolfe, People, q-net, net, au, EFTEL
> > > dn: uid=prolfe,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: prolfe
> > > uid: prolfe
> > > homeDirectory: /home/au/net/q-net/p/prolfe
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Name Withheld
> > > mail: prolfe@q-net.net.au
> > > uidNumber: 20678
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 2
> > > # numEntries: 1
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 107 Last allocated locker ID.
> > > 9       Number of lock modes.
> > > 1000    Maximum number of locks possible.
> > > 1000    Maximum number of lockers possible.
> > > 1000    Maximum number of objects possible.
> > > 25      Current locks.
> > > 28      Maximum number of locks so far.
> > > 66      Current number of lockers.
> > > 67      Maximum number  lockers so far.
> > > 0       Current number lock objects.
> > > 6       Maximum number of lock objects so far.
> > > 318     Number of lock requests.
> > > 318     Number of lock releases.
> > > 0       Number of lock requests that would have waited.
> > > 0       Number of lock conflicts.
> > > 0       Number of deadlocks.
> > > 0       Number of transaction timeouts.
> > > 0       Number of lock timeouts.
> > > 352KB   Lock region size (360448 bytes).
> > > 0       The number of region locks granted after waiting.
> > > 979     The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL uid=swilton
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: uid=swilton
> > > # requesting: ALL
> > > #
> > >
> > > # swilton, People, vision, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=vision,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: top
> > > objectClass: shadowAccount
> > > objectClass: eftelradiusSimpleProfile
> > > uid: swilton
> > > cn: swilton
> > > mail: swilton@vision.net.au
> > > shadowLastChange: 11529
> > > shadowWarning: 7
> > > uidNumber: 43056
> > > gidNumber: 118
> > > homeDirectory: /home/au/net/vision/s/swilton
> > > gecos: Internal User
> > > loginShell: /bin/nologin
> > >
> > > # swilton, People, q-net, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: swilton
> > > uid: swilton
> > > homeDirectory: /home/au/net/q-net/s/swilton
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Steven Wilton
> > > mail: swilton@q-net.net.au
> > > uidNumber: 16273
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # swilton, People, eftel, net, au, EFTEL
> > > dn: uid=swilton,ou=People,dc=eftel,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: swilton
> > > uid: swilton
> > > homeDirectory: /home/au/net/eftel/s/swilton
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Steven Wilton
> > > mail: swilton@eftel.net.au
> > > uidNumber: 1908
> > > gidNumber: 100
> > > loginShell: /bin/nologin
> > > eftelradiusCheckItem: Simultaneous-Use=0
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 4
> > > # numEntries: 3
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 108 Last allocated locker ID.
> > > 9       Number of lock modes.
> > > 1000    Maximum number of locks possible.
> > > 1000    Maximum number of lockers possible.
> > > 1000    Maximum number of objects possible.
> > > 26      Current locks.
> > > 29      Maximum number of locks so far.
> > > 66      Current number of lockers.
> > > 67      Maximum number  lockers so far.
> > > 0       Current number lock objects.
> > > 6       Maximum number of lock objects so far.
> > > 327     Number of lock requests.
> > > 327     Number of lock releases.
> > > 0       Number of lock requests that would have waited.
> > > 0       Number of lock conflicts.
> > > 0       Number of deadlocks.
> > > 0       Number of transaction timeouts.
> > > 0       Number of lock timeouts.
> > > 352KB   Lock region size (360448 bytes).
> > > 0       The number of region locks granted after waiting.
> > > 1002    The number of region locks granted without waiting.
> > >
> > >
> > >
> > >  From what I can tell on a variety of other queries, using a filter of
> > > uid=x or uidnumber=x result in an increase of 1x lock being used but
> > not
> > > freed.  Using the (&(objectclass=x)(uidnumber=y)) filer, 5x lockers and
> > 1x
> > > lock are used but not freed.  Interestingly enough, searching on an
> > > unindexed field uses no extra resources (but takes over 1 minute to
> > > complete) as follows:
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > > 109 Last allocated locker ID.
> > > 9       Number of lock modes.
> > > 1000    Maximum number of locks possible.
> > > 1000    Maximum number of lockers possible.
> > > 1000    Maximum number of objects possible.
> > > 27      Current locks.
> > > 30      Maximum number of locks so far.
> > > 66      Current number of lockers.
> > > 67      Maximum number  lockers so far.
> > > 0       Current number lock objects.
> > > 6       Maximum number of lock objects so far.
> > > 334     Number of lock requests.
> > > 334     Number of lock releases.
> > > 0       Number of lock requests that would have waited.
> > > 0       Number of lock conflicts.
> > > 0       Number of deadlocks.
> > > 0       Number of transaction timeouts.
> > > 0       Number of lock timeouts.
> > > 352KB   Lock region size (360448 bytes).
> > > 0       The number of region locks granted after waiting.
> > > 1021    The number of region locks granted without waiting.
> > >
> > > sv1:~# ldapsearch -b o=EFTEL mail=prolfe@q-net.net.au
> > >
> > > # extended LDIF
> > > #
> > > # LDAPv3
> > > # filter: mail=prolfe@q-net.net.au
> > > # requesting: ALL
> > > #
> > >
> > > # prolfe, People, q-net, net, au, EFTEL
> > > dn: uid=prolfe,ou=People,dc=q-net,dc=net,dc=au,o=EFTEL
> > > objectClass: posixAccount
> > > objectClass: shadowAccount
> > > objectClass: top
> > > objectClass: eftelradiusSimpleProfile
> > > cn: prolfe
> > > uid: prolfe
> > > homeDirectory: /home/au/net/q-net/p/prolfe
> > > shadowWarning: 7
> > > shadowLastChange: 10000
> > > gecos: Name Withheld
> > > mail: prolfe@q-net.net.au
> > > uidNumber: 20678
> > > gidNumber: 200
> > > loginShell: /bin/nologin
> > >
> > > # search result
> > > search: 2
> > > result: 0 Success
> > >
> > > # numResponses: 2
> > > # numEntries: 1
> > >
> > > sv1:~# db4.0_stat -c -h /var/lib/ldap/
> > >
> > > 110 Last allocated locker ID.
> > > 9       Number of lock modes.
> > > 1000    Maximum number of locks possible.
> > > 1000    Maximum number of lockers possible.
> > > 1000    Maximum number of objects possible.
> > > 27      Current locks.
> > > 30      Maximum number of locks so far.
> > > 66      Current number of lockers.
> > > 67      Maximum number  lockers so far.
> > > 0       Current number lock objects.
> > > 6       Maximum number of lock objects so far.
> > > 68181   Number of lock requests.
> > > 68181   Number of lock releases.
> > > 0       Number of lock requests that would have waited.
> > > 0       Number of lock conflicts.
> > > 0       Number of deadlocks.
> > > 0       Number of transaction timeouts.
> > > 0       Number of lock timeouts.
> > > 352KB   Lock region size (360448 bytes).
> > > 0       The number of region locks granted after waiting.
> > > 114101  The number of region locks granted without waiting.
> > >
> > >
> > >
> > > Indexes are as follows:
> > >
> > > index   objectClass             pres,eq
> > > index   cn,sn,uid               eq
> > > index   uidNumber,gidNumber,memberUid   eq
> > >
> > >
> > > during the above test queries, no other queries were sent to the ldap
> > > server from any other program.
> > >
> > > Should I open a new its with the above info?
> > >
> > > thanks
> > >
> > > Steven
> > >
> > > ps.once we run out of lock_id's, the new LOCK_ID result checking code
> > > causes failed ldap lookups as I expected, but we do not go into an
> > endless
> > > loop (which is good).  If we fix the above resource leak, we should
> > never
> > > reach a point where we run out of lock_id's which is the result we are
> > > really after.
> > >
> > >
> >
>
>