[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#5926) slapd proxying AD with back-meta locks up



mhardin@symas.com wrote:
> Full_Name: Matthew Hardin
> Version: 2.4.12
> OS: Red Hat Enterprise Linux 4 i686
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (74.38.114.185)
>
>
> Hi All,
>
> We are using a pair of OpenLDAP 2.4.12 servers with back-meta to proxy an active
> directory domain. The clients are all current versions of PADL's nss_ldap
> libraries.
>
> Every once in a while (sometimes twice a day, sometimes once every two weeks)
> one of the slapd servers will peg CPU use at 100% and stop answering requests.
> The only way to stop slapd is with a kill -9.
>
> There doesn't seem to be anything to explain the lockup or allow us to reproduce
> it. We are using redundant AD servers and they are not going offline. A third
> slapd server running as a test server using the same AD servers and configured
> identically but serving a much lighter nss_ldap load does not fail at all. We
> have ruled out hardware, OS, and connectivity as possible causes.
>
> We are unfortunately unable to attach gdb to the running processes, as these are
> production servers and need to be restarted immediately. Our smaller test system
> does not exhibit the same behavior, either. There is nothing unusual in the
> server logs, either. We do have core files generated from kill -6 commands, and
> they are all eerily similar to the back-trace below in that they have one or
> more threads waiting for a search or a bind response from AD.
>
> I am also enclosing relevant portions of slapd.conf for these systems. Please
> let me know if any additional information would be useful.
>
> Thanks,
>
> -Matt
>
> -----
>
>
> (gdb) thr apply all bt

> Thread 1 (process 29769):
> #0  0x005fa410 in __kernel_vsyscall ()
> #1  0x004ddd10 in raise () from /lib/libc.so.6
> #2  0x004df621 in abort () from /lib/libc.so.6
> #3  0x004d715b in __assert_fail () from /lib/libc.so.6
> #4  0x0806eec8 in slap_listener (sl=0x9583108)
>      at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1803
> #5  0x0806f643 in slap_listener_thread (ctx=0x4e92220, ptr=0x9583108)
>      at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1997
> #6  0x00a10783 in ldap_int_thread_pool_wrapper (xpool=0x959a010)
>      at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/libraries/libldap_r/tpool.c:663
> #7  0x0038a45b in start_thread () from /lib/libpthread.so.0
> #8  0x00585c4e in clone () from /lib/libc.so.6
> (gdb)

It seems you sent the wrong backtrace; this one doesn't show any signs of 
looping or anything that would indicate heavy CPU usage. It shows an assert 
which would kill the process, leading to 0% CPU usage. This assert was most 
likely fixed in 2.4.14.

> slapd.conf

> #######################################################################
> # bdb database definitions
> #######################################################################
> database        bdb
> suffix          "ou=nisdata"

> #######################################################################
> # Definitions for proxy and cache to AD
> #######################################################################
> database        meta
> suffix          "dc=my-customer,dc=com"

> # The link to AD:
> uri             ldaps://ldap-prd-dc01.my-customer.com/dc=ad,dc=my-customer,dc=com
> ldaps://ldap-prd-dc02.my-customer.com/

> # The link to the NIS data directory (yes, we could chain/glue, that's
> # for later)
> uri             ldapi://%2fvar%2fsymas%2frun%2fldapi/dc=nis,dc=my-customer,dc=com

Pointing back-meta at its own slapd will inevitably exhaust the thread pool 
since incoming operations will always use 2x the number of available threads.

This ITS will be closed.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/