OpenLDAP
Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest

Viewing Incoming/6920
Full headers

From: mslby@deshaw.com
Subject: OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86 slapd hangs
Compose comment
Download message
State:
0 replies:
4 followups: 1 2 3 4

Major security issue: yes  no

Notes:

Notification:


Date: Thu, 28 Apr 2011 18:05:43 +0000
From: mslby@deshaw.com
To: openldap-its@OpenLDAP.org
Subject: OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86 slapd hangs
Full_Name: Mark Selby
Version: 2.4.25
OS: Solaris 10 x86
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (149.77.104.214)


My company uses OpenLDAP 2.4.25 with Berkeley DB 4.8.30 compiled on Solaris
10 x86 using Sun Studio. OpenLDAP is used as the backend for generic naming
services (passwd, group, netgroup etc?) as well as holding mail routing and
some custom data. We have master and slave servers and are using syncrepl
refresh and persist.

Lately we have been experiencing hangs with slapd and I cannot figure out
what the cause is. Things will be humming along and then slapd will simply
stop accepting connections and answering any requests for reads/writes. We
have set the loglevel to 256 and there is nothing in the logs that
indicates what the issue it. At the time that the process goes catatonic all
syslogs from slapd stop

I have not found a way to reproduce this on demand.

Today we have had three hangs and the truss ouput is exactly the same on
all process. Once the slapd process gets in this state a simple kill does
not work. The syslog always says that slapd is waiting for tasks to
complete but this never happens. I need to kill -9 the pid.

I am going to include all of the debug info that I have collected and
hopefully someone will have some idea what is going on. I also have a gcore
of the process if anyone wants and info from that

All and any help is greatly appreciated

###########
# version #
###########
@(#) $OpenLDAP: slapd 2.4.25 (Mar 28 2011 16:47:13) $
        
Included static backends:
    config
    ldif
    monitor
    bdb
    hdb
    ldap
    relay


#############
# DB_CONFIG # 
#############
set_cachesize 0 1073741824 1

###########################
# db parts of slapd.conf  #
###########################
cachesize       100000
checkpoint      512 720

################
# truss output #
################
12612/17:       lwp_cond_wait(0xFFFFFD7FFC8B2C38, 0xFFFFFD7FFC8B2C20,
0x00000000, 1) (sleeping...)
12612/10:       lwp_cond_wait(0xFFFFFD7FFC8B41C8, 0xFFFFFD7FFC8B41B0,
0x00000000, 1) (sleeping...)
12612/2:        pollsys(0xFFFFFD7FBB3FE510, 59, 0xFFFFFD7FBB3FF290, 0x00000000)
(sleeping...)
12612/15:       lwp_cond_wait(0xFFFFFD7FFC8ABF78, 0xFFFFFD7FFC8ABF60,
0x00000000, 1) (sleeping...)
12612/14:       lwp_cond_wait(0xFFFFFD7FFC895610, 0xFFFFFD7FFC8955F8,
0x00000000, 1) (sleeping...)
12612/16:       lwp_cond_wait(0xFFFFFD7FFC86DA70, 0xFFFFFD7FFC86DA58,
0x00000000, 1) (sleeping...)
12612/18:       lwp_cond_wait(0xFFFFFD7FFC888B18, 0xFFFFFD7FFC888B00,
0x00000000, 1) (sleeping...)
12612/1:        lwp_wait(2, 0xFFFFFD7FFFDFFBE4) (sleeping...)
12612/8:        lwp_cond_wait(0xFFFFFD7FFC8C7BD8, 0xFFFFFD7FFC8C7BC0,
0x00000000, 1) (sleeping...)
12612/6:        lwp_cond_wait(0xFFFFFD7FFC8AFF38, 0xFFFFFD7FFC8AFF20,
0x00000000, 1) (sleeping...)
12612/3:        lwp_cond_wait(0xFFFFFD7FFC8B0208, 0xFFFFFD7FFC8B01F0,
0x00000000, 1) (sleeping...)
12612/12:       lwp_cond_wait(0xFFFFFD7FFC8B0118, 0xFFFFFD7FFC8B0100,
0x00000000, 1) (sleeping...)
12612/7:        lwp_cond_wait(0xFFFFFD7FFC918A10, 0xFFFFFD7FFC9189F8,
0x00000000, 1) (sleeping...)
12612/11:       lwp_cond_wait(0xFFFFFD7FFC7055C0, 0xFFFFFD7FFC7055A8,
0x00000000, 1) (sleeping...)
12612/13:       lwp_cond_wait(0xFFFFFD7FFC901F40, 0xFFFFFD7FFC901F28,
0x00000000, 1) (sleeping...)
12612/5:        lwp_cond_wait(0xFFFFFD7FFC8BCC10, 0xFFFFFD7FFC8BCBF8,
0x00000000, 1) (sleeping...)
12612/9:        lwp_cond_wait(0xFFFFFD7FFC8684A8, 0xFFFFFD7FFC868490,
0x00000000, 1) (sleeping...)
12612/4:        lwp_cond_wait(0xFFFFFD7FFC86C7B0, 0xFFFFFD7FFC86C798,
0x00000000, 1) (sleeping...)

################
# db_stat -C A #
################
Default locking region information:
323     Last allocated locker ID
0x7fffffff      Current maximum unused locker ID
9       Number of lock modes
1000    Maximum number of locks possible
1000    Maximum number of lockers possible
1000    Maximum number of lock objects possible
80      Number of lock object partitions
60      Number of current locks
795     Maximum number of locks at any one time
17      Maximum number of locks in any one bucket
38      Maximum number of locks stolen by for an empty partition
12      Maximum number of locks stolen for any one partition
333     Number of current lockers
333     Maximum number of lockers at any one time
40      Number of current lock objects
457     Maximum number of lock objects at any one time
4       Maximum number of lock objects in any one bucket
0       Maximum number of objects stolen by for an empty partition
0       Maximum number of objects stolen for any one partition
20686M  Total number of locks requested (20686026845)
20686M  Total number of locks released (20686020297)
0       Total number of locks upgraded
34      Total number of locks downgraded
6904    Lock requests not available due to conflicts, for which we waited
6439    Lock requests not availa

Message of length 130352 truncated

Followup 1

Download message
Date: Sat, 30 Apr 2011 18:59:43 -0700
From: Quanah Gibson-Mount <quanah@zimbra.com>
To: openldap-its@openldap.org, mselby@deshaw.com
Subject: Re: (ITS#6920) OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86
 slapd hangs
--On Thursday, April 28, 2011 6:05 PM +0000 mslby@deshaw.com wrote:

> Full_Name: Mark Selby
> Version: 2.4.25
> OS: Solaris 10 x86
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (149.77.104.214)
>
>
> My company uses OpenLDAP 2.4.25 with Berkeley DB 4.8.30 compiled on
> Solaris 10 x86 using Sun Studio. OpenLDAP is used as the backend for
> generic naming services (passwd, group, netgroup etc?) as well as holding
> mail routing and some custom data. We have master and slave servers and
> are using syncrepl refresh and persist.

Correcting the from address (mselby@deshaw.com is correct).

I will note that there have been numerous issues IIRC reported with using 
Solaris 10 and OpenLDAP, due to a kernel bug in Solaris.

--Quanah


--

Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration



Followup 2

Download message
Date: Sat, 30 Apr 2011 21:05:48 -0700
From: Howard Chu <hyc@symas.com>
To: mslby@deshaw.com
CC: openldap-its@openldap.org
Subject: Re: (ITS#6920) OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86
 slapd hangs
mslby@deshaw.com wrote:
> Full_Name: Mark Selby
> Version: 2.4.25
> OS: Solaris 10 x86
> URL: ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (149.77.104.214)
>
>
> My company uses OpenLDAP 2.4.25 with Berkeley DB 4.8.30 compiled on Solaris
> 10 x86 using Sun Studio. OpenLDAP is used as the backend for generic naming
> services (passwd, group, netgroup etc?) as well as holding mail routing and
> some custom data. We have master and slave servers and are using syncrepl
> refresh and persist.
>
> Lately we have been experiencing hangs with slapd and I cannot figure out
> what the cause is. Things will be humming along and then slapd will simply
> stop accepting connections and answering any requests for reads/writes. We
> have set the loglevel to 256 and there is nothing in the logs that
> indicates what the issue it. At the time that the process goes catatonic
all
> syslogs from slapd stop
>
> I have not found a way to reproduce this on demand.
>
> Today we have had three hangs and the truss ouput is exactly the same on
> all process. Once the slapd process gets in this state a simple kill does
> not work. The syslog always says that slapd is waiting for tasks to
> complete but this never happens. I need to kill -9 the pid.
>
> I am going to include all of the debug info that I have collected and
> hopefully someone will have some idea what is going on. I also have a gcore
> of the process if anyone wants and info from that
>
> All and any help is greatly appreciated

This appears to be a dup of ITS#6833. Your OS is broken. Closing this ITS.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/



Followup 3

Download message
From: "Selby, Mark" <Mark.Selby@deshaw.com>
To: "'openldap-its@OpenLDAP.org'" <openldap-its@OpenLDAP.org>
CC: "Selby, Mark" <Mark.Selby@deshaw.com>
Date: Mon, 2 May 2011 14:16:05 -0400
Subject: Re: (ITS#6920) OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86
 slapd hangs
--_000_7A74C7BE829D9D478879E3DBC0EFDA732B395EE5NYCMBX1winmaild_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

"FWIW, we don't think this is the same problem. All the threads except the =
main one and the one blocking on select(), are blocked on pthread_cond_wait=
() - and, according to our analysis of the core file, while they all are us=
ing different mutexes, they are all waiting on the same condition variable.=
"

Is there anything that we can do to have you look at the issue closer?


--_000_7A74C7BE829D9D478879E3DBC0EFDA732B395EE5NYCMBX1winmaild_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Exchange Server">
<!-- converted from rtf -->
<style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt;
border-left:=
 #800000 2px solid; } --></style>
</head>
<body>
<font face=3D"Calibri, sans-serif" size=3D"2">
<div><font color=3D"#1F497D">&#8220;FWIW, we don&#8217;t
think this is the =
same problem. All the threads except the main one and the one blocking on <=
i>select(),</i> are blocked on <i>pthread_cond_wait()
</i>&#8211; and, acco=
rding to our analysis of the core file, while they all
are using different mutexes, they are all waiting on the same condition var=
iable.&#8221;</font></div>
<div>&nbsp;</div>
<div><font face=3D"Courier New, monospace" size=3D"2">Is there
anything tha=
t we can do to have you look at the issue closer?</font></div>
<div>&nbsp;</div>
</font>
</body>
</html>

--_000_7A74C7BE829D9D478879E3DBC0EFDA732B395EE5NYCMBX1winmaild_--



Followup 4

Download message
Date: Tue, 24 May 2011 22:06:54 +0900
From: SATOH Fumiyasu <fumiyas@osstech.jp>
To: mslby@deshaw.com
Cc: openldap-its@openldap.org
Subject: Re: (ITS#6920) OpenLDAP 2.4.25 / Berkeley DB 4.8.30 Solaris 10 x86 slapd hangs
At Thu, 28 Apr 2011 18:05:43 GMT,
mslby@deshaw.com wrote:
> My company uses OpenLDAP 2.4.25 with Berkeley DB 4.8.30 compiled on Solaris
> 10 x86 using Sun Studio. OpenLDAP is used as the backend for generic naming
> services (passwd, group, netgroup etc?) as well as holding mail routing and
> some custom data. We have master and slave servers and are using syncrepl
> refresh and persist.

Are you using Solaris nss_ldap and ldap_cachemgr(1M) on your
Solaris 10 with OpenLDAP slapd? If so, Solaris libldap.so breaks
your slapd process as the following scenario:

  (1) slapd calls some name service functions, e.g. getpwnam(3C).
  (2) Solaris nss_ldap (/usr/lib/nss_ldap.so.1) is loaded.
  (3) Solaris libldap.so (/usr/lib/libldap.so.5) is loaded.
  (4) Solaris libldap.so overrides ldap_*() and ber_*() functions
      in slapd (or OpenLDAP libldap_r.so and liblber.so).
  (5) OpenLDAP calls ldap_*() and ber_*() functions that are
      belong to Solaris libldap.so, not OpenLDAP's one.

See also:

  http://www.openldap.org/lists/openldap-technical/200902/msg00000.html
  https://bugzilla.mozilla.org/show_bug.cgi?id=292127
  http://bugzilla.padl.com/show_bug.cgi?id=203

-- 
-- Name: SATOH Fumiyasu (fumiyas @ osstech co jp)
-- Business Home: http://www.OSSTech.co.jp/
-- Personal Home: http://www.SFO.jp/blog/

> Lately we have been experiencing hangs with slapd and I cannot figure out
> what the cause is. Things will be humming along and then slapd will simply
> stop accepting connections and answering any requests for reads/writes. We
> have set the loglevel to 256 and there is nothing in the logs that
> indicates what the issue it. At the time that the process goes catatonic
all
> syslogs from slapd stop
> 
> I have not found a way to reproduce this on demand.
> 
> Today we have had three hangs and the truss ouput is exactly the same on
> all process. Once the slapd process gets in this state a simple kill does
> not work. The syslog always says that slapd is waiting for tasks to
> complete but this never happens. I need to kill -9 the pid.
> 
> I am going to include all of the debug info that I have collected and
> hopefully someone will have some idea what is going on. I also have a gcore
> of the process if anyone wants and info from that
> 
> All and any help is greatly appreciated


Up to top level
Build   Contrib   Development   Documentation   Historical   Incoming   Software Bugs   Software Enhancements   Web  

Logged in as guest


The OpenLDAP Issue Tracking System uses a hacked version of JitterBug

______________
© Copyright 2013, OpenLDAP Foundation, info@OpenLDAP.org