[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: slapd hangs - subtree insert failed: -30995



Hello,

yesterday, the slapd hangs again, but this time the other master...

Here is the log from the time, the Server was hanging. But today, no
error was logged, not in Master 2 (ldap-02) nor Master 1 (not
included).

Jun 10 15:19:03 ldap-02 slapd[31642]: conn=1710241 fd=61 ACCEPT from
IP=19.168.1.10:45678 (IP=0.0.0.0:389)
Jun 10 15:19:03 ldap-02 slapd[31642]: conn=1710241 op=0 BIND
dn="cn=root,ou=admins,ou=root" method=128
Jun 10 15:19:03 ldap-02 slapd[31642]: conn=1710241 op=0 BIND
dn="cn=root,ou=admins,ou=root" mech=SIMPLE ssf=0
Jun 10 15:19:03 ldap-02 slapd[31642]: conn=1710241 op=0 RESULT tag=97
err=0 text=
Jun 10 15:19:03 ldap-02 slapd[31642]: conn=1710241 op=1 ADD
dn="cn=object1,cn=node10,cn=Room,cn=2,cn=1329915568-54148,cn=BBB,cn=AAA,cn=companies,ou=root"
Jun 10 15:29:28 ldap-02 slapd[31642]: daemon: shutdown requested and initiated.
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=206330 fd=9 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1000 fd=12 closed
(connection lost on write)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709749 fd=31 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709789 fd=34 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709851 fd=36 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709870 fd=38 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709888 fd=40 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709896 fd=41 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709914 fd=43 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709899 fd=44 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710145 fd=47 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709933 fd=48 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710075 fd=50 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1709981 fd=52 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710016 fd=54 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710150 fd=56 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710240 fd=59 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: conn=1710238 fd=60 closed (slapd shutdown)
Jun 10 15:29:28 ldap-02 slapd[31642]: slapd shutdown: waiting for 20
operations/tasks to finish
Jun 10 15:34:20 ldap-02 slapd[5030]: @(#) $OpenLDAP: slapd 2.4.30 $
  opensuse-buildservice@opensuse.org
Jun 10 15:34:20 ldap-02 slapd[5030]: daemon: IPv6 socket() failed
errno=97 (Address family not supported by protocol)
Jun 10 15:34:20 ldap-02 slapd[5037]: bdb_db_open: database "ou=root":
unclean shutdown detected; attempting recovery.
Jun 10 15:34:21 ldap-02 slapd[5037]: slapd starting
Jun 10 15:34:21 ldap-02 slapd[5037]: conn=1000 fd=13 ACCEPT from
IP=127.0.0.1:56444 (IP=0.0.0.0:389)
Jun 10 15:34:21 ldap-02 slapd[5037]: conn=1000 op=0 BIND dn="" method=128
Jun 10 15:34:21 ldap-02 slapd[5037]: conn=1000 op=0 RESULT tag=97
err=48 text=anonymous bind disallowed
Jun 10 15:34:21 ldap-02 slapd[5037]: conn=1000 op=1 UNBIND
Jun 10 15:34:21 ldap-02 slapd[5037]: conn=1000 fd=13 closed
Jun 10 15:34:22 ldap-02 slapd[5037]: conn=1001 fd=13 ACCEPT from
IP=192.168.1.10:48743 (IP=0.0.0.0:389)


As you can see, the ADD operation was the last operation on
(15:19:03). On 15:29:28, the responsible operation staff did reboot
the slapd. He took care, that no slapd process was running before he
started the slapd again.

We set "gentlehup on" on the config, so the server tried to answer all
pending request and close the connections..(Does the server answer
pending requests?).
But after starting, bdb_db_open says "unclean shutdown detected", so
it seems to me, that the server quit after time without fulfill all
requirements for a clean shutdown?


Additionally, I have running sar to collect informations about the
system activity (see below). Here we can see, that in the time range
(15:20-15:30) while the slapd was hanging, the slapd accepts new
connections (tcpsck / connection count is growing), but does not serve
answers (tcp-tw / no "closed" connection).


15:00:01       totsck    tcpsck    udpsck    rawsck   ip-frag    tcp-tw
15:05:01          165         8         5              0         0        11
15:10:01          164         7         5              0         0        17
15:15:01          166         9         5              0         0        14
15:20:01          196        79        5              0         0         0
15:25:01          194       166       5              0         0         0
15:30:01          159         4         5             0         0        16
15:35:01          162         6         5             0         0        14
15:40:01          167        11        5              0         0        27
15:45:01          167        11        5              0         0        28

During the whole time, the CPU was idling between 97% ... 100%

The Systems are running in production, so I can't make any tests. In
our test environment are no problems seen till now.
But there the load (ldap operations) is very low ..

The configuration for  larger IDL (see first posting), we running
since 2,5 years without problems by mostly the
same size of the directory.

Thanks Meike


2012/6/6 Meike Stone <meike.stone@googlemail.com>:
> Hello dear list,
>
> does anyone can help me?
>
> Kindly regards and thanks
>
> Meike
>
> 2012/6/1 Meike Stone <meike.stone@googlemail.com>:
>> Hello,
>>
>> after inserting (ADD) one object, I get following messages in the
>> logfile and the sapld hangs:
>>
>> Jun  1 09:02:24 ldap-01 slapd[8836]: conn=633789 op=1 ADD
>> dn="cn=3,cn=2,cn=node,cn=1,cn=BBB,cn=AAA,cn=companies,ou=root"
>> Jun  1 09:02:24 ldap-01 slapd[8836]: => bdb_idl_insert_key: c_get
>> failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30995)
>> Jun  1 09:02:24 ldap-01 slapd[8836]: => bdb_dn2id_add 0x205e7e:
>> subtree (cn=BBB,cn=AAA,cn=companies,ou=root) insert failed: -30995
>>
>> After this, I don't see any messages in the log till the staff was
>> initiating a stop/start:
>>
>> Jun  1 09:09:29 ldap-01 slapd[8836]: daemon: shutdown requested and initiated.
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633113 fd=9 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=405426 fd=12 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633787 fd=13 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=1011 fd=14 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=1013 fd=18 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=632703 fd=33 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=632710 fd=37 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=632883 fd=39 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=632762 fd=40 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633211 fd=41 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633735 fd=45 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=632829 fd=47 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633170 fd=48 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633200 fd=50 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: conn=633788 fd=55 closed (slapd shutdown)
>> Jun  1 09:09:29 ldap-01 slapd[8836]: slapd shutdown: waiting for 22
>> operations/tasks to finish
>> Jun  1 09:09:42 ldap-01 slapd[20945]: @(#) $OpenLDAP: slapd 2.4.30 $
>>   opensuse-buildservice@opensuse.org
>> Jun  1 09:09:43 ldap-01 slapd[20945]: slapd starting
>>
>> After the error message at 09:02:24 the slapd did not answer any request.
>> I cannot recover that problem in a test environment.
>>
>> The server is running in a MM environment (two masters), and the
>> server gets 200-1200 search request/s
>> Because of this high rate, we set "loglevel 0". Since we updated the
>> slapd to 2.4.30 (from 2.4.28) the server crashes/hangs about on times
>> a week.
>> Because of this we set loglevel 256 back again.
>>
>>
>> Would be very nice, if I can fix the problem, please help.
>>
>> Thanks in advance
>>
>> Meike
>>
>>
>> PS:
>> The slapd is a modified, self compiled version because of larger IDL
>> with following changes:
>>
>> openldap-2.4.30/servers/slapd/back-bdb/idl.h:
>> -#define        BDB_IDL_LOGN    16      /* DB_SIZE is 2^16, UM_SIZE is 2^17 */
>> +#define        BDB_IDL_LOGN    17      /* DB_SIZE is 2^17, UM_SIZE is 2^18 */
>>
>> openldap-2.4.30/include/ldap_pvt_thread.h:
>> -#      define LDAP_PVT_THREAD_STACK_SIZE ( 1 * 1024 * 1024 * sizeof(void *) )
>> +#      define LDAP_PVT_THREAD_STACK_SIZE ( 2 * 1024 * 1024 * sizeof(void *) )
>>
>> All tests where running well!