[Date Prev][Date Next] [Chronological] [Thread] [Top]

(ITS#5380) master slapd hangs when writing



Full_Name: Ali Pouya
Version: 2.3.36
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (145.242.11.4)


My directory is formed with 4 servers syncronized through syncrepl
(RefreshAndPersist).
THe directory contains about 10 million entries in the BDB back-end.
My slapd servers are configured with the default 16 threads.
I have a JAVA injector for write operations. The injector establishes 28
connections to the master and writes about 1400 entries per minute.

After several hours of activity the master hangs. It accepts TCP connections but
does not handle LDAP operations. LDAP requests remain hanged until interruption
by the client. The replicas are OK.

Is this problem already known ?
I checked the CHANGES file for the 2.3.40 release and did not find any trace of
a known problem.

I used strace to see what slapd does at hang (freeze) situation. It shows an
infinite  epoll loop in one of the threads.

I reproduce the result at the end of this ITS.

Thanks for your help
Best regards
Ali

====================

The strace result :


Process 16970 attached with 18 threads - interrupt to quit
[pid 16601] futex(0xa341abf8, FUTEX_WAIT, 16625, NULL <unfinished ...>
[pid 16625] time( <unfinished ...>
[pid 16626] futex(0xb7f4d3d0, FUTEX_WAIT, 9, NULL <unfinished ...>
[pid 16627] futex(0xb7f4c110, FUTEX_WAIT, 21, NULL <unfinished ...>
[pid 20161] futex(0xb7f53708, FUTEX_WAIT, 3, NULL <unfinished ...>
[pid 20162] futex(0xb7f4c9a8, FUTEX_WAIT, 19, NULL <unfinished ...>
[pid 31353] futex(0xb7f4b5bc, FUTEX_WAIT, 11, NULL <unfinished ...>
[pid 31354] futex(0xb7f4980c, FUTEX_WAIT, 1, NULL <unfinished ...>
[pid 31355] futex(0xb7f4d178, FUTEX_WAIT, 11, NULL <unfinished ...>
[pid  2229] futex(0xb7f4fd9c, FUTEX_WAIT, 1, NULL <unfinished ...>
[pid  2241] futex(0xb7f48cb8, FUTEX_WAIT, 5, NULL <unfinished ...>
[pid  2242] futex(0xb7f50440, FUTEX_WAIT, 3, NULL <unfinished ...>
[pid  2243] futex(0xb7f49d20, FUTEX_WAIT, 17, NULL <unfinished ...>
[pid 16228] futex(0xb7f523e4, FUTEX_WAIT, 17, NULL <unfinished ...>
[pid 16899] futex(0xb7f500bc, FUTEX_WAIT, 19, NULL <unfinished ...>
[pid 16916] futex(0xb7f4aa04, FUTEX_WAIT, 29, NULL <unfinished ...>
[pid 16969] futex(0xb7f52060, FUTEX_WAIT, 23, NULL <unfinished ...>
[pid 16970] futex(0xb7f4c87c, FUTEX_WAIT, 27, NULL <unfinished ...>
[pid 16625] <... time resumed> NULL)    = 1203063506
[pid 16625] epoll_wait(6, {{EPOLLERR|EPOLLHUP, {u32=153710248, u64=153710248}},
{EPOLLERR|EPOLLHUP, {u32=153710244, u64=15371
0244}}, {EPOLLERR|EPOLLHUP, {u32=153710232, u64=153710232}}, {EPOLLERR|EPOLLHUP,
{u32=153710220, u64=153710220}}, {EPOLLERR|E
POLLHUP, {u32=153710204, u64=153710204}}, {EPOLLERR|EPOLLHUP, {u32=153710196,
u64=153710196}}, {EPOLLERR|EPOLLHUP, {u32=15371
0224, u64=153710224}}, {EPOLLERR|EPOLLHUP, {u32=153710192, u64=153710192}},
{EPOLLERR|EPOLLHUP, {u32=153710180, u64=153710180
}}, {EPOLLERR|EPOLLHUP, {u32=153710176, u64=153710176}}, {EPOLLERR|EPOLLHUP,
{u32=153710164, u64=153710164}}, {EPOLLERR|EPOLL
HUP, {u32=153710160, u64=153710160}}, {EPOLLERR|EPOLLHUP, {u32=153710144,
u64=153710144}}, {EPOLLERR|EPOLLHUP, {u32=153710140
, u64=153710140}}, {EPOLLERR|EPOLLHUP, {u32=153710128, u64=153710128}},
{EPOLLERR|EPOLLHUP, {u32=153710116, u64=153710116}},
{EPOLLERR|EPOLLHUP, {u32=153710108, u64=153710108}}, {EPOLLERR|EPOLLHUP,
{u32=153710104, u64=153710104}}, {EPOLLERR|EPOLLHUP,
 {u32=153710100, u64=153710100}}, {EPOLLERR|EPOLLHUP, {u32=153710096,
u64=153710096}}, {EPOLLERR|EPOLLHUP, {u32=153710092, u6
4=153710092}}, {EPOLLERR|EPOLLHUP, {u32=153710088, u64=153710088}},
{EPOLLERR|EPOLLHUP, {u32=153710080, u64=153710080}}, {EPO
LLERR|EPOLLHUP, {u32=153710068, u64=153710068}}, {EPOLLERR|EPOLLHUP,
{u32=153710056, u64=153710056}}, {EPOLLERR|EPOLLHUP, {u3
2=153710052, u64=153710052}}, {EPOLLERR|EPOLLHUP, {u32=153710048,
u64=153710048}}, {EPOLLERR|EPOLLHUP, {u32=153710044, u64=15
3710044}}, {EPOLLERR|EPOLLHUP, {u32= ..........