[Date Prev][Date Next] [Chronological] [Thread] [Top]

slapd hangs on startup



Hi,

This issue has been bugging me for a while, but I can't find anything about it when googling.

I have a slapd 2.3.x server which has been taking longer and longer to start. Lately it has ben 45 min. for a restart.
With strace you can just see it waiting for a futex:

Process 22740 attached - interrupt to quit
futex(0x56274bd8, FUTEX_WAIT, 22742, NULL

... then suddenly it starts to listen and answer queries.

Now, I hoped my planed upgrade to 2.4.x would solve that, but alas!

I now have a running setup with 2.4.17 compiled with OpenSSL in mirrormode, mirroring cn=config and the primary database with TLS (with client certs and SASL EXTERNAL), running on Linux 2.6.18 on a "vserver".

And it still hangs on startup.

I would suspect that it has something to do with the vserver. One explanation would be if slapd tried to connect to it self via TCP, since the kernel just DROP packets to 127.0.0.1.

Another explantion would be that It can't gather enough entropy, but my 2.3..x setup didn't use TLS and I have checked the /dev/random /dev/urandom are world readable.

Looking at debug output at server-1 I see things like:

ber_flush2 failed errno=11 reason="Resource temporarily unavailable"

and:

connection_write(17): waking output for id=2
connection_get(17): got connid=2
connection_write(17): waking output for id=2
connection_get(17): got connid=2
connection_write(17): waking output for id=2
connection_get(17): got connid=2
connection_write(17): waking output for id=2
ber_flush2: 933 bytes to sd 17
send_search_entry: conn 2  ber write failed.
connection_close: conn=2 sd=17
connection_read(17): no connection!
connection_read(17): no connection!
connection_read(17): no connection!
connection_read(17): no connection!

Where sd 17 seems to be the last of 4 syncrepl connections.

lsof tells me:
slapd 25704 root 14u IPv4 852048 TCP s01:40400->s02:ldaps (ESTABLISHED) slapd 25704 root 15u IPv4 852057 TCP s01:40401->s02:ldaps (ESTABLISHED) slapd 25704 root 16u IPv4 852169 TCP s01:ldaps->s02:48705 (ESTABLISHED)

But the last (sd 17) seems to be closed again.

It seems syncrepl tried to get startet, (the other server is empty, since the database has just been loaded with slapdadd -w on server-1)
But it only mamanges to syncrepl the first 5 entries or so.

slapd -d 16384 output is:
slapd starting
do_syncrep2: rid=004 LDAP_RES_INTERMEDIATE - REFRESH_DELETE
do_syncrep2: rid=002 LDAP_RES_INTERMEDIATE - REFRESH_DELETE
send_search_entry: conn 2  ber write failed.
connection_read(17): no connection!
connection_read(17): no connection!
connection_read(17): no connection!
send_search_entry: conn 3  ber write failed.
connection_write(17): no connection!
send_search_entry: conn 4  ber write failed.
connection_write(17): no connection!
send_search_entry: conn 5  ber write failed.
connection_write(17): no connection!
....


What can slapd be waiting for?

/Peter