[Date Prev][Date Next] [Chronological] [Thread] [Top]

Unexpected server freeze



I'm suffering from repeated unexpected server freeze with slapd. The process is still running, but any query attempt receive no answer.

On the client side, I've been unable to make nss, pam or even ldapsearch able to survive such service unavailibility, despite the use of strict timeouts in their respective configuration files :(

Here is the result of ldapsearch -x -d 1, on the host running the server:

-- ldap_create
ldap_bind_s
ldap_simple_bind_s
ldap_sasl_bind_s
ldap_sasl_bind
ldap_send_initial_request
ldap_new_connection
ldap_int_open_connection
ldap_connect_to_host: TCP localhost:389
ldap_new_socket: -1
ldap_new_socket: 3
ldap_prepare_socket: 3
ldap_connect_to_host: Trying 127.0.0.1:389
ldap_connect_timeout: fd: 3 tm: -1 async: 0
ldap_ndelay_on: 3
ldap_is_sock_ready: 3
ldap_ndelay_off: 3
ldap_int_sasl_open: host=yquem.inria.fr
ldap_open_defconn: successful
ldap_send_server_request
ber_flush: 14 bytes to sd 3
ldap_result msgid 1
ldap_chkResponseList for msgid=1, all=1
ldap_chkResponseList returns NULL
wait4msg (infinite timeout), msgid 1
wait4msg continue, msgid 1, all 1
** Connections:
* host: localhost  port: 389  (default)
  refcnt: 2  status: Connected
  last used: Tue Jan 18 10:21:11 2005

** Outstanding Requests:
 * msgid 1,  origid 1, status InProgress
   outstanding referrals 0, parent count 0
** Response Queue:
   Empty
ldap_chkResponseList for msgid=1, all=1
ldap_chkResponseList returns NULL
ldap_int_select


And here the result of stracing the previous call, showing that actually data are read but nothing is printed:
read(3, " # miriad # Edmonde.Dute"..., 4096) = 4096
read(3, " # Edmonde.Duteurtre\n193.51.1"..., 4096) = 4096
read(3, "emo44.inria.fr expo-demo44 \t# mi"..., 4096) = 1296
read(3, "", 4096) = 0
close(3) = 0
munmap(0x4001d000, 4096) = 0
socket(PF_INET6, SOCK_STREAM, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol)
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
connect(3, {sin_family=AF_INET, sin_port=htons(389), sin_addr=inet_addr("127.0.0.1")}}, 16) = -1 EINPROGRESS (Operation now in progress)
select(1024, NULL, [3], NULL, NULL) = 1 (out [3])
getpeername(3, {sin_family=AF_INET, sin_port=htons(389), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
fcntl64(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl64(3, F_SETFL, O_RDWR) = 0
getpeername(3, {sin_family=AF_INET, sin_port=htons(389), sin_addr=inet_addr("127.0.0.1")}}, [16]) = 0
uname({sys="Linux", node="yquem", ...}) = 0
time(NULL) = 1106040113
write(3, "0\f\2\1\1`\7\2\1\3\4\0\200\0", 14) = 14
select(1024, [3], [], NULL, NULL <unfinished ...>


I'm running a backported slapd 2.1.30 on stable Debian.
--
At the source of every error that is blamed on the computer, you will find atleast two human errors, including the error of blaming it on the computer
-- SNAFU Equations (JB's Scholastic Laws) n°1