[Date Prev][Date Next] [Chronological] [Thread] [Top]

2.3.11 syncrepl failing - send_ldap_response: ber write failed



I have a SuSE 10.0 Master running 2.3.11 (from the RPMS at ftp://ftp.suse.com/pub/projects/OpenLDAP/2.3/i386/10.0) and a syncrepl slave on a busy mailserver (also SuSE 10.0/2.3.11), and configured Postfix/Courier/saslauthd/Amavis to look to the localhost replica for information.

This worked great for about two weeks but recently (maybe load related, getting busy again after the New Year) the syncrepl server is failing, messages in the logfile look like this (filtering out all the normal stuff)

Jan 4 12:28:17 coffee slapd[5203]: send_ldap_response: ber write failed
Jan 4 12:28:19 coffee slapd[5203]: send_ldap_response: ber write failed
Jan 4 12:28:18 coffee slapd[5203]: conn=1 op=40468 ABANDON msg=40468
Jan 4 12:28:19 coffee slapd[5203]: send_ldap_response: ber write failed
Jan 4 12:28:18 coffee slapd[5203]: send_ldap_response: ber write failed
Jan 4 12:28:19 coffee slapd[5203]: connection_read(43): no connection!
Jan 4 12:28:19 coffee slapd[5203]: connection_read(83): no connection!
Jan 4 12:28:19 coffee slapd[5203]: connection_input: conn=6036 deferring operation: binding
Jan 4 12:28:19 coffee slapd[5203]: connection_input: conn=6039 deferring operation: binding


but it isn't until a few minutes later that I see actual errors:

Jan 4 12:33:55 coffee saslauthd[30308]: user ldap_search_st() failed: Can't contact LDAP server
Jan 4 12:33:55 coffee saslauthd[30308]: Retrying authentication
Jan 4 12:33:55 coffee saslauthd[30308]: ldap_simple_bind() failed -1 (Can't contact LDAP server).


and in the maillog:

Jan 4 12:32:52 coffee postfix/smtpd[2940]: warning: dict_ldap_connect: Unable to bind to server ldap://ldaprepl:389 as cn=Coffee: -1 (Can't contact LDAP server)
Jan 4 12:32:59 coffee authdaemond: ldap_simple_bind_s failed: Can't contact LDAP server


etc. etc.

so it seems to run into a problem (write error?) that leads to a condition where it has to exit.

Load average during the time is not extreme, 1.15 on a dual-Xeon. There is a TINY bit of swap going on during this time, but I can't tell if it's prior to the event or after, when I restart LDAP. Oh, yes, incidentally restarting openldap fixes everything.

Q: Is it possible that logging itself is killing it? My logfile is about 600MB a day with just the default loglevel, so I don't want to turn on any extra logging to get more descriptive error messages, and in fact would rather turn DOWN logging. When it says "ber write failed", what is it trying to write - responses to a socket or updates to the database?