[Date Prev][Date Next] [Chronological] [Thread] [Top]

(sometimes) slow ldap_bind



Hi!

We are using OpenLDAP 1.2.11 for authentication purposes for (amongst
others) our mail system (POP3/SMTP-AUTH).

Currently there are two mail servers as frontend to a shared storage,
which authenticate via LDAP. These are heavily loaded (about 500
POP3-Requests per minute each). These two servers use one
OpenLDAP-Server (a slave) with about 100.000 entries in the directory.

All Systems run on RedHat-Linux (POP-servers use 6.0, the LDAP-server
runs on 6.2).

We are encountering a strange problem:

Sometimes authentication can take up to and more than 90 seconds. My
co-worker, who is hacking the LDAP-authentication code into QPopper,
says, that the problem is not the ldap_search (which is provided with a
timeout), but with the ldap_bind.

We bind with the manager DN, issue a ldap_search_st for the UID
provided and retrieve the userpassword attribute (this is a bit of
legacy we carry around, as on other systems we retrieve additional
values, and only reuse that code).

We are setting ld->ld_options.ldo_tm_net to set a timeout for the
ldap_bind and fall back to a second slave if the first one doesn't
answer. The piece of code looks like this:

--- SNIP
if ((ld = ldap_init(ldapserver[i], 389)) != NULL) {
	mytv.tv_sec = 1;
	mytv.tv_usec = 0;
	ld->ld_options.ldo_tm_net = &mytv;
	lasttry = ldap_simple_bind_s(ld, MGR, PW);
}
--- SNAP

It has to be said, that the OpenLDAP library version on the mail servers is
OpenLDAP 2.0.7.

So what happens is, that the LDAP server sometimes builds up quite some
backlog on connections in state 'SYN_RECV'. From time to time, these
requests are very quickly done, and everything returns to normal. But,
seconds later, the backlog builds up again. When looking at the
POP-statistics, we see that most requests are done within a second or
two (which is fine), but then suddenly lots of authentication requests
take 90 seconds or longer.
Some profiling shows, that indeed the hang occurs in ldap_simple_bind_s.

The timeout does not seem to be honoured. Well, it is, if we deny
connections to LDAP port via ipchains, but not if 389 is reachable.

Does anyone have any suggestions for improving this situation? Is it an
OS problem? Should we use the same library version as the servers
version?

Thanks,
Heinz