[Date Prev][Date Next] [Chronological] [Thread] [Top]

troubleshooting assitance - long-ish



I need some help, folks.

We have a corporate LDAP Infrastructure, which is comprised of one master, 
which we replicate to a dozen or so LDAP servers.

One server, all of a sudden, became swamped. 100% CPU utilization, and memory 
became pegged, as well.

One of the roads I traveled was to see what machines were connecting to our 
server. netstat -na revealed about 14 servers, which were connected to the 
LDAP server, that we had no idea were dependant on our LDAP.

Since the server problems came with serious consequence to the corporate user 
community, I decided to IP filter all of these unknown servers out, and 
contact the owners to let them know that I did this.

Low and behold, blocking these servers stabilized the machine.

I'm currently crying "foul", because these pinheads who rolled out 
applications which were dependant on our LDAP, did so without our knowledge, 
thus the blindsided nature of the problem.

My question:

Now that the owners of the apps that I have blocked have all cried in unison, 
"we're not doing anything that would cause....blah, blah, blah", I need to 
pinpoint the exact cause, whether it be the aggregate of all these "surprise" 
servers, or whether it is one that is causing the problems.

Limiting the number of connections to the LDAP didn't help too much. I began 
adding each blocked server, one by one. What happened at that point, was we 
noticed another "new" server which was pounding the LDAP server.

What can I do to definitively drill-down to the root cause of this problem.

Any help would be appreciated.