[Date Prev][Date Next] [Chronological] [Thread] [Top]

99.9% CPU usage still, production system



Dear team,

Thank you to all the good suggestions to my previous questions.  I have
added ulimit -n 16000 to /etc/init.d/ldap, and have rebuilt the indexes
after stopping the system late at night.

However, I still have 99.9% CPU usage occasionally, and often students
cannot log in and get their home directories.  We are using LDAP as a
NIS replacement.  I am desperate, and have expended much energy in
this.  With linux-2.4.10-ac12 the slapd process no longer dies, but the
system is not available to our students at the start of each laboratory
session.

Hardware: Acer Altos 2 x PIII 800MHz, with two 72GB, one 32GB scsi hard
disks on aic7xxx
Software: RH 7.1 with all updates, kernel 2.4.10-ac12, recent mount from
rawhide, openldap-2.0.15-2 from rawhide (all rawhide software rebuilt on
this machine), LVM, ext3, replacing reiserfs to get nfs
(apparently) working properly (at last!)

**************************************************
Relevant Question:
**************************************************
We are using student numbers as user IDs, but I read chkname.c in
shadow-utils, which says that user names must begin with a letter.
Could this be having any impact on the server?  What are the
implications of using all-digit login user IDs?
**************************************************

I don't know what log data to send to the list.  I turned on logging to
33, which resulted in nearly two gigabytes of data in a couple of days.
This also caused syslog CPU usage to go above 70%.  What part of this
2GB should I send?  What other info should I send?  I can tell more, but
what do you need to know?

Here are the indexes from slapd.conf:
# Indices to maintain
index   objectClass,uid,uidNumber,gidNumber,memberUid   eq
index   cn,mail,surname,givenname                       eq,subinitial

Here is a relevant part of the output of the perl program I wrote that
parses the output of the top -b -n1 cron job log:
Sat Oct 20 12:25:00 HKT 2001: slapd has 47.4% CPU usage
Sat Oct 20 12:47:22 HKT 2001: slapd has 99.9% CPU usage
Sat Oct 20 12:50:01 HKT 2001: slapd has 99.9% CPU usage
Sat Oct 20 12:51:00 HKT 2001: slapd has 99.7% CPU usage
Sat Oct 20 12:52:00 HKT 2001: slapd has 99.9% CPU usage
Sat Oct 20 12:52:59 HKT 2001: slapd has 99.3% CPU usage

Here is a sample from our log after commenting out the loglevel in our
slapd.conf, at the start of a period of 99.9% CPU usage:

> Oct 20 12:46:21 ictlab slapd[10515]: daemon: conn=7804 fd=124 connection from IP=172.19.125.134:1037 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:46:21 ictlab slapd[10515]: conn=7804 op=0 BIND dn="" method=128
> Oct 20 12:46:21 ictlab slapd[10515]: conn=7804 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:46:21 ictlab slapd[10515]: conn=7804 op=1 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(objectClass=posixAccount)"
> Oct 20 12:47:22 ictlab slapd[10515]: send_ldap_response: ber write failed
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=124 closed
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7805 fd=124 connection from IP=172.19.126.132:1039 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=21 closed
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7659 op=14 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(&(objectClass=posixAccount)(uidNumber=13412))"
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7659 op=14 SEARCH RESULT tag=101 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7784 op=7 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(&(objectClass=posixAccount)(uid=ict01))"
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7784 op=7 SEARCH RESULT tag=101 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=120 closed
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=122 closed
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7806 fd=21 connection from IP=172.19.125.134:1038 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7805 op=0 BIND dn="OU=AUTO.PRACTICAL,DC=TYICT,DC=VTC,DC=EDU,DC=HK" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7805 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=124 closed
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7807 fd=120 connection from IP=172.19.125.213:1095 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7806 op=0 BIND dn="" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7806 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7806 op=2 UNBIND
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=21 closed
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7808 fd=21 connection from IP=172.19.125.134:1039 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7807 op=0 BIND dn="" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7807 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7807 op=2 UNBIND
> Oct 20 12:47:22 ictlab slapd[10515]: conn=-1 fd=120 closed
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7809 fd=120 connection from IP=172.19.64.52:49323 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7808 op=0 BIND dn="" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7808 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7810 fd=122 connection from IP=172.19.125.134:1040 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7809 op=0 BIND dn="" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7809 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7809 op=1 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(uid=root)"
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7809 op=1 SEARCH RESULT tag=101 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7811 fd=124 connection from IP=172.19.125.213:1096 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7808 op=1 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(&(objectClass=posixGroup)(memberUid=ict01))"
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7808 op=1 SEARCH RESULT tag=101 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7659 op=15 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(&(objectClass=posixAccount)(uidNumber=13412))"
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7659 op=15 SEARCH RESULT tag=101 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7810 op=0 BIND dn="" method=128
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7810 op=0 RESULT tag=97 err=0 text=
> Oct 20 12:47:22 ictlab slapd[10515]: daemon: conn=7812 fd=125 connection from IP=172.19.125.142:1024 (IP=0.0.0.0:34049) accepted.
> Oct 20 12:47:22 ictlab slapd[10515]: conn=7809 op=2 SRCH base="dc=tyict,dc=vtc,dc=edu,dc=hk" scope=2 filter="(&(objectClass=posixGroup)(memberUid=root))"
> [
>

I also have a cron job running top -b -n1 every minute to a log file.  I
have written a simple perl script to determine when the usage went above
any threshold from the top -b -n1 log file,; part of the output is
above.  I would be very grateful for any suggestions.  My colleagues are
looking for solutions other than OpenLDAP.  I would really like our use
of OpenLDAP to be a success.

I am showing the technicians how to turn on nscd on the client machines,
and I will also ask them to change the host name on our hundreds of
clients to one I can round-robin in DNS in combination with replicas.
These will help, but I am sure that OpenLDAP is capable of much more
than this; something is configured wrongly, but I need pointers to
identifying the wrong configuration.  Should I send my slapd.conf file?
Should I search for specific strings in my gigabytes of log files?
Note: our clients are RH 7.1 with updates up to about 3 weeks ago.

--
Nick Urbanik   RHCE                                  nicku@vtc.edu.hk
Dept. of Information & Communications Technology
Hong Kong Institute of Vocational Education (Tsing Yi)
Tel:   (852) 2436 8576, (852) 2436 8579          Fax: (852) 2436 8526
PGP: 53 B6 6D 73 52 EE 1F EE EC F8 21 98 45 1C 23 7B     ID: 7529555D