[Date Prev][Date Next] [Chronological] [Thread] [Top]

2.3.14 problem?



Has anyone seen a looping condition on 2.3.14 (and maybe lesser with 2.3.13) ?

I don't know how to form this problem into something that can be debugged.

Anyway, I had trouble with 2.3.14 on RHEL4, BerkelyDB.4.2 (patched)

Going back to a RedHat Linux 7 system, again with 4.2.52, which I typically use
to check out new releases, I ran some tests:

On 2.3.13, I ran this update of a prototype database:

# ./daily_load -u
5393 modified; 30 added; 36382 scanned; 59 deceased; 3:58 elapsed

Essentially, it reads the whole database, but only updates changed entries.

It used some small CPU time across the slapd processes:

# ps auxww|grep ldap
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     25639  0.0  9.9 431208 206056 ?     S    12:38   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25640  0.0  9.9 431208 206056 ?     S    12:38   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25641  0.4  9.9 431208 206056 ?     S    12:38   0:06 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25642  2.7  9.9 431208 206056 ?     S    12:38   0:39 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25644  1.8  9.9 431208 206056 ?     S    12:38   0:26 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25645  1.9  9.9 431208 206056 ?     S    12:38   0:28 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25646  2.0  9.9 431208 206056 ?     S    12:38   0:29 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25647  1.9  9.9 431208 206056 ?     S    12:38   0:28 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25648  1.7  9.9 431208 206056 ?     S    12:38   0:24 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     25649  1.9  9.9 431208 206056 ?     S    12:38   0:28 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf

Nothing else was hitting the server in this test environment.

On 2.3.14, the update from the same starting base seemed to have stalled, so I
control-C'd it:

# ./daily_load -u
SIGINT
218 modified; 0 added; 1476 scanned; 3 deceased; 3:32 elapsed

One of the server processes had run up lots of CPU time:

# ps auxww|grep ldap
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     11029  0.0  6.8 408704 141300 ?     S    12:19   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11030  0.0  6.8 408704 141300 ?     S    12:19   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11031 87.3  6.8 408704 141300 ?     S    12:19   3:19 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11032  0.5  6.8 408704 141300 ?     S    12:19   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11116  0.4  6.8 408704 141300 ?     S    12:20   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11117  0.5  6.8 408704 141300 ?     S    12:20   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11118  0.9  6.8 408704 141300 ?     S    12:20   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11119  0.3  6.8 408704 141300 ?     S    12:20   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11120  0.7  6.8 408704 141300 ?     S    12:20   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11121  0.3  6.8 408704 141300 ?     S    12:20   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11122  0.9  6.8 408704 141300 ?     S    12:20   0:02 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     11123  0.7  6.8 408704 141300 ?     S    12:20   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf

Running it again from the same initial database (slapad'd with the
2.3.14 routine) with the same changes:

# ./daily_load -u
SIGINT
245 modified; 0 added; 1674 scanned; 3 deceased; 4:03 elapsed

I thought that it stalled at a different point, but now I realize that it's
ticking along, but at a much slower rate than a healthy 2.3.13 had...

The CPU of that process does not continue to run up time after the client
termination.  Starting the update client again does resume the compute behavior
of that single process.  The other processes appear to gain only small amounts
of time.  Maybe they're on a path to run up as much time as the 2.3.13 setup,
but it's going to take a long time for them to get there.

The 2.3.13 update scanned 36,382 entries in 3:58 elapsed, or 152 per
second, let alone that it did 5,393 updates, which should be the more
time-consuming activity.  2.3.14 runs at 1 or 2 per second after it bogs down.

Running the update again, and letting it continue with some debugging turned on
in the client, I see that updates are occuring, and they're not returning
errors, it's just running slowly and that single server process is really
running up compute time.

Again, there are no other clients accessing the data, thus no contention.

I had seen some looping incidents on the 2.3.13 version, but only on the slave,
and had hoped for a fix in 2.3.14, but 2.3.14 has this problem worse than .13
had.  The situation was rare in .13, but it did happen.

Here's the condition much farther on.  The other server processes had gained a
little time, but it's really that single one that crunches time:

# ps auxww|grep ldap
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root     10122  0.0  6.9 408228 142788 ?     S    14:21   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10123  0.0  6.9 408228 142788 ?     S    14:21   0:00 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10124 96.5  6.9 408228 142788 ?     R    14:21  11:11 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10125  0.2  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10131  0.2  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10132  0.3  6.9 408228 142788 ?     S    14:21   0:02 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10134  0.1  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10135  0.1  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10136  0.1  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10137  0.1  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10138  0.2  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf
root     10139  0.1  6.9 408228 142788 ?     S    14:21   0:01 servers/slapd/slapd -f /usr/local/etc/openldap/slapd.utility5.alum.conf

I tried using -d-1 after restarting the server after the "slowness" problem had
started to be observed, but nothing unusual appeared in the debugging (to my
untrained eye, anyway).

Does anyone have a clue about how to set up a debugging situation?  Has anyone
else seen this?