Re: (ITS#6322) slapd suddenly stops working, and starts using 100% CPU

I have now installed openldap 2.4.19 from source (default configuration except for --enable-crypt=yes).

# slapd -V
@(#) $OpenLDAP: slapd 2.4.19 (Dec  9 2009 22:46:15) $

But unfortunately, the bug is still there. Nothing has changed at all..
It has crashed 10 times since I did the migration 10 hours ago. It usually crashes 1-2 times at night, and 10-20 times during work hours (when the servers have more load).

When I migrated, I recreated the database by typing:
# oldbin/slapcat > db.ldif
# newbin/slapadd < db.ldif # (under the new openldap/slapd user)
So I can't imagine the database can be corrupt in any way, which I initially thought when I first sent this bug report.

It still dies on the same queries as before, in the middle of iterating through ou=users,.. or ou=groups,.., which both have ~1.8k entries. These queries comes from proftpd, which does a "getent passwd" and "getent group" every time a customer logs in.
I tried to reproduce this manually again, launching 8 processes, constantly querying ou=users and ou=groups, 2 using a unix socket locally and 6 using ldaps://hostname/ remotely, but it still won't break down when I do that. This causes *alot* more load than proftpd does, but the crashes seem to only happen "randomly".

Any suggestions on what I can do to figure out this problem? The LDAP server is live and in use in two servers with 2-3k users, so I can't mess with it *too* much.

On Fri, Oct 02, 2009 at 09:28:25PM +0000, quanah@zimbra.com wrote:
>--On Friday, October 02, 2009 8:54 PM +0000 hyc@symas.com wrote:
>> strace is useless. Use gdb and get a trace of all running threads when
>> this  occurs.
>> http://www.openldap.org/faq/data/cache/59.html
>You also need to test against a current release of OpenLDAP, like OpenLDAP
