Re: slapd 2.3.19 freezes up

Krishna Sivaramapuram wrote:
It looks to me that this fix is already in 2.3.19. I do see some references to BDB 4.3 having some issues... But the details are very sketchy. I'm not sure if people are seeing similar kind of issues...

There are additional related fixes that are only in 2.3.20.

The other possibility I'm thinking off is a possible bug in the Centos OS itself.

If I have to move to Solaris from Centos, I'm not sure if the DB files are completely portable... Sleepycat does say that the DB files are all portable. If anyone has tried this kind of stuff, I'd like to know about their experience.

As usual, the advice for migrating is to use slapcat/slapadd.

As of release 2.3 there are no byte-order dependencies in back-bdb/hdb's database files, and they can be moved freely among different machines, provided the word size is the same. (I.e., all 32 bit or all 64 bit.) There is no such portability for back-ldbm or anything in 2.2 and earlier.

Currently I'm not sure if any of these solutions will fix the problem.

It may be worthwhile to test your deployment on Solaris. I've found SMP support on the 2.6 kernel series to be pretty unstable; I'm now running 2.6.14 with additional hand-picked patches to keep my dual-core AMD64 running. Even then, I can reliably lock it up hard with a simple combination of test clients. Watching all the traffic on the linux-kernel mailing lists has not given me any sense that stability has improved in 2.6.15 or .16.


David Hawes wrote:
On Sunday 26 February 2006 00:47, Howard Chu wrote:
Krishna Sivaramapuram wrote:
I have a pretty serious issue in my environment... I've around 6
million nodes in my ldap tree... And this server is running on a box
with the following linux configuration...

Cent0S 4.0 (basically redhat enterprise 4.0)
upgraded to a smp kernel

We are using Openldap 2.3.19 with BDB 4.3.29.
Try 2.3.20. As noted many times on this list, BDB 4.3 is not recommended.

I have seen similar things in my production environment (4.2.52 + patches). The first time I noticed this all three nodes of a load balanced pool reached this state within 30 minutes after all the servers had been up for approximately 24 hours. It has happened much more sporadically since that time. The server would accept new connections, but nothing would be logged by slapd and no results would be returned.

I have been trying to collect more data on this problem to post it, but was wondering if perhaps this was a know issue and is fixed in 2.3.20 (maybe ITS#4385?).



