[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: openldap hangs on bdb / database corruption



Hi Ingo,

as you mabye have already seen there are more people on the list having problems with
corrupted databases with different versions of OpenLDAP and BDB. Currently I'am loading
my directory with 500.000 entry's (OpenLDAP 2.2.13, BDB 4.2.52.2 on Redhat 3.0).
Yesterday I discoverd a new problem after starting OpenLDAP. A attribute defined in
a schema file couldn't suddenly be found anymore. And this after two weeks everything
run's fine. This is very frustrating since it is very hard to debug problems which appears
after days or weeks.
Since this problems are all have data corruption in common my assumption at the moment is
that it is maybe a threading problem. Especially with Redhat 3 and NTPL. But this is only
a assumption. So for the next test I will start OpenLDAP with the following env. set:
LD_ASSUME_KERNEL=2.4.19
I think it is also possible to disable threads with OpenLDAP and using processes instead.
But I don't know if this helps and if it makes sense. I'm also in the dark...


Cheers,
Robert

Ingo Steuwer wrote:

Am Fr, den 11.06.2004 schrieb Ingo Steuwer um 12:38:
[..]


|>From time to time the database seems to hang, ldapsearch gets no answers
| and even db4.2_stat hangs at some point (needs kill -9 then). In this
| cases I need to stop slapd and do a db4.2_recover.
|

Looks like you might be exceeding 2GB of transaction logs which are in use.


[..]


| relevant configruation-parts:
|
| slapd.conf:
| ---------------------------------------------------
| sizelimit               unlimited
| modulepath      /usr/lib/ldap
| moduleload      back_bdb.so
|
| database        bdb
|
| cachesize       500000
| index
| objectClass,uidNumber,gidNumber,memberUid,ou,uniqueMember pres,eq
| index           uid,cn,sn,givenName,mail,description,displayName
| pres,eq,sub
| index           sambaSID,sambaPrimaryGroupSID,sambaDomainName eq
| index           default sub
| ---------------------------------------------------
|

You don't appear to have a checkpoint setting, which would mean that all
your transaction logs are open (or something to that effect, based on
what I've seen).


That's true, I added "checkpoint 1024 1" now in slapd.conf. It may take
some days to see if it is the reason.

But is it true that the log-files which contains only 10MB each are held
open ? I thought this would be done only for the last=actual one, I
mean, for what reason should thy all be open ?

Well, as I have 211 logfiles now they are bigger than 2G all together.




It made it worse. As long as I have the checkpoint-Option set (I tried also checkpoint 2048 2) I had a cpu-eating slapd for _each_ ldapmodify. The modifications were done, but the correspondig slapd-process goes mad. strace shows me the already meantioned sched_yield()-calls, but the more mysterous is that such a process finishes after I try "ltrace -p" on it.

Well, and after all it corrupted my database again. Back to the defaults
and after doing some changes there is one CPU-eating slapd which will
not finish with ltrace (it gives me countless
"ldap_pvt_thread_yield(0xbfffdf48, 0x40009e90, 0, 0x6044a510,
0x40117f60)              = 0") but will also resist an
"/etc/init.d/slapd stop" so I have to do a "kill -9". After a new start
I get "Implementation Specific Error"s on an ldapdelete.

So db_recover is needed again.

I'd be glad for any other hint or correction.

Thanks Ingo Steuwer