[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: both 8 hour 200 vu tests end in a server abort



Hi!

Maybe it would help a little bit if I include my config's in this discussion since
I have had the same problems as Trevor shortly. So here an extract of the
relevant settings in DB_CONFIG I'm experimenting with:


set_cachesize     0       524288000       0
set_lg_regionmax  1048576
set_lg_max        104857600
set_lg_bsize      2097152
set_tx_max        100

And the slapd.conf:
cachesize         10000
checkpoint        512     1

Let me note that I have already tried different settings. The reason that I have
chosen such a big lg_max setting is that during my tests I have done lot of updates
and modifies which causes the logs to grow quickly. Sometimes during my
tests it has happend that suddenly after the logswitch the new logfile are owned by
user root and OpenLDAP quitted (running as openldap). After increasing the logsize this
problem disappered. Oracle advices that a log switch should only occur once every 30 min.
I've taken this advice for OpenLDAP. Maybe lg_regionmax is now to short. I will
try to increase this next.
But all I've seen until now db_stat shows me that this parameters are ideal for my
env. Buffer hit ratio for every BDB is nearly 100%, no problems with too few
lockers, transaction count is fine, ... So I'am pretty sure that the reasons for
these database corruptions doesn't came from too few resources.
As I have written in an email earlier today my assumption at the moment is that
it is maybe a problem with threads. Maybe a env. setting like
LD_ASSUME_KERNEL=2.4.19
would help. But that's just a assumption. I really don't know how to debug this
problems since this problems happens days or weeks after starting OpenLDAP.
Could it be that we are using I/O subsystems which are simply "too fast"? I know
from our appserver that most problems only occur during heavy load and we often
have had problems with threads which caused memory leaks or crashes. Our developers
had a lot to do to debug this bugs.
Well... I'll further test different settings. But if somebody could give me some
help how to debug problems which occurs days or weeks after starting the application
maybe I could help the developers to find this bug's (if this problems are really
caused by bugs...).
Ah... My ldapadd with 500.000 entries finished at the moment. It crashed... Here
the output of the errorlog (OpenLDAP 2.2.13, BDB 4.2.52.2, Redhat 3 ES):
....
bdb(l=root): PANIC: fatal region error detected; run recovery
conn=2 op=440516 RESULT tag=105 err=80 text=internal error
conn=2 op=440516 RESULT tag=105 err=80 text=internal error
conn=2 op=440517 ADD dn="uid=440517,ou=icpuser,l=root"
bdb(l=root): PANIC: fatal region error detected; run recovery
....
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb(l=root): PANIC: fatal region error detected; run recovery
ch_malloc of 8388608 bytes failed
slapd: ch_malloc.c:62: ch_malloc: Assertion `0' failed.


Hmmm... Any ideas?

Cheers,
Robert


Trevor Warren wrote:

--- Howard Chu <hyc@symas.com> wrote:


You still have neglected to mention what resource
limits are in effect for the actual slapd process. As such, there is no
evidence that you have set up your machine correctly. Swearing/cursing
isn't going to improve the situation.


[snip]



Hmm...M/c setup. Logs are being written to another


machine over a mounted partition.

################################
# DB_Config is as follows:
################################

#
# Set the database in memory cache size.
#
set_cachesize 0 300428800 0
#
# Set database flags.
#
#set_flags DB_TXN_NOSYNC
#
# Set log values.
#
set_lg_regionmax 10048576
set_lg_max 10485760
set_lg_bsize 2097152
set_lg_dir /home/perf/openLDAP/install/ldap2/var/openldap-logs


#######################
#Slapd.conf #######################


Caching about 10K records.

SCSI - RAID with RAID 5. Slapd started with logging
totally diables. Any other information you need
howard. Lemme know please.

Trevor



--
-- Howard Chu




=====
( >-                                           -< )
/~\    ______________________________________   /~\
|  \) /    Scaling FLOSS in the Enterprise   \ (/ |
|_|_  \        trevorwarren@yahoo.com        / _|_|
      \____________________________________/




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/