[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: both 8 hour 200 vu tests end in a server abort

Hi Buchan,

Buchan Milne wrote:

Hash: SHA1

RW wrote:
| Hi!
| Maybe it would help a little bit if I include my config's in this
| discussion since
| I have had the same problems as Trevor shortly. So here an extract of the
| relevant settings in DB_CONFIG I'm experimenting with:
| set_cachesize 0 524288000 0
| set_lg_regionmax 1048576
| set_lg_max 104857600
| set_lg_bsize 2097152
| set_tx_max 100
| And the slapd.conf:
| cachesize 10000
| checkpoint 512 1

You are checkpointing way too often. You really only want to ensure that
database recovery (when necessary) will not take too long, and that you
don't keepexcessive archives of log files.
- -you have checkpointed at some time interval smaller than your backup
- -you have checkpointed by size at a time that allows you to run database
recovery in a short enough time.
- -never reach 2GB of active transaction logs

Thanks for your recommendations but I can't see the point here. If I'm right the
syntax for checkpoint is:
checkpoint <kbytes> <min>
So in my case I'am checkpointing every 512 Kbyte or after 1 Min. With
Oracle we are doing checkpointing much more often. 1 Min. for a checkpoint
isn't too low. But I've also done the same tests with the following parameter:
checkpoint 4096 5
I've also used the same parameter settings for production but finally ended
with a corrupted database.
What I can't figure out is what do you mean by "never reach 2GB of active
transaction logs". I'am doing a db_archive every night and it always keeps
only one active transaction log. The ldapadd I mentioned in my last mail which
should add 500.000 entries created 5 GB of transaction logs
before OpenLDAP quitted. What I don't know is if adding one entry
= 1 transaction or if adding all 500.000 entries = 1 transaction. In the case
of 500.000 entries = 1 transaction it would of course create a lot of active
logs. I would guess one entry = 1 transaction. But I can't verify this at the
moment. I can't recover (not even with db_recover -c) the BDB files.
db_recover runs fine saying that recovery has completed. But when I start
OpenLDAP I get:
bdb(l=root): PANIC: fatal region error detected; run recovery
bdb_db_open: dbenv_open failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30978)
backend_startup: bi_db_open(0) failed! (-30978)
bdb(l=root): txn_checkpoint interface requires an environment configured for the transaction subsystem
bdb_db_destroy: txn_checkpoint failed: Invalid argument (22)
slapd stopped.
connections_destroy: nothing to destroy.
> My reasoning (for the setup I posted before) was that I wanted at most 2
> transaction logs in use, so I ensured that checkpointing occured at <
> set_lg_max (I went for set_lg_max/2) and I was running hot database
> backups (for the accelerated testing) every 2 minutes, so I set the time
> interval less than that (but not substantially).
> If you're checkpointing every second, you're going to be killing your IO
> subsystem, and probably increasing mutex contention. You don't need to
> over-checkpoint, since you have the transaction logs to recover from.

Of course you're right that too much checkpointing would kill I/O but I've
looked at it during ldapadd and it was no problem. The average writes/s
was 2500 with writing 10.000 kByte/s, a util. of about 70% and
a average wait time of only about 3 millis.  So no problem here. Nothing
I wouldn't expect during such a massive import.
But I see now that set_lg_max is definitely too high. I will decrease it.



- --
Buchan Milne                      Senior Support Technician
Obsidian Systems                  http://www.obsidian.co.za
B.Eng                                RHCE (803004789010797)
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org