[Date Prev][Date Next] [Chronological] [Thread] [Top]
Re: (ITS#7841) high disk utilization

To: openldap-its@OpenLDAP.org
Subject: Re: (ITS#7841) high disk utilization
From: leo@yuriev.ru
Date: Fri, 03 Oct 2014 00:55:25 +0000
Auto-submitted: auto-generated (OpenLDAP-ITS)
2014-10-03 3:13 GMT+04:00 Howard Chu <hyc@symas.com>:
>> commit 841059330fd44769e93eb4b937c3ce42654fad6f
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-20 07:16:15 +0400
>>
>>       BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected
>> write,
>>                 before the data pages would be synchronized.
>>
>>       Without locking the meta-pages may be writen by OS before other
>> data,
>>       in this case database would be inconsistent.
>
>
> Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but
> that risk is already documented.

We are using the combination:
  envflags writemap nosync lifo
  checkpoint 0 1

If the checkpoint is set in seconds, it gives us the assurance
consistent state database on disk.
However, without this patch meta-pages can be written by the kernel
before the data.

In fact, for a full guarantee in case of death slapd process,
meta-page should be written explicitly.
But it requires a lot of changes and I do not do that.

>> commit 0c168d0e63ed78d13df3fc8a42f3667335678639
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-20 10:13:28 +0400
>>
>>       FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes.
>>
>>       Reclaim FreeDB in LIFO order - this is a main feature.
>>       Also aim to coalesce small FreeDFB records.
>
> Will spend more time looking at this closer.

I would be suggested, but do not insist, review this patch on github.

>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-19 22:47:19 +0400
>>
>>       BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env().
>>
>>       Meta-pages may be updated during data-syncing in mdb_sync_env(),
>>       in this case database would be inconsistent.
>>
>>       Check-and-retry if lead txn-id changed during flushing data in
>> mdb_sync_env().
>
> Probably could simplify this, just obtain the write mutex unconditionally,
> then there's no need to loop or retry. But also, this depends on MDB_NOLOCK
> - if that's set, then do no locking at all.

I did so for reasons of performance and less a lock retention time.

Retries will be if there an intensive flow of changes.
In this case it will be a lot of updated pages, the record which will
take some time.

However, in subsequent iterations (if a transactions had committed
while there was a record),
the modified pages will be much fewer, and the sync will be quick.

Thus (and it was seen in tests) even when a substantial amount of the
transactions,
usually only two iterations of the cycle,
without locking and flow of changes are not suspended.

>> commit 147f41a8110f28456bc32123bde86d47183f9c0a
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-04 16:01:15 +0400
>>
>>       FEATURE - lmdb: implementation of "checkpoint kbytes".
>>
>>       Force flush when volume of the changes reached a configurable
>> threshold.
>
>
> Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a
> good name.

Agree.
I just took the first choice and try to retaining the style.
Ideas?

>> commit fb82a0b688f4c31313d0790415feda8aaa18651c
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-04 15:18:16 +0400
>>
>>       CHANGE - lmdb-backend: checkpoint-interval in seconds instead of
>> minutes.
>
>
> Gratuitous change. We used minutes since the BDB backend uses minutes, and
> the intention was to maintain parallel functionality. What's the
> justification for this change?

As I had wrote above, we are using the combination:
  envflags writemap nosync lifo
  checkpoint 0 1

If the interval is specified in minutes, then it can not be set less
than one minute.
But it's too big amount of time to allow lost the updates.

However, setting the synchronization interval of one second,
we reduce the amount of losses in the event of an accident to an
acceptable level,
while the load on the storage system is acceptable even for a large
flow of updates.

As a result, I have not found a better solution than simply replace
the minutes by the seconds.

>> commit fc409d89e0d9dde20f612e34c2a463c8a81ea000
>> Author: Leo Yuriev <leo@yuriev.ru>
>> Date:   2014-09-20 06:51:04 +0400
>>
>>       EXTENSION - lmdb: more usefull info from mdb_stat tool.
>
>
> A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, not
> the "last" reader. I'm not convinced of the value of this patch, since you
> can already view the readers list.

I am agree then "tail" is a best choice.
But the main value of this patch is not to show a txn of oldest
reader, but to show an info about pages usage.
Especially the amount of pages which are "blocked" by oldest (laggard)
reader, and how much pages are actually available.

> --
>   -- Howard Chu
>   CTO, Symas Corp.           http://www.symas.com
>   Director, Highland Sun     http://highlandsun.com/hyc/
>   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Thank you in advance.
BR.
Leonid Yuriev.
Prev by Date: Re: (ITS#7841) high disk utilization
Next by Date: (ITS#7957) [LMDB] critical error after compacting an empty database
Index(es):
- Chronological
- Thread