Full_Name: Dmitrii Fonariuk Version: 2.4.38 OS: rhEL6.x86_64 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (91.210.4.1) There is a big value of DISK WRITE parameter in utility Iotop when in MDB a lot of Free pages (Freelist Status). Supposedly this situation arises from memory management algorithm. The algorithm FIFO is used for pages block allocation in free pages pool. We touched and dirty different pages on every modification transaction, which then flushed to disk by system process Flush. Perhaps it would be better to use the LIFO, which will to dirty the same pages by different transactions, which reduces the load on the disk? we use MDB with EnvFlags writemap and mapasync.
The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Leonid Yuriev <leo@yuriev.ru>. I have not assigned rights and/or interest in this work to any party. The attached modifications to OpenLDAP Software are subject to the following notice: Copyright 2014 Leonid Yuriev. Copyright 2014 Peter-Service LLC, Moscow, Russia. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License.
The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Leonid Yuriev <leo@yuriev.ru>. I have not assigned rights and/or interest in this work to any party. The attached modifications to OpenLDAP Software are subject to the following notice: Copyright 2014 Leonid Yuriev. Copyright 2014 Peter-Service LLC, Moscow, Russia. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License. https://github.com/leo-yuriev/openldap-lmdb-challenge/pull/1 or https://github.com/leo-yuriev/openldap-lmdb-challenge/ branch master-devel commit 841059330fd44769e93eb4b937c3ce42654fad6f Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-20 07:16:15 +0400 BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected write, before the data pages would be synchronized. Without locking the meta-pages may be writen by OS before other data, in this case database would be inconsistent. commit 6240c3350e8bd86337c7e41722cf6a38881f15e7 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-12 01:32:13 +0400 BUGFIX - lmdb: reordering of instructions which update the txn in a meta-page. Without "volatile" or memory-barrier compiler may reorder instructions for update the "mm_txnid" field in meta-page in "writemap" mode. From the reader's point of view this cause a short time interval when the transaction is corrupted. commit accef62de7fe5660f870f4c5da319a2a8098b2fb Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-21 02:29:50 +0400 BUGFIX - lmdb: 'volatile' to important fields, which may be updated by readers asynchronously. Without 'volatile' compiler may eliminate a mdb_find_oldest() calls. commit bb83e03cf1b8bceee64550229c3becbdd5400680 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-19 20:18:17 +0400 FEATURE - lmdb-backend: support config for 'lifo' and 'coalesce' envflags. commit 0c168d0e63ed78d13df3fc8a42f3667335678639 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-20 10:13:28 +0400 FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes. Reclaim FreeDB in LIFO order - this is a main feature. Also aim to coalesce small FreeDFB records. commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-19 22:47:19 +0400 BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). Meta-pages may be updated during data-syncing in mdb_sync_env(), in this case database would be inconsistent. Check-and-retry if lead txn-id changed during flushing data in mdb_sync_env(). commit 908677f989588d06b9f00620576dea3c5c8675d7 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-04 16:10:05 +0400 FEATURE - lmdb-backend: support for "checkpoint kbytes" config-option. commit 147f41a8110f28456bc32123bde86d47183f9c0a Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-04 16:01:15 +0400 FEATURE - lmdb: implementation of "checkpoint kbytes". Force flush when volume of the changes reached a configurable threshold. commit fb82a0b688f4c31313d0790415feda8aaa18651c Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-04 15:18:16 +0400 CHANGE - lmdb-backend: checkpoint-interval in seconds instead of minutes. commit fc409d89e0d9dde20f612e34c2a463c8a81ea000 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-20 06:51:04 +0400 EXTENSION - lmdb: more usefull info from mdb_stat tool. commit ccc7da690ffbff440643295b945fdf7886f48c97 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-05 00:19:16 +0400 TRIVIA - lmdb: clean testdb-dir while "make test".
leo@yuriev.ru wrote: > The attached patch file is derived from OpenLDAP Software. All of the > modifications to OpenLDAP Software represented in the following > patch(es) were developed by Leonid Yuriev <leo@yuriev.ru>. I have not > assigned rights and/or interest in this work to any party. > > The attached modifications to OpenLDAP Software are subject to the > following notice: > > Copyright 2014 Leonid Yuriev. > Copyright 2014 Peter-Service LLC, Moscow, Russia. > Redistribution and use in source and binary forms, with or without > modification, are permitted only as authorized by the OpenLDAP Public > License. > > https://github.com/leo-yuriev/openldap-lmdb-challenge/pull/1 > or > https://github.com/leo-yuriev/openldap-lmdb-challenge/ branch master-devel > > commit 841059330fd44769e93eb4b937c3ce42654fad6f > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-20 07:16:15 +0400 > > BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected write, > before the data pages would be synchronized. > > Without locking the meta-pages may be writen by OS before other data, > in this case database would be inconsistent. Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but that risk is already documented. > > commit 6240c3350e8bd86337c7e41722cf6a38881f15e7 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-12 01:32:13 +0400 > > BUGFIX - lmdb: reordering of instructions which update the txn in > a meta-page. > > Without "volatile" or memory-barrier compiler may reorder instructions > for update the "mm_txnid" field in meta-page in "writemap" mode. > > From the reader's point of view this cause a short > time interval when the transaction is corrupted. OK. > > commit accef62de7fe5660f870f4c5da319a2a8098b2fb > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-21 02:29:50 +0400 > > BUGFIX - lmdb: 'volatile' to important fields, which > may be updated by readers asynchronously. > > Without 'volatile' compiler may eliminate a mdb_find_oldest() calls. OK. > > commit bb83e03cf1b8bceee64550229c3becbdd5400680 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-19 20:18:17 +0400 > > FEATURE - lmdb-backend: support config for 'lifo' and 'coalesce' envflags. > > commit 0c168d0e63ed78d13df3fc8a42f3667335678639 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-20 10:13:28 +0400 > > FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes. > > Reclaim FreeDB in LIFO order - this is a main feature. > Also aim to coalesce small FreeDFB records. Will spend more time looking at this closer. > > commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-19 22:47:19 +0400 > > BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). > > Meta-pages may be updated during data-syncing in mdb_sync_env(), > in this case database would be inconsistent. > > Check-and-retry if lead txn-id changed during flushing data in > mdb_sync_env(). Probably could simplify this, just obtain the write mutex unconditionally, then there's no need to loop or retry. But also, this depends on MDB_NOLOCK - if that's set, then do no locking at all. > commit 908677f989588d06b9f00620576dea3c5c8675d7 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-04 16:10:05 +0400 > > FEATURE - lmdb-backend: support for "checkpoint kbytes" config-option. OK if the lmdb implementation is OK. > > commit 147f41a8110f28456bc32123bde86d47183f9c0a > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-04 16:01:15 +0400 > > FEATURE - lmdb: implementation of "checkpoint kbytes". > > Force flush when volume of the changes reached a configurable threshold. Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a good name. > > commit fb82a0b688f4c31313d0790415feda8aaa18651c > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-04 15:18:16 +0400 > > CHANGE - lmdb-backend: checkpoint-interval in seconds instead of minutes. Gratuitous change. We used minutes since the BDB backend uses minutes, and the intention was to maintain parallel functionality. What's the justification for this change? > > commit fc409d89e0d9dde20f612e34c2a463c8a81ea000 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-20 06:51:04 +0400 > > EXTENSION - lmdb: more usefull info from mdb_stat tool. A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, not the "last" reader. I'm not convinced of the value of this patch, since you can already view the readers list. > commit ccc7da690ffbff440643295b945fdf7886f48c97 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-05 00:19:16 +0400 > > TRIVIA - lmdb: clean testdb-dir while "make test". OK. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
2014-10-03 3:13 GMT+04:00 Howard Chu <hyc@symas.com>: >> commit 841059330fd44769e93eb4b937c3ce42654fad6f >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-20 07:16:15 +0400 >> >> BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected >> write, >> before the data pages would be synchronized. >> >> Without locking the meta-pages may be writen by OS before other >> data, >> in this case database would be inconsistent. > > > Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but > that risk is already documented. We are using the combination: envflags writemap nosync lifo checkpoint 0 1 If the checkpoint is set in seconds, it gives us the assurance consistent state database on disk. However, without this patch meta-pages can be written by the kernel before the data. In fact, for a full guarantee in case of death slapd process, meta-page should be written explicitly. But it requires a lot of changes and I do not do that. >> commit 0c168d0e63ed78d13df3fc8a42f3667335678639 >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-20 10:13:28 +0400 >> >> FEATURE - lmdb: MDB_LIFORECLAIM & MDB_COALESCE modes. >> >> Reclaim FreeDB in LIFO order - this is a main feature. >> Also aim to coalesce small FreeDFB records. > > Will spend more time looking at this closer. I would be suggested, but do not insist, review this patch on github. >> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-19 22:47:19 +0400 >> >> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). >> >> Meta-pages may be updated during data-syncing in mdb_sync_env(), >> in this case database would be inconsistent. >> >> Check-and-retry if lead txn-id changed during flushing data in >> mdb_sync_env(). > > Probably could simplify this, just obtain the write mutex unconditionally, > then there's no need to loop or retry. But also, this depends on MDB_NOLOCK > - if that's set, then do no locking at all. I did so for reasons of performance and less a lock retention time. Retries will be if there an intensive flow of changes. In this case it will be a lot of updated pages, the record which will take some time. However, in subsequent iterations (if a transactions had committed while there was a record), the modified pages will be much fewer, and the sync will be quick. Thus (and it was seen in tests) even when a substantial amount of the transactions, usually only two iterations of the cycle, without locking and flow of changes are not suspended. >> commit 147f41a8110f28456bc32123bde86d47183f9c0a >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-04 16:01:15 +0400 >> >> FEATURE - lmdb: implementation of "checkpoint kbytes". >> >> Force flush when volume of the changes reached a configurable >> threshold. > > > Probably OK. Needs some typographical cleanup. Not sure "syncbytes" is a > good name. Agree. I just took the first choice and try to retaining the style. Ideas? >> commit fb82a0b688f4c31313d0790415feda8aaa18651c >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-04 15:18:16 +0400 >> >> CHANGE - lmdb-backend: checkpoint-interval in seconds instead of >> minutes. > > > Gratuitous change. We used minutes since the BDB backend uses minutes, and > the intention was to maintain parallel functionality. What's the > justification for this change? As I had wrote above, we are using the combination: envflags writemap nosync lifo checkpoint 0 1 If the interval is specified in minutes, then it can not be set less than one minute. But it's too big amount of time to allow lost the updates. However, setting the synchronization interval of one second, we reduce the amount of losses in the event of an accident to an acceptable level, while the load on the storage system is acceptable even for a large flow of updates. As a result, I have not found a better solution than simply replace the minutes by the seconds. >> commit fc409d89e0d9dde20f612e34c2a463c8a81ea000 >> Author: Leo Yuriev <leo@yuriev.ru> >> Date: 2014-09-20 06:51:04 +0400 >> >> EXTENSION - lmdb: more usefull info from mdb_stat tool. > > > A bit ambiguous. me_tail_txnid is actually the ID of the oldest reader, not > the "last" reader. I'm not convinced of the value of this patch, since you > can already view the readers list. I am agree then "tail" is a best choice. But the main value of this patch is not to show a txn of oldest reader, but to show an info about pages usage. Especially the amount of pages which are "blocked" by oldest (laggard) reader, and how much pages are actually available. > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ Thank you in advance. BR. Leonid Yuriev.
As directed by Kurt Zeilenga (Executive Director, Kurt@openldap.org) I was re-submitted the new ITS#7958 with updated IPR statement. http://www.openldap.org/its/index.cgi/Incoming?id=7958;selectid=7958 Best regards, Leonid.
See ITS#7958
changed notes changed state Open to Closed