Full_Name: Leonid Yuriev Version: 2.4.40 OS: RHEL7 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (31.130.36.33) Several fields from the meta-data that shared globally across active writer and all readers are not 'volatile'. This allow compiler to cache its values in registers and reorder writes. More over, in some cases compiler is free to eliminate the calls of function which seems to be 'const' or 'pure', pls see http://lwn.net/Articles/285332/. Thereby a series of Heisenbugs may be induced, for instance: 1) The mdb_find_oldest() calls may be potentially eliminated up to public (not static) functions due to interprocedural optimization 'unit at once'. 2) Reordering of CPU instructions which updates the txn in a meta-page. Without "volatile" or memory-barrier compiler may reorder instructions for update the "mm_txnid" field in meta-page in "writemap" mode. Thereby, from the reader's point of view this cause a short time interval when the transaction is corrupted. On some architectures few actions are required for cache coherence, for instance - GCC's __synchronize() must be called. 3) A really critical bugs will be described later in separate ITS.
The attached files is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights and/or interest in this work to any party. I, Leonid Yuriev am authorized by Peter-Service LLC, my employer, to release this work under the following terms. Peter-Service LLC hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice. commit 5e83ea8a76f81dac8d5a9dcd74d5b10a01330102 Author: Leo Yuriev <leo@yuriev.ru> Date: 2014-09-12 01:32:13 +0400 ITS#7969 for LMDB: volatile & __synchronize(). diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c index 6cc3433..3286ffb 100644 --- a/libraries/liblmdb/mdb.c +++ b/libraries/liblmdb/mdb.c @@ -580,11 +580,11 @@ typedef struct MDB_rxbody { * started from so we can avoid overwriting any data used in that * particular version. */ - txnid_t mrb_txnid; + volatile txnid_t mrb_txnid; /** The process ID of the process owning this reader txn. */ - MDB_PID_T mrb_pid; + volatile MDB_PID_T mrb_pid; /** The thread ID of the thread owning this txn. */ - MDB_THR_T mrb_tid; + volatile MDB_THR_T mrb_tid; } MDB_rxbody; /** The actual reader record, with cacheline padding. */ @@ -632,12 +632,12 @@ typedef struct MDB_txbody { * This is recorded here only for convenience; the value can always * be determined by reading the main database meta pages. */ - txnid_t mtb_txnid; + volatile txnid_t mtb_txnid; /** The number of slots that have been used in the reader table. * This always records the maximum count, it is not decremented * when readers release their slots. */ - unsigned mtb_numreaders; + volatile unsigned mtb_numreaders; } MDB_txbody; /** The actual reader table definition. */ @@ -908,7 +908,7 @@ typedef struct MDB_meta { /** Any persistent environment flags. @ref mdb_env */ #define mm_flags mm_dbs[0].md_flags pgno_t mm_last_pg; /**< last used page in file */ - txnid_t mm_txnid; /**< txnid that committed this page */ + volatile txnid_t mm_txnid; /**< txnid that committed this page */ } MDB_meta; /** Buffer for a stack-allocated meta page. @@ -3559,6 +3559,10 @@ mdb_env_write_meta(MDB_txn *txn) mp->mm_dbs[0] = txn->mt_dbs[0]; mp->mm_dbs[1] = txn->mt_dbs[1]; mp->mm_last_pg = txn->mt_next_pgno - 1; +#if !(defined(_MSC_VER) || defined(__i386__) || defined(__x86_64__)) + /* LY: issue a memory barrier, if not x86. */ + __sync_synchronize(); +#endif mp->mm_txnid = txn->mt_txnid; if (!(env->me_flags & (MDB_NOMETASYNC|MDB_NOSYNC))) { unsigned meta_size = env->me_psize;
The attached files is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights and/or interest in this work to any party. I, Leonid Yuriev am authorized by Peter-Service LLC, my employer, to release this work under the following terms. Peter-Service LLC hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice. https://gist.github.com/leo-yuriev/dcedd5e5b2a09c605108
leo@yuriev.ru wrote: > The attached files is derived from OpenLDAP Software. All of the modifications > to OpenLDAP Software represented in the following patch(es) were developed by > Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights > and/or interest in this work to any party. I, Leonid Yuriev am authorized by > Peter-Service LLC, my employer, to release this work under the following terms. > > Peter-Service LLC hereby places the following modifications to OpenLDAP Software > (and only these modifications) into the public domain. Hence, these > modifications may be freely used and/or redistributed for any purpose with or > without attribution and/or other notice. Committed to mdb.master. The whitespace is wrong in all of the patches you submitted, had to apply them manually. > > commit 5e83ea8a76f81dac8d5a9dcd74d5b10a01330102 > Author: Leo Yuriev <leo@yuriev.ru> > Date: 2014-09-12 01:32:13 +0400 > > ITS#7969 for LMDB: volatile & __synchronize(). > > diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c > index 6cc3433..3286ffb 100644 > --- a/libraries/liblmdb/mdb.c > +++ b/libraries/liblmdb/mdb.c > @@ -580,11 +580,11 @@ typedef struct MDB_rxbody { > * started from so we can avoid overwriting any data used in that > * particular version. > */ > - txnid_t mrb_txnid; > + volatile txnid_t mrb_txnid; > /** The process ID of the process owning this reader txn. */ > - MDB_PID_T mrb_pid; > + volatile MDB_PID_T mrb_pid; > /** The thread ID of the thread owning this txn. */ > - MDB_THR_T mrb_tid; > + volatile MDB_THR_T mrb_tid; > } MDB_rxbody; > > /** The actual reader record, with cacheline padding. */ > @@ -632,12 +632,12 @@ typedef struct MDB_txbody { > * This is recorded here only for convenience; the value can always > * be determined by reading the main database meta pages. > */ > - txnid_t mtb_txnid; > + volatile txnid_t mtb_txnid; > /** The number of slots that have been used in the reader table. > * This always records the maximum count, it is not decremented > * when readers release their slots. > */ > - unsigned mtb_numreaders; > + volatile unsigned mtb_numreaders; > } MDB_txbody; > > /** The actual reader table definition. */ > @@ -908,7 +908,7 @@ typedef struct MDB_meta { > /** Any persistent environment flags. @ref mdb_env */ > #define mm_flags mm_dbs[0].md_flags > pgno_t mm_last_pg; /**< last used page in file */ > - txnid_t mm_txnid; /**< txnid that committed this page */ > + volatile txnid_t mm_txnid; /**< txnid that committed this page */ > } MDB_meta; > > /** Buffer for a stack-allocated meta page. > @@ -3559,6 +3559,10 @@ mdb_env_write_meta(MDB_txn *txn) > mp->mm_dbs[0] = txn->mt_dbs[0]; > mp->mm_dbs[1] = txn->mt_dbs[1]; > mp->mm_last_pg = txn->mt_next_pgno - 1; > +#if !(defined(_MSC_VER) || defined(__i386__) || defined(__x86_64__)) > + /* LY: issue a memory barrier, if not x86. */ > + __sync_synchronize(); > +#endif > mp->mm_txnid = txn->mt_txnid; > if (!(env->me_flags & (MDB_NOMETASYNC|MDB_NOSYNC))) { > unsigned meta_size = env->me_psize; > > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
changed notes changed state Open to Test moved from Incoming to Software Bugs
Hi, Howard. I also noticed a problem with the whilespace, so that is also posted patches on github's gist and gave references in each of three ITS. https://gist.github.com/leo-yuriev/dcedd5e5b2a09c605108 https://gist.github.com/leo-yuriev/fc00313b8035cddba6a7 https://gist.github.com/leo-yuriev/6f1e1444c9e29bd94885 > > Committed to mdb.master. The whitespace is wrong in all of the patches you > submitted, had to apply them manually. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/
__sync_synchronize() needs an #ifdef around it. #ifdef __GNUC__, or something more general? Not sure what to do for the #else. We could mostly avoid using mm_txnid when the lockfile is initialized and use mti_txnid instead (e.g. in pick_meta). And penalize MDB_NOLOCK with some sync call in the #else. -- Hallvard
in mdb.master, reopened (unportable)
changed notes changed state Test to Open
2014-11-16 0:04 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > __sync_synchronize() needs an #ifdef around it. > #ifdef __GNUC__, or something more general? > > Not sure what to do for the #else. We could mostly > avoid using mm_txnid when the lockfile is initialized > and use mti_txnid instead (e.g. in pick_meta). And > penalize MDB_NOLOCK with some sync call in the #else. > > -- > Hallvard > Unfortunately IMDB programmed without accounting possibility of memory reordering and cache incoherency (as a proof - the 'volatile' was missed before ITS#7969). So, currently seems LMDB is durable only on x86 and may be on MIPS. I think memory-ordering (e.g. barriers) functions should be defined in separate h-file, which will select implementation depends on cpu-arch & compiler. see http://en.wikipedia.org/wiki/Memory_ordering The minimum seems to be required a memory_barrier_coherence() - to make cache coherence and suppress dangerously load/store reordering, e.g. flush/invalidate cache on some cpu-arch, but do nothing on x86 and so on. Then __sync_synchronize() and MIPS's cache invalidate should be replaced with memory_barrier_coherence(). But instead the original version recreate cleanly on the basis of 1Hippeus engine (manager of shared memory and queues for zero-copy IPC, google it). Leonid
2014-11-16 0:04 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > __sync_synchronize() needs an #ifdef around it. > #ifdef __GNUC__, or something more general? I have implemented 'portable' memory barriers for a set of compilers. https://github.com/ReOpen/ReOpenLDAP/commit/a1ec3a4a8290c84846b4e98ff3abf38839381866 Feedback welcomed. Leonid.
On 01/12/2015 07:15 PM, leo@yuriev.ru wrote: > Unfortunately IMDB programmed without accounting possibility of memory > reordering and cache incoherency (as a proof - the 'volatile' was > missed before ITS#7969). > So, currently seems LMDB is durable only on x86 and may be on MIPS. I gather "volatile" in thread code can be read as "warning: dubious code" or at best "please access this sooner rather than later". comp.programming.threads's advise is that volatile is useless for thread sync: You need sync primitives, and then you don't need volatile since the compiler/hardware may not move the memory access past the primitive. Unless the primitive's spec says so, I guess. E.g. with an "asm" statement where the compiler does not know what the assembly code does and must be told not to mess with ordering. The best fix is likely a format change which reduces the need for sync primitives, for LDMBv2. Checksum the MDB_meta, and/or replace the variable part with a single word which refers to the meta. Don't know if we can get rid of unportable sync code though. Leaving the rest to Howard, I'm too busy just now to do more than blow hot air:-) -- Hallvard
2015-01-13 13:50 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > On 01/12/2015 07:15 PM, leo@yuriev.ru wrote: >> >> Unfortunately IMDB programmed without accounting possibility of memory >> reordering and cache incoherency (as a proof - the 'volatile' was >> missed before ITS#7969). >> So, currently seems LMDB is durable only on x86 and may be on MIPS. > > > I gather "volatile" in thread code can be read as "warning: dubious > code" or at best "please access this sooner rather than later". > > comp.programming.threads's advise is that volatile is useless for thread > sync: You need sync primitives, and then you don't need volatile since > the compiler/hardware may not move the memory access past the primitive. Excuse me, but seems you are misunderstand comp.programming.threads in case of blocking/locking multithreading and non-blocking multithreading (aka lock-free/wait-free). Yes, `volatile` don't needed, when a locking (e.g. mutex) is used for access to any variables that shared between threads. But LMDB don't uses locking primitives for interaction between writers and readers. LMDB properties declared as: - readers don't block writers and writers don't block readers; - read transactions are extremely cheap, and can be performed using no mallocs or any other _blocking_ calls; THEREFORE 'volatile' and memory-ordering ISSUES MUST BE HANDLED VERY CAREFULLY AND ANXIOUSLY. > Unless the primitive's spec says so, I guess. E.g. with an "asm" > statement where the compiler does not know what the assembly code does > and must be told not to mess with ordering. Please study GCC's manual about __asm __volalile (:::"memory"), this is just a "full compiler fence". > The best fix is likely a format change which reduces the need for sync > primitives, for LDMBv2. Checksum the MDB_meta, and/or replace the > variable part with a single word which refers to the meta. > Don't know if we can get rid of unportable sync code though. Yes it is possible, but requires C++, see http://en.cppreference.com/w/cpp/atomic/memory_order By using just C, include C99 - not possible, see http://stackoverflow.com/questions/27301371/memory-order-consistency-model-and-c99 > Leaving the rest to Howard, I'm too busy just now to do more than blow > hot air:-) Ok ;) I created gist for public review, please share it. https://gist.github.com/leo-yuriev/ba186a6bf5cf3a27bae7 Leonid.
On 13/01/15 17:54, leo@yuriev.ru wrote: > 2015-01-13 13:50 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: >> comp.programming.threads's advise is that volatile is useless for thread >> sync: You need sync primitives, and then you don't need volatile since >> the compiler/hardware may not move the memory access past the primitive. > > Excuse me, but seems you are misunderstand comp.programming.threads in > case of blocking/locking multithreading and non-blocking > multithreading (aka lock-free/wait-free). Yes, that didn't come out right. I don't mind inserting the volatile, but I don't know that it helps either. As far as I understand if it was broken without volatile, then it's still broken with it - just hopefully less likely to break. And LMDB couldn't be releaseed with your original unportable sync code along with the volatile. > (...) >> Unless the primitive's spec says so, I guess. E.g. with an "asm" >> statement where the compiler does not know what the assembly code does >> and must be told not to mess with ordering. > > Please study GCC's manual about __asm __volalile (:::"memory"), this > is just a "full compiler fence". Thanks, looks good. Will read the rest later. -- Hallvard
On 13/01/15 19:23, Hallvard Breien Furuseth wrote: > Yes, that didn't come out right. I don't mind inserting the > volatile, but I don't know that it helps either. As far as I > understand if it was broken without volatile, then it's still > broken with it - just hopefully less likely to break. And LMDB > couldn't be releaseed with your original unportable sync code > along with the volatile. Sorry, nevermind. Of course when the writer does sync well enough, volatile + the txn_renew loop will have to do for a sync primitive in the reader. I suppose this requires that sync in the writer thread will shake other threads as well, it won't be private to the writer. -- Hallvard
2015-01-14 8:49 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: > On 13/01/15 19:23, Hallvard Breien Furuseth wrote: >> >> Yes, that didn't come out right. I don't mind inserting the >> volatile, but I don't know that it helps either. As far as I >> understand if it was broken without volatile, then it's still >> broken with it - just hopefully less likely to break. And LMDB >> couldn't be releaseed with your original unportable sync code >> along with the volatile. > > > Sorry, nevermind. Of course when the writer does sync well enough, > volatile + the txn_renew loop will have to do for a sync primitive in > the reader. > I suppose this requires that sync in the writer thread > will shake other threads as well, it won't be private to the writer. In general this is wrong, more precisely depend on 'volatile' on shared variables and usage of barriers/fences by the readers. Sync-ops due lock/release a mutex by writer issue a memory-barrier for its own thread. With this compiler must update all modified variables, which shaded in the CPU registers. Next a hardware write-barrier (aka release) in the mutex-release code enforce all changes to be visible for other threads (e.g. flush the cache). But 'be visible' here mean 'publish' and other threads can access these changes, but if want this. In general, to see changes made by the writer, all other threads should issue a read-barrier (aka acquire). On most arches such barrier just inform compiler that memory was changed and variables which cached in the registers must be reloaded. But in some cases (like Itanium) this barrier will be taken in account for instruction scheduling. For 'volatile' compiler should generate barriers each time on read or write such variables. More general, memory-barriers are very important to HPC, distributed computing and for super-computers. For example read-barriers may pull changes from internode-bus or other nodes, and write-barriers - publish the local changes. So, the one way to avoid a race bugs - thinking in terms of publish/pull changes.
This discussion should be moved to the openldap-devel list as there's more being discussed now than the patch itself. Леонид Юрьев wrote: > 2015-01-14 8:49 GMT+03:00 Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>: >> On 13/01/15 19:23, Hallvard Breien Furuseth wrote: >>> >>> Yes, that didn't come out right. I don't mind inserting the >>> volatile, but I don't know that it helps either. As far as I >>> understand if it was broken without volatile, then it's still >>> broken with it - just hopefully less likely to break. And LMDB >>> couldn't be releaseed with your original unportable sync code >>> along with the volatile. >> >> >> Sorry, nevermind. Of course when the writer does sync well enough, >> volatile + the txn_renew loop will have to do for a sync primitive in >> the reader. > >> I suppose this requires that sync in the writer thread >> will shake other threads as well, it won't be private to the writer. > > In general this is wrong, more precisely depend on 'volatile' on > shared variables and usage of barriers/fences by the readers. Actually 'volatile' is meaningless at the hardware level. It only serves to prevent the compiler from reordering or eliminating accesses. In the case of LMDB the compiler cannot eliminate accesses because they are to global memory, and the compiler cannot determine anything about the liveness or side-effects of global memory accesses. > Sync-ops due lock/release a mutex by writer issue a memory-barrier for > its own thread. > With this compiler must update all modified variables, which shaded in > the CPU registers. > Next a hardware write-barrier (aka release) in the mutex-release code > enforce all changes to be visible for other threads (e.g. flush the > cache). > But 'be visible' here mean 'publish' and other threads can access > these changes, but if want this. > In general, to see changes made by the writer, all other threads > should issue a read-barrier (aka acquire). > On most arches such barrier just inform compiler that memory was > changed and variables which cached in the registers must be reloaded. > But in some cases (like Itanium) this barrier will be taken in account > for instruction scheduling. > For 'volatile' compiler should generate barriers each time on read or > write such variables. No. volatile doesn't generate hardware barriers. > More general, memory-barriers are very important to HPC, distributed > computing and for super-computers. > For example read-barriers may pull changes from internode-bus or other > nodes, and write-barriers - publish the local changes. > So, the one way to avoid a race bugs - thinking in terms of > publish/pull changes. "race bug" by definition occurs when multiple writers may modify a particular memory object, causing its value to be indeterminate to an observer. By definition, no such bugs can occur in LMDB because only single writers can ever modify any memory object. In the case of arbitrary readers viewing writes, these are the possible cases: 1) reader is on same CPU as writer, writes cached there is no issue, the reader sees what the writer wrote. 2) reader is on same CPU as writer, writes not cached there is no issue, the reader must fetch data from global memory 3) reader is on different CPU, writes not cached there is no issue, the reader's CPU must fetch the data from global memory - same as (2) 4) reader is on different CPU, writes are cached the reader may see the cached/stale data, or the CPU may fetch the new data from global memory Only case (4) has any ambiguity, and LMDB's reader table is specifically designed not to care about the ambiguity. I.e., whether fresh or stale data is seen is irrelevant, LMDB will operate correctly because it does not fresh data. Correct processing of the reader table only depends on the oldest data in the table, so staleness is an asset here. Correct processing of changes to the meta page requires exclusion from other writers, so the write mutex is used. This also guarantees that all changes to the meta page are flushed to global memory before the next writer begins. In ITS#7970 you discuss a "heisenbug" which can theoretically occur if multiple write txns complete in the span of 2 memory accesses by a reader. In practice such a bug cannot occur because the act of completing a write txn involves multiple blocking calls (I/O system calls, mutex acquisition/release, etc.) which will force writers to pause relative to any readers in question. In contrast the reader performs no blocking calls at all, and in the window of vulnerability it is performing only 2 single-word memory accesses. Demonstrating the bug by manually inserting yield()s only proves the point - without those manually inserted yields you cannot actually trigger any situation where the reader thread will be descheduled between those two instructions. You discuss the ramifications of such a bug as the writer potentially overwriting pages that the reader needs. None of this can actually occur in current LMDB code, again by design. The fact that you encountered these issues while debugging your LIFO patch only reflects on the problems I already pointed out with the LIFO approach. Hallvard and I had this exact same discussion a few years ago; his example used the debugger to pause the reader at that point. Certainly, if you go out of your way to manually halt the reader at a specific instruction you can break the reader. Without outside intervention it won't happen. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Patch committed, fix released in LMDB 0.9.15
*** Issue 7842 has been marked as a duplicate of this issue. ***