[Date Prev][Date Next] [Chronological] [Thread] [Top]

Antw: Re: LMDB test assertion failures on Linux/MIPS



Hi!

I think a problem with your test program is that you don't wait for the write() thread to finish before you try to read the mmap(). See how locking on a producer-consumer (or reader-writer) relationship is usually implemented (If you don't have it ready, I could send you the algorithms).

Regards,
Ulrich

>>> Martin Lucina <martin@lucina.net> schrieb am 10.03.2014 um 22:10 in Nachricht
<20140310211032.GA22062@nodbug.moloch.sk>:
> hyc@symas.com said:
>> Martin Lucina wrote:
>> >That still doesn't explain the MIPS issues, any suggestions on how to
>> >proceed there? I can give someone access to a MIPS host if that would help.
>> 
>> Copying back to the list:
>> 
>> Martin Lucina wrote:
>> > hyc@symas.com said:
>> >> It appears that this system also lacks a coherent FS cache, like
>> >> some BSDs. I changed mtest.c to use MDB_WRITEMAP and it now runs
>> >> fine.
>> >>
>> >> The unmodified mtest.c also worked when single-stepping thru gdb,
>> >> which apparently gives time for the cache to sort itself out between
>> >> mdb function calls.
>> >
>> > Interesting. What you're saying is that without MDB_WRITEMAP pages are
>> > written out separately and it is up to the FS cache to ensure that reading
>> > back via the memory map is consistent, correct?
>> 
>> That's the general idea. As the LMDB design paper states, LMDB
>> requires the OS to use a unified buffer cache - so that mmap pages
>> and FS cache pages are the same.
>> 
>> > I'll try and dig through the OpenWRT kernel configuration, they must have
>> > changed something that triggers this behaviour.
>> 
>> Frankly it seems unlikely that they could have changed something so
>> fundamental to the VM subsystem of the kernel. It's also possible that we're
>> seeing *CPU* cache inconsistencies, and that adding a few
>> MIPS-specific memory barrier instructions here and there may fix
>> things up.
> 
> I did some more investigating:
> 
> 1) Tried adding calls to sync_file_range() (Linux-specific syscall) and
> in desperation even sync(2) to mdb_txn_commit() just after mdb_page_flush()
> et al. No change.
> 
> 2) Compiled the below test program on various plaforms. This tries (rather
> unscientifically) to test how "long" it takes for a mmap to become
> consistent after writing to the underlying file through a different fd
> opened with O_DSYNC (what mdb does).
> 
> The results are interesting:
> 
> x86_64 core i5m (2 cores, 4 threads): gcc -O2: consistently less than 1k 
> iterations
> x86_64 core i5m (2 cores, 4 threads): gcc -O2 -DNOBARRIER: consistently around 
> 10k iterations
> x86_64 dual 4-core xeon, gcc -O2: around 2k iterations
> x86_64 dual 4-core xeon, gcc -O2 -DNOBARRIER: 10-15k iterations
> MIPS target, musl gcc -O2 -mips32r2: varies, mostly 1, in each 10 runs at 
> least one run completes in the high 100k's of iterations
> MIPS target, musl gcc -O2 -mips32r2 -DNOBARRIER: about the same as previous, 
> but
> when not 1 the result is subjectively higher (around 1m iterations)
> single CPU SPARCv9 solaris 10, Sun cc -fast -mt: always[*] 1
> single CPU SPARCv9 solaris 10, CSW gcc -O2, with or without -DNOBARRIER: 
> always[*] 1
> ia64 dual Itanium 2, Linux gcc -O2: around 2k iterations
> ia64 dual Itanium 2, Linux gcc -O2 -DNOBARRIER: anwhere between 3-8k iterations
> 
> [*] very rarely several million iterations
> 
> Does this help in any way?  It certainly seems to suggest that the MIPS
> target's fs cache is (eventually) consistent.
> 
> Any pointers on how to proceed or what else to try/who else to ask will be
> much appreciated.
> 
> Martin
> 
> ----test program----
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/mman.h>
> #include <assert.h>
> #include <stdio.h>
> #include <pthread.h>
> #include <unistd.h>
> 
> pthread_barrier_t b;
> 
> static void *thread (void *arg)
> {
>     int fd;
> 
>     pthread_barrier_wait (&b);
>     fd = open ("/tmp/testfile", O_RDWR | O_CREAT | O_DSYNC, 0600);
>     unsigned long v = 1;
>     assert (write (fd, &v, sizeof v) == sizeof v);
>     close (fd);
>     return NULL;
> }
> 
> int main (int argc, char *argv[])
> {
>     int fd;
>     pthread_barrier_init (&b, NULL, 2);
> 
>     unlink ("/tmp/testfile");
>     fd = open ("/tmp/testfile", O_RDWR | O_CREAT, 0600);
>     unsigned long v = 0;
>     assert (write (fd, &v, sizeof v) == sizeof v);
>     volatile unsigned long *p = mmap (NULL, getpagesize (), PROT_READ,
>             MAP_SHARED, fd, 0);
>     assert (p != MAP_FAILED);
> 
>     int i = 0;
>     pthread_t thread_id = 0;
>     pthread_create (&thread_id, NULL, thread, NULL);
> 
>     while (*p != 1) {
>         if (!i)
>             pthread_barrier_wait (&b);
>         i++;
> #if defined (__GNUC__) && !defined (NOBARRIER)
>         __sync_synchronize ();
> #endif
>     }
>     printf ("%d\n", i);
> 
>     munmap ((void *)p, getpagesize ());
>     close (fd);
>     return 0;
> }