[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: LMDB test assertion failures on Linux/MIPS



hyc@symas.com said:
> Martin Lucina wrote:
> >That still doesn't explain the MIPS issues, any suggestions on how to
> >proceed there? I can give someone access to a MIPS host if that would help.
> 
> Copying back to the list:
> 
> Martin Lucina wrote:
> > hyc@symas.com said:
> >> It appears that this system also lacks a coherent FS cache, like
> >> some BSDs. I changed mtest.c to use MDB_WRITEMAP and it now runs
> >> fine.
> >>
> >> The unmodified mtest.c also worked when single-stepping thru gdb,
> >> which apparently gives time for the cache to sort itself out between
> >> mdb function calls.
> >
> > Interesting. What you're saying is that without MDB_WRITEMAP pages are
> > written out separately and it is up to the FS cache to ensure that reading
> > back via the memory map is consistent, correct?
> 
> That's the general idea. As the LMDB design paper states, LMDB
> requires the OS to use a unified buffer cache - so that mmap pages
> and FS cache pages are the same.
> 
> > I'll try and dig through the OpenWRT kernel configuration, they must have
> > changed something that triggers this behaviour.
> 
> Frankly it seems unlikely that they could have changed something so
> fundamental to the VM subsystem of the kernel. It's also possible that we're
> seeing *CPU* cache inconsistencies, and that adding a few
> MIPS-specific memory barrier instructions here and there may fix
> things up.

I did some more investigating:

1) Tried adding calls to sync_file_range() (Linux-specific syscall) and
in desperation even sync(2) to mdb_txn_commit() just after mdb_page_flush()
et al. No change.

2) Compiled the below test program on various plaforms. This tries (rather
unscientifically) to test how "long" it takes for a mmap to become
consistent after writing to the underlying file through a different fd
opened with O_DSYNC (what mdb does).

The results are interesting:

x86_64 core i5m (2 cores, 4 threads): gcc -O2: consistently less than 1k iterations
x86_64 core i5m (2 cores, 4 threads): gcc -O2 -DNOBARRIER: consistently around 10k iterations
x86_64 dual 4-core xeon, gcc -O2: around 2k iterations
x86_64 dual 4-core xeon, gcc -O2 -DNOBARRIER: 10-15k iterations
MIPS target, musl gcc -O2 -mips32r2: varies, mostly 1, in each 10 runs at least one run completes in the high 100k's of iterations
MIPS target, musl gcc -O2 -mips32r2 -DNOBARRIER: about the same as previous, but
when not 1 the result is subjectively higher (around 1m iterations)
single CPU SPARCv9 solaris 10, Sun cc -fast -mt: always[*] 1
single CPU SPARCv9 solaris 10, CSW gcc -O2, with or without -DNOBARRIER: always[*] 1
ia64 dual Itanium 2, Linux gcc -O2: around 2k iterations
ia64 dual Itanium 2, Linux gcc -O2 -DNOBARRIER: anwhere between 3-8k iterations

[*] very rarely several million iterations

Does this help in any way?  It certainly seems to suggest that the MIPS
target's fs cache is (eventually) consistent.

Any pointers on how to proceed or what else to try/who else to ask will be
much appreciated.

Martin

----test program----
#include <fcntl.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <assert.h>
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>

pthread_barrier_t b;

static void *thread (void *arg)
{
    int fd;

    pthread_barrier_wait (&b);
    fd = open ("/tmp/testfile", O_RDWR | O_CREAT | O_DSYNC, 0600);
    unsigned long v = 1;
    assert (write (fd, &v, sizeof v) == sizeof v);
    close (fd);
    return NULL;
}

int main (int argc, char *argv[])
{
    int fd;
    pthread_barrier_init (&b, NULL, 2);

    unlink ("/tmp/testfile");
    fd = open ("/tmp/testfile", O_RDWR | O_CREAT, 0600);
    unsigned long v = 0;
    assert (write (fd, &v, sizeof v) == sizeof v);
    volatile unsigned long *p = mmap (NULL, getpagesize (), PROT_READ,
            MAP_SHARED, fd, 0);
    assert (p != MAP_FAILED);

    int i = 0;
    pthread_t thread_id = 0;
    pthread_create (&thread_id, NULL, thread, NULL);

    while (*p != 1) {
        if (!i)
            pthread_barrier_wait (&b);
        i++;
#if defined (__GNUC__) && !defined (NOBARRIER)
        __sync_synchronize ();
#endif
    }
    printf ("%d\n", i);

    munmap ((void *)p, getpagesize ());
    close (fd);
    return 0;
}