Full_Name: Hallvard B Furuseth Version: LMDB_0.9.15 OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (81.191.45.5) Submitted by: hallvard mdb_env_copyfd1() and mdb_env_copythr() synchronize via my->mc_new, but incorrectly. copythr() starts before there is any data to consume, and copyfd1() does not tell it (by setting mc_new and signalling) when there is data. Instead it sets mc_new before it has finished producing the data. I suppose copythr just waits for a spurious wakeup, I haven't looked closely. It got it right originally when copyfd1 did not start the new thread until data (metapages) was ready. Or copyfd1 could start with mc_new = 0, and set it and signal when data is ready. copythr holds the mutex while writing. That's not how conds are supposed to be used. Threads are supposed to grab the mutex briefly around code which operates on mc_new. mc_status can be lost. Copyfd1 does not always use it, or it discards it. And copythr should set mc_new=0 before exiting also when setting mc_status. so copyfd1 won't just sit and wait.
On 29/07/15 21:25, h.b.furuseth@usit.uio.no wrote: > (...)And copythr should set mc_new=0 before exiting also when setting > mc_status. so copyfd1 won't just sit and wait. I forgot it clears mc_new before setting mc_status. It's if it didn't keep the mutex it would have to set mc_new afterwards (after locking the mutex).
moved from Incoming to Software Bugs
Okay, that was one confused ITS in response to equally confused code. Trying again. Draft fixes in my updated branch "mdb/copyfd1" (non-fast-forward). It's broken if mdb_env_copythr() has a write error: * copyfd1 can (A) overrun buffer mc_wlen, and (B) hang, if if cthr_toggle() is in cond_wait() when the error occurs: It sets mc_new and returns success. cwalk() proceeds to write at mc_wbuf+mc_wlen, but copythr() did not reset mc_wlen. Then copyfd1() waits for the now-dead copythr() to clear mc_new. Fixes: (A) join the if(mc_status) and while(mc_new==1) in cthr_toggle. (B) Drop the wait before THREAD_FINISH(), it seems pointless. Both Posix and Windows doc say that THREAD_FINISH waits. Or if Windows needs it for some reason, then copythr must live until it receives mc_new=-1, so it can set mc_new=0. * copyfd1() loses any error from the final cthr_toggle(). Fix: Return rc ? rc : my.mc_status. Tested on Linux with fake write errors, with/without fixes. While not a bug, I also see no point in both threads' initial waits. copyfd1() can own the buffer initially (set mc_new=0). I'm guessing the waits are there because the code tries to guide the threads from step to step, but it broke because you didn't think of every possibility. It's easier to define and maintain one cond_wait state per thread and keep things consistent there. More on error checking: copyfd1() omits most error checking and the cleanup after errors. E.g. it does not tell copythr to finish if txn_begin fails. Each thread is slow to react to errors in the other thread. I/O will cease quicker if both threads set and check mc_status. If so, mc_status can no longer be part of the cond_wait condition, since it won't be mutex-protected. So the threads might chat a bit more before finishing. I'd rename mc_status -> mc_error. "while (!my->mc_error && ...)" is quite self-explanatory, unlike "while (!my->mc_status && ...)". About lock/unlock outside vs. inside the loop: When outside, copythr() must block for the threads to exchange buffers. When inside, the slowest thread does not block, but there are more lock/unlock/wait calls. Testing: Locks inside is clearly faster on tmpfs. On a regular FS I don't know - it may have been a bit slower, but times varied more than the difference. Finally, cwalk() could re-use the page buffers it mallocs. -- Hallvard
More: MDB_CP_COMPACT would drop a new, empty main DB. (I.e. the main DB would lose its flags.) Compacting can verify that the written metapage claims enough DB pages. But not the opposite - it should not fail if it claims too many pages since that would mean we can't compact a DB which has suffered page leaks. We could detect that and re-write the metas at the end, if the user requests it and the file is seekable.
I wrote: > Compacting can verify that the written metapage claims > enough DB pages. But not the opposite - it should not > fail if it claims too many pages since that would mean > we can't compact a DB which has suffered page leaks. Duh, yes the opposite too. If there has been a page leak, the root page number written in the new meta will be wrong. > We could detect that and re-write the metas at the end, > if the user requests it and the file is seekable. Though there's no reliable "is file seekable?" test. lseek() can do nothing but still return success on some devices. Another (user) option: Write a file which requires recovery. We'll need something like that with incremental backups anyway. Write dummy metapages at the beginning, with meta.mm_version indicating the file must be recovered. Write a final metapage at the end, or try to seek and rewrite the beginning if the user says so. Recovery must be a separate program, or in mdb_env_open() when we know the metapage is at the end - i.e. before mapping and maybe expanding the file. -- Hallvard
changed notes changed state Open to Test
Some fixes in mdb.master
changed notes