Issue 7682 - LMDB: [PATCH] mdb_env_copy should retry open() if O_DIRECT fails
Summary: LMDB: [PATCH] mdb_env_copy should retry open() if O_DIRECT fails
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-05 23:11 UTC by sog@msg.com.mx
Modified: 2020-03-12 15:54 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description sog@msg.com.mx 2013-09-05 23:11:23 UTC
Full_Name: Salvador Ortiz
Version: 24
OS: Linux
URL: ftp://ftp.msg.com.mx/pub/varios/0001-In-mdb_env_copy-retry-open-if-O_DIRECT-fails.patch
Submission from: (NULL) (187.162.45.111)


If the OS defines O_DIRECT mdb_env_copy unconditionally uses it, but according
to open(2): "Some  file  systems  may  not implement the flag and open() will
fail with EINVAL if it is used."

In this cases mdb_env_copy should retry without the flag.

The attached patch implements that.
Comment 1 Howard Chu 2013-09-05 23:33:19 UTC
sog@msg.com.mx wrote:
> Full_Name: Salvador Ortiz
> Version: 24
> OS: Linux
> URL: ftp://ftp.msg.com.mx/pub/varios/0001-In-mdb_env_copy-retry-open-if-O_DIRECT-fails.patch
> Submission from: (NULL) (187.162.45.111)
>
>
> If the OS defines O_DIRECT mdb_env_copy unconditionally uses it, but according
> to open(2): "Some  file  systems  may  not implement the flag and open() will
> fail with EINVAL if it is used."
>
> In this cases mdb_env_copy should retry without the flag.
>
> The attached patch implements that.
>
>
Thanks, applied.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2013-09-06 00:33:51 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 3 sog@msg.com.mx 2013-09-13 21:09:43 UTC
On 09/05/2013 06:33 PM, Howard Chu wrote:
> sog@msg.com.mx wrote:
>> Full_Name: Salvador Ortiz
>> Version: 24
>> OS: Linux
>> URL: 
>> ftp://ftp.msg.com.mx/pub/varios/0001-In-mdb_env_copy-retry-open-if-O_DIRECT-fails.patch
>> Submission from: (NULL) (187.162.45.111)
>>
>>
>> If the OS defines O_DIRECT mdb_env_copy unconditionally uses it, but 
>> according
>> to open(2): "Some  file  systems  may  not implement the flag and 
>> open() will
>> fail with EINVAL if it is used."
>>
>> In this cases mdb_env_copy should retry without the flag.
>>
>> The attached patch implements that.
>>
>>
> Thanks, applied.
>

Thank you Howard, but I found an unexpected problem with the patch as 
commited.
If the call fails because O_DIRECT, in a "tmpfs" in my Fedora 19 '/tmp', 
the file is left created anyway! so the retry fails because O_EXCL|O_CREAT.
I consider this undocumented behavior for "open", buggy or at least ugly.
(A simple test program in ftp://ftp.msg.com.mx/pub/varios/o_direct.c)

I propose the following:
========
--- a/liblmdb/mdb.c    2013-09-05 17:31:01.498090281 -0500
+++ b/liblmdb/mdb.c    2013-09-13 14:45:00.918717292 -0500
@@ -4250,9 +4250,16 @@
      /* The OS supports O_DIRECT, try with it */
      newfd = open(lpath, O_WRONLY|O_CREAT|O_EXCL|O_DIRECT, 0666);
      /* But open can fail if O_DIRECT isn't supported by the file system
-     * so retry without the flag
+     * so retry without the flag.
+     *
+     * Sept 13 2013.
+     * On a tmpfs, at least, an open with O_DIRECT fails,
+     * BUT the file is created anyway!!
+     * Fortunately seems that other errors are reported before EINVAL
+     * So, we need to remove it before retry open.
       */
      if (newfd == INVALID_HANDLE_VALUE && ErrCode() == EINVAL)
+        (void)unlink(lpath)  ,
  #endif
      newfd = open(lpath, O_WRONLY|O_CREAT|O_EXCL, 0666);
  #endif
=========

What do you think?

Regards.

Comment 4 Hallvard Furuseth 2013-09-14 08:27:45 UTC
On 2013-09-13 23:09, sog@msg.com.mx wrote:
> If the call fails because O_DIRECT, in a "tmpfs" in my Fedora 19 
> '/tmp',
> the file is left created anyway! so the retry fails because 
> O_EXCL|O_CREAT.

Also on RHEL 6.4 (2.6.32-358.18.1.el6.x86_64) with
/dev/shm/nosuch.  I'll report it to RedHat and see what they say.

> I propose the following:
> (...)
> +     * Fortunately seems that other errors are reported before EINVAL
> +     * So, we need to remove it before retry open.
>        */
>       if (newfd == INVALID_HANDLE_VALUE && ErrCode() == EINVAL)
> +        (void)unlink(lpath)  ,

No unlink(). EINVAL does not say whether the file existed
before the call. Not on another OS, nor on another Linux
version or filesystem where Linux may work differently.

Somewhat better: #ifdef <buggy Linux>, replace O_EXCL with
O_TRUNC in the 2nd call. Or add paranoia like taking an
exclusive lock, but ignore "locking not supported" errors.

>   #endif
>       newfd = open(lpath, O_WRONLY|O_CREAT|O_EXCL, 0666);
>   #endif

-- 
Hallvard

Comment 5 Hallvard Furuseth 2013-09-14 08:29:58 UTC
changed notes
changed state Test to Open
Comment 6 Hallvard Furuseth 2013-09-14 12:03:24 UTC
I wrote:
> Also on RHEL 6.4 (2.6.32-358.18.1.el6.x86_64) with
> /dev/shm/nosuch.  I'll report it to RedHat and see what they say.

https://bugzilla.redhat.com/show_bug.cgi?id=1008073

Simpler fix, at least on Linux:

#ifdef O_DIRECT
	if ((rc = fcntl(fd, F_GETFL)) != -1)
		(void) fcntl(fd, F_SETFL, rc | O_DIRECT);
#endif

But that could fail elsewhere, if some OS only accepts
O_DIRECT to open() and not to fcntl().

BTW, does the nearby Apple code need to fail if
fcntl(newfd, F_NOCACHE, 1) fails? Could ignore errors.

-- 
Hallvard

Comment 7 Hallvard Furuseth 2013-12-21 15:32:01 UTC
changed notes
changed state Open to Partial
Comment 8 Hallvard Furuseth 2013-12-21 23:44:47 UTC
This was fixed in LMDB 0.9.9, except:  If both O_DIRECT and
F_NOCACHE are #defined, both are used.  Looks a bit strange.
Maybe just one should be used in that case.  (Which?)  Also I'm
still wondering if env_open needs to fail if F_NOCACHE fails.

Anyway, maybe it should be something like this.  The "else {}"
is in case O_DIRECT is an unsupported dummy definiton:

#ifdef O_DIRECT
    if ((rc = fcntl(newfd, F_GETFL)) != -1)
        (void) fcntl(newfd, F_SETFL, rc | O_DIRECT);
    else
#endif
    {
#ifdef F_NOCACHE    /* __APPLE__ */
        (void) fcntl(newfd, F_NOCACHE, 1);
#endif
    }

-- 
Hallvard

Comment 9 Hallvard Furuseth 2013-12-21 23:49:19 UTC
I wrote:
> still wondering if env_open needs to fail if F_NOCACHE fails.

Oops - env_copy, not env_open.

-- 
Hallvard

Comment 10 Howard Chu 2013-12-24 17:18:43 UTC
h.b.furuseth@usit.uio.no wrote:
> This was fixed in LMDB 0.9.9, except:  If both O_DIRECT and
> F_NOCACHE are #defined, both are used.  Looks a bit strange.
> Maybe just one should be used in that case.  (Which?)  Also I'm
> still wondering if env_open needs to fail if F_NOCACHE fails.
>
> Anyway, maybe it should be something like this.  The "else {}"
> is in case O_DIRECT is an unsupported dummy definiton:

Isn't that in the wrong order then? What you suggest here will use O_DIRECT 
even if it's a dummy.

I don't see any reason to fail the operation. They just want to make a 
copy/backup; even if the optimization of avoiding pollution of the buffer 
cache can't be done, they still want the copy to happen.

> #ifdef O_DIRECT
>      if ((rc = fcntl(newfd, F_GETFL)) != -1)
>          (void) fcntl(newfd, F_SETFL, rc | O_DIRECT);
>      else
> #endif
>      {
> #ifdef F_NOCACHE    /* __APPLE__ */
>          (void) fcntl(newfd, F_NOCACHE, 1);
> #endif
>      }
>


-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 11 Hallvard Furuseth 2016-08-31 08:55:17 UTC
Sorry, looks like I dropped this message.

Yes, #ifdef F_NOCACHE / #elif defined O_DIRECT / #endif seems better.
Included in branch "mdb/fopen" in my repo at UiO.

On 24. des. 2013 18:18, Howard Chu wrote:
> h.b.furuseth@usit.uio.no wrote:
>> This was fixed in LMDB 0.9.9, except:  If both O_DIRECT and
>> F_NOCACHE are #defined, both are used.  Looks a bit strange.
>> Maybe just one should be used in that case.  (Which?)  Also I'm
>> still wondering if env_open needs to fail if F_NOCACHE fails.
>>
>> Anyway, maybe it should be something like this.  The "else {}"
>> is in case O_DIRECT is an unsupported dummy definiton:
>
> Isn't that in the wrong order then? What you suggest here will use O_DIRECT
> even if it's a dummy.
>
> I don't see any reason to fail the operation. They just want to make a
> copy/backup; even if the optimization of avoiding pollution of the buffer
> cache can't be done, they still want the copy to happen.
>
>> #ifdef O_DIRECT
>>      if ((rc = fcntl(newfd, F_GETFL)) != -1)
>>          (void) fcntl(newfd, F_SETFL, rc | O_DIRECT);
>>      else
>> #endif
>>      {
>> #ifdef F_NOCACHE    /* __APPLE__ */
>>          (void) fcntl(newfd, F_NOCACHE, 1);
>> #endif
>>      }

-- 
Hallvard

Comment 12 Hallvard Furuseth 2016-09-27 05:20:34 UTC
changed notes
changed state Partial to Test
Comment 13 OpenLDAP project 2018-02-09 18:53:46 UTC
Fixed in mdb.master
Fixed in 0.9.19
Comment 14 Quanah Gibson-Mount 2018-02-09 18:53:46 UTC
changed notes
changed state Test to Closed