Issue 7992 - lmdb: Windows: assume file paths are UTF-8, encode to UTF-16 for WinAPI and enable compiling when UNICODE is defined
Summary: lmdb: Windows: assume file paths are UTF-8, encode to UTF-16 for WinAPI and e...
Status: VERIFIED FIXED
Alias: None
Product: LMDB
Classification: Unclassified
Component: liblmdb (show other issues)
Version: unspecified
Hardware: All All
: --- normal
Target Milestone: ---
Assignee: OpenLDAP project
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-05 11:32 UTC by oskari.timperi@iki.fi
Modified: 2020-03-12 15:55 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description oskari.timperi@iki.fi 2014-12-05 11:32:10 UTC
Full_Name: Oskari Timperi
Version: 
OS: Windows 7
URL: ftp://ftp.openldap.org/incoming/oskari-timperi-141205.zip
Submission from: (NULL) (109.204.204.106)


The patches make some changes so that compiling lmdb when UNICODE is defined is
possible. This is achieved by using the A-versions of WinApi functions where
they are needed.

Also use CreateFileW to open files so that one can use exotic characters in
paths. The library interface is not modified, but the code makes an assumption
that paths passed to lmdb functions are encoded as UTF-8. The UTF-8 encoded
paths are encoded to UTF-16 which is then passed to CreateFileW.
Comment 1 Howard Chu 2015-11-20 01:16:25 UTC
oskari.timperi@iki.fi wrote:
> Full_Name: Oskari Timperi
> Version:
> OS: Windows 7
> URL: ftp://ftp.openldap.org/incoming/oskari-timperi-141205.zip
> Submission from: (NULL) (109.204.204.106)
>
>
> The patches make some changes so that compiling lmdb when UNICODE is defined is
> possible. This is achieved by using the A-versions of WinApi functions where
> they are needed.

Thanks, this part of the patch was done in merging ITS#8069.
>
> Also use CreateFileW to open files so that one can use exotic characters in
> paths. The library interface is not modified, but the code makes an assumption
> that paths passed to lmdb functions are encoded as UTF-8. The UTF-8 encoded
> paths are encoded to UTF-16 which is then passed to CreateFileW.

This patch is now in git mdb.master.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

Comment 2 Howard Chu 2015-11-20 01:16:45 UTC
changed notes
changed state Open to Test
moved from Incoming to Software Bugs
Comment 3 Hallvard Furuseth 2015-12-14 21:31:18 UTC
On 20/11/15 02:16, hyc@symas.com wrote:
>> Also use CreateFileW to open files so that one can use exotic characters in
>> paths. The library interface is not modified, but the code makes an assumption
>> that paths passed to lmdb functions are encoded as UTF-8. The UTF-8 encoded
>> paths are encoded to UTF-16 which is then passed to CreateFileW.
>
> This patch is now in git mdb.master.

Needs cleanup:

utf8_to_utf16() ignores errors in malloc/MultiByteToWideChar().
utf8_to_utf16()'s callers ignore error returns.

utf8_to_utf16() can return EILSEQ, an errno code not listed in
mdb_strerror(). It should return a Windows error code or a new
MDB_<SOMETHING> code.

(LMDB returns LDMB codes or system error codes. The latter are
errno codes on Unix and Windows codes on Windows - except
mdb_strerror's hardcoded errno codes which we plan to get rid of.)


Comment 4 Hallvard Furuseth 2016-02-03 00:39:25 UTC
This is still not fixed: The EILSEQ, and a new memleak.
I rewrote it in my branch "mdb/its7992".  Untested.

Comment 5 Hallvard Furuseth 2016-02-03 18:31:29 UTC
On 03/02/16 01:39, h.b.furuseth@usit.uio.no wrote:
> This is still not fixed: The EILSEQ, and a new memleak.
> I rewrote it in my branch "mdb/its7992".  Untested.

Oops, shouldn't have done some late-night code and forgotten
to comment it. The important changes are:
* "goto leave" - fixes a memleak on failure.
* New error code for mdb_strerror(), since EILSEQ is an errno.

The rest is just twiddling around:
* Drop the always-unused size params to utf8_to_utf16().
* goto fail -- just cleanup, fewer exit points from function.
* Looping adds an error check and saves a few bytes object code:-)
* Move function inside Doxygen group 'internal'


Comment 6 Hallvard Furuseth 2016-08-27 10:05:13 UTC
Seems the test for MultiByteToWideChar() == 0xFFFD can be dropped.

The MultiByteToWideChar doc mentions U+FFFD under Return Value.
But when I test bad UTF-8 characters, it inserts 0xFFFD for the
bad character(s) in the output string.  It doesn't return 0xFFFD.
Or if we pass the MB_ERR_INVALID_CHARS flag, it returns 0 and sets
GetLastError() as usual.

(I wonder if I had a hand in this code - looks like my kind of
thing to write a draft from the Windows doc but leave the
Windows testing to someone else:-)


Comment 7 OpenLDAP project 2018-02-09 18:54:08 UTC
in mdb.master
Fixed in 0.9.18
Comment 8 Quanah Gibson-Mount 2018-02-09 18:54:08 UTC
changed notes
changed state Test to Closed