[Date Prev][Date Next] [Chronological] [Thread] [Top]

LMDB and text encoding

To: openldap-devel@openldap.org
Subject: LMDB and text encoding
From: Timur Kristóf <timur.kristof@gmail.com>
Date: Tue, 27 Jan 2015 22:39:33 +0100
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=WHA2/HXJovHcmazj6aWO+c1pym6DQIpZUxKG1qnyrVA=; b=OlbiDEThbfuzSG+zSRmgSgCXeTBPYgYFIbmqSdGq8e9CXHtc0T4kiidR0ic0hXtJS2 Ja6XGLX/CtOUZmO8Y2t4M3NF2j0eezq3p2PpWFEAz4Dfsi4bypMrOTd+eapXhJzWcM9G uNkIv/LhY5S5Hq9iZS+VMJRxiYD6VuhDbA91wRQyrNx0EKWu3IzLtprijtTK0E+ZTBcI Cn7ba+keNJsdt1Kqn9xbW1mAFXBDOXvhjbudkH2PRQxZdFlSkejb/NbPY9W02gIe0Ems 1A6IdjOxEnOVSDyhnsAHy6VDK8y7CKd0Doutm9jFg9gFDqsyF4Q3rrhNbBSeWsHFGvAl kt4Q==

Hi Everyone,

I've been talking to Howard about this and he suggested to post it to
this mailing list. There are two things that I recently noticed about
how LMDB works with various encodings and I think it's worth to
discuss.

1. Database names

mdb_dbi_open treats its name parameter as a C string. This means UTF-8 on
unixes and ANSI on Windows, which is problematic for cross-platform
applications.

My suggestion is to create a variant of this function that also
accepts a length parameter (or just use MDB_val) so that instead of
treating it as a C string, it would treat it like a series of bytes,
allowing the user to use the encoding of their choice.

2. Path names

Functions like mdb_env_open, mdb_env_get_path, mdb_env_copy and the
likes accept a char* for path names. This is fine on most unixes where
char* is an UTF-8 string, but unfortunately, these functions call the
ANSI variants of the Windows API functions, making it impossible to
use Unicode path names with them.

I think we should switch to the widechar APIs instead, but that would
also mean changing the LMDB API to accept a wchar_t* parameter on
Windows instead of char*.


What do you guys think about all this?


Best regards,
Timur Kristóf

Follow-Ups:
- Re: LMDB and text encoding
  - From: Timur Kristóf <timur.kristof@gmail.com>
- Re: LMDB and text encoding
  - From: Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no>

Prev by Date: Re: RE24 testing call #2 (2.4.41), LMDB RE0.9 testing call #2 (0.9.15)
Next by Date: syncrepl consumer is slow
Index(es):
- Chronological
- Thread