[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Jun 10, 2009, at 10:34 PM, Yves Dorfsman wrote:
Kurt Zeilenga wrote:
There are a number of problems with it. Personally, I think what
Steven already offered (and likely implemented) is better, though I
am
My problem with Steven's solution is that it is half LDIF, half XML.
As I have mentioned earlier, I thing XML has its place, and maybe
DSML should be fixed or re-invented, but for other application, I
find the simplicity of LDIF an advantage ; unfortunately, having to
base64 encode anything that's not 7 bit ASCII takes away some of its
simplicity.
concerned about line separators. As Howard comments kind of
suggests, when you have a value which is multi-lined,
I have never run into the situation where I needed a multi-line
value in an LDAP directory and was surprised by the need, but Steven
brought this up earlier in the thread and said that he has a real-
world need for it, and that the lack of a syntax for it in my
proposition for an updated LDIF format was an issue.
The problem is that what line separators to use is syntax specific (or
possible attribute value convention specific). For instance, an LDAP
syntax could say multiple lines are to be separated by a particular
set of code points (such as '$') or it could be simple be a convention
that an attribute uses a particular set of characters.
To convert a LDIF specific line separator to an attribute value line
separator requires not only knowledge of the LDAP schema, but
knowledge of the attribute value conventions not expressed in the LDAP
schema.
LDIF however was designed to allow a mechanical conversion of LDIF to
LDAP PDUs without such knowledge. Requiring implementations to have
additional knowledge is quite problematic.
The problem with your proposal, and Steven's, is that LDIF line
separators and value line separators are one and the same thing.
While one might be case occasionally, it cannot be expected to be
generally the case.
On the contrary, both Steven's solution and mine separate the lines
but do not impose a line separator. Steven delimits his line with
the <item></item> syntax,
I think you confuse XML elements with line separators in XML data.
Two very different things.
Steven's proposal represents line separators in the XML data using the
<SEP> production.
while I let the user choose any line separator out of the half dozen
that has been used throughout the history of computing.
The problem here is how line separators in LDIF relate to line
separators in the value.
Your approach assumes that whatever line separator the user chooses to
use in the LDIF file is valid per the LDAP value syntax and any
attribute type specific restrictions.
Our syntaxes are clear enough to let the import process know that
those are separate lines, and the import process or the LDAP server
can choose whichever line separator it wants.
That requires LDAP schema and attribute type specific restrictions
knowledge.
Making the line separator part of the data will create cross-
platform issues.
Yes, but this is what your proposal seems to do. (See below).
The LDAP server or actually the LDAP client should choose which line
separator to use for its context/platform.
Today, LDIF line separators (<SEP>) are not part of the LDAP value.
That is,
foo: X
Y
is equivalent to:
foo: XY
That is, the LDAP value is merely wrapped over multiple LDIF lines.
Now maybe you meant:
foo:<<EOT
X
Y
EOT
to also be equivalent to:
foo: XY
I don't see any value in offering yet another way to line wrap an LDAP
value.
I took your proposal as representing foo attribute value "X<SEP>Y"
where <SEP> was the sequence of characters used in the LDIF to
separate X and Y. This is problematic.
Adding UTF-8 support does appear to be in support of improving LDIF
as a proper interchange format. It seems to be driven by other
goals, such as trying to make LDIF files displayable.
Yes and no. My main reason for pushing this is diffing.
Diffing requires knowledge of LDAP schema. One might store "foo" and
get back "FOO" (or any other equivalent value) [See LDAP's data
preservation requirements].
You run into a problem and you want to diff the original and the
problematic LDIF export of your directory. Having half of your LDIF
file base64 encoded makes it a lot more difficult to pin point the
problem.
As Michael noted, there exists LDIF diffing tools (most of which are
likely not schema aware, and hence show equivalent values as being
different).
If you are right, that LDIF is purely for exchanging information
between applications, never to be looked at by humans, then why is
the current version so human friendly ?
I never claimed LDIF would not be looked at by humans.
I have stated that LDIF, with ASCII restrictions, already suffers from
some display/editing issues namely due line separator issues. Lifting
the ASCII restriction will make these matters far worse. (Line
separators are the tip of the displayability/editability iceberg.)
I'm not convinced that removing the ASCII restrictions will be a
good thing. Not only do I doubt it will have a net positive on
displayability of LDIF for those who have a displayability goal (I
don't this goal), I think it will have a net negative impact on
interoperability and user confusion, such as when the user creates
a file using one Unicode normalization algorithm, but is trying to
set values which require a different Unicode normalization value.
How so ?
In the current version, you have to encode your Unicode to UTF-8,
and then encode it again to base64. With my proposal, you would get
the exact same UTF-8 strings as you do today, but they would not be
(or would not have to be) encoded in base64.
I see two kinds of problems.
1) This would result in LDIF files which programs designed to display
UTF-8 encoded Unicode text will not be able to display. There is a
user expectation that LDIF files be displayable. With the current
LDIF format, we do have some display issues (e.g., line separators),
but they are limited. If we remove the ASCII restrictions, we'll run
into a wide range of display issues.
2) Today we have some separation between (non-ASCII) Unicode LDAP
attribute values and their LDIF representation. This separation, I
think, has some value in that it instills LDIF syntactically
requirements are LDAP attribute value syntax requirements are
independent of each other. Removing this separation, I think, will
lead to user confusion.
This is not a rebuttal of your argument, I am truly interested in
understanding what you mean here (in the same way I was glad
somebody brought up the issue of Right To Left characters, as I had
not thought about it). Maybe it is a problem that we can address ?
Keep what separation we do have between LDIF representation of an
attribute value and the LDAP syntax for the attribute value.
if so, should we help Steven with the xmled RFC ?
What Steven and Andrew have done is define an extension for LDIF to
allow XML values to be represented in a human-readable format
instead of requiring the use use of base64. Unfortunately his
proposal has interchange issues (see the I-D's security
considerations section). This, I think, is a fatal problem with
this extension.
So really this is the issue, should the value of the line separator
be part of the data, or should everybody (LDIF importers/exporters,
LDAP servers, LDAP clients) treat multi-line entries as just that,
several lines, and choose their own line separator ? (in case I
wasn't clear earlier, I am in favour of the latter).
In RFC 2849, <SEP> is never part of the LDAP attribute value.
In ELDIF extension, certain <SEP>s are part of the LDAP attribute
value. This, I think, is problematic.
--
Yves.
http://www.sollers.ca/
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext