[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Jun 17, 2009, at 11:57 PM, Steven Legg wrote:
Kurt,
Kurt Zeilenga wrote:
On Jun 16, 2009, at 4:54 PM, Steven Legg wrote:
Also, X.500 directories already lose something in the translation
when outputting
as LDIF. For example, the choice in a DirectoryString is lost and
if that
choice is teletexString then transcoding wipes out the exact octet
encoding.
Most of the ad-hoc LDAP string encodings are lossy in some respect.
Such changes are tolerable because the resulting value in LDIF is
the same
as far as the matching rules are concerned. Unicode normalization
of the
extended LDIF output is a similar situation.
The loss you are talking about is inherent in LDAP not LDIF. That
is, LDIF does not lose anything (for the LDAP requests it's design
to represent) in translation to/from LDAP. I don't it is tolerable
for an LDAP intermediate format to "lose" LDAP information.
First, for loss, let's only talk about loss between the data format
and LDAP requests. Loss between the data format and DAP or some
other protocol ought to be beyond the scope of an LDAP data format.
Loss of the DirectoryString CHOICE is an LDAP issue, not a LDAP data
format issue.
So when I respond to your reply, I'm only going to consider loss of
information between the file and protocol representations of the LDAP
request.
A "here" document mechanism where the UTF-8 character sequence between
the end of the introducer and the beginning of the terminator is the
literal directory attribute value without any modification would
satisfy
the no loss requirement.
Depending on particulars of value and "here" mechanism. For
instance, line separators are problematic in the mechanism suggested
by Yves.
We could always recommend base64 encoding the
entire file to transfer it across the network, or at least treating
it as
a binary file, or suggest using LDIFv1 instead for such purposes.
For me, the convenience of being able to easily view, edit or
compose the
content in files
Have you tried to view/edit the UTF-8 file I sent out?
I'm uploading outweighs the need to be careful about
which text handling tools I use.
You will have to be quite careful in the tools you use. If they are
"text" tools, you'll have problems, because the LDIF file is not
restricted to text. And the tools encoding of values as text might be
problematic (wrong normalization on a per value basis), etc..
For instance, any "text" tool which normalizes its output is generally
unsuitable because that normalization will be incorrect for some set
of values.
If others have similar or overlapping
requirements it would be better for us to use the same specification
even
if it never becomes a proposed standard RFC.
I wouldn't object to experimenting here, just so long as it not
confused with standard LDIF.
But if one is going to try to solve a particular problem within the
standard community, they should try to solve it generally. That is,
we should dismiss solutions which only work well for limited subsets
of Unicode.
Would you feel better if we called it the LDAP Data Composition
Format(LDCF)
instead of LDIFv2 or Extended LDIF ?
Well, if one was to have an experiment to develop a LDAP data format,
which the whole content is say Net-Unicode, calling it something other
than LDIF would be good to avoid confusing it with LDIF. Things to
tackle would include
- how to represent text values needing to be in different Unicode
normalization (in the directory) than the required normalization,
- how to represent non-text Unicode values (arbitrary sequences of
Unicode points),
- if Net-Unicode line separators are to be considered part of the
value, how to represent values requiring other line separators, and
- bidirectional values.
Note: not requiring a particular normalization in the file is not a
solution to first item, as that merely leaves it up to tools as to
which normalization (if any) to apply. So the first item just
becomes, how to convey the normalization algorithm required for the
value when represented in the LDAP request.
These are not simple problems and are unlikely to be solved well on
the first publication, hence experimental track does seem appropriate
for this feature.
I would like LDIF standard extensions work to be limited to improving
data interchange capabilities (such as representation of LDAP
responses), which is a much more straight forward problem, something
we are far more likely to get right on the first publication.
-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext