[Date Prev][Date Next]
[Chronological]
[Thread]
[Top]
Re: [ldapext] UTF-8 full support in LDIF / LDIF v2
On Jun 16, 2009, at 12:32 PM, Howard Chu wrote:
Not to mention that to implement this properly will require complete
schema knowledge at the time the LDIF is generated. (Otherwise, how
do you distinguish a genuine octetString value, which cannot be
safely represented in UTF-8, from a directoryString value...)
Well, one could scan the value to see the octets are a valid UTF-8
sequence of valid Unicode code points just as today most
implementations scan data for octets within of SAFE-STRING. The
significant difference is the check is straight forward in LDIFv1, as
it's san octet-by-octet check. But if we allow UTF-8 sequences of
valid Unicode points, each octet of the value must be checked to see
that it's part of a valid UTF-8 sequence, and each UTF-8 sequence
checked to see if encodes a valid Unicode code point. And then
wrapping becomes more complicated, etc..
And even with all of that, LDIF would still not be well-formed Unicode
text. And even if we solved that (by even more complex restrictions
on what Unicode code point sequences can be represented as UTF-8
instead of base64-encoded UTF-8), we'd have the problem of unintended
Unicode transformations in transporting LDIF. (We have this problem
with LDIFv1, but it's generally limited to end-of-line characters.
With UTF-8, data will be impacted. For instance, consider MUAs (or
the like) that might convert (on send or receive) text to Net-Unicode.)
I've expanded my UTF-8 LDIF with some more goofiness.
version: 2
dn: cn=funky
bom:
smiley-face:â?º
# only SPACE is special
no-break-space:
zero-width-space:â??
word-joiner:â?
ideographic-space:ã??
zero-width-no-break-space:
# line separators and other such things
nel:Â?
ls:â?¨
ps:â?©
ff:
# these hyphen differ but may look the same
hyphen-minus:-
hyphen:â??
non-breaking-hyphen:â??
figure-dash:â??
en-dash:â??
minus-sign:â??
roman-uncia-sign:ð???
# these differ but may look the same
o-diaeresis:ö
o-diaeresis-decomposed:oÌ?
# ignorables
ignore:â?
ignore:â?¡
ignore:Â
# inside-out rule
inside-out:aÌ?Ì?Ì£Ì
inside-out:�ึ�
# combining character
diaeresis:Ì?
# bidi
bidi:Ú?
bidi:Ù±ABÙ¹Ú?
bidi:Ù±37Ù¹Ú?
bidi:â?®ABCâ?¬
bidi:â?ٱٹÚ?â?¬
# bidi wrapped
bidi:
Ú?
bidi:Ù±A
BÙ¹Ú?
bidi:Ù±3
7Ù¹Ú?
bidi:â?®A
BCâ?¬
bidi:â?Ù±
Ù¹Ú?â?¬
# private use
pu:î??î??î??
-- Kurt
_______________________________________________
Ldapext mailing list
Ldapext@ietf.org
https://www.ietf.org/mailman/listinfo/ldapext