[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: (ITS#7157) UTF-8 support for "mail" attribute



Kurt Zeilenga wrote:
>
> On Feb 10, 2012, at 8:44 AM, Michael Ströder wrote:
>
>> Kurt@OpenLDAP.org wrote:
>>> On Feb 7, 2012, at 1:03 PM, alfiej@opera.com wrote:
>>>
>>>> When searching for Chinese names in the "to:" field under Thunderbird, I see
>>>> asserted_value_validate_normalize() not returning LDAP_SUCCESS in filter.c. This
>>>> is because the "mail" attribute in core.schema is of type "IA5 String" but the
>>>> Chinese name falls outside the character set.
>>>
>>> No, it's because Thunderbird failed to convert the email address to form
>>> expected.  It should have applied the "To Ascii" conversion as defined in
>>> the EIA and IDNA specs.
>>
>> Do you mean RFC 3490 to 3492?
>
> I mean whatever the spec are.  I don't follow the either the DNS or email
> spaces closely, so I don't know off hand what the specs are...
> But I do know enough that international of both rely on applications
> encoding UTF-8 into ASCII for use in DNS and Email protocols, and by
> extension LDAP.

Please give a reference for this statement. Especially how the "To Ascii" 
conversion should be made.

You know that I'm always eager to get things right in web2ldap. It was easy to 
add the methods sanitizeInput() and formValue() to the relevant plugin class 
which encodes and decodes to IDNA for domain attributes. Relevant here seems 
RFCs 5890-5895. Strictly speaking I simply assume here that the LDAP string 
format is IDNA encoding because the attributes (e.g. 'associatedDomain' etc.) 
are of LDAP syntax IA5String and so there's no other way to do it. But AFAIK 
there's no formal spec saying so.

Dealing with attribute 'mail' seems to me somewhat more complicated. Yesterday 
I tried to find relevant RFCs in rfc-index.txt which define the "To Ascii" 
conversion you mentioned. So far I came up with this for e-mail headers (and 
in turn e-mail addresses) which uses UTF-8:

5335 Internationalized Email Headers. A. Yang, Ed.. September 2008.
      (Format: TXT=27945 bytes) (Updates RFC2045, RFC2822) (Status:
      EXPERIMENTAL)

5336 SMTP Extension for Internationalized Email Addresses. J. Yao,
      Ed., W. Mao, Ed.. September 2008. (Format: TXT=48110 bytes) (Updates
      RFC2821, RFC2822, RFC4952) (Status: EXPERIMENTAL)

5337 Internationalized Delivery Status and Disposition Notifications.
      C. Newman, A. Melnikov, Ed.. September 2008. (Format: TXT=36324
      bytes) (Updates RFC3461, RFC3462, RFC3464, RFC3798) (Status:
      EXPERIMENTAL)

Maybe I'm misreading http://tools.ietf.org/html/rfc5335#section-4.1 but I 
think they use UTF-8.

Coming back on-topic for this ITS I agree that the LDAP syntax for 'mail' 
should not be changed because it would open a can of worms and the OpenLDAP 
ITS is definitely the wrong place to define the LDAP string format for 
Internationalized Email Addresses.

Ciao, Michael.