[Date Prev][Date Next] [Chronological] [Thread] [Top]
Textual/non-textual passwords and SASLprep

To: ietf-ldapbis@OpenLDAP.org
Subject: Textual/non-textual passwords and SASLprep
From: Hallvard B Furuseth <h.b.furuseth@usit.uio.no>
Date: Sun, 9 Nov 2003 18:09:24 +0100
I wanted to research a bit and then lost the threads about passwords and
SASLprep again my workload suddenly quadrupled.  Sorry about that.
(Michael, I've reviewed the 'Revisited: NON-ASCII chars in userPassword'
discussion but I'm still concerned.  Doesn't seem to be the same issue.)

Since [Protocol] has changed since then, I'll start backwards:

At 02 Oct 2003, Kurt D. Zeilenga wrote in message
<6.0.0.22.0.20031002024351.03ff9600@127.0.0.1> =
<http://www.openldap.org/lists/ietf-ldapbis/200310/msg00005.html>:

> Subject: Re: more SASLprep/protocol problems
> 
> RFC 2277 [IETF Policy on Character Sets and Languages] says:
>    All protocols MUST identify, for all character data,
>    which charset is in use.
> 
> That is, we cannot allow clients to transfer character data in
> the protocol without providing the server a protocol mechanism
> for reliably determining what charset is use.  My previous
> suggestion was flawed in that it allowed character data to
> be transferred without identifying which charset is in use.
> 
> For this reason, I revise my suggestion:
>         The simple form of an AuthenticationChoice specifies a
>         simple password to be used for authentication.  Passwords
>         consisting of character data (text passwords) SHALL be
>        transferred as [UTF-8] encoded [Unicode].  Prior to transfer,
>        clients SHOULD prepare text passwords by applying the
>        [SASLprep] profile of the [Stringprep] algorithm.  Passwords
>         consisting of other data (such as random octets) MUST NOT
>         be altered.
>
> (...)

This text now appears in [Protocol] version 18.

However, RFC 2277 also says:

   Where protocol elements look like text tokens, such as in many IETF
   application layer protocols, protocols MUST specify which parts are
   protocol and which are text. [WR 2.2.1.1]

yet the above text means that whether a password is text or not is
decided outside the protocol, which seems just as much in violation of
RFC 2277 as textual passwords without a specified charset.

For that matter, RFC 2277 goes on to say:

   Protocols that transfer text MUST provide for carrying information
   about the language of that text.

which LDAP doesn't do for passwords, and probably doesn't fit anyone's
idea of what one does with passwords in any case.

So if we go by RFC 2277, we must either specify that passwords are UTF-8
text, and probably also modify the userPassword syntax to only accept
UTF-8, or that passwords are not text, but MAY be translated to UTF-8
and prepared by [SASLprep] in some cases anyway.  Or we could ignore
rfc2277 for this issue (or maybe that means to ask for an exception to
be made), like at least the IRC protocol has done for backwards
compatibility.

> (...) RFC 2119 says of implementations supporting an OPTIONAL
> feature MUST be prepared to interoperate with implementations which
> don't.   If one client prepares text passwords as RECOMMENDED and
> another doesn't, the user will likely only be able to use one
> client or the other but not both.

He'll only be able to use one of them in any case if the server uses
non-ASCII passwords stored hashed on by OS, which was my original
concern in this thread.  (I'll summarize below, if anyone have
forgotten.)  The client doing UTF-8 SASPprep won't be able to
interoperate with such a server, and there is nothing we can do about
it.

> Also, we need to ensure that interoperable implementations of
> features, even OPTIONAL features, can be independently developed.

We can't in situations like I mentioned, unless we never prepare
passwords and always treat them as binary entities, and leave the rest
to the user.  Even I don't want that:-)

> Two clients which both use "null" preparation may fail to
> interoperate with the server due to platform differences.
> For example, if one client send the text as UTF-16 and
> another send same text as UTF-32, authentication will fail
> one or both of these clients (assuming the server implements
> password matching as specified).

Back to the [Protocol] text, I think the statement

        Passwords consisting of other data (such as random octets) MUST
        NOT be altered.

, if it is kept, must at least be accompanied with a text which
describes when a password is 'other data' (than text) and when it is
not, in a way which client and server can agree on.  But I don't know
how, in particular when passwords were not stored in LDAP through the
protocol.

========

About textual passwords:

What's a textual password?  Michael Ströder referred to article
<http://www.openldap.org/lists/ietf-ldapbis/200305/msg00112.html>,
which starts with:

> I would define 'textual' passwords as passwords manually typed in by
> a user on the keyboard or a similar character-based input device.
 
I don't like this, not for deciding how to translate the password.  I
type in passwords sometimes and store them in files sometimes, but they
can be the same password for the same authentication ID - the difference
is if the application runs automatically or not.  If it is, I don't want
to use a `password:xyzzy' command-line option because that shows up on
the 'ps' Unix command for everyone to read.  So I store it in a file.  I
could translate it to UTF-8 and then apply SASLprep by hand before
storing to a file, but that feels like an UN-intuitive thing to tell
users to do to me.  So does translating back from UTF-8 to the user's
local character set if he wants to cut&paste a password from a file and
into the command line.

Besides, this doesn't tell the _server_ whether or not the password is
textual.  That's all right if the password was stored through protocol,
or at least it defers the textual/non-textual decision to the user who
stored it, but it doesn't help when the password is taken from a
server-side password store.

> More formally one could define textual passwords as a character
> sequence with a known character set and encoding.
 
That's a circular definition in this case:  Whether to translate the
password from some local character set to UTF-8 is exactly what the
question of whether the password is textual or not is supposed to
decide.

========

Repeating my original concern, with various false starts cunningly
ignored - feel free to skip this:

  A number of operating systems, including Unix and Windows, store
  passwords in a hashed form and do not know users' plaintext passwords.
  So if LDAP clients translate passwords to UTF-8 and apply SASLprep to
  them, which [Protocol] sort of requires, servers that use the
  operating system's password store for authentication must reverse
  SASLprep (impossible) and then translate the password back to the OS's
  character set (cumbersome, and impossible if there is no specified
  OS's character set, i.e. on multi-charset sites).  The result of such
  translation and preparation will be that users with non-ASCII
  passwords cannot authenticate to such servers.

  So I (still) think clients SHOULD have an option to turn off such
  translation and preparation, or maybe translate passwords in some
  OS-specific way (like ASCII<->EBCDIC).  This leaves the problem of
  which character set to the password has, mostly to the user - which is
  how this is done on such sites anyway.  It's ugly, but I think it's
  the least bad solution.  It doesn't give interoperability with other
  sites for such passwords, but these passwords won't give
  interoperability with other sites in any case.  With such an option,
  LDAP will at least work on-site just as well as other authentication
  methods.  Otherwise sites with such passwords may have to choose
  something else than LDAP for authentication.

  I'll drop my suggestion that UTF-8/SASLprep should not even be the
  default, since that's clearly a lost cause.  I'm not sure anymore
  that is was a good idea anyway.

Possibly none of this is important compared to the textual/non-textual
issue, though.  Messages on News suggest that people on e.g. EBCDIC
sites generally deal with the issue by only using passwords that are
ASCII subsets, and John McMeeking's message
<OFE48D5E6C.F392BD3F-ON86256DB2.006FAD27-86256DB2.006FBEBF@us.ibm.com> =
<http://www.openldap.org/lists/ietf-ldapbis/200310/msg00004.html> also
suggests that things work out in practice, though we still need several
modes for dealing with passwords.

========

A few other pot shots:

At 02 Oct 2003, Kurt D. Zeilenga wrote in message
<6.0.0.22.0.20031001234323.03ef3578@127.0.0.1> =
<http://www.openldap.org/lists/ietf-ldapbis/200310/msg00002.html>:

> Subject: Re: more SASLprep/protocol problems
>
>> If the server and therefore the sysadmin doesn't know whether to
>> prepare a bind password or not, how is the client supposed to know -
>> unless the user tells it?
> 
> Clients (whether used by end-user or sysadmin) have, in
> general, knowledge of character set and encodings they are
> using to interact with the user.  That knowledge is NOT
> communicated by the client to the server.
> 
> I note as well that existing external passwords stores do not
> maintain knowledge of the character set/encoding used by the
> admin, they just store whatever octet string the password-setting
> application provides.  The assumption is that the authenticating
> application will use the same character set, normalization,
> encoding algorithm of the password-setting application.
> 
> This assumption is flawed in that;
>         a) users may use different platforms and/or platform
>           settings then their administrator,
>         b) users may use different platforms and/or platform
>           settings from time to time.

The assumption is flawed, but that doesn't just mean we can ignore it.
The assumption is made, and we have to live with it.

> To deal with this flawed assumption, many deployments are forced
> not internationalize.  Instead they use the lowest common
> denominator (commonly some subset of ASCII).
> 
> The internationalization of LDAP passwords takes this into
> account.  The specification preserves the encoding of
> printable ASCII passwords.

I don't understand this argument.  This whole discussion is about
non-ASCII passwords.  If printable ASCII passwords were all we cared
about, we could simply reject all other passwords.

>> Not that I disagree that client-side preparation is most flexible,
>> but... usually all passwords will be UTF-8 or none of them will,
>> depending on how the sysadmin put them there, so the server will know.
> 
> Clients I have used lately have encoded by password (internally)
> using:
>         ISO 8559-1, UTF-8, UTF-16, UTF-32.
> 
> (I didn't used an EBCDIC client lately).  Luckily the non-UTF-8
> clients prepared the password before transferring it.  The server
> has no knowledge of this.  It simply treats my password as an
> octet string.

We seem to have been talking past each other.  I think you are talking
about passwords stored in the server through the LDAP protocol.  I'm
talking about possibly hashed passwords fetched from the OS by the
server, and stored in the OS with who-knows-what charset.

>> OTOH, the server won't know if they should be prepared if the users
>> stored userPasswords as they pleased, both UTF-8 and non-UTF-8.
>> But then you _really_ need a client option to say whether or not to prepare
>> bind passwords, so I don't suppose that's what you are talking about...
> 
> OTOH, say each of my clients just sent password as they pleased
> with the expectation that the server deal with.  The problem is,
> of course, there is nothing in the protocol which indicates that
> the password field is character data and, if so, which character
> set/encoding was used.  So, the server would have to guess.
> 
>> (Hey! That's another argument for recommending that client option!:-)
> 
> Actually, I think you just made a good argument for why we
> cannot allow clients to as they please.

If you were talking about storing non-UTF-8 passwords in userPassword, I
might agree.  Are you suggesting that userPassword is restricted to
UTF-8?  Otherwise this statement does not make sense.  If the password
is stored as non-UTF-8 in the server, and the server can't translate it
to UTF-8 or doesn't know that it is not UTF-8, the bind password will
have to be non-UTF-8 in order to match.

-- 
Hallvard
Follow-Ups:
- Re: Textual/non-textual passwords and SASLprep
  - From: "Kurt D. Zeilenga" <Kurt@OpenLDAP.org>
Prev by Date: Re: Processing MUST/MAY and NOT in DIT content rule
Next by Date: rfc2277 compliance
Index(es):
- Chronological
- Thread