[Date Prev][Date Next] [Chronological] [Thread] [Top]

Stringprep Considered Harmful



draft-ietf-ldapbis-strprep-04.txt would define and require the use of a stringprep profile for many common LDAP attribute types. The stringprep algorithm may fail on certain input strings; if it fails, that input string becomes unmatchable.

If all such strings were obviously illegitimate, this would not be a problem, but many legitimate strings will fail, and this will create problems, some of them serious.

The clearest example of this is a string which fails the bidi check, which requires any string which contains right-to-left characters to both start and end with a right-to-left character. Consequently, a string which ends with a latin character and contains an arabic word will fail.

There are a number of potential examples of such strings which might usefully be found in a directory. These include:

-- a url starting with the latin string 'http' arabic domain-name or path component
-- an email address ending with a standard TLD which contains hebrew in the mailbox or in some domain component.
-- a full name in latin with a syriac nickname embedded in it.
-- a descriptive text field which contains some right-to-left words.


As I understand it, strprep's bidi rule was essentially designed for the nameprep profile, which is performed component by component on domain names and therefore will not fail a domain name which has both arabic and latin components (which, I believe, would be the normal case for a domain name including arabic components.) It is therefore suitable for the dc attribute, but would not be suitable for an attribute which could contain a sequence of domain components.

Aside from leading to the failure to return unique identifiers (such as the first three examples above), the use of the bidi-prohibition may cause substring matches to fail, even if the substring assertions pass. Consequently, a text description with a single arabic word in it would effectively be unmatchable even if the assertion and the substring to be matched were idential latin (or arabic) words.

Furthermore, it appears to me that unmatchable attribute values will be harder to Modify. If the attribute type were single valued, one could use replace rather than delete/add, but for a multi-valued attribute where the delete/add sequence would probably be the expected procedure, the delete will fail because the current attribute value cannot match the one specified in the delete operation. Also, an attempt to add a duplicate value -- that is a value which is octet-for-octet identical -- will silently and erroneously succeed.

While the bidi prohibition step seems to me to be the most likely source of problems, other prohibitions -- such as the prohibition of characters not in Unicode 3.2 -- may result in non-intuitive behaviour.
It would also prevent an enterprise from using private use area characters in an internal directory, although that would be low on my list of priorities.


The strprep document cites the importance to security of a predictable and consistent string comparison algorithm. This is certainly true, at least in the case of strings which are used in some security process (but it is worth noting that many strings entered in directories are not part of any security process). However, the strings in question are, in many cases, also being used and matched by systems outside of the directory, such as filesystems, email processors, web servers, and so on. It is necessary that all systems using a particular string for identification purposes perform the same matching algorithm on that string. If the string in question is a filepath on a Mac OS X HFS filesystem, for example, the canonicalization algorithm needs to be the one used (and extremely well documented) by HFS. Similarly, for NTFS. I don't believe that LDAP is (yet) in a position to impose a single canonicalization on every string used in every system in the world; this may indicate the need for more MatchingRules which correspond to those system's matching algorithms.

Rici