[Date Prev][Date Next] [Chronological] [Thread] [Top]
Re: GnuTLS considered harmful

To: Howard Chu <hyc@symas.com>
Subject: Re: GnuTLS considered harmful
From: Simon Josefsson <simon@josefsson.org>
Date: Wed, 27 Feb 2008 12:29:51 +0100
Cc: OpenLDAP Devel <openldap-devel@openldap.org>
In-reply-to: <47C51ECB.1000100@symas.com> (Howard Chu's message of "Wed, 27 Feb 2008 00:26:51 -0800")
Openpgp: id=B565716F; url=http://josefsson.org/key.txt
References: <47B751BF.8010103@symas.com> <87lk57zo2h.fsf@mocca.josefsson.org> <47C51ECB.1000100@symas.com>
User-agent: Gnus/5.110007 (No Gnus v0.7) Emacs/22.1 (gnu/linux)
Howard Chu <hyc@symas.com> writes:

> Simon Josefsson wrote:
>> Howard Chu<hyc@symas.com>  writes:
>>
>>> The recent trouble in ITS#5361 prompted me to look into the GnuTLS
>>> code a little deeper. It turns out that their corresponding
>>> set_subject_alt_name() API only takes a char * pointer as input,
>>> without a corresponding length. As such, this API will only work for
>>> string-form alternative names, and will typically break with IP
>>> addresses and other alternatives.
>>
>> I've assigned this a ticket:
>>
>> http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18
>>
>> It seems most SAN's in wide use are string-like, so nobody had
>> discovered this before.  It seems clear gnutls should have a new API to
>> allow setting arbitrary SAN values.
>
> Not really trying to be antagonistic here, but this kind of proves my
> point about inexperience - SANs are pretty well documented in
> X.509v3. (Heck, if there's anything you can count on in ITU specs,
> it's that there are mountains of them.) Anyone who has spent any time
> with ASN.1 and X.500 knows that your safest, most uniform coding
> approach is to use explicit (length,data) tuples
> everywhere. Uniformity in the API design prevents this sort of problem
> ever occurring in the first place.

Agreed.  As far as I can see, the rest of the GnuTLS API have that
uniformity.  This particular function,
gnutls_x509_crt_set_subject_alternative_name, is odd in other ways too
(the naming standard, elsewhere 'subject_alt_name' is used).

> Your bug report text is interesting because it's only concerned with
> truncated binary values. There's a flip side to this problem that you
> didn't mention, which is the potential to SEGV from accessing unmapped
> memory due to use of a non-terminated string. And again, an
> experienced C programmer would understand this issue without needing
> to be bludgeoned over the head with it (as I am certainly beating you
> up about it now).

Since the function lacks a string length parameter, I think it is
obvious that the function is intended to take a zero terminated string.
This is also how the command line tools (src/certtool.c etc) use it.

I don't think it is unreasonable for a SAN related API to work with
zero-terminated strings.  The typical SAN's like dNSName, rfc822Name,
and uniformResourceIdentifier are human readable strings.  Most
applications will work with the strings in zero-terminated form.

Given that there are more SAN's that aren't string-like, I agree that we
should have an API to support them.  But I think the current API that
takes a zero terminated string will continue to be useful.

So, yes, there is a SEGV if you pass the function non-zero-terminated
data, but strlen has the same problem...  So it is about documentation.
I have improved the documentation.

> There's another issue here, due to usability. Anyone who has spent
> time with X.509 certificates has run into the problem of needing to
> specify multiple SANs to get their certificates working. Your current
> API only allows a single SAN value to be stored. Again, the API
> appears to have been designed by somebody who has never used these
> concepts in the real world, and doesn't understand how these items
> actually need to work. That's hardly reassuring in the general case,
> doubly worrying for a piece of security software.

I've added this as a requirement on the new API to the ticket:
http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18#comment:1

>>> Looking across more of their APIs, I see that the code makes liberal
>>> use of strlen and strcat, when it needs to be using counted-length
>>> data blobs everywhere. In short, the code is fundamentally broken;
>>> most of its external and internal APIs are incapable of passing binary
>>> data without mangling it. The code is completely unsafe for handling
>>> binary data, and yet the nature of TLS processing is almost entirely
>>> dependent on secure handling of binary data.
>>>
>>> I strongly recommend that GnuTLS not be used. All of its APIs would
>>> need to be overhauled to correct its flaws and it's clear that the
>>> developers there are too naive and inexperienced to even understand
>>> that it's broken.
>>
>> I looked at the X.509 API's (x509.h) and I couldn't find any other that
>> didn't take buffer length arguments.  I didn't look carefully though.
>>
>> There is 1 (one) use of 'strcat' in the X.509 code, and it looks correct
>> to me.  There was 20 uses of 'strlen' in the X.509 code, and I went over
>> the first matches but when they looked correct I didn't look further.
>> (For reference, the X.509 code size is around 21000 lines of code.)
>
>> If you can give more details, that would be appreciated.
>
> Aside from the inherently unsafe nature of using strlen() on strings
> of unknown origin, the overall code quality is extremely poor. It
> looks like it was written by a Pascal programmer, someone who's
> accustomed to strings with embedded lengths and for which strlen() is
> essentially free. I hesitate to mention this aspect of it since you'll
> probably consider it a premature optimization.
>
> For example, minitasn1/coding.c:
> asn1_retCode
> _asn1_time_der (unsigned char *str, unsigned char *der, int *der_len)
> {
>   int len_len;
>   int max_len;
>
>   max_len = *der_len;
>
>   asn1_length_der (strlen (str), (max_len > 0) ? der : NULL, &len_len);
>
>   if ((len_len + (int) strlen (str)) <= max_len)
>     memcpy (der + len_len, str, strlen (str));
>   *der_len = len_len + strlen (str);
>
>   if ((*der_len) > max_len)
>     return ASN1_MEM_ERROR;
>
>   return ASN1_SUCCESS;
> }
>
> There are 4 calls to strlen(str) here, which ... I can't even find the
> words to express how gross this is.

That code is from libtasn1, and for libtasn1 I am inclined to agree with
you even generally -- libtasn1 is not written in good C style.  It is
very difficult to maintain, and I try to avoid changing the code if I
can.  However, the API is small and reasonable, and the code is not that
large, so it is possible to re-write it in a clean way.  There is even a
item in doc/TODO about this:

* Improve or rewrite libtasn1 to make it easier to maintain.

Right now we don't have resources to take on something like that.

Although I agree that libtasn1 code is ugly, I think my reasons are
somewhat different than yours -- I think the code is unreadable and
unmaintainable.  I regard unreadable code as a much more serious problem
than inefficient code.  It is not at all clear that libtasn1 is a
bottle-neck performance wise.  GnuTLS is used on some rather limited
devices and performance haven't been a significant concern.  So if the
code was very readable, but still inefficient, I wouldn't have any
serious problem with it.  (Best is of course readable and efficient
code, and sometimes those two go together, but not always.)  A good
compiler could even optimize away those excessive strlen (str) calls.

> You note that there's really a small number of instances of strcat()
> in the code. That's true, but that's because you've provided your own
> _gnutls_str_cat() function instead, which is also heavily
> used. Assuming that strlen() isn't going to SEGV on you (which depends
> on dumb luck) this becomes just a question of efficiency.

_gnutls_str_cat is used in 61 places in 80kloc (v2.3.2), and it does
protect against overflows.  Looking over the uses, many of them are to
construct the ASN.1 field names, which are fixed strings stored in local
fixed buffers, rather than working on variable-size inputs.  The code
looks ugly to me, but seems correct in the few places I looked carefully
at.

> E.g. in x509/dn.c you have in str_escape()
> 	str_length = MIN (strlen (str), buffer_size - 1);
> which is already bad because the MIN macro may evaluate that strlen() call twice.
>
> This function is called from _gnutls_x509_parse_dn() which makes heavy
> use of _gnutls_string_append_str() which guess what, calls strlen() on
> its arguments again. All of your string processing code has O(n^2)
> efficiency. It's incredibly sloppy and there's no justifiable reason
> for it.
>
> In x509/output.c your gnutls_x509_crt_print() function constructs a
> gnutls_string structure with a series of inefficient calls and then,
> the final capper:
>       _gnutls_string_append_data (&str, "\0", 1);
>       out->data = str.data;
>       out->size = strlen (str.data);
> It walks all across the length of the string to append a single
> terminating NUL byte, and then it calls strlen() on the result
> *again*, even though str.length is already calculated for you in
> _gnutls_string_append_data().
>
> If you actually had used gnutls_string structures everywhere, you
> *could* do all string construction operations in O(n) time (n ~ number
> of characters being added) by simply memcpy'ing to
> str.data+str.length. As it is, anything based on strcat is inherently
> O(n^2) and you do so many repeated traversals of the same data that I
> suspect the actual compute cost is even higher than n^2.

Yeah, there are large parts that could be optimized, and you have found
good examples.

I think the X.509 code in GnuTLS is of less quality than the rest, and
it would be interesting to see if we can switch to using something like
libksba.  The goal with GnuTLS is to be a SSL/TLS implementation, and if
there had been good X.509 libraries around I suspect we wouldn't have
incrementally added code to handle X.509 at all.  I wasn't involved in
the project at the time the X.509 code was written so I don't know what
the thoughts were, though.

Still, the X.509 part of a TLS session is small.  We take care to do the
expensive processing before the actual TLS session start.  (Such as
computing the RDN_SEQ to send to clients.)  My hope is that GnuTLS will
do minimal X.509 handling in the future.  We support PKCS#11-like
handling of private keys, and applications can off-load the X.509 chain
verification to other libraries as well.

In the end, this is about economics and trade-offs.  While the code is
technically sometimes both inefficient and inelegant, there are too few
people who work on it to make re-writing code a good use of our time.
If GnuTLS was a larger and funded project like OpenSSL, NSS, or
OpenLDAP, things may be different.

/Simon
Follow-Ups:
- Re: GnuTLS considered harmful
  - From: Howard Chu <hyc@symas.com>
- Re: GnuTLS considered harmful
  - From: Michael Ströder <michael@stroeder.com>
References:
- GnuTLS considered harmful
  - From: Howard Chu <hyc@symas.com>
- Re: GnuTLS considered harmful
  - From: Howard Chu <hyc@symas.com>
Prev by Date: Re: GnuTLS considered harmful
Next by Date: Re: GnuTLS considered harmful
Index(es):
- Chronological
- Thread