[Date Prev][Date Next] [Chronological] [Thread] [Top]

Re: DIGEST-MD5 and {nonce,cnonce}



On Mon, 25 Oct 1999, Kurt D. Zeilenga wrote:

> At 09:01 PM 10/25/99 +0300, Mihai Ibanescu wrote:
> >	Okay, I already have seen propositions on some options
> >(/dev/random, which seems the most reliable, and a PRNG started with a
> >seed derived from gettimeofday, all of them hashed with MD5 or something
> >like that). Any other alternatives?
> >	I'd like to implement it. 
> 
> 
> If you hash the value read from /dev/random, you will LOSE entropy.
> That is, you could read 128 bits of entropy from /dev/random.
> Hashing it to a 64-bit quanity will result in at least half these
> bits being lost.  Even where you read exactly 64-bits from /dev/random,
> hashing the value can only reduce the amount of entropy.

	Yes, but this is a quote from
/usr/src/linux/drivers/char/random.c, the source for /dev/random:

Copyright Theodore Ts'o, 1994, 1995, 1996, 1997, 1998
...

 * Computers are very predictable devices.  Hence it is extremely hard
 * to produce truly random numbers on a computer --- as opposed to
 * pseudo-random numbers, which can easily generated by using a
 * algorithm.  Unfortunately, it is very easy for attackers to guess
 * the sequence of pseudo-random number generators, and for some
 * applications this is not acceptable.  So instead, we must try to
 * gather "environmental noise" from the computer's environment, which
 * must be hard for outside attackers to observe, and use that to
 * generate random numbers.  In a Unix environment, this is best done
 * from inside the kernel.
 *
 * Sources of randomness from the environment include inter-keyboard
 * timings, inter-interrupt timings from some interrupts, and other
 * events which are both (a) non-deterministic and (b) hard for an
 * outside observer to measure.  Randomness from these sources are
 * added to an "entropy pool", which is mixed using a CRC-like function.
 * This is not cryptographically strong, but it is adequate assuming
 * the randomness is not chosen maliciously, and it is fast enough that
 * the overhead of doing it on every interrupt is very reasonable.
 * As random bytes are mixed into the entropy pool, the routines keep
 * an *estimate* of how many bits of randomness have been stored into
 * the random number generator's internal state.
 *
 * When random bytes are desired, they are obtained by taking the SHA
 * hash of the contents of the "entropy pool".  The SHA hash avoids
 * exposing the internal state of the entropy pool.  It is believed to
 * be computationally infeasible to derive any useful information
 * about the input of SHA from its output.  Even if it is possible to
 * analyze SHA in some clever way, as long as the amount of data
 * returned from the generator is less than the inherent entropy in
 * the pool, the output data is totally unpredictable.  For this
 * reason, the routine decreases its internal estimate of how many
 * bits of "true randomness" are contained in the entropy pool as it
 * outputs random numbers.

	My point here is that I should hash the content, not for the
entropy's sake, but to hide the real contents of /dev/random, just in
case... AFAIK SHA1 is a 128-bit hash, and after all I can diversify the
generation (have no idea about SHA1, but with MD5 I can do something like:

int lutil_entropy(char *buf, int nbytes)
{
    struct lutil_MD5Context ctx;
    int i, retval;
    int rndfd;
    struct timeval tval;
    char digest[16];
    char buffer[16];
    /* I won't initialize digest[], this may be a source of randomness */

    for (i = 0; i <= nbytes / 16; i++) {
        lutil_MD5Init(&ctx);
        lutil_MD5Update(&ctx, digest, 16);
        rndfd = open("/idev/urandom", O_RDONLY);
        if (rndfd < 0)
            rndfd = open("/idev/random", O_RDONLY);
        if (rndfd > 0) {
            retval = read(rndfd, buffer, sizeof(buffer));
            lutil_MD5Update(&ctx, buffer, retval);
            close(rndfd);
        }
        retval = gettimeofday(&tval, NULL);
        if (retval == 0)
            lutil_MD5Update(&ctx, (char *)&tval, sizeof(struct timeval));
        lutil_MD5Final(digest, &ctx);
        memcpy(buf + 16 * i, digest, (16 * i < nbytes) ? 16 : nbytes - 16
* i);
    }
    if (rndfd > 0)
        close(rndfd);
    return nbytes;
}

	The idea is that I first try to read random bytes from
/dev/urandom or /dev/random. I compute the digest and store it in the
first 16 bytes of the result. The 16-bytes digest is used to further
compute the next digest, with another call to a read from /dev/random.
Just in case there is no /dev/urandom or /dev/random available, I append
the result of gettimeofday(). While one call to gettimeofday() may be
predictable, I assume a series of calls to gettimeofday are not that
predictable anymore, since a context change may increase the difference
between two calls.
	I cannot be sure how much randomness is in the method described
above; however, I wrote 16383 bytes obtained via lutil_entropy and gzipped
(-9); the result was 16411 bytes, which make me think the method is pretty
reliable. The result was the same without reading bytes from /dev/random
(just hashing the results of gettimeofday()).
	Since there is a SHA1 implementation in liblutil, I may change MD5
to SHA1. This should be better, IMHO.
	If it's OK with you, I'll send the source.

> Hashing should only be applied to values that have some entropy
> but where you are unsure how the entropy is distributed within
> the value.  That is, a crytographic hash is used as a means to
> extract bits of entropy from the input.  You must, of course,
> be sure not to assume the hash has more bits of entropy than
> provided by the input.
> 
> I would suggest that your initial implementation provide:
> 	/dev/random 
> 	highly-portable fallback
> 
> Others can then contribute codes which extend the implementation
> to support other mechanisms.

Cheers,
Misa