2007-12-14 19:44:11

by John Reiser

[permalink] [raw]
Subject: /dev/urandom uses uninit bytes, leaks user data

xfer_secondary_pool() in drivers/char/random.c tells add_entropy_words()
to use uninitialized tmp[] whenever bytes is not a multiple of 4.
Besides being unfriendly to automated dynamic checkers, this is a
potential leak of user data into the output stream. When called from
extract_entropy_user, then uninit tmp[] can capture leftover data
from a previous copy_from_user().

Signed off by: [email protected]

--- ./drivers/char/random.c.orig 2007-12-14 11:06:03.000000000 -0800
+++ ./drivers/char/random.c 2007-12-14 11:06:57.000000000 -0800
@@ -708,7 +708,19 @@

bytes=extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
- add_entropy_words(r, tmp, (bytes + 3) / 4);
+ /*
+ * 2007-12-13 (valgrind/memcheck) Do not use undefined bytes.
+ * Avoid info leak when called from extract_entropy_user:
+ * uninit tmp[] can have data from previous copy_from_user().
+ * Instead: fill last word using first bytes.
+ */
+ {
+ __u8 *src = (__u8 *)&tmp[0];
+ __u8 *dst = bytes + src;
+ for (; 0!=(3 & bytes); ++bytes)
+ *dst++ = *src++;
+ }
+ add_entropy_words(r, tmp, bytes>>2);
credit_entropy_store(r, bytes*8);
}

--
John Reiser, [email protected]


2007-12-14 20:14:06

by Matt Mackall

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Fri, Dec 14, 2007 at 11:34:09AM -0800, John Reiser wrote:
> xfer_secondary_pool() in drivers/char/random.c tells add_entropy_words()
> to use uninitialized tmp[] whenever bytes is not a multiple of 4.
> Besides being unfriendly to automated dynamic checkers, this is a
> potential leak of user data into the output stream. When called from
> extract_entropy_user, then uninit tmp[] can capture leftover data
> from a previous copy_from_user().

Yes, we use uninitialized data. But it's not a leak in any useful
sense. To the extent the previous data is secret, this actually
improves our entropy.

It's getting folded into the random number pool, where it will be
impossible to recover it unless you already know what was in the
pool. And if you know what's in the pool, you've already broken into
the kernel.

But I'm sympathetic to making Valgrind happy.

> --- ./drivers/char/random.c.orig 2007-12-14 11:06:03.000000000 -0800
> +++ ./drivers/char/random.c 2007-12-14 11:06:57.000000000 -0800
> @@ -708,7 +708,19 @@
>
> bytes=extract_entropy(r->pull, tmp, bytes,
> random_read_wakeup_thresh / 8, rsvd);
> - add_entropy_words(r, tmp, (bytes + 3) / 4);
> + /*
> + * 2007-12-13 (valgrind/memcheck) Do not use undefined bytes.
> + * Avoid info leak when called from extract_entropy_user:
> + * uninit tmp[] can have data from previous copy_from_user().
> + * Instead: fill last word using first bytes.
> + */
> + {
> + __u8 *src = (__u8 *)&tmp[0];
> + __u8 *dst = bytes + src;
> + for (; 0!=(3 & bytes); ++bytes)
> + *dst++ = *src++;
> + }

That's hideous. How about a memset instead:

/* clear uninitialized bytes at the end to make valgrind happy */
memset((char *)tmp + bytes, 0, -bytes & 3);

Also, don't bother putting dates or the like in comments. We've got a
version control system for that.

> + add_entropy_words(r, tmp, bytes>>2);

And that change is broken..

> credit_entropy_store(r, bytes*8);

..because it makes this line wrong. We have to add precisely the
number of bytes returned by extract_entropy to keep the books
balanced.

--
Mathematics is the supreme nostalgia of our time.

2007-12-14 20:45:39

by John Reiser

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Matt Mackall wrote:
> Yes, we use uninitialized data. But it's not a leak in any useful
> sense. To the extent the previous data is secret, this actually
> improves our entropy.
>
> It's getting folded into the random number pool, where it will be
> impossible to recover it unless you already know what was in the
> pool. And if you know what's in the pool, you've already broken into
> the kernel.

The combination of capturing data from other users, plus seeding
the pool with your own data, just might be powerful enough to help
steal secrets, sometime in the next five years, from data that is
recorded today.

> But I'm sympathetic to making Valgrind happy. ...

> [The code in John's patch is] hideous. How about a memset instead ...

> And [John's] change is broken.. We have to add precisely the
> number of bytes returned by extract_entropy to keep the books
> balanced.

Matt is correct. The rolled-up result follows.

Signed off by: [email protected]

--- ./drivers/char/random.c.orig 2007-12-14 11:06:03.000000000 -0800
+++ ./drivers/char/random.c 2007-12-14 12:27:23.000000000 -0800
@@ -708,6 +708,8 @@

bytes=extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
+ /* clear uninitialized bytes at the end to make valgrind happy */
+ memset((char *)tmp + bytes, 0, -bytes & 3);
add_entropy_words(r, tmp, (bytes + 3) / 4);
credit_entropy_store(r, bytes*8);
}
--
John Reiser, [email protected]

2007-12-14 23:24:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Fri, Dec 14, 2007 at 12:45:23PM -0800, John Reiser wrote:
> > It's getting folded into the random number pool, where it will be
> > impossible to recover it unless you already know what was in the
> > pool. And if you know what's in the pool, you've already broken into
> > the kernel.
>
> The combination of capturing data from other users, plus seeding
> the pool with your own data, just might be powerful enough to help
> steal secrets, sometime in the next five years, from data that is
> recorded today.

Um, no. Just seeding the pool with your own data won't help, since
that still won't tell you the initial contents of the pool. And if
you know the initial contents of the pool, then you've broken root.
And being able to steal from the pool also assumes that you've broken
into the system; it is never, ever exported to userspace, even if
you're root (unless you use something like /dev/kmem). Furthermore,
if you don't know the previous contents of the pool, you'll never be
able to recover the information, either now or five years in the
future, since information is XOR'ed into the pool.

> > But I'm sympathetic to making Valgrind happy. ...

How about wrapping it in a #ifdef CONFIG_UML, which is the only way
you can use Valgrind? The memset will slow down things unnecessarily,
and mixing in the unknown previous contents of the stack (a) doesn't
hurt, and (b) could make life more complicated for an attacker.

- Ted

2007-12-15 00:30:35

by John Reiser

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Fri, Dec 14, 2007 at 12:45:23PM -0800, John Reiser wrote:
>
>>>It's getting folded into the random number pool, where it will be
>>>impossible to recover it unless you already know what was in the
>>>pool. And if you know what's in the pool, you've already broken into
>>>the kernel.
>>
>>The combination of capturing data from other users, plus seeding
>>the pool with your own data, just might be powerful enough to help
>>steal secrets, sometime in the next five years, from data that is
>>recorded today.
>
>
> Um, no. Just seeding the pool with your own data won't help, since
> that still won't tell you the initial contents of the pool. And if
> you know the initial contents of the pool, then you've broken root.
> And being able to steal from the pool also assumes that you've broken
> into the system; it is never, ever exported to userspace, even if
> you're root (unless you use something like /dev/kmem). Furthermore,
> if you don't know the previous contents of the pool, you'll never be
> able to recover the information, either now or five years in the
> future, since information is XOR'ed into the pool.

There is a path that goes from user data into the pool. This path
is subject to manipulation by an attacker, for both reading and
writing. Are you going to guarantee that in five years nobody
will discover a way to take advantage of it? Five years ago
there were no public attacks against MD5 except brute force;
now MD5 is on the "weak" list.

>>>But I'm sympathetic to making Valgrind happy. ...
>
>
> How about wrapping it in a #ifdef CONFIG_UML, which is the only way
> you can use Valgrind? The memset will slow down things unnecessarily,
> and mixing in the unknown previous contents of the stack (a) doesn't
> hurt, and (b) could make life more complicated for an attacker.

If speed matters that much, then please recoup 33 cycles on x86
by using shifts instead of three divides, such as (gcc 4.1.2):

add_entropy_words(r, tmp, (bytes + 3) / 4);

0x8140689 <xfer_secondary_pool+206>: lea 0x3(%esi),%eax
0x814068c <xfer_secondary_pool+209>: mov $0x4,%dl
0x814068e <xfer_secondary_pool+211>: mov %edx,%edi
0x8140690 <xfer_secondary_pool+213>: cltd
0x8140691 <xfer_secondary_pool+214>: idiv %edi

--
John Reiser, [email protected]

2007-12-15 04:32:51

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Fri, Dec 14, 2007 at 04:30:08PM -0800, John Reiser wrote:
> There is a path that goes from user data into the pool. This path
> is subject to manipulation by an attacker, for both reading and
> writing. Are you going to guarantee that in five years nobody
> will discover a way to take advantage of it? Five years ago
> there were no public attacks against MD5 except brute force;
> now MD5 is on the "weak" list.

Yep, I'm confident about making such a guarantee. Very confident.

First of all, keep in mind that the attacks on MD5 are about being
able to find hash collisions. The way the cryptographic hash is being
used in /dev/random, merely being able to find hash collision means
squat. You need to be able to carry out a preimage attack; which no
one has been able to do yet. And even if you could figure out how to
do a pre-image attack (which none of the "successful attacks" on MD4,
MD5, SHA-0, HAVAL-128, RIPEMD, et. al have been able to acomplish), we
don't give you the entire hash output; instead, what you get is the
hash folded in half via XOR, so you only get the two halves of the SHA
hash XOR'ed together to form 80 bits.

So given one of these folded 80 bits of hash, you need to figure out a
large number of the possible combinations 1024 bits that were in
secondry entropy pool could have resulted in the folded hash image.
And using the pigeon-hole princple and assuming that SHA approximates
a random function, you need to figure out which one of the 2**944
possible combination of 1024 bits was the correct pool pre-image that
generated those 80 bits. That's a hard problem.

But secondly, even *that's* not enough. As I said earlier, the pool
is simply unavailable to the attacker; we never make it available,
except by revealing 80 bit hashes of the pool. So you can't read the
initial or current state of the pool without first breaking root ---
and after 3 bytes of kernel stack is mixed into the pool, via an XOR
operation, there is no way to read out the pool. And if you don't
know the initial contents of the pool --- funny thing, but UNKNOWN XOR
KNOWN == UNKNOWN. So here I'm not even relying on cryptographers of
the future not being able to find preimage attacks. I'm just relying
on simple logic.

- Ted

2007-12-15 07:13:41

by Herbert Xu

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

John Reiser <[email protected]> wrote:
>
> If speed matters that much, then please recoup 33 cycles on x86
> by using shifts instead of three divides, such as (gcc 4.1.2):
>
> add_entropy_words(r, tmp, (bytes + 3) / 4);
>
> 0x8140689 <xfer_secondary_pool+206>: lea 0x3(%esi),%eax
> 0x814068c <xfer_secondary_pool+209>: mov $0x4,%dl
> 0x814068e <xfer_secondary_pool+211>: mov %edx,%edi
> 0x8140690 <xfer_secondary_pool+213>: cltd
> 0x8140691 <xfer_secondary_pool+214>: idiv %edi

There ought to be a warning about this sort of thing.

[CHAR] random: Avoid signed integer division

Joihn Reiser pointed out that we use signed integer divisions
unnecessarily in random.c. This is bad because the C compiler
is obliged to consider the case of a negative dividend.

This patch changes all the relevant divisions and modulus operations
to use unsigned arithmetic.

Signed-off: Herbert Xu <[email protected]>

Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 5fee056..6c70bfb 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -694,11 +694,11 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
if (r->pull && r->entropy_count < nbytes * 8 &&
r->entropy_count < r->poolinfo->POOLBITS) {
/* If we're limited, always leave two wakeup worth's BITS */
- int rsvd = r->limit ? 0 : random_read_wakeup_thresh/4;
+ int rsvd = r->limit ? 0 : random_read_wakeup_thresh / 4u;
int bytes = nbytes;

/* pull at least as many as BYTES as wakeup BITS */
- bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
+ bytes = max_t(int, bytes, random_read_wakeup_thresh / 8u);
/* but never more than the buffer size */
bytes = min_t(int, bytes, sizeof(tmp));

@@ -707,8 +707,8 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
r->name, bytes * 8, nbytes * 8, r->entropy_count);

bytes=extract_entropy(r->pull, tmp, bytes,
- random_read_wakeup_thresh / 8, rsvd);
- add_entropy_words(r, tmp, (bytes + 3) / 4);
+ random_read_wakeup_thresh / 8u, rsvd);
+ add_entropy_words(r, tmp, (bytes + 3) / 4u);
credit_entropy_store(r, bytes*8);
}
}
@@ -739,14 +739,14 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
nbytes * 8, r->name);

/* Can we pull enough? */
- if (r->entropy_count / 8 < min + reserved) {
+ if (r->entropy_count / 8u < min + reserved) {
nbytes = 0;
} else {
/* If limited, never pull more than available */
- if (r->limit && nbytes + reserved >= r->entropy_count / 8)
- nbytes = r->entropy_count/8 - reserved;
+ if (r->limit && nbytes + reserved >= r->entropy_count / 8u)
+ nbytes = r->entropy_count / 8u - reserved;

- if(r->entropy_count / 8 >= nbytes + reserved)
+ if(r->entropy_count / 8u >= nbytes + reserved)
r->entropy_count -= nbytes*8;
else
r->entropy_count = reserved;
@@ -781,7 +781,7 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
/* hash blocks of 16 words = 512 bits */
sha_transform(buf, (__u8 *)(r->pool + i), buf + 5);
/* feed back portion of the resulting hash */
- add_entropy_words(r, &buf[i % 5], 1);
+ add_entropy_words(r, &buf[i % 5u], 1);
}

/*
@@ -789,7 +789,7 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* portion of the pool while mixing, and hash one
* final time.
*/
- __add_entropy_words(r, &buf[i % 5], 1, data);
+ __add_entropy_words(r, &buf[i % 5u], 1, data);
sha_transform(buf, (__u8 *)data, buf + 5);

/*
@@ -1040,7 +1040,7 @@ write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
count -= bytes;
p += bytes;

- add_entropy_words(r, buf, (bytes + 3) / 4);
+ add_entropy_words(r, buf, (bytes + 3) / 4u);
}

return 0;

2007-12-15 16:31:45

by Matt Mackall

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Sat, Dec 15, 2007 at 03:13:19PM +0800, Herbert Xu wrote:
> John Reiser <[email protected]> wrote:
> >
> > If speed matters that much, then please recoup 33 cycles on x86
> > by using shifts instead of three divides, such as (gcc 4.1.2):
> >
> > add_entropy_words(r, tmp, (bytes + 3) / 4);
> >
> > 0x8140689 <xfer_secondary_pool+206>: lea 0x3(%esi),%eax
> > 0x814068c <xfer_secondary_pool+209>: mov $0x4,%dl
> > 0x814068e <xfer_secondary_pool+211>: mov %edx,%edi
> > 0x8140690 <xfer_secondary_pool+213>: cltd
> > 0x8140691 <xfer_secondary_pool+214>: idiv %edi
>
> There ought to be a warning about this sort of thing.

Indeed. Seems it would be better to adjust the types appropriately.
Anyway, this is no longer relevant to security@.

--
Mathematics is the supreme nostalgia of our time.

2007-12-17 16:30:27

by John Reiser

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Fri, Dec 14, 2007 at 04:30:08PM -0800, John Reiser wrote:
>
>>There is a path that goes from user data into the pool.

Note particularly that the path includes data from other users.
Under the current implementation, anyone who accesses /dev/urandom
is subject to having some bytes from their address space being captured
and mixed into the pool.

>> This path
>>is subject to manipulation by an attacker, for both reading and
>>writing. Are you going to guarantee that in five years nobody
>>will discover a way to take advantage of it?

> Yep, I'm confident about making such a guarantee. Very confident.

A direct attack (determining the specific values or partial values
of some captured bytes) is not the only way to steal secrets.
An indirect attack, such as traffic analysis, also may be effective.

Here is one idea. Use output from /dev/urandom to generate a random
permutation group. Analyze the group: determine all its subgroups, etc.
If the structure of those groups has different properties depending
on the captured bytes, even after SHA1 and folding and twisting,
then that may be enough to help steal secrets.

Indirect attacks may be subject to "exponent doubling." The state
modulo 2**(2n) may correspond to a system of 2**n congruences
in 2**n variables. So a property modulo 2**n might be hoisted to a
related property modulo 2**(2n). This might make 2**1024 seem to be
not so big.

Also, "getting lucky" is allowed, both via initial conditions and
via other coincidences. Running on a newly-booted, newly-installed
system might be especially advantageous. A completely formal
Goedel-numbering proof often has a formal checker that is logarithmic
in the length of the proof. If such a logarithmic property applies
every once in a while to /dev/urandom, then that might be enough.


The bottom line: At a cost of at most three unpredictable branches
(whether to clear the bytes in the last word with indices congruent
to 1, 2, or 3 modulo 4), then the code can reduce the risk from something
small but positive, to zero. This is very inexpensive insurance.

--
John Reiser, [email protected]

2007-12-17 17:30:48

by Linus Torvalds

[permalink] [raw]
Subject: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)



On Sat, 15 Dec 2007, Herbert Xu wrote:
>
> There ought to be a warning about this sort of thing.

We could add it to sparse. The appended (untested) patch seems to say
there's a lot of those signed divides-by-power-of-twos.

However, the problem with such warnings is that it encourages people to do
the simple fix that may be *wrong*. For example, you fixed it with patches
like

> - int rsvd = r->limit ? 0 : random_read_wakeup_thresh/4;
> + int rsvd = r->limit ? 0 : random_read_wakeup_thresh / 4u;

which is really quite dangerous for several reasons:

- it depends intimately on the type of the thing being divided (try it:
it will do nothing at all if the thing you divide is larger than
"unsigned int", since then the "4u" will be turned into a _signed_
larger type by the C type expansion).

So in general, the above doesn't even do what it's supposed to do on a
64-bit architecture if the thing to be divided is 64-bit!

- it changes behaviour. If that thing really is signed and can be
negative, that "trivial" patch just changed the divide to be
fundamentally something totally different.

so I think this patch is horribly wrong.

The *correct* way to fix signed divisions is by doing one of two things:

- really make the data we divide be unsigned. With all the thinking that
involves!

This is the good change, but it does involve making sure that there are
no tests against zero and that the value really cannot go negative.
Usually the unsigned types are (a) faster and (b) more robust, but if
somebody is depending on signs, unsigned types are obviously not
appropriate.

- change a divide-by-power-of-2 into a signed shift instead.

Yes, this also changes the end result for negative values, but it
changes it in a sane and generally good way (ie it will still be a
"valid" divide, it will just be a divide that rounds differently, and
is more likely than turning it into an unsigned divide to generally
result in working code).

Hmm?

Linus
---
simplify.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/simplify.c b/simplify.c
index 94e14d2..91f1120 100644
--- a/simplify.c
+++ b/simplify.c
@@ -286,6 +286,36 @@ static int simplify_constant_rightside(struct instruction *insn)
if (!value)
return replace_with_pseudo(insn, insn->src2);
return 0;
+
+ case OP_DIVU: case OP_DIVS:
+ if (!value)
+ break;
+ if (value == 1)
+ return replace_with_pseudo(insn, insn->src1);
+ /* Power-of-two? */
+ if (!(value & (value-1))) {
+ int log2 = -1;
+ do {
+ log2++;
+ value >>= 1;
+ } while (value);
+
+ /* Turn unsigned divides into shifts */
+ if (insn->opcode == OP_DIVU) {
+ insn->src2->value = log2;
+ insn->opcode = OP_LSR;
+ return 1;
+ }
+
+ /*
+ * This is incorrect, but we care more about
+ * the warning than the code generation
+ */
+ warning(insn->pos, "expensive signed divide");
+ insn->src2->value = log2;
+ insn->opcode = OP_ASR;
+ return 1;
+ }
}
return 0;
}

2007-12-17 17:36:43

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Mon, Dec 17, 2007 at 08:30:05AM -0800, John Reiser wrote:
> >>[You have yet to show that...]
> >>There is a path that goes from user data into the pool.
>
> Note particularly that the path includes data from other users.
> Under the current implementation, anyone who accesses /dev/urandom
> is subject to having some bytes from their address space being captured
> and mixed into the pool.

Again, you haven't *proven* this yet.

> A direct attack (determining the specific values or partial values
> of some captured bytes) is not the only way to steal secrets.
> An indirect attack, such as traffic analysis, also may be effective.
>
> Here is one idea. Use output from /dev/urandom to generate a random
> permutation group. Analyze the group: determine all its subgroups, etc.
> If the structure of those groups has different properties depending
> on the captured bytes, even after SHA1 and folding and twisting,
> then that may be enough to help steal secrets.

Generate a random permutation group *how*? This sounds suspiciously
like "reverse the polarity of the neutron flow" technobabble to me.
What are you going to compare the groups against? What subgroup? How
big of a group do you think it is going to be? So you really think
the output of SHA is going to produce a group with subgroups?

> Indirect attacks may be subject to "exponent doubling." The state
> modulo 2**(2n) may correspond to a system of 2**n congruences
> in 2**n variables. So a property modulo 2**n might be hoisted to a
> related property modulo 2**(2n). This might make 2**1024 seem to be
> not so big.

> A completely formal
> Goedel-numbering proof often has a formal checker that is logarithmic
> in the length of the proof. If such a logarithmic property applies
> every once in a while to /dev/urandom, then that might be enough.

References, please and exactly how this would apply to the situation
at hand? This sounds like math babble to me, and I have taken enough
abstract algebra and other math class that this sounds suspiciously to
me like the sort of stuff that Star Trek actors make up when the
script says "insert technobabble here". Of course, I'm not a math
major, so I'm willing to accept the possibility that you know what
you're talking about --- but I do know enough that I should be able to
evaluate any proofs and math papers you want to throw at me.

- Ted

2007-12-17 17:48:34

by Al Viro

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Mon, Dec 17, 2007 at 09:28:57AM -0800, Linus Torvalds wrote:
>
>
> On Sat, 15 Dec 2007, Herbert Xu wrote:
> >
> > There ought to be a warning about this sort of thing.
>
> We could add it to sparse. The appended (untested) patch seems to say
> there's a lot of those signed divides-by-power-of-twos.

I'm not sure that you are warning about the right things. If you want
a real nightmare scenario in that area, consider this:

int x[20];

int *p = x + n;
int *q = x + m;

p - q
((char *)p - (char *)q)/4
((char *)p - (char *)q)/sizeof(int)

The first two are equivalent on all targets we care about. However, an
attempt to make the second one "more portable" silently creates the
code that'll do something entirely different as soon as we get m > n...

2007-12-17 17:56:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Mon, 17 Dec 2007 09:28:57 -0800 (PST)
Linus Torvalds <[email protected]> wrote:

>
>
> On Sat, 15 Dec 2007, Herbert Xu wrote:
> >
> > There ought to be a warning about this sort of thing.
>
> We could add it to sparse. The appended (untested) patch seems to say
> there's a lot of those signed divides-by-power-of-twos.
>
> However, the problem with such warnings is that it encourages people to do
> the simple fix that may be *wrong*. For example, you fixed it with patches
> like
>
> > - int rsvd = r->limit ? 0 : random_read_wakeup_thresh/4;
> > + int rsvd = r->limit ? 0 : random_read_wakeup_thresh / 4u;
>
> which is really quite dangerous for several reasons:
>
> - it depends intimately on the type of the thing being divided (try it:
> it will do nothing at all if the thing you divide is larger than
> "unsigned int", since then the "4u" will be turned into a _signed_
> larger type by the C type expansion).
>

I was looking at lib/extable.c which does emit a signed divide on i386 but not on x86_64:

mid = (last - first) / 2 + first;

So I tried to compiled this on x86_64 :

long *mid(long *a, long *b)
{
return ((a - b) / 2 + a);
}

It gave :
mid:
movq %rdi, %rdx
subq %rsi, %rdx
sarq $3, %rdx
movq %rdx, %rax
shrq $63, %rax
addq %rdx, %rax
sarq %rax
leaq (%rdi,%rax,8), %rax
ret

while

long *mid(long *a, long *b)
{
return ((a - b) / 2u + a);
}

gave :
mid:
movq %rdi, %rdx
subq %rsi, %rdx
sarq $3, %rdx
movq %rdx, %rax
shrq $63, %rax
addq %rdx, %rax
sarq %rax
leaq (%rdi,%rax,8), %rax
ret

and while :

long *mid(long *a, long *b)
{
return (((unsigned long)(a - b)) / 2 + a);
}

gave :
mid:
movq %rdi, %rax
subq %rsi, %rax
sarq %rax
andq $-8, %rax
addq %rdi, %rax
ret


But I found this cast ugly so I cooked this patch.

[PATCH] Avoid signed arithmetics in search_extable()

On i386 and gcc-4.2.{1|2}, search_extable() currently does integer divides (by 2 !!!), while
we can certainly use a right shift. This looks more a typical bsearch() implementation.

Signed-off-by: Eric Dumazet <[email protected]>

diff --git a/lib/extable.c b/lib/extable.c
index 463f456..03a81bd 100644
--- a/lib/extable.c
+++ b/lib/extable.c
@@ -54,20 +54,20 @@ search_extable(const struct exception_table_entry *first,
const struct exception_table_entry *last,
unsigned long value)
{
- while (first <= last) {
- const struct exception_table_entry *mid;
+ unsigned long mid, low = 0, high = (last - first);

- mid = (last - first) / 2 + first;
+ while (low <= high) {
+ mid = (low + high) / 2;
/*
* careful, the distance between entries can be
- * larger than 2GB:
+ * larger than MAX_LONG:
*/
- if (mid->insn < value)
- first = mid + 1;
- else if (mid->insn > value)
- last = mid - 1;
+ if (first[mid].insn < value)
+ low = mid + 1;
+ else if (first[mid].insn > value)
+ high = mid - 1;
else
- return mid;
+ return first + mid;
}
return NULL;
}

2007-12-17 18:05:46

by Ray Lee

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Dec 17, 2007 9:55 AM, Eric Dumazet <[email protected]> wrote:
> - mid = (last - first) / 2 + first;
> + while (low <= high) {
> + mid = (low + high) / 2;

I think you just introduced a bug. Think about what happens if
low=high=MAX_LONG/2 + 1.

2007-12-17 18:10:34

by Eric Dumazet

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Mon, 17 Dec 2007 10:05:35 -0800
"Ray Lee" <[email protected]> wrote:

> On Dec 17, 2007 9:55 AM, Eric Dumazet <[email protected]> wrote:
> > - mid = (last - first) / 2 + first;
> > + while (low <= high) {
> > + mid = (low + high) / 2;
>
> I think you just introduced a bug. Think about what happens if
> low=high=MAX_LONG/2 + 1.
>

Fortunatly this is not possible :)

Hint : sizeof(struct exception_table_entry) is >= 8

so high is garanteed to be <= MAX_LONG/8

2007-12-17 18:13:00

by Ray Lee

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Dec 17, 2007 10:10 AM, Eric Dumazet <[email protected]> wrote:
> On Mon, 17 Dec 2007 10:05:35 -0800
> "Ray Lee" <[email protected]> wrote:
>
> > On Dec 17, 2007 9:55 AM, Eric Dumazet <[email protected]> wrote:
> > > - mid = (last - first) / 2 + first;
> > > + while (low <= high) {
> > > + mid = (low + high) / 2;
> >
> > I think you just introduced a bug. Think about what happens if
> > low=high=MAX_LONG/2 + 1.
> >
>
> Fortunatly this is not possible :)
>
> Hint : sizeof(struct exception_table_entry) is >= 8
>
> so high is garanteed to be <= MAX_LONG/8

Ah, my bad. One of these days I'll learn to not post before coffee.

2007-12-17 18:24:19

by Al Viro

[permalink] [raw]
Subject: Re: Signed divides vs shifts (Re: [Security] /dev/urandom uses uninit bytes, leaks user data)

On Mon, Dec 17, 2007 at 06:55:57PM +0100, Eric Dumazet wrote:

> long *mid(long *a, long *b)
> {
> return ((a - b) / 2 + a);
> }

... is not actually a middle (you'd want b-a, not a-b there), but anyway

> It gave :
> mid:
> movq %rdi, %rdx
> subq %rsi, %rdx
> sarq $3, %rdx
> movq %rdx, %rax
> shrq $63, %rax
> addq %rdx, %rax
> sarq %rax
> leaq (%rdi,%rax,8), %rax
> ret
>
> while
>
> long *mid(long *a, long *b)
> {
> return ((a - b) / 2u + a);
> }

... undefined behaviour if a < b

> and while :
>
> long *mid(long *a, long *b)
> {
> return (((unsigned long)(a - b)) / 2 + a);
> }

undefined behaviour, again.

2007-12-17 18:32:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [Security] Signed divides vs shifts (Re: /dev/urandom uses uninit bytes, leaks user data)



On Mon, 17 Dec 2007, Eric Dumazet wrote:
>
> while
>
> long *mid(long *a, long *b)
> {
> return ((a - b) / 2u + a);
> }

This is exactly what I'm talking about. That "2u" is TOTALLY POINTLESS.
It's an "unsigned int", but since (a-b) will be of type ptrdiff_t, and is
*wider* on a 64-bit architecture (it's the same as "long" on x86-64), then
the 2u will just be converted to "long", and be signed again!

So you thought that you did an unsigned divide, but you did no such thing.

If you change the "2u" to a "2ul", it works again, and you get

mid:
movq %rdi, %rax
subq %rsi, %rax
sarq %rax
andq $-8, %rax
addq %rdi, %rax
ret

which is the code you wanted. But quite frankly, you could just have
written it with a shift to start with, and avoided the subtle type issue,
although gcc then generates

movq %rdi, %rax
subq %rsi, %rax
sarq $4, %rax
leaq (%rdi,%rax,8), %rax
ret

instead. Of course, this all *does* still have subtle sign issues, because
the "a-b" part implies a signed divide in itself, which is why you see
that "sarq" in he first place (rather than a "shrq").

Signed divides are hard. The "a-b" pointer subtraction is actually cheaper
than a general signed divide by sizeof, since the compiler can then assume
that the two pointers are mutually aligned, which is why gcc can generate
just a single "sarq" instead of having to do an extra "add negative bit"
thing to get the rounding right.

[ So Al, when you said that

(a-b)

is equivalent to

((char *)a-(char *)b)/4

for a "int *" a and b, you're right in the sense that the *result* is
the same, but the code generation likely isn't. The "a-b" thing can (and
does) allow the compiler to avoid the whole "align up for signed
numbers" thing, and the difference in code generation is clear:

subq %rsi, %rdi
sarq $2, %rdi

vs

subq %rsi, %rdi
leaq 3(%rdi), %rax
testq %rdi, %rdi
cmovs %rax, %rdi
sarq $2, %rdi

exactly because the first case *knows* that the low two bits have to be
zero, and thus there is no rounding issue. ]

Linus

2007-12-17 19:09:12

by Al Viro

[permalink] [raw]
Subject: Re: [Security] Signed divides vs shifts (Re: /dev/urandom uses uninit bytes, leaks user data)

On Mon, Dec 17, 2007 at 10:28:38AM -0800, Linus Torvalds wrote:
> [ So Al, when you said that
>
> (a-b)
>
> is equivalent to
>
> ((char *)a-(char *)b)/4
>
> for a "int *" a and b, you're right in the sense that the *result* is
> the same, but the code generation likely isn't. The "a-b" thing can (and

Sure. And yes, I very much do prefer code that uses C as it ought to be
used and doesn't play games with casts from hell, etc. For a lot of reasons,
both correctness- and efficiency-related.

We _do_ have such turds. In spades. And such places are potential timebombs,
since well-intentioned idiotic patch ("I've read in lecture notes that sizeof
is better than explicit constant, so replacement surely can only improve the
things and the best part is, I don't need to understand what I'm doing")
turns an ugly FPOS into equally ugly FPOS that silently doesn't work ;-/

[sorry about the rant, I'm about 3/4 through the drivers/net colonoscopy,
with >300Kb of patches and a pile of assorted bugs so far - and then there's
drivers/scsi to deal with. Endianness stuff, mostly...]

2007-12-17 20:59:26

by David Schwartz

[permalink] [raw]
Subject: RE: /dev/urandom uses uninit bytes, leaks user data


> The bottom line: At a cost of at most three unpredictable branches
> (whether to clear the bytes in the last word with indices congruent
> to 1, 2, or 3 modulo 4), then the code can reduce the risk from something
> small but positive, to zero. This is very inexpensive insurance.

> John Reiser, [email protected]

Even if you're right, the change isn't free. You've simply presented
evidence of one non-zero benefit of it. You've given no ability to assess
the size of this benefit and no way to figure if it exceeds the cost. There
is also a non-zero *security* cost to this change.

DS

2007-12-18 01:17:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Mon, Dec 17, 2007 at 08:30:05AM -0800, John Reiser wrote:
>>>> [You have yet to show that...]
>>>> There is a path that goes from user data into the pool.
>> Note particularly that the path includes data from other users.
>> Under the current implementation, anyone who accesses /dev/urandom
>> is subject to having some bytes from their address space being captured
>> and mixed into the pool.
>
> Again, you haven't *proven* this yet.

Has anyone *proven* that using uninitialized data this way is safe? As
a *user* of this stuff, I'm *very* hesitant to trust Linux's RNG when I
hear things like this. (Hint: most protocols are considered insecure
until proven otherwise, not the other way around.)

Consider: add_entropy_words is reversible [1], presumably in the sense
that words could be removed, so, if there added words are independent of
the state, there is no loss of entropy. But it also appears reversible
in the sense that if the pool has a known state and entropy words are
added, most of which are known, the unknown ones can be found (if
nothing else, if there are only a few bits of entropy there, you can try
all possibilities).

Now imagine a security program. It runs some forward secret protocol
and it's very safe not to leak data that would break forward secrecy
(mlockall, memset when done with stuff, etc). It runs on a freshly
booted machine (no DSA involved, so we're not automatically hosed), so
an attacker knows the initial pool state. Conveniently, some *secret*
(say an ephemeral key, or, worse, a password) gets mixed in to the pool.
There are apparently at most three bytes of extra data mixed in, but
suppose the attacker knows add the words that were supposed to get mixed
in. Now the program clears all its state to "ensure" forward secrecy,
and *then* the machine gets hacked. Now the attacker can learn (with at
most 2^24 guesses worth of computation) 24 bits worth of a secret, which
could quite easily reduce the work involved in breaking whatever forward
secret protocol was involved from intractable to somewhat easy. Or it
could leak three bytes of password. Or whatever.

Sorry for the somewhat inflammatory email, but this is absurd.

--Andy


[1] http://www.mail-archive.com/[email protected]/msg238406.html

2007-12-18 03:05:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Mon, Dec 17, 2007 at 07:52:53PM -0500, Andy Lutomirski wrote:
> It runs on a freshly booted machine (no
> DSA involved, so we're not automatically hosed), so an attacker knows the
> initial pool state.

Not just a freshly booted system. The system has to be a freshly
booted, AND freshly installed system. Normally you mix in a random
seed at boot time. And during the boot sequence, the block I/O will
be mixing randomness into the entropy pool, and as the user logs in,
the keyboard and mouse will be mixing more entropy into the pool. So
you'll have to assume that all entropy inputs have somehow been
disabled as well.

BUT --- if the pool state is totally known, you're really, REALLY,
REALLY hosed, since normally /dev/random might get used to initialize
a CRNG to *generate* the ephmeral key. So the danger is not *3*
*bytes* of the empheral key accidentally getting mixed into the
entropy pool, followed by an attacker managing to crack the system so
bad that he or she has read access into kernel memory (without
managing to mix more entropy into the pool), and then doing
sophisticated cryptoanalytic attacks with an O(2**24) order time, to
try to leak the *3* *bytes* of emphemeral key. No, the problem is
that the attacker, with access to the known initial state of the pool,
will be able to find out *THE* *ENTIRE* *EMPHERAL* *KEY*, since it was
probably generated via /dev/random --- and without needing to break
into the system with sufficient privs to be able to read kernel
memory.

So it is your argument which is absurd. If you're going to assume a
completely known pool state, and then assume some program is using
/dev/random, the danger is not in change that some previous kernel
stack state might contain something secret that could theoretically be
revealed after an attacker manages to break into machine as root. No,
the real danger is in what this presumably cryptographically sensitive
program did with the predictable data from /dev/random.

So the real answer is that there needs to be sufficient randomness
mixed into /dev/random. For a typical Linux system, it's there.
There are some potential configuration (say a pure diskless system
with NFS remote filesystems and no keyboard or mouse) being used in
some kind of security-sensitive system. Now, someone who wanted to
say, run a remote certificate signing server or IPSEC key management
system on such a system would arguably be performing security
malpractice, and the lack of entropy for /dev/random is probably the
least of such a system's security problems.

But in any case, instead of trying to worry about these sorts of
hypothetical worries, the much better use of time and energy is the
userspace daemon that can sample entropy from a sound device, or some
other hardware entropy available to the system. The real issue is
being able to make sure the pool is *not* knowable to an attacker.

- Ted

2007-12-18 03:13:19

by David Newall

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Mon, Dec 17, 2007 at 07:52:53PM -0500, Andy Lutomirski wrote:
>
>> It runs on a freshly booted machine (no
>> DSA involved, so we're not automatically hosed), so an attacker knows the
>> initial pool state.
>>
>
> Not just a freshly booted system. The system has to be a freshly
> booted, AND freshly installed system. Normally you mix in a random
> seed at boot time. And during the boot sequence, the block I/O will
> be mixing randomness into the entropy pool, and as the user logs in,
> the keyboard and mouse will be mixing more entropy into the pool. So
> you'll have to assume that all entropy inputs have somehow been
> disabled as well.
>

On a server, keyboard and mouse are rarely used. As you've described
it, that leaves only the disk, and during the boot process, disk
accesses and timing are somewhat predictable. Whether this is
sufficient to break the RNG is (clearly) a matter of debate.

2007-12-18 03:47:20

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Tue, Dec 18, 2007 at 01:43:28PM +1030, David Newall wrote:
> On a server, keyboard and mouse are rarely used. As you've described it,
> that leaves only the disk, and during the boot process, disk accesses and
> timing are somewhat predictable. Whether this is sufficient to break the
> RNG is (clearly) a matter of debate.

In normal operaiton, entropy is accumlated on the system, extracted
via /dev/urandom at shutdown, and then loaded back into the system
when it boots up. So this will help in everything but the freshly
installed system case (and in that case things will get better as the
system has a chance to operate).

If you have a system with insufficent entropy inputs, you're in
trouble, of course. There are "catastrophic reseeds" that attempt to
mitigrate some of worse attacks, but at the end of the day,
/dev/random isn't magic.

In any case, the problem if you have insufficent entropy is not
esoteric attacks that presume an attacker can break into the system
and read out kernel memory. The problem will be /dev/random's entropy
output. And there are techniques that cryptographic applications can
do to help. For example, if it is about to encrypt data, it can SHA-1
hash data that it is about to be sent out encrypted, and feed the
SHA-1 hash into /dev/random. This won't increase entropy credit
counter, but if the attacker doesn't have access to the unecrypted
plaintext, mixing in the SHA-1 hash into /dev/random can only help.

If you have a server, the best thing you can do is use a hardware
random number generator, if it exists. Fortunately a number of
hardware platforms, such as IBM blades and Thinkpads, come with TPM
modules that include hardware RNG's. That's ultimately the best way
to solve these issues.

- Ted

2007-12-18 04:08:52

by David Newall

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Tue, Dec 18, 2007 at 01:43:28PM +1030, David Newall wrote:
>
>> On a server, keyboard and mouse are rarely used. As you've described it,
>> that leaves only the disk, and during the boot process, disk accesses and
>> timing are somewhat predictable. Whether this is sufficient to break the
>> RNG is (clearly) a matter of debate.
>>
>
> In normal operaiton, entropy is accumlated on the system, extracted
> via /dev/urandom at shutdown, and then loaded back into the system
> when it boots up.

Thus, the entropy saved at shutdown can be known at boot-time. (You can
examine the saved entropy on disk.)


> If you have a server, the best thing you can do is use a hardware
> random number generator, if it exists. Fortunately a number of
> hardware platforms, such as IBM blades and Thinkpads, come with TPM
> modules that include hardware RNG's. That's ultimately the best way
> to solve these issues.

Just how random are they? Do they turn out to be quite predictable if
you're IBM?

2007-12-18 04:24:00

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Tue, Dec 18, 2007 at 02:39:00PM +1030, David Newall wrote:
>
> Thus, the entropy saved at shutdown can be known at boot-time. (You can
> examine the saved entropy on disk.)
>

If you can examine the saved entropy on disk, you can also introduce a
trojan horse kernel that logs all keystrokes and all generated entropy
to the FBI carnivore server --- you can replace the gpg binary with
one which ships copies of the session keys to the CIA --- and you can
replace the freeswan server with one which generates emphermal keys by
encrypting the current timestamp with a key known only by the NSA. So
if the attacker has access to your disk between shutdown and boot up,
you are *done*. /dev/random is the least of your worries.

Really, why is it that people are so enamored about proposing these
totally bogus scenarios? Yes, if you have direct physical access to
your machine, you can compromise it. But there are far easier ways
that such a vulnerability can be exploited, rather than making it easy
to carry out an cryptoanalytic attack on /dev/random.

(And yes, after using the saved state to load the entropy at
boot-time, the saved state file is overwritten, and if you're
paranoid, you can scrub the disk after it is read and mixed into the
entropy pool. And yes, the saved state is *not* the entropy pool used
during the previous boot, but entropy extracted using SHA-1 based
CRNG.)

>> If you have a server, the best thing you can do is use a hardware
>> random number generator, if it exists. Fortunately a number of
>> hardware platforms, such as IBM blades and Thinkpads, come with TPM
>> modules that include hardware RNG's. That's ultimately the best way
>> to solve these issues.
>
> Just how random are they? Do they turn out to be quite predictable if
> you're IBM?

They use a noise diode, so they are as good as any other hardware
random number generator. Of course, you ultimately have to trust the
supplier of your hardware not to do something screwy, and Thinkpads
are now made by Lenovo, which has caused some US Government types to
get paranoid --- but that's why the best way to do things is to get
entropy from as many places as possible, and mix it all into
/dev/random. If any one of them is unknown to the attacker, he's stuck.

And if you are sufficiently paranoid, you obviously can't use
commodity hardware, and how do you know that some time along time ago,
someone snuck in a evil secret code into the C compiler that was
designed to survive a bootstrap process[1]? And how do you know that
your keyboard controller hasn't been gimmicked to record your key
strokes and then send them out by adding jitter that can be read by an
attacker looking at the timing of your ssh packets[2]?

But seriously, if you are that paranoid, better build your hardware
from scratch, and then hand assemble your compiler, and then do a
line-by-line audit of all of your source code. It's going to be the
only way to be sure....

- Ted

[1] Ken Thompson, Reflections on Trusting Trust.
http://cm.bell-labs.com/who/ken/trust.html

[2] Gaurav Shah, Andres Molina and Matt Blaze, Keyboards and Covert Channels.
http://www.usenix.org/events/sec06/tech/shah/shah_html/index.html

2007-12-18 05:13:44

by David Schwartz

[permalink] [raw]
Subject: RE: /dev/urandom uses uninit bytes, leaks user data


> Has anyone *proven* that using uninitialized data this way is safe?

You can probably find dozens of things in the Linux kernel that have not
been proven to be safe. That means nothing.

> As
> a *user* of this stuff, I'm *very* hesitant to trust Linux's RNG when I
> hear things like this. (Hint: most protocols are considered insecure
> until proven otherwise, not the other way around.)

There's no reason whatsoever to think this is unsafe. First, you can't
access the pool directly. Second, even if you could, it's mixed in securely.

> Now imagine a security program. It runs some forward secret protocol
> and it's very safe not to leak data that would break forward secrecy
> (mlockall, memset when done with stuff, etc). It runs on a freshly
> booted machine (no DSA involved, so we're not automatically hosed), so
> an attacker knows the initial pool state. Conveniently, some *secret*
> (say an ephemeral key, or, worse, a password) gets mixed in to the pool.
> There are apparently at most three bytes of extra data mixed in, but
> suppose the attacker knows add the words that were supposed to get mixed
> in. Now the program clears all its state to "ensure" forward secrecy,
> and *then* the machine gets hacked. Now the attacker can learn (with at
> most 2^24 guesses worth of computation) 24 bits worth of a secret, which
> could quite easily reduce the work involved in breaking whatever forward
> secret protocol was involved from intractable to somewhat easy. Or it
> could leak three bytes of password. Or whatever.

This is no more precise than "imagine there's some vulnerability in the
RNG". Yes, if there's a vulnerability, then we're vulnerable.

An attacker can always (at least in principle) get the pool out of the
kernel. The RNG's design is premised on the notion that it is
computationally infeasbile to get the input entropy out of the pool. If an
attacker can watch data going into the pool, he needn't get it out of the
pool.

> Sorry for the somewhat inflammatory email, but this is absurd.

I agree.

DS

2007-12-19 22:23:18

by Bill Davidsen

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

David Newall wrote:
> Theodore Tso wrote:
>> On Tue, Dec 18, 2007 at 01:43:28PM +1030, David Newall wrote:
>>
>>> On a server, keyboard and mouse are rarely used. As you've described
>>> it, that leaves only the disk, and during the boot process, disk
>>> accesses and timing are somewhat predictable. Whether this is
>>> sufficient to break the RNG is (clearly) a matter of debate.
>>>
>>
>> In normal operaiton, entropy is accumlated on the system, extracted
>> via /dev/urandom at shutdown, and then loaded back into the system
>> when it boots up.
>
> Thus, the entropy saved at shutdown can be known at boot-time. (You can
> examine the saved entropy on disk.)
>
>
>> If you have a server, the best thing you can do is use a hardware
>> random number generator, if it exists. Fortunately a number of
>> hardware platforms, such as IBM blades and Thinkpads, come with TPM
>> modules that include hardware RNG's. That's ultimately the best way
>> to solve these issues.
>
> Just how random are they? Do they turn out to be quite predictable if
> you're IBM?

The typical RNG is a noise diode or other similar hardware using thermal
noise, so it's unlikely that anyone but God could predict it. There are
some USB devices which supposedly use radioactive decay, I'm unsure if
that's better but I don't want to carry the dongle in my pants pocket.
The hotbits network site uses radioactive decay to generate it's numbers.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-12-19 22:25:23

by Bill Davidsen

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Theodore Tso wrote:
> On Tue, Dec 18, 2007 at 02:39:00PM +1030, David Newall wrote:
>> Thus, the entropy saved at shutdown can be known at boot-time. (You can
>> examine the saved entropy on disk.)
>>
>
> If you can examine the saved entropy on disk, you can also introduce a
> trojan horse kernel that logs all keystrokes and all generated entropy
> to the FBI carnivore server --- you can replace the gpg binary with
> one which ships copies of the session keys to the CIA --- and you can
> replace the freeswan server with one which generates emphermal keys by
> encrypting the current timestamp with a key known only by the NSA. So
> if the attacker has access to your disk between shutdown and boot up,
> you are *done*. /dev/random is the least of your worries.
>
> Really, why is it that people are so enamored about proposing these
> totally bogus scenarios? Yes, if you have direct physical access to
> your machine, you can compromise it. But there are far easier ways
> that such a vulnerability can be exploited, rather than making it easy
> to carry out an cryptoanalytic attack on /dev/random.
>
> (And yes, after using the saved state to load the entropy at
> boot-time, the saved state file is overwritten, and if you're
> paranoid, you can scrub the disk after it is read and mixed into the
> entropy pool. And yes, the saved state is *not* the entropy pool used
> during the previous boot, but entropy extracted using SHA-1 based
> CRNG.)
>
>>> If you have a server, the best thing you can do is use a hardware
>>> random number generator, if it exists. Fortunately a number of
>>> hardware platforms, such as IBM blades and Thinkpads, come with TPM
>>> modules that include hardware RNG's. That's ultimately the best way
>>> to solve these issues.
>> Just how random are they? Do they turn out to be quite predictable if
>> you're IBM?
>
> They use a noise diode, so they are as good as any other hardware
> random number generator. Of course, you ultimately have to trust the
> supplier of your hardware not to do something screwy, and Thinkpads
> are now made by Lenovo, which has caused some US Government types to
> get paranoid --- but that's why the best way to do things is to get
> entropy from as many places as possible, and mix it all into
> /dev/random. If any one of them is unknown to the attacker, he's stuck.
>
In another thread I believe I mentioned things an attacker can't know
(unless your system is already compromised) like fan speed, CPU
temperature, etc.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-12-20 04:19:07

by Andy Lutomirski

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Dec 17, 2007 10:46 PM, Theodore Tso <[email protected]> wrote:
> If you have a system with insufficent entropy inputs, you're in
> trouble, of course. There are "catastrophic reseeds" that attempt to
> mitigrate some of worse attacks, but at the end of the day,
> /dev/random isn't magic.
>

I understand that there's no way that /dev/random can provide good
output if there's insufficient entropy. But it still shouldn't leak
arbitrary bits of user data that were never meant to be put into the
pool at all.

(My hypothetical attack is a lot hypothetical than I thought at first.
An attacker does not need to break into the kernel and steal the
state of the pool. It may be as easy as this to trigger:

Step 1: Boot a system without a usable entropy source.
Step 2: add some (predictable) "entropy" from userspace which isn't a
multiple of 4, so up to three extra bytes get added.
Step 3: Read a few bytes of /dev/random and send them over the network.

An attacker can now try all possibilities of the three extra bytes and
guess them pretty quickly. No compromise needed. This is, IMHO, bad.
(It's one thing for the "random" numbers to be weak. It's another
thing entirely for them to reveal data that never belonged in the pool
in the first place.)

Actually, perhaps there should be a policy that we try never to reseed
the pool at all until there is enough entropy around to prevent
attacks like these. (In theory the state of the pool might contain
2^(smallish number) bits of data interesting to the attacker even
without the uninitialized data issue.) This would make the situation
even worse for low-entropy systems, though.

--Andy

2007-12-20 20:17:21

by Phillip Susi

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Andrew Lutomirski wrote:
> I understand that there's no way that /dev/random can provide good
> output if there's insufficient entropy. But it still shouldn't leak
> arbitrary bits of user data that were never meant to be put into the
> pool at all.

It doesn't leak it though, it consumes it, and it then vanishes into the
entropy pool, never to be seen again.

> Step 1: Boot a system without a usable entropy source.
> Step 2: add some (predictable) "entropy" from userspace which isn't a
> multiple of 4, so up to three extra bytes get added.
> Step 3: Read a few bytes of /dev/random and send them over the network.

Only root can do 1 and 2, at which point, it's already game over.

2007-12-20 20:36:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Wed, Dec 19, 2007 at 11:18:54PM -0500, Andrew Lutomirski wrote:
> I understand that there's no way that /dev/random can provide good
> output if there's insufficient entropy. But it still shouldn't leak
> arbitrary bits of user data that were never meant to be put into the
> pool at all.

Let's be clear here. It's not arbitrary bits of user data. It's at
most three bytes of kernel stack garbage from the process which is
adding entropy to /dev/random from userspace. And the attacker
doesn't get to pick which 3 bytes they are, either! That means they
are not "arbitrary bits".

> (My hypothetical attack is a lot hypothetical than I thought at first.
> An attacker does not need to break into the kernel and steal the
> state of the pool. It may be as easy as this to trigger:
>
> Step 1: Boot a system without a usable entropy source.
> Step 2: add some (predictable) "entropy" from userspace which isn't a
> multiple of 4, so up to three extra bytes get added.
> Step 3: Read a few bytes of /dev/random and send them over the network.

So Step 1 assumes a system without a random seed file, or any usable
entropy source. As I've mentioned, this means that any cryptographic
users of /dev/random --- like, say, openssh daemon --- are in deep,
DEEP, trouble. This is a very serious problem, and in my opinion, far
more serious and far more REAL that your hypothetical attack. So we
need to address Step 1 anyway, and more urgently.

In step 2, what program are you expecting will be writing to
/dev/random? Very few programs do so today, and none that are
commonly installed on most Linux distributions, other than the dd
program run out of /etc/init.d/random to initialize /dev/random from
the random seed file. But, oh wait --- you're assuming that doesn't
exist, even though all major distributions have it.

Why would a legitimate program read a few bytes of /dev/random and
send them on the network in step 3? Or are you assuming your
hypothetical attacker has a shell account on the system? In which
case there is probably far more concerns about information leakage
than 3 bytes of kernel stack garbage, which might or might not contain
user data.

And note that this attack can only be done for 3 bytes. After that,
the work factors involved become intractable.

So yes, there is a theoretical hole given your assumptions, but the
problem started the "system without any intial entropy", and there are
far worse problems that result in that case. So that is the problem
we should be addressing.

That being said, write_pool() is hardly a fastpath, and the memset
isn't going to make any difference. So if it makes people feel
better, I don't object to adding it. I'm just pointing out that the
assumption is either pretty silly (since all of the distributions do
use the random seed file, and in practice *someone* has logged into
the console to add some amount of entropy at some point in the
machine's lifetime, even it if was when the machine was initially
installed) and/or points out a more critical issue, which is we need
to make sure that we do have a reasonable entropy source on all
machines.

- Ted

2007-12-21 16:10:45

by Andy Lutomirski

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Dec 20, 2007 3:17 PM, Phillip Susi <[email protected]> wrote:
> Andrew Lutomirski wrote:
> > I understand that there's no way that /dev/random can provide good
> > output if there's insufficient entropy. But it still shouldn't leak
> > arbitrary bits of user data that were never meant to be put into the
> > pool at all.
>
> It doesn't leak it though, it consumes it, and it then vanishes into the
> entropy pool, never to be seen again.

No, it's there, and if there's little enough entropy around it can be
recovered by brute force.

>
> > Step 1: Boot a system without a usable entropy source.
> > Step 2: add some (predictable) "entropy" from userspace which isn't a
> > multiple of 4, so up to three extra bytes get added.
> > Step 3: Read a few bytes of /dev/random and send them over the network.
>
> Only root can do 1 and 2, at which point, it's already game over.

Again, no. Root could do this accidentally. Step 1 happens all the
time (see the comments on non-unique UUIDs). Step 2 just requires a
program to put data which it things is OK to go into the pool next to
data that shouldn't be there in memory. I'm OK with the entropy pool
being insecure in cases where it cannot possibly be secure. But it
should not leak information that never belonged in it in the first
place.

(Remember, the entire justification for Linux's model of the entropy
pool seems to be that it should be as secure as possible even against
computationally unbounded attackers or attackers who can find SHA
preimages. A brute force attack that works sometimes and only
requires 2^24 brute force iterations certainly fits into this threat
model.)

--Andy

2007-12-22 01:14:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Fri, Dec 21, 2007 at 11:10:36AM -0500, Andrew Lutomirski wrote:
> > > Step 1: Boot a system without a usable entropy source.
> > > Step 2: add some (predictable) "entropy" from userspace which isn't a
> > > multiple of 4, so up to three extra bytes get added.
> > > Step 3: Read a few bytes of /dev/random and send them over the network.
> >
> > Only root can do 1 and 2, at which point, it's already game over.
>
> Again, no. Root could do this accidentally. Step 1 happens all the
> time (see the comments on non-unique UUIDs).

Actually, it doesn't happen *all* the time. That was a situation of
an rpm post-install script trying to use a random UUID generation
facility from a kick-start install where there was no keyboard
activity and apparently no other entropy added. (Note that in such an
environment, generation of host ssh keys at install time without any
entropy is an even bigger problem, and that *is* something that people
should be thinking about.)

> Step 2 just requires a
> program to put data which it things is OK to go into the pool next to
> data that shouldn't be there in memory.

Um, no, that's incorrect. Simply putting data which is OK to put into
the pool next to data that shouldn't isn't enough. First of all, the
issue is using unitialized kernel stack garbage. Read that again,
slowly --- kernel stack. How you arrange your data in user memory
doesn't matter. Secondly, as I've said earlier, I'm not aware of any
program that actually tries to write into the entropy pool on most
modern Linux systems other than dd in /etc/init.d/random which
initializes the random pool from pre-seeded entropy on disk.

- Ted

2007-12-26 18:30:51

by Phillip Susi

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

Andrew Lutomirski wrote:
> No, it's there, and if there's little enough entropy around it can be
> recovered by brute force.

A little entropy is enough to prevent a brute force attack. You would
have to have ZERO entropy after a cold boot so the attacker would know
exactly the contents of the pool, AND know that one and ONLY one other
task has read from /dev/urandom, AND exactly what time that task did so,
AND how many bytes it read. Only then could the attacker read from
urandom and based on those bytes and the previous known pool state,
brute force the 3 bytes that came from some unknown location in the
other task's memory.

>>> Step 1: Boot a system without a usable entropy source.
>>> Step 2: add some (predictable) "entropy" from userspace which isn't a
>>> multiple of 4, so up to three extra bytes get added.
>>> Step 3: Read a few bytes of /dev/random and send them over the network.
>> Only root can do 1 and 2, at which point, it's already game over.
>
> Again, no. Root could do this accidentally. Step 1 happens all the
> time (see the comments on non-unique UUIDs). Step 2 just requires a

It does not happen all the time. It happens on a system that has just
been cold booted from read only distribution media with a broken cmos
clock, no user keyboard interaction, and no hardware rng and that's it.
Every other system is going to have some entropy from the last boot at
least.

2007-12-27 10:44:36

by Pavel Machek

[permalink] [raw]
Subject: Re: /dev/urandom uses uninit bytes, leaks user data

On Thu 2007-12-20 15:36:01, Theodore Tso wrote:
> On Wed, Dec 19, 2007 at 11:18:54PM -0500, Andrew Lutomirski wrote:
> > I understand that there's no way that /dev/random can provide good
> > output if there's insufficient entropy. But it still shouldn't leak
> > arbitrary bits of user data that were never meant to be put into the
> > pool at all.
>
> Let's be clear here. It's not arbitrary bits of user data. It's at
> most three bytes of kernel stack garbage from the process which is
> adding entropy to /dev/random from userspace. And the attacker
> doesn't get to pick which 3 bytes they are, either! That means they
> are not "arbitrary bits".
>
> > (My hypothetical attack is a lot hypothetical than I thought at first.
> > An attacker does not need to break into the kernel and steal the
> > state of the pool. It may be as easy as this to trigger:
> >
> > Step 1: Boot a system without a usable entropy source.
> > Step 2: add some (predictable) "entropy" from userspace which isn't a
> > multiple of 4, so up to three extra bytes get added.
> > Step 3: Read a few bytes of /dev/random and send them over the network.
>
> So Step 1 assumes a system without a random seed file, or any usable
> entropy source. As I've mentioned, this means that any cryptographic
> users of /dev/random --- like, say, openssh daemon --- are in deep,
> DEEP, trouble. This is a very serious problem, and in my opinion, far
> more serious and far more REAL that your hypothetical attack. So we
> need to address Step 1 anyway, and more urgently.
>
> In step 2, what program are you expecting will be writing to
> /dev/random? Very few programs do so today, and none that are
> commonly installed on most Linux distributions, other than the dd
> program run out of /etc/init.d/random to initialize /dev/random from
> the random seed file. But, oh wait --- you're assuming that doesn't
> exist, even though all major distributions have it.
>
> Why would a legitimate program read a few bytes of /dev/random and
> send them on the network in step 3? Or are you assuming your
> hypothetical attacker has a shell account on the system? In which
> case there is probably far more concerns about information leakage
> than 3 bytes of kernel stack garbage, which might or might not contain
> user data.
>
> And note that this attack can only be done for 3 bytes. After that,
> the work factors involved become intractable.
>
> So yes, there is a theoretical hole given your assumptions, but the
> problem started the "system without any intial entropy", and there are
> far worse problems that result in that case. So that is the problem
> we should be addressing.
>
> That being said, write_pool() is hardly a fastpath, and the memset
> isn't going to make any difference. So if it makes people feel
> better, I don't object to adding it. I'm just pointing out that the
> assumption is either pretty silly (since all of the distributions do
> use the random seed file, and in practice *someone* has logged into
> the console to add some amount of entropy at some point in the
> machine's lifetime, even it if was when the machine was initially
> installed) and/or points out a more critical issue, which is we need
> to make sure that we do have a reasonable entropy source on all
> machines.

Lets memset. It is not a fastpath, and code will be more obvious that
way.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html