2022-03-28 19:57:40

by Sasha Levin

[permalink] [raw]
Subject: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

From: "Jason A. Donenfeld" <[email protected]>

[ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]

The current 4096-bit LFSR used for entropy collection had a few
desirable attributes for the context in which it was created. For
example, the state was huge, which meant that /dev/random would be able
to output quite a bit of accumulated entropy before blocking. It was
also, in its time, quite fast at accumulating entropy byte-by-byte,
which matters given the varying contexts in which mix_pool_bytes() is
called. And its diffusion was relatively high, which meant that changes
would ripple across several words of state rather quickly.

However, it also suffers from a few security vulnerabilities. In
particular, inputs learned by an attacker can be undone, but moreover,
if the state of the pool leaks, its contents can be controlled and
entirely zeroed out. I've demonstrated this attack with this SMT2
script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
a matter of seconds on a single core of my laptop, resulting in little
proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.

For basically all recent formal models of RNGs, these attacks represent
a significant cryptographic flaw. But how does this manifest
practically? If an attacker has access to the system to such a degree
that he can learn the internal state of the RNG, arguably there are
other lower hanging vulnerabilities -- side-channel, infoleak, or
otherwise -- that might have higher priority. On the other hand, seed
files are frequently used on systems that have a hard time generating
much entropy on their own, and these seed files, being files, often leak
or are duplicated and distributed accidentally, or are even seeded over
the Internet intentionally, where their contents might be recorded or
tampered with. Seen this way, an otherwise quasi-implausible
vulnerability is a bit more practical than initially thought.

Another aspect of the current mix_pool_bytes() function is that, while
its performance was arguably competitive for the time in which it was
created, it's no longer considered so. This patch improves performance
significantly: on a high-end CPU, an i7-11850H, it improves performance
of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
improves performance by 103%.

This commit replaces the LFSR of mix_pool_bytes() with a straight-
forward cryptographic hash function, BLAKE2s, which is already in use
for pool extraction. Universal hashing with a secret seed was considered
too, something along the lines of <https://eprint.iacr.org/2013/338>,
but the requirement for a secret seed makes for a chicken & egg problem.
Instead we go with a formally proven scheme using a computational hash
function, described in sections 5.1, 6.4, and B.1.8 of
<https://eprint.iacr.org/2019/198>.

BLAKE2s outputs 256 bits, which should give us an appropriate amount of
min-entropy accumulation, and a wide enough margin of collision
resistance against active attacks. mix_pool_bytes() becomes a simple
call to blake2s_update(), for accumulation, while the extraction step
becomes a blake2s_final() to generate a seed, with which we can then do
a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
back as an init key for subsequent blake2s_update()s, and the rest we
produce to the caller. This then is provided to our CRNG like usual. In
that expansion step, we make opportunistic use of 32 bytes of RDRAND
output, just as before. We also always reseed the crng with 32 bytes,
unconditionally, or not at all, rather than sometimes with 16 as before,
as we don't win anything by limiting beyond the 16 byte threshold.

Going for a hash function as an entropy collector is a conservative,
proven approach. The result of all this is a much simpler and much less
bespoke construction than what's there now, which not only plugs a
vulnerability but also improves performance considerably.

Cc: Theodore Ts'o <[email protected]>
Cc: Dominik Brodowski <[email protected]>
Reviewed-by: Eric Biggers <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Jean-Philippe Aumasson <[email protected]>
Signed-off-by: Jason A. Donenfeld <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---
drivers/char/random.c | 304 ++++++++----------------------------------
1 file changed, 55 insertions(+), 249 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3404a91edf29..882f78829a24 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -42,61 +42,6 @@
*/

/*
- * (now, with legal B.S. out of the way.....)
- *
- * This routine gathers environmental noise from device drivers, etc.,
- * and returns good random numbers, suitable for cryptographic use.
- * Besides the obvious cryptographic uses, these numbers are also good
- * for seeding TCP sequence numbers, and other places where it is
- * desirable to have numbers which are not only random, but hard to
- * predict by an attacker.
- *
- * Theory of operation
- * ===================
- *
- * Computers are very predictable devices. Hence it is extremely hard
- * to produce truly random numbers on a computer --- as opposed to
- * pseudo-random numbers, which can easily generated by using a
- * algorithm. Unfortunately, it is very easy for attackers to guess
- * the sequence of pseudo-random number generators, and for some
- * applications this is not acceptable. So instead, we must try to
- * gather "environmental noise" from the computer's environment, which
- * must be hard for outside attackers to observe, and use that to
- * generate random numbers. In a Unix environment, this is best done
- * from inside the kernel.
- *
- * Sources of randomness from the environment include inter-keyboard
- * timings, inter-interrupt timings from some interrupts, and other
- * events which are both (a) non-deterministic and (b) hard for an
- * outside observer to measure. Randomness from these sources are
- * added to an "entropy pool", which is mixed using a CRC-like function.
- * This is not cryptographically strong, but it is adequate assuming
- * the randomness is not chosen maliciously, and it is fast enough that
- * the overhead of doing it on every interrupt is very reasonable.
- * As random bytes are mixed into the entropy pool, the routines keep
- * an *estimate* of how many bits of randomness have been stored into
- * the random number generator's internal state.
- *
- * When random bytes are desired, they are obtained by taking the BLAKE2s
- * hash of the contents of the "entropy pool". The BLAKE2s hash avoids
- * exposing the internal state of the entropy pool. It is believed to
- * be computationally infeasible to derive any useful information
- * about the input of BLAKE2s from its output. Even if it is possible to
- * analyze BLAKE2s in some clever way, as long as the amount of data
- * returned from the generator is less than the inherent entropy in
- * the pool, the output data is totally unpredictable. For this
- * reason, the routine decreases its internal estimate of how many
- * bits of "true randomness" are contained in the entropy pool as it
- * outputs random numbers.
- *
- * If this estimate goes to zero, the routine can still generate
- * random numbers; however, an attacker may (at least in theory) be
- * able to infer the future output of the generator from prior
- * outputs. This requires successful cryptanalysis of BLAKE2s, which is
- * not believed to be feasible, but there is a remote possibility.
- * Nonetheless, these numbers should be useful for the vast majority
- * of purposes.
- *
* Exported interfaces ---- output
* ===============================
*
@@ -298,23 +243,6 @@
*
* mknod /dev/random c 1 8
* mknod /dev/urandom c 1 9
- *
- * Acknowledgements:
- * =================
- *
- * Ideas for constructing this random number generator were derived
- * from Pretty Good Privacy's random number generator, and from private
- * discussions with Phil Karn. Colin Plumb provided a faster random
- * number generator, which speed up the mixing function of the entropy
- * pool, taken from PGPfone. Dale Worley has also contributed many
- * useful ideas and suggestions to improve this driver.
- *
- * Any flaws in the design are solely my responsibility, and should
- * not be attributed to the Phil, Colin, or any of authors of PGP.
- *
- * Further background information on this topic may be obtained from
- * RFC 1750, "Randomness Recommendations for Security", by Donald
- * Eastlake, Steve Crocker, and Jeff Schiller.
*/

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -358,79 +286,15 @@

/* #define ADD_INTERRUPT_BENCH */

-/*
- * If the entropy count falls under this number of bits, then we
- * should wake up processes which are selecting or polling on write
- * access to /dev/random.
- */
-static int random_write_wakeup_bits = 28 * (1 << 5);
-
-/*
- * Originally, we used a primitive polynomial of degree .poolwords
- * over GF(2). The taps for various sizes are defined below. They
- * were chosen to be evenly spaced except for the last tap, which is 1
- * to get the twisting happening as fast as possible.
- *
- * For the purposes of better mixing, we use the CRC-32 polynomial as
- * well to make a (modified) twisted Generalized Feedback Shift
- * Register. (See M. Matsumoto & Y. Kurita, 1992. Twisted GFSR
- * generators. ACM Transactions on Modeling and Computer Simulation
- * 2(3):179-194. Also see M. Matsumoto & Y. Kurita, 1994. Twisted
- * GFSR generators II. ACM Transactions on Modeling and Computer
- * Simulation 4:254-266)
- *
- * Thanks to Colin Plumb for suggesting this.
- *
- * The mixing operation is much less sensitive than the output hash,
- * where we use BLAKE2s. All that we want of mixing operation is that
- * it be a good non-cryptographic hash; i.e. it not produce collisions
- * when fed "random" data of the sort we expect to see. As long as
- * the pool state differs for different inputs, we have preserved the
- * input entropy and done a good job. The fact that an intelligent
- * attacker can construct inputs that will produce controlled
- * alterations to the pool's state is not important because we don't
- * consider such inputs to contribute any randomness. The only
- * property we need with respect to them is that the attacker can't
- * increase his/her knowledge of the pool's state. Since all
- * additions are reversible (knowing the final state and the input,
- * you can reconstruct the initial state), if an attacker has any
- * uncertainty about the initial state, he/she can only shuffle that
- * uncertainty about, but never cause any collisions (which would
- * decrease the uncertainty).
- *
- * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
- * Videau in their paper, "The Linux Pseudorandom Number Generator
- * Revisited" (see: http://eprint.iacr.org/2012/251.pdf). In their
- * paper, they point out that we are not using a true Twisted GFSR,
- * since Matsumoto & Kurita used a trinomial feedback polynomial (that
- * is, with only three taps, instead of the six that we are using).
- * As a result, the resulting polynomial is neither primitive nor
- * irreducible, and hence does not have a maximal period over
- * GF(2**32). They suggest a slight change to the generator
- * polynomial which improves the resulting TGFSR polynomial to be
- * irreducible, which we have made here.
- */
enum poolinfo {
- POOL_WORDS = 128,
- POOL_WORDMASK = POOL_WORDS - 1,
- POOL_BYTES = POOL_WORDS * sizeof(u32),
- POOL_BITS = POOL_BYTES * 8,
+ POOL_BITS = BLAKE2S_HASH_SIZE * 8,
POOL_BITSHIFT = ilog2(POOL_BITS),

/* To allow fractional bits to be tracked, the entropy_count field is
* denominated in units of 1/8th bits. */
POOL_ENTROPY_SHIFT = 3,
#define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
- POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
-
- /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
- POOL_TAP1 = 104,
- POOL_TAP2 = 76,
- POOL_TAP3 = 51,
- POOL_TAP4 = 25,
- POOL_TAP5 = 1,
-
- EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
+ POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
};

/*
@@ -438,6 +302,12 @@ enum poolinfo {
*/
static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
static struct fasync_struct *fasync;
+/*
+ * If the entropy count falls under this number of bits, then we
+ * should wake up processes which are selecting or polling on write
+ * access to /dev/random.
+ */
+static int random_write_wakeup_bits = POOL_BITS * 3 / 4;

static DEFINE_SPINLOCK(random_ready_list_lock);
static LIST_HEAD(random_ready_list);
@@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
*
**********************************************************************/

-static u32 input_pool_data[POOL_WORDS] __latent_entropy;
-
static struct {
+ struct blake2s_state hash;
spinlock_t lock;
- u16 add_ptr;
- u16 input_rotate;
int entropy_count;
} input_pool = {
+ .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
+ BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
+ BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
+ .hash.outlen = BLAKE2S_HASH_SIZE,
.lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
};

-static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
-static ssize_t _extract_entropy(void *buf, size_t nbytes);
+static bool extract_entropy(void *buf, size_t nbytes, int min);
+static void _extract_entropy(void *buf, size_t nbytes);

static void crng_reseed(struct crng_state *crng, bool use_input_pool);

-static const u32 twist_table[8] = {
- 0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
- 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
-
/*
* This function adds bytes into the entropy "pool". It does not
* update the entropy estimate. The caller should call
* credit_entropy_bits if this is appropriate.
- *
- * The pool is stirred with a primitive polynomial of the appropriate
- * degree, and then twisted. We twist by three bits at a time because
- * it's cheap to do so and helps slightly in the expected case where
- * the entropy is concentrated in the low-order bits.
*/
static void _mix_pool_bytes(const void *in, int nbytes)
{
- unsigned long i;
- int input_rotate;
- const u8 *bytes = in;
- u32 w;
-
- input_rotate = input_pool.input_rotate;
- i = input_pool.add_ptr;
-
- /* mix one byte at a time to simplify size handling and churn faster */
- while (nbytes--) {
- w = rol32(*bytes++, input_rotate);
- i = (i - 1) & POOL_WORDMASK;
-
- /* XOR in the various taps */
- w ^= input_pool_data[i];
- w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
- w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
- w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
- w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
- w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
-
- /* Mix the result back in with a twist */
- input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
-
- /*
- * Normally, we add 7 bits of rotation to the pool.
- * At the beginning of the pool, add an extra 7 bits
- * rotation, so that successive passes spread the
- * input bits across the pool evenly.
- */
- input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
- }
-
- input_pool.input_rotate = input_rotate;
- input_pool.add_ptr = i;
+ blake2s_update(&input_pool.hash, in, nbytes);
}

static void __mix_pool_bytes(const void *in, int nbytes)
@@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
static void crng_reseed(struct crng_state *crng, bool use_input_pool)
{
unsigned long flags;
- int i, num;
+ int i;
union {
u8 block[CHACHA_BLOCK_SIZE];
u32 key[8];
} buf;

if (use_input_pool) {
- num = extract_entropy(&buf, 32, 16);
- if (num == 0)
+ if (!extract_entropy(&buf, 32, 16))
return;
} else {
_extract_crng(&primary_crng, buf.block);
@@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
}

/*
- * This function does the actual extraction for extract_entropy.
- *
- * Note: we assume that .poolwords is a multiple of 16 words.
+ * This is an HKDF-like construction for using the hashed collected entropy
+ * as a PRF key, that's then expanded block-by-block.
*/
-static void extract_buf(u8 *out)
+static void _extract_entropy(void *buf, size_t nbytes)
{
- struct blake2s_state state __aligned(__alignof__(unsigned long));
- u8 hash[BLAKE2S_HASH_SIZE];
- unsigned long *salt;
unsigned long flags;
-
- blake2s_init(&state, sizeof(hash));
-
- /*
- * If we have an architectural hardware random number
- * generator, use it for BLAKE2's salt & personal fields.
- */
- for (salt = (unsigned long *)&state.h[4];
- salt < (unsigned long *)&state.h[8]; ++salt) {
- unsigned long v;
- if (!arch_get_random_long(&v))
- break;
- *salt ^= v;
+ u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
+ struct {
+ unsigned long rdrand[32 / sizeof(long)];
+ size_t counter;
+ } block;
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
+ if (!arch_get_random_long(&block.rdrand[i]))
+ block.rdrand[i] = random_get_entropy();
}

- /* Generate a hash across the pool */
spin_lock_irqsave(&input_pool.lock, flags);
- blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
- blake2s_final(&state, hash); /* final zeros out state */

- /*
- * We mix the hash back into the pool to prevent backtracking
- * attacks (where the attacker knows the state of the pool
- * plus the current outputs, and attempts to find previous
- * outputs), unless the hash function can be inverted. By
- * mixing at least a hash worth of hash data back, we make
- * brute-forcing the feedback as hard as brute-forcing the
- * hash.
- */
- __mix_pool_bytes(hash, sizeof(hash));
- spin_unlock_irqrestore(&input_pool.lock, flags);
+ /* seed = HASHPRF(last_key, entropy_input) */
+ blake2s_final(&input_pool.hash, seed);

- /* Note that EXTRACT_SIZE is half of hash size here, because above
- * we've dumped the full length back into mixer. By reducing the
- * amount that we emit, we retain a level of forward secrecy.
- */
- memcpy(out, hash, EXTRACT_SIZE);
- memzero_explicit(hash, sizeof(hash));
-}
+ /* next_key = HASHPRF(seed, RDRAND || 0) */
+ block.counter = 0;
+ blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
+ blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));

-static ssize_t _extract_entropy(void *buf, size_t nbytes)
-{
- ssize_t ret = 0, i;
- u8 tmp[EXTRACT_SIZE];
+ spin_unlock_irqrestore(&input_pool.lock, flags);
+ memzero_explicit(next_key, sizeof(next_key));

while (nbytes) {
- extract_buf(tmp);
- i = min_t(int, nbytes, EXTRACT_SIZE);
- memcpy(buf, tmp, i);
+ i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
+ /* output = HASHPRF(seed, RDRAND || ++counter) */
+ ++block.counter;
+ blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
nbytes -= i;
buf += i;
- ret += i;
}

- /* Wipe data just returned from memory */
- memzero_explicit(tmp, sizeof(tmp));
-
- return ret;
+ memzero_explicit(seed, sizeof(seed));
+ memzero_explicit(&block, sizeof(block));
}

/*
@@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
* returns it in a buffer.
*
* The min parameter specifies the minimum amount we can pull before
- * failing to avoid races that defeat catastrophic reseeding.
+ * failing to avoid races that defeat catastrophic reseeding. If we
+ * have less than min entropy available, we return false and buf is
+ * not filled.
*/
-static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
+static bool extract_entropy(void *buf, size_t nbytes, int min)
{
trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
- nbytes = account(nbytes, min);
- return _extract_entropy(buf, nbytes);
+ if (account(nbytes, min)) {
+ _extract_entropy(buf, nbytes);
+ return true;
+ }
+ return false;
}

#define warn_unseeded_randomness(previous) \
@@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
unsigned long rv;

mix_pool_bytes(&now, sizeof(now));
- for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
+ for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
if (!arch_get_random_seed_long(&rv) &&
!arch_get_random_long(&rv))
rv = random_get_entropy();
--
2.34.1


2022-03-28 21:45:42

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

I should have said 'predicate', not 'axiom'. Please review the
rigorous proof of correctness as listed on my github. Another issue
I'd like to bring up in 5.17 is the apparent lack of mathematical
proofs and overuse of the word "entropy" when we should be using the
word uniqueness - do we really want to go live with something that
isn't documented at all and just trust the word of any one individual?

My team put together mathematical proof of correctness, the onerious
of proving a defect in the implementation is up to individuals who run
the code and observe a failing - that is the point of code review.
The proposed changes to the pool means it no longer needs to be
"drained" or "accumulated", however we still may need a lock to make
sure the /dev/random device driver is ready for use prior to access.
Although I am actually not entirely convinced that this lock is needed
because it directly impacts boot time because of KSLR. We could lock
and wait for the slow long term storage to wake up and feed us the
seed - but is that really necessary? The alternate_rand() function in
the provided patch should be sufficient for the vast majority of
non-paranoid users, I don't see how a would-be adversary could
undermine this construction - sure it isn't ideally secure as a fully
loaded keypool - but it's good enough to thwart a memory corruption
exploit. A fast boot option is preferable IMHO.

Regards,
Mike

On Mon, Mar 28, 2022 at 9:16 AM Michael Brooks <[email protected]> wrote:
>
> Jason,
>
> I think this is an astute observation - the current design of the pool has this defect of being emptied leading to deadlocks, and in addition, access to it undermines the predictability of the pool. This needs to be fundamentally addressed.
>
> What if access to the pool could never undermine the predictability of the pool state? If this axiom holds, then there would never be a reason to keep a global counter for how much “entropy” or uniqueness is in the pool - and that would remove the lock that handle_irq_event_percpu() is fighting over... which creates a condition where one lock is being battled over by every irq event - which is made worse as the machine has additional CPUs. This one lock is so bad, it is basically a GIL, and we need to have a serious discussion on how to remove this one lock.
>
> This may sound far fetched, but there is actually some clever cryptography that can provide a good solution here. What if the pool was not a linear structure FILO structure, but actually a jump table? What if access to this jump table was randomized, so that no two callers could take the same path through the table? What if in addition, upon read - the reader "covers their tracks" and modifies their entrypoint into the table to prevent any other caller from being able to follow the same path? This is exactly what keypoolrandom does.
>
> Now, if the output of this jump table is used as an input to cryptographic primitives such as an ideal block cipher or hash function - then even a single bit flip would cause an avalanche which dramatically affects the output - and also obscures how it was generated. Therefore, access using this method could never undermine the pool state - therefore this speciture structure never "drains" but rather becomes less and less predictable over time.
>
> This kind of jump table has been written and is called https://github.com/TheRook/KeypoolRandom. This took three years to write, has been peer reviewed and has exquisite documentation, if you take the time to read over this code and the docs - I think you'll enjoy it. This project is very easy to compile and to run - it is the software equivalent of a breakout board, and doesn't require a full kernel compile to see how it functions.
>
> Regards,
> Michael Brooks
>
>
> On Mon, Mar 28, 2022 at 4:20 AM Sasha Levin <[email protected]> wrote:
>>
>> From: "Jason A. Donenfeld" <[email protected]>
>>
>> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
>>
>> The current 4096-bit LFSR used for entropy collection had a few
>> desirable attributes for the context in which it was created. For
>> example, the state was huge, which meant that /dev/random would be able
>> to output quite a bit of accumulated entropy before blocking. It was
>> also, in its time, quite fast at accumulating entropy byte-by-byte,
>> which matters given the varying contexts in which mix_pool_bytes() is
>> called. And its diffusion was relatively high, which meant that changes
>> would ripple across several words of state rather quickly.
>>
>> However, it also suffers from a few security vulnerabilities. In
>> particular, inputs learned by an attacker can be undone, but moreover,
>> if the state of the pool leaks, its contents can be controlled and
>> entirely zeroed out. I've demonstrated this attack with this SMT2
>> script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
>> a matter of seconds on a single core of my laptop, resulting in little
>> proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
>>
>> For basically all recent formal models of RNGs, these attacks represent
>> a significant cryptographic flaw. But how does this manifest
>> practically? If an attacker has access to the system to such a degree
>> that he can learn the internal state of the RNG, arguably there are
>> other lower hanging vulnerabilities -- side-channel, infoleak, or
>> otherwise -- that might have higher priority. On the other hand, seed
>> files are frequently used on systems that have a hard time generating
>> much entropy on their own, and these seed files, being files, often leak
>> or are duplicated and distributed accidentally, or are even seeded over
>> the Internet intentionally, where their contents might be recorded or
>> tampered with. Seen this way, an otherwise quasi-implausible
>> vulnerability is a bit more practical than initially thought.
>>
>> Another aspect of the current mix_pool_bytes() function is that, while
>> its performance was arguably competitive for the time in which it was
>> created, it's no longer considered so. This patch improves performance
>> significantly: on a high-end CPU, an i7-11850H, it improves performance
>> of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
>> improves performance by 103%.
>>
>> This commit replaces the LFSR of mix_pool_bytes() with a straight-
>> forward cryptographic hash function, BLAKE2s, which is already in use
>> for pool extraction. Universal hashing with a secret seed was considered
>> too, something along the lines of <https://eprint.iacr.org/2013/338>,
>> but the requirement for a secret seed makes for a chicken & egg problem.
>> Instead we go with a formally proven scheme using a computational hash
>> function, described in sections 5.1, 6.4, and B.1.8 of
>> <https://eprint.iacr.org/2019/198>.
>>
>> BLAKE2s outputs 256 bits, which should give us an appropriate amount of
>> min-entropy accumulation, and a wide enough margin of collision
>> resistance against active attacks. mix_pool_bytes() becomes a simple
>> call to blake2s_update(), for accumulation, while the extraction step
>> becomes a blake2s_final() to generate a seed, with which we can then do
>> a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
>> back as an init key for subsequent blake2s_update()s, and the rest we
>> produce to the caller. This then is provided to our CRNG like usual. In
>> that expansion step, we make opportunistic use of 32 bytes of RDRAND
>> output, just as before. We also always reseed the crng with 32 bytes,
>> unconditionally, or not at all, rather than sometimes with 16 as before,
>> as we don't win anything by limiting beyond the 16 byte threshold.
>>
>> Going for a hash function as an entropy collector is a conservative,
>> proven approach. The result of all this is a much simpler and much less
>> bespoke construction than what's there now, which not only plugs a
>> vulnerability but also improves performance considerably.
>>
>> Cc: Theodore Ts'o <[email protected]>
>> Cc: Dominik Brodowski <[email protected]>
>> Reviewed-by: Eric Biggers <[email protected]>
>> Reviewed-by: Greg Kroah-Hartman <[email protected]>
>> Reviewed-by: Jean-Philippe Aumasson <[email protected]>
>> Signed-off-by: Jason A. Donenfeld <[email protected]>
>> Signed-off-by: Sasha Levin <[email protected]>
>> ---
>> drivers/char/random.c | 304 ++++++++----------------------------------
>> 1 file changed, 55 insertions(+), 249 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 3404a91edf29..882f78829a24 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -42,61 +42,6 @@
>> */
>>
>> /*
>> - * (now, with legal B.S. out of the way.....)
>> - *
>> - * This routine gathers environmental noise from device drivers, etc.,
>> - * and returns good random numbers, suitable for cryptographic use.
>> - * Besides the obvious cryptographic uses, these numbers are also good
>> - * for seeding TCP sequence numbers, and other places where it is
>> - * desirable to have numbers which are not only random, but hard to
>> - * predict by an attacker.
>> - *
>> - * Theory of operation
>> - * ===================
>> - *
>> - * Computers are very predictable devices. Hence it is extremely hard
>> - * to produce truly random numbers on a computer --- as opposed to
>> - * pseudo-random numbers, which can easily generated by using a
>> - * algorithm. Unfortunately, it is very easy for attackers to guess
>> - * the sequence of pseudo-random number generators, and for some
>> - * applications this is not acceptable. So instead, we must try to
>> - * gather "environmental noise" from the computer's environment, which
>> - * must be hard for outside attackers to observe, and use that to
>> - * generate random numbers. In a Unix environment, this is best done
>> - * from inside the kernel.
>> - *
>> - * Sources of randomness from the environment include inter-keyboard
>> - * timings, inter-interrupt timings from some interrupts, and other
>> - * events which are both (a) non-deterministic and (b) hard for an
>> - * outside observer to measure. Randomness from these sources are
>> - * added to an "entropy pool", which is mixed using a CRC-like function.
>> - * This is not cryptographically strong, but it is adequate assuming
>> - * the randomness is not chosen maliciously, and it is fast enough that
>> - * the overhead of doing it on every interrupt is very reasonable.
>> - * As random bytes are mixed into the entropy pool, the routines keep
>> - * an *estimate* of how many bits of randomness have been stored into
>> - * the random number generator's internal state.
>> - *
>> - * When random bytes are desired, they are obtained by taking the BLAKE2s
>> - * hash of the contents of the "entropy pool". The BLAKE2s hash avoids
>> - * exposing the internal state of the entropy pool. It is believed to
>> - * be computationally infeasible to derive any useful information
>> - * about the input of BLAKE2s from its output. Even if it is possible to
>> - * analyze BLAKE2s in some clever way, as long as the amount of data
>> - * returned from the generator is less than the inherent entropy in
>> - * the pool, the output data is totally unpredictable. For this
>> - * reason, the routine decreases its internal estimate of how many
>> - * bits of "true randomness" are contained in the entropy pool as it
>> - * outputs random numbers.
>> - *
>> - * If this estimate goes to zero, the routine can still generate
>> - * random numbers; however, an attacker may (at least in theory) be
>> - * able to infer the future output of the generator from prior
>> - * outputs. This requires successful cryptanalysis of BLAKE2s, which is
>> - * not believed to be feasible, but there is a remote possibility.
>> - * Nonetheless, these numbers should be useful for the vast majority
>> - * of purposes.
>> - *
>> * Exported interfaces ---- output
>> * ===============================
>> *
>> @@ -298,23 +243,6 @@
>> *
>> * mknod /dev/random c 1 8
>> * mknod /dev/urandom c 1 9
>> - *
>> - * Acknowledgements:
>> - * =================
>> - *
>> - * Ideas for constructing this random number generator were derived
>> - * from Pretty Good Privacy's random number generator, and from private
>> - * discussions with Phil Karn. Colin Plumb provided a faster random
>> - * number generator, which speed up the mixing function of the entropy
>> - * pool, taken from PGPfone. Dale Worley has also contributed many
>> - * useful ideas and suggestions to improve this driver.
>> - *
>> - * Any flaws in the design are solely my responsibility, and should
>> - * not be attributed to the Phil, Colin, or any of authors of PGP.
>> - *
>> - * Further background information on this topic may be obtained from
>> - * RFC 1750, "Randomness Recommendations for Security", by Donald
>> - * Eastlake, Steve Crocker, and Jeff Schiller.
>> */
>>
>> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> @@ -358,79 +286,15 @@
>>
>> /* #define ADD_INTERRUPT_BENCH */
>>
>> -/*
>> - * If the entropy count falls under this number of bits, then we
>> - * should wake up processes which are selecting or polling on write
>> - * access to /dev/random.
>> - */
>> -static int random_write_wakeup_bits = 28 * (1 << 5);
>> -
>> -/*
>> - * Originally, we used a primitive polynomial of degree .poolwords
>> - * over GF(2). The taps for various sizes are defined below. They
>> - * were chosen to be evenly spaced except for the last tap, which is 1
>> - * to get the twisting happening as fast as possible.
>> - *
>> - * For the purposes of better mixing, we use the CRC-32 polynomial as
>> - * well to make a (modified) twisted Generalized Feedback Shift
>> - * Register. (See M. Matsumoto & Y. Kurita, 1992. Twisted GFSR
>> - * generators. ACM Transactions on Modeling and Computer Simulation
>> - * 2(3):179-194. Also see M. Matsumoto & Y. Kurita, 1994. Twisted
>> - * GFSR generators II. ACM Transactions on Modeling and Computer
>> - * Simulation 4:254-266)
>> - *
>> - * Thanks to Colin Plumb for suggesting this.
>> - *
>> - * The mixing operation is much less sensitive than the output hash,
>> - * where we use BLAKE2s. All that we want of mixing operation is that
>> - * it be a good non-cryptographic hash; i.e. it not produce collisions
>> - * when fed "random" data of the sort we expect to see. As long as
>> - * the pool state differs for different inputs, we have preserved the
>> - * input entropy and done a good job. The fact that an intelligent
>> - * attacker can construct inputs that will produce controlled
>> - * alterations to the pool's state is not important because we don't
>> - * consider such inputs to contribute any randomness. The only
>> - * property we need with respect to them is that the attacker can't
>> - * increase his/her knowledge of the pool's state. Since all
>> - * additions are reversible (knowing the final state and the input,
>> - * you can reconstruct the initial state), if an attacker has any
>> - * uncertainty about the initial state, he/she can only shuffle that
>> - * uncertainty about, but never cause any collisions (which would
>> - * decrease the uncertainty).
>> - *
>> - * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
>> - * Videau in their paper, "The Linux Pseudorandom Number Generator
>> - * Revisited" (see: http://eprint.iacr.org/2012/251.pdf). In their
>> - * paper, they point out that we are not using a true Twisted GFSR,
>> - * since Matsumoto & Kurita used a trinomial feedback polynomial (that
>> - * is, with only three taps, instead of the six that we are using).
>> - * As a result, the resulting polynomial is neither primitive nor
>> - * irreducible, and hence does not have a maximal period over
>> - * GF(2**32). They suggest a slight change to the generator
>> - * polynomial which improves the resulting TGFSR polynomial to be
>> - * irreducible, which we have made here.
>> - */
>> enum poolinfo {
>> - POOL_WORDS = 128,
>> - POOL_WORDMASK = POOL_WORDS - 1,
>> - POOL_BYTES = POOL_WORDS * sizeof(u32),
>> - POOL_BITS = POOL_BYTES * 8,
>> + POOL_BITS = BLAKE2S_HASH_SIZE * 8,
>> POOL_BITSHIFT = ilog2(POOL_BITS),
>>
>> /* To allow fractional bits to be tracked, the entropy_count field is
>> * denominated in units of 1/8th bits. */
>> POOL_ENTROPY_SHIFT = 3,
>> #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
>> - POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
>> -
>> - /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
>> - POOL_TAP1 = 104,
>> - POOL_TAP2 = 76,
>> - POOL_TAP3 = 51,
>> - POOL_TAP4 = 25,
>> - POOL_TAP5 = 1,
>> -
>> - EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
>> + POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
>> };
>>
>> /*
>> @@ -438,6 +302,12 @@ enum poolinfo {
>> */
>> static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
>> static struct fasync_struct *fasync;
>> +/*
>> + * If the entropy count falls under this number of bits, then we
>> + * should wake up processes which are selecting or polling on write
>> + * access to /dev/random.
>> + */
>> +static int random_write_wakeup_bits = POOL_BITS * 3 / 4;
>>
>> static DEFINE_SPINLOCK(random_ready_list_lock);
>> static LIST_HEAD(random_ready_list);
>> @@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
>> *
>> **********************************************************************/
>>
>> -static u32 input_pool_data[POOL_WORDS] __latent_entropy;
>> -
>> static struct {
>> + struct blake2s_state hash;
>> spinlock_t lock;
>> - u16 add_ptr;
>> - u16 input_rotate;
>> int entropy_count;
>> } input_pool = {
>> + .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
>> + BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
>> + BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
>> + .hash.outlen = BLAKE2S_HASH_SIZE,
>> .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
>> };
>>
>> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
>> -static ssize_t _extract_entropy(void *buf, size_t nbytes);
>> +static bool extract_entropy(void *buf, size_t nbytes, int min);
>> +static void _extract_entropy(void *buf, size_t nbytes);
>>
>> static void crng_reseed(struct crng_state *crng, bool use_input_pool);
>>
>> -static const u32 twist_table[8] = {
>> - 0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
>> - 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
>> -
>> /*
>> * This function adds bytes into the entropy "pool". It does not
>> * update the entropy estimate. The caller should call
>> * credit_entropy_bits if this is appropriate.
>> - *
>> - * The pool is stirred with a primitive polynomial of the appropriate
>> - * degree, and then twisted. We twist by three bits at a time because
>> - * it's cheap to do so and helps slightly in the expected case where
>> - * the entropy is concentrated in the low-order bits.
>> */
>> static void _mix_pool_bytes(const void *in, int nbytes)
>> {
>> - unsigned long i;
>> - int input_rotate;
>> - const u8 *bytes = in;
>> - u32 w;
>> -
>> - input_rotate = input_pool.input_rotate;
>> - i = input_pool.add_ptr;
>> -
>> - /* mix one byte at a time to simplify size handling and churn faster */
>> - while (nbytes--) {
>> - w = rol32(*bytes++, input_rotate);
>> - i = (i - 1) & POOL_WORDMASK;
>> -
>> - /* XOR in the various taps */
>> - w ^= input_pool_data[i];
>> - w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
>> - w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
>> - w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
>> - w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
>> - w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
>> -
>> - /* Mix the result back in with a twist */
>> - input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
>> -
>> - /*
>> - * Normally, we add 7 bits of rotation to the pool.
>> - * At the beginning of the pool, add an extra 7 bits
>> - * rotation, so that successive passes spread the
>> - * input bits across the pool evenly.
>> - */
>> - input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
>> - }
>> -
>> - input_pool.input_rotate = input_rotate;
>> - input_pool.add_ptr = i;
>> + blake2s_update(&input_pool.hash, in, nbytes);
>> }
>>
>> static void __mix_pool_bytes(const void *in, int nbytes)
>> @@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
>> static void crng_reseed(struct crng_state *crng, bool use_input_pool)
>> {
>> unsigned long flags;
>> - int i, num;
>> + int i;
>> union {
>> u8 block[CHACHA_BLOCK_SIZE];
>> u32 key[8];
>> } buf;
>>
>> if (use_input_pool) {
>> - num = extract_entropy(&buf, 32, 16);
>> - if (num == 0)
>> + if (!extract_entropy(&buf, 32, 16))
>> return;
>> } else {
>> _extract_crng(&primary_crng, buf.block);
>> @@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
>> }
>>
>> /*
>> - * This function does the actual extraction for extract_entropy.
>> - *
>> - * Note: we assume that .poolwords is a multiple of 16 words.
>> + * This is an HKDF-like construction for using the hashed collected entropy
>> + * as a PRF key, that's then expanded block-by-block.
>> */
>> -static void extract_buf(u8 *out)
>> +static void _extract_entropy(void *buf, size_t nbytes)
>> {
>> - struct blake2s_state state __aligned(__alignof__(unsigned long));
>> - u8 hash[BLAKE2S_HASH_SIZE];
>> - unsigned long *salt;
>> unsigned long flags;
>> -
>> - blake2s_init(&state, sizeof(hash));
>> -
>> - /*
>> - * If we have an architectural hardware random number
>> - * generator, use it for BLAKE2's salt & personal fields.
>> - */
>> - for (salt = (unsigned long *)&state.h[4];
>> - salt < (unsigned long *)&state.h[8]; ++salt) {
>> - unsigned long v;
>> - if (!arch_get_random_long(&v))
>> - break;
>> - *salt ^= v;
>> + u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
>> + struct {
>> + unsigned long rdrand[32 / sizeof(long)];
>> + size_t counter;
>> + } block;
>> + size_t i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
>> + if (!arch_get_random_long(&block.rdrand[i]))
>> + block.rdrand[i] = random_get_entropy();
>> }
>>
>> - /* Generate a hash across the pool */
>> spin_lock_irqsave(&input_pool.lock, flags);
>> - blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
>> - blake2s_final(&state, hash); /* final zeros out state */
>>
>> - /*
>> - * We mix the hash back into the pool to prevent backtracking
>> - * attacks (where the attacker knows the state of the pool
>> - * plus the current outputs, and attempts to find previous
>> - * outputs), unless the hash function can be inverted. By
>> - * mixing at least a hash worth of hash data back, we make
>> - * brute-forcing the feedback as hard as brute-forcing the
>> - * hash.
>> - */
>> - __mix_pool_bytes(hash, sizeof(hash));
>> - spin_unlock_irqrestore(&input_pool.lock, flags);
>> + /* seed = HASHPRF(last_key, entropy_input) */
>> + blake2s_final(&input_pool.hash, seed);
>>
>> - /* Note that EXTRACT_SIZE is half of hash size here, because above
>> - * we've dumped the full length back into mixer. By reducing the
>> - * amount that we emit, we retain a level of forward secrecy.
>> - */
>> - memcpy(out, hash, EXTRACT_SIZE);
>> - memzero_explicit(hash, sizeof(hash));
>> -}
>> + /* next_key = HASHPRF(seed, RDRAND || 0) */
>> + block.counter = 0;
>> + blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
>> + blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
>>
>> -static ssize_t _extract_entropy(void *buf, size_t nbytes)
>> -{
>> - ssize_t ret = 0, i;
>> - u8 tmp[EXTRACT_SIZE];
>> + spin_unlock_irqrestore(&input_pool.lock, flags);
>> + memzero_explicit(next_key, sizeof(next_key));
>>
>> while (nbytes) {
>> - extract_buf(tmp);
>> - i = min_t(int, nbytes, EXTRACT_SIZE);
>> - memcpy(buf, tmp, i);
>> + i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
>> + /* output = HASHPRF(seed, RDRAND || ++counter) */
>> + ++block.counter;
>> + blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
>> nbytes -= i;
>> buf += i;
>> - ret += i;
>> }
>>
>> - /* Wipe data just returned from memory */
>> - memzero_explicit(tmp, sizeof(tmp));
>> -
>> - return ret;
>> + memzero_explicit(seed, sizeof(seed));
>> + memzero_explicit(&block, sizeof(block));
>> }
>>
>> /*
>> @@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
>> * returns it in a buffer.
>> *
>> * The min parameter specifies the minimum amount we can pull before
>> - * failing to avoid races that defeat catastrophic reseeding.
>> + * failing to avoid races that defeat catastrophic reseeding. If we
>> + * have less than min entropy available, we return false and buf is
>> + * not filled.
>> */
>> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
>> +static bool extract_entropy(void *buf, size_t nbytes, int min)
>> {
>> trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
>> - nbytes = account(nbytes, min);
>> - return _extract_entropy(buf, nbytes);
>> + if (account(nbytes, min)) {
>> + _extract_entropy(buf, nbytes);
>> + return true;
>> + }
>> + return false;
>> }
>>
>> #define warn_unseeded_randomness(previous) \
>> @@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
>> unsigned long rv;
>>
>> mix_pool_bytes(&now, sizeof(now));
>> - for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
>> + for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
>> if (!arch_get_random_seed_long(&rv) &&
>> !arch_get_random_long(&rv))
>> rv = random_get_entropy();
>> --
>> 2.34.1
>>

2022-03-28 21:52:43

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

This patch is not ready, but I think it solves the issues Jason has brought up.

-Mike

On Mon, Mar 28, 2022 at 11:08 AM Eric Biggers <[email protected]> wrote:
>
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <[email protected]>
> >
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> >
>
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable. There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
>
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.
>
> - Eric


Attachments:
keypoolrandom.patch (58.46 kB)

2022-03-28 21:58:57

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> From: "Jason A. Donenfeld" <[email protected]>
>
> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
>

I don't think it's a good idea to start backporting random commits to random.c
that weren't marked for stable. There were a lot of changes in v5.18, and
sometimes they relate to each other in subtle ways, so the individual commits
aren't necessarily safe to pick.

IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
Jason explicitly reviews the exact sequence of commits that you're backporting.

- Eric

2022-03-29 12:51:51

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

Hi Sasha,

On Mon, Mar 28, 2022 at 2:08 PM Eric Biggers <[email protected]> wrote:
>
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <[email protected]>
> >
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> >
>
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable. There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
>
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.

I'm inclined to agree with Eric here that you might be a bit careful
about autosel'ing 5.18, given how extensive the changes were. In
theory they should all be properly sequenced so that nothing breaks,
but I'd still be cautious. However, if you want, maybe we can work out
some plan for backporting. I'll take a look and maybe will ping you on
IRC about it.

Jason

2022-03-29 18:14:37

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

On Mon, Mar 28, 2022 at 06:08:26PM +0000, Eric Biggers wrote:
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <[email protected]>
> >
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> >
>
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable. There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
>
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.

Especially this commit in general, which is making a fundamental
change in how we extract entropy. We should be very careful about
taking such changes into stable; a release or two of additonal "soak"
time would be a good idea before these go into the LTS releases in particular.

- Ted

2022-03-30 15:04:42

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

I agree with Ted, this patch is just to start the discussion on how
we can safely remove these locks for the improvement of safety and
security. Both boot and interrupt benchmarks stand to benefit from a
patch like this, so it is worth a deep dive.

Feedback welcome, I am always looking for ways I can be a better
engineer, and a better hacker and a better person. And we are all here
to make the very best kernel.

Regards,
Micahel

On Tue, Mar 29, 2022 at 8:39 AM Theodore Ts'o <[email protected]> wrote:
>
> On Mon, Mar 28, 2022 at 06:08:26PM +0000, Eric Biggers wrote:
> > On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > > From: "Jason A. Donenfeld" <[email protected]>
> > >
> > > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> > >
> >
> > I don't think it's a good idea to start backporting random commits to random.c
> > that weren't marked for stable. There were a lot of changes in v5.18, and
> > sometimes they relate to each other in subtle ways, so the individual commits
> > aren't necessarily safe to pick.
> >
> > IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> > Jason explicitly reviews the exact sequence of commits that you're backporting.
>
> Especially this commit in general, which is making a fundamental
> change in how we extract entropy. We should be very careful about
> taking such changes into stable; a release or two of additonal "soak"
> time would be a good idea before these go into the LTS releases in particular.
>
> - Ted

2022-03-30 18:24:18

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

Of course I am assuming local user non-root access. One does not need
to reverse the mix operations in order to form a parallel construction
- a one way function is sufficient for such a construct as both sides
will operate on the data in the same manner.

This attack scenario is simply a non-issue in keypoolrandom.
https://github.com/TheRook/KeypoolRandom

On Wed, Mar 30, 2022 at 9:49 AM David Laight <[email protected]> wrote:
>
> From: Michael Brooks
> > Sent: 30 March 2022 17:08
> ...
> > I’d like to describe this bug using mathematics, because that is how I
> > work - I am the kind of person that appreciates rigor. In this case,
> > let's use inductive reasoning to illuminate this issue.
> >
> > Now, in this attack scenario let “p” be the overall pool state and let
> > “n” be the good unknown values added to the pool. Finally, let “k” be
> > the known values - such as jiffies. If we then describe the ratio of
> > unknown uniqueness with known uniqueness as p=n/k then as a k grows
> > the overall predictability of the pool approaches an infinite value as
> > k approaches zero. A parallel pool, let's call it p’ (that is
> > pronounced “p-prime” for those who don’t get the notation). let
> > p’=n’/k’. In this case the attacker has no hope of constructing n’,
> > but they can construct k’ - therefore the attacker’s parasol model of
> > the pool p’ will become more accurate as the attack persists leading
> > to p’ = p as time->∞.
> >
> > Q.E.D.
>
> That rather depends on how the (not) 'randmoness' is added to the pool.
> If there are 'r' bits of randomness in the pool and a 'stir in' a pile
> of known bits there can still be 'n' bits of randomness in the pool.
>
> The whole thing really relies on the non-reversability of the final prng.
> Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
> in the prng you only need to request 'r + p' bits of output to be able
> to solve the 'p + r' simultaneous equations in 'p + r' unknowns
> (I think that is in the field {0, 1}).
>
> The old kernel random number generator that used xor to combine the
> outputs of several LFSR is trivially reversable.
> It will leak whatever it was seeded with.
>
> The non-reversability of the pool isn't as important since you need
> to reverse the prng as well.
>
> So while, in some sense, removing 'p' bits from the entropy pool
> to seed the prng only leaves 'r - p' bits left.
> That is only true if you think the prng is reversable.
> Provided 'r > p' (preferably 'r >> p') you can reseed the prng
> again (provided you take reasonably random bits) without
> really exposing any more state to an attacker.
>
> Someone doing cat /dev/urandom >/dev/null should just keep reading
> values out of the entropy pool.
> But if they are discarding the values that shouldn't help them
> recover the state of the entropy pool or the prng - even if only
> constant values are being added to the pool.
>
> Really what you mustn't do is delete the bits used to seed the prng
> from the entropy pool.
> About the only way to actually reduce the randomness of the entropy
> pool is if you've discovered what is actually in it, know the
> 'stirring' algorithm and feed in data that exactly cancels out
> bits that are present already.
> I suspect that anything with root access can manage that!
> (Although they can just overwrite the entropy pool itself,
> and the prng for that matter.)
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2022-03-31 02:27:20

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

Good point Ted, I agree we should have a defense-in-depth design that
plans on failure. I expect keypoolrandom to be resistant against this
attack as well.

In this threat model, p' is a Laplacian Demon. Any parallel
construction is aided by whatever source of information the attacker
can come by. Using jiffies as a known-perimage aids Laplace's Demon,
as does a memory disclosure vulnerability. Leplace's Demon can simply
"get lucky" and report both false-positives and false-negatives, but
in this model it should get more lucky over time. Now we get into the
realm of statistics and predictive mathematics, which gets into
Langevin Dynamics and Einstein's early work describing Brownian
Motion.

Looping in a fellow Cryptographer Nicholas, who has a passion for
Laplace's work.

Regards,
Michael

On Wed, Mar 30, 2022 at 12:01 PM Theodore Y. Ts'o <[email protected]> wrote:
>
> On Wed, Mar 30, 2022 at 11:33:21AM -0700, Michael Brooks wrote:
> > The /dev/random device driver need not concern itself with root
> > adversaries as this type of user has permissions to read and overwrite
> > memory - this user even possesses permission to replace the kernel elf
> > binary with a copy of /dev/random that always returns the number 0 -
> > that is their right.
>
> The design consideration that random number generators do concern
> themselves with is recovery after pool exposure. This could happen
> through any number of ways; maybe someone got a hold of the suspended
> image after a hiberation, or maybe a VM is getting hybernated, and
> then replicated, etc.
>
> One can argue whether or not it's "reasonable" that these sorts of
> attacks could happen, or whether they are equivalent to full root
> access whether you can overwrite the pool. The point remains that it
> is *possible* to have situations where the internal state of the RNG
> might have gotten exposed, and a design criteria is how quickly or
> reliably can you reocver from that situation over time.
>
> See the Yarrow paper and its discussion of iterative guessing attack
> for an explanation of why cryptographers like John Kelsey, Bruce
> Schneier, and Niels Ferguson think it is important. And please don't
> argue with me on this point while discussing which patches should be
> backported to stable kernels --- argue with them. :-)
>
> Cheers,
>
> - Ted

2022-03-31 02:49:02

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

The /dev/random device driver need not concern itself with root
adversaries as this type of user has permissions to read and overwrite
memory - this user even possesses permission to replace the kernel elf
binary with a copy of /dev/random that always returns the number 0 -
that is their right.

This whole issue of leaks is because we are relying upon fast but
insecure hash-function-like methods to extract data from a sensitive
pool. LFSR and CRC32 (which was used in an earlier version of
/dev/random) have similarities to secure hash functions - but "aren't
good enough for military work" - which is why we are even discussing
the topic of leakage.

If we use a secure primitive in the right way - preferably one that is
faster than SHA1 - such as chcha20 or even faster still AES-NI (both
in a feedback mode). Then the leakage is stopped. If the pool cannot
leak then we do not need an entropy counter which is the mutex that
handle_irq_event_percpu() fights over. This reduces interrupt
latency, reduces computational load of all interrupts, and helps
contribute to the pool being less predictable because now an outside
construct has to account for race conditions.

-Michael

On Wed, Mar 30, 2022 at 10:10 AM Michael Brooks <[email protected]> wrote:
>
> Of course I am assuming local user non-root access. One does not need
> to reverse the mix operations in order to form a parallel construction
> - a one way function is sufficient for such a construct as both sides
> will operate on the data in the same manner.
>
> This attack scenario is simply a non-issue in keypoolrandom.
> https://github.com/TheRook/KeypoolRandom
>
> On Wed, Mar 30, 2022 at 9:49 AM David Laight <[email protected]> wrote:
> >
> > From: Michael Brooks
> > > Sent: 30 March 2022 17:08
> > ...
> > > I’d like to describe this bug using mathematics, because that is how I
> > > work - I am the kind of person that appreciates rigor. In this case,
> > > let's use inductive reasoning to illuminate this issue.
> > >
> > > Now, in this attack scenario let “p” be the overall pool state and let
> > > “n” be the good unknown values added to the pool. Finally, let “k” be
> > > the known values - such as jiffies. If we then describe the ratio of
> > > unknown uniqueness with known uniqueness as p=n/k then as a k grows
> > > the overall predictability of the pool approaches an infinite value as
> > > k approaches zero. A parallel pool, let's call it p’ (that is
> > > pronounced “p-prime” for those who don’t get the notation). let
> > > p’=n’/k’. In this case the attacker has no hope of constructing n’,
> > > but they can construct k’ - therefore the attacker’s parasol model of
> > > the pool p’ will become more accurate as the attack persists leading
> > > to p’ = p as time->∞.
> > >
> > > Q.E.D.
> >
> > That rather depends on how the (not) 'randmoness' is added to the pool.
> > If there are 'r' bits of randomness in the pool and a 'stir in' a pile
> > of known bits there can still be 'n' bits of randomness in the pool.
> >
> > The whole thing really relies on the non-reversability of the final prng.
> > Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
> > in the prng you only need to request 'r + p' bits of output to be able
> > to solve the 'p + r' simultaneous equations in 'p + r' unknowns
> > (I think that is in the field {0, 1}).
> >
> > The old kernel random number generator that used xor to combine the
> > outputs of several LFSR is trivially reversable.
> > It will leak whatever it was seeded with.
> >
> > The non-reversability of the pool isn't as important since you need
> > to reverse the prng as well.
> >
> > So while, in some sense, removing 'p' bits from the entropy pool
> > to seed the prng only leaves 'r - p' bits left.
> > That is only true if you think the prng is reversable.
> > Provided 'r > p' (preferably 'r >> p') you can reseed the prng
> > again (provided you take reasonably random bits) without
> > really exposing any more state to an attacker.
> >
> > Someone doing cat /dev/urandom >/dev/null should just keep reading
> > values out of the entropy pool.
> > But if they are discarding the values that shouldn't help them
> > recover the state of the entropy pool or the prng - even if only
> > constant values are being added to the pool.
> >
> > Really what you mustn't do is delete the bits used to seed the prng
> > from the entropy pool.
> > About the only way to actually reduce the randomness of the entropy
> > pool is if you've discovered what is actually in it, know the
> > 'stirring' algorithm and feed in data that exactly cancels out
> > bits that are present already.
> > I suspect that anything with root access can manage that!
> > (Although they can just overwrite the entropy pool itself,
> > and the prng for that matter.)
> >
> > David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)

2022-03-31 02:50:41

by Michael Brooks

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

I think I understand the bug that Jason is referring to, but really he
is referring to the tip of a larger issue - and correct me if I am
wrong.

Let’s say an attacker is a user resident on the machine and can
internally drain the pool via `cat /dev/urandom > /dev/null` - this
will take the high quality random sources and throw it away - forcing
the kernel to use the degraded try_to_generate_entropy() method.

In this state, the kernel performance will be affected, this is a
moderate DoS attack - it won’t bring the system down, but you are
going to have a bad time. What is serious is the state of the pool -
what happens is that the ratio of “good” to “bad” entropy sources
becomes more and more undesirable. Sure, mixing jiffies will fool
die-harder, but that’s easy to do. There is a deeper issue here that
die-harder cannot test for - and that is a more intentional parallel
construction. Die-Harder is attempting a parallel construction
without knowing how the RNG is designed.

It is safe to assume that a user who is resident on the machine not
only has a wrist watch, but also can query jiffies directly, this is
not a hidden value. Now, this codebase seems to use that word
“entropy” a whole lot, and entropy is a physics term. In physical
space - entropy has a lot of emptiness that expands over time, and
also contains known values - these attributes do not aid in the
creation of a secure /dev/random device driver. Sure Jiffies provides
entropy, but it is a bit like pointing at the moon because it is a
value that everyone can see, and is even something that a remote
attacker could determine using something similar to a vector clock.
The physics definition of “entropy” is simply not useful for our
needs. What we really need is to collect as many “unknown values” as
we can.

I’d like to describe this bug using mathematics, because that is how I
work - I am the kind of person that appreciates rigor. In this case,
let's use inductive reasoning to illuminate this issue.

Now, in this attack scenario let “p” be the overall pool state and let
“n” be the good unknown values added to the pool. Finally, let “k” be
the known values - such as jiffies. If we then describe the ratio of
unknown uniqueness with known uniqueness as p=n/k then as a k grows
the overall predictability of the pool approaches an infinite value as
k approaches zero. A parallel pool, let's call it p’ (that is
pronounced “p-prime” for those who don’t get the notation). let
p’=n’/k’. In this case the attacker has no hope of constructing n’,
but they can construct k’ - therefore the attacker’s parasol model of
the pool p’ will become more accurate as the attack persists leading
to p’ = p as time->∞.

Q.E.D.

In summation, it’s not entropy that we are after and mix_pool_bytes()
does not impact the ratio above. What we are really after is ‘hidden
uniqueness’. We need unique values that are not known to any would-be
attacker, and the more uncertainty we can add to our pool the harder
it will be for an outsider to reconstruct the internal state - which
is why find_unqinees_in_memeory() is valuable.

There is a way to solve the predictability of the internal pool
without melting the ice caps - and “jumptable” and “gatekey” does this
by allowing access to the pool without revealing ANY internal
structure. Furthermore if we are using an ideal block cipher or hash
function in our csprng then these primitives do a great job at
obscuring the pre-image because that is what they were built for.

TLDR; this ain't good - yes, Jason is right that the branch is
currently vulnerable. So should we roll out this code? Or should we do
it right?

-Michael Brooks


On Mon, Mar 28, 2022 at 4:20 AM Sasha Levin <[email protected]> wrote:
>
> From: "Jason A. Donenfeld" <[email protected]>
>
> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
>
> The current 4096-bit LFSR used for entropy collection had a few
> desirable attributes for the context in which it was created. For
> example, the state was huge, which meant that /dev/random would be able
> to output quite a bit of accumulated entropy before blocking. It was
> also, in its time, quite fast at accumulating entropy byte-by-byte,
> which matters given the varying contexts in which mix_pool_bytes() is
> called. And its diffusion was relatively high, which meant that changes
> would ripple across several words of state rather quickly.
>
> However, it also suffers from a few security vulnerabilities. In
> particular, inputs learned by an attacker can be undone, but moreover,
> if the state of the pool leaks, its contents can be controlled and
> entirely zeroed out. I've demonstrated this attack with this SMT2
> script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
> a matter of seconds on a single core of my laptop, resulting in little
> proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
>
> For basically all recent formal models of RNGs, these attacks represent
> a significant cryptographic flaw. But how does this manifest
> practically? If an attacker has access to the system to such a degree
> that he can learn the internal state of the RNG, arguably there are
> other lower hanging vulnerabilities -- side-channel, infoleak, or
> otherwise -- that might have higher priority. On the other hand, seed
> files are frequently used on systems that have a hard time generating
> much entropy on their own, and these seed files, being files, often leak
> or are duplicated and distributed accidentally, or are even seeded over
> the Internet intentionally, where their contents might be recorded or
> tampered with. Seen this way, an otherwise quasi-implausible
> vulnerability is a bit more practical than initially thought.
>
> Another aspect of the current mix_pool_bytes() function is that, while
> its performance was arguably competitive for the time in which it was
> created, it's no longer considered so. This patch improves performance
> significantly: on a high-end CPU, an i7-11850H, it improves performance
> of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
> improves performance by 103%.
>
> This commit replaces the LFSR of mix_pool_bytes() with a straight-
> forward cryptographic hash function, BLAKE2s, which is already in use
> for pool extraction. Universal hashing with a secret seed was considered
> too, something along the lines of <https://eprint.iacr.org/2013/338>,
> but the requirement for a secret seed makes for a chicken & egg problem.
> Instead we go with a formally proven scheme using a computational hash
> function, described in sections 5.1, 6.4, and B.1.8 of
> <https://eprint.iacr.org/2019/198>.
>
> BLAKE2s outputs 256 bits, which should give us an appropriate amount of
> min-entropy accumulation, and a wide enough margin of collision
> resistance against active attacks. mix_pool_bytes() becomes a simple
> call to blake2s_update(), for accumulation, while the extraction step
> becomes a blake2s_final() to generate a seed, with which we can then do
> a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
> back as an init key for subsequent blake2s_update()s, and the rest we
> produce to the caller. This then is provided to our CRNG like usual. In
> that expansion step, we make opportunistic use of 32 bytes of RDRAND
> output, just as before. We also always reseed the crng with 32 bytes,
> unconditionally, or not at all, rather than sometimes with 16 as before,
> as we don't win anything by limiting beyond the 16 byte threshold.
>
> Going for a hash function as an entropy collector is a conservative,
> proven approach. The result of all this is a much simpler and much less
> bespoke construction than what's there now, which not only plugs a
> vulnerability but also improves performance considerably.
>
> Cc: Theodore Ts'o <[email protected]>
> Cc: Dominik Brodowski <[email protected]>
> Reviewed-by: Eric Biggers <[email protected]>
> Reviewed-by: Greg Kroah-Hartman <[email protected]>
> Reviewed-by: Jean-Philippe Aumasson <[email protected]>
> Signed-off-by: Jason A. Donenfeld <[email protected]>
> Signed-off-by: Sasha Levin <[email protected]>
> ---
> drivers/char/random.c | 304 ++++++++----------------------------------
> 1 file changed, 55 insertions(+), 249 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3404a91edf29..882f78829a24 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -42,61 +42,6 @@
> */
>
> /*
> - * (now, with legal B.S. out of the way.....)
> - *
> - * This routine gathers environmental noise from device drivers, etc.,
> - * and returns good random numbers, suitable for cryptographic use.
> - * Besides the obvious cryptographic uses, these numbers are also good
> - * for seeding TCP sequence numbers, and other places where it is
> - * desirable to have numbers which are not only random, but hard to
> - * predict by an attacker.
> - *
> - * Theory of operation
> - * ===================
> - *
> - * Computers are very predictable devices. Hence it is extremely hard
> - * to produce truly random numbers on a computer --- as opposed to
> - * pseudo-random numbers, which can easily generated by using a
> - * algorithm. Unfortunately, it is very easy for attackers to guess
> - * the sequence of pseudo-random number generators, and for some
> - * applications this is not acceptable. So instead, we must try to
> - * gather "environmental noise" from the computer's environment, which
> - * must be hard for outside attackers to observe, and use that to
> - * generate random numbers. In a Unix environment, this is best done
> - * from inside the kernel.
> - *
> - * Sources of randomness from the environment include inter-keyboard
> - * timings, inter-interrupt timings from some interrupts, and other
> - * events which are both (a) non-deterministic and (b) hard for an
> - * outside observer to measure. Randomness from these sources are
> - * added to an "entropy pool", which is mixed using a CRC-like function.
> - * This is not cryptographically strong, but it is adequate assuming
> - * the randomness is not chosen maliciously, and it is fast enough that
> - * the overhead of doing it on every interrupt is very reasonable.
> - * As random bytes are mixed into the entropy pool, the routines keep
> - * an *estimate* of how many bits of randomness have been stored into
> - * the random number generator's internal state.
> - *
> - * When random bytes are desired, they are obtained by taking the BLAKE2s
> - * hash of the contents of the "entropy pool". The BLAKE2s hash avoids
> - * exposing the internal state of the entropy pool. It is believed to
> - * be computationally infeasible to derive any useful information
> - * about the input of BLAKE2s from its output. Even if it is possible to
> - * analyze BLAKE2s in some clever way, as long as the amount of data
> - * returned from the generator is less than the inherent entropy in
> - * the pool, the output data is totally unpredictable. For this
> - * reason, the routine decreases its internal estimate of how many
> - * bits of "true randomness" are contained in the entropy pool as it
> - * outputs random numbers.
> - *
> - * If this estimate goes to zero, the routine can still generate
> - * random numbers; however, an attacker may (at least in theory) be
> - * able to infer the future output of the generator from prior
> - * outputs. This requires successful cryptanalysis of BLAKE2s, which is
> - * not believed to be feasible, but there is a remote possibility.
> - * Nonetheless, these numbers should be useful for the vast majority
> - * of purposes.
> - *
> * Exported interfaces ---- output
> * ===============================
> *
> @@ -298,23 +243,6 @@
> *
> * mknod /dev/random c 1 8
> * mknod /dev/urandom c 1 9
> - *
> - * Acknowledgements:
> - * =================
> - *
> - * Ideas for constructing this random number generator were derived
> - * from Pretty Good Privacy's random number generator, and from private
> - * discussions with Phil Karn. Colin Plumb provided a faster random
> - * number generator, which speed up the mixing function of the entropy
> - * pool, taken from PGPfone. Dale Worley has also contributed many
> - * useful ideas and suggestions to improve this driver.
> - *
> - * Any flaws in the design are solely my responsibility, and should
> - * not be attributed to the Phil, Colin, or any of authors of PGP.
> - *
> - * Further background information on this topic may be obtained from
> - * RFC 1750, "Randomness Recommendations for Security", by Donald
> - * Eastlake, Steve Crocker, and Jeff Schiller.
> */
>
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -358,79 +286,15 @@
>
> /* #define ADD_INTERRUPT_BENCH */
>
> -/*
> - * If the entropy count falls under this number of bits, then we
> - * should wake up processes which are selecting or polling on write
> - * access to /dev/random.
> - */
> -static int random_write_wakeup_bits = 28 * (1 << 5);
> -
> -/*
> - * Originally, we used a primitive polynomial of degree .poolwords
> - * over GF(2). The taps for various sizes are defined below. They
> - * were chosen to be evenly spaced except for the last tap, which is 1
> - * to get the twisting happening as fast as possible.
> - *
> - * For the purposes of better mixing, we use the CRC-32 polynomial as
> - * well to make a (modified) twisted Generalized Feedback Shift
> - * Register. (See M. Matsumoto & Y. Kurita, 1992. Twisted GFSR
> - * generators. ACM Transactions on Modeling and Computer Simulation
> - * 2(3):179-194. Also see M. Matsumoto & Y. Kurita, 1994. Twisted
> - * GFSR generators II. ACM Transactions on Modeling and Computer
> - * Simulation 4:254-266)
> - *
> - * Thanks to Colin Plumb for suggesting this.
> - *
> - * The mixing operation is much less sensitive than the output hash,
> - * where we use BLAKE2s. All that we want of mixing operation is that
> - * it be a good non-cryptographic hash; i.e. it not produce collisions
> - * when fed "random" data of the sort we expect to see. As long as
> - * the pool state differs for different inputs, we have preserved the
> - * input entropy and done a good job. The fact that an intelligent
> - * attacker can construct inputs that will produce controlled
> - * alterations to the pool's state is not important because we don't
> - * consider such inputs to contribute any randomness. The only
> - * property we need with respect to them is that the attacker can't
> - * increase his/her knowledge of the pool's state. Since all
> - * additions are reversible (knowing the final state and the input,
> - * you can reconstruct the initial state), if an attacker has any
> - * uncertainty about the initial state, he/she can only shuffle that
> - * uncertainty about, but never cause any collisions (which would
> - * decrease the uncertainty).
> - *
> - * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
> - * Videau in their paper, "The Linux Pseudorandom Number Generator
> - * Revisited" (see: http://eprint.iacr.org/2012/251.pdf). In their
> - * paper, they point out that we are not using a true Twisted GFSR,
> - * since Matsumoto & Kurita used a trinomial feedback polynomial (that
> - * is, with only three taps, instead of the six that we are using).
> - * As a result, the resulting polynomial is neither primitive nor
> - * irreducible, and hence does not have a maximal period over
> - * GF(2**32). They suggest a slight change to the generator
> - * polynomial which improves the resulting TGFSR polynomial to be
> - * irreducible, which we have made here.
> - */
> enum poolinfo {
> - POOL_WORDS = 128,
> - POOL_WORDMASK = POOL_WORDS - 1,
> - POOL_BYTES = POOL_WORDS * sizeof(u32),
> - POOL_BITS = POOL_BYTES * 8,
> + POOL_BITS = BLAKE2S_HASH_SIZE * 8,
> POOL_BITSHIFT = ilog2(POOL_BITS),
>
> /* To allow fractional bits to be tracked, the entropy_count field is
> * denominated in units of 1/8th bits. */
> POOL_ENTROPY_SHIFT = 3,
> #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
> - POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
> -
> - /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
> - POOL_TAP1 = 104,
> - POOL_TAP2 = 76,
> - POOL_TAP3 = 51,
> - POOL_TAP4 = 25,
> - POOL_TAP5 = 1,
> -
> - EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
> + POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
> };
>
> /*
> @@ -438,6 +302,12 @@ enum poolinfo {
> */
> static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
> static struct fasync_struct *fasync;
> +/*
> + * If the entropy count falls under this number of bits, then we
> + * should wake up processes which are selecting or polling on write
> + * access to /dev/random.
> + */
> +static int random_write_wakeup_bits = POOL_BITS * 3 / 4;
>
> static DEFINE_SPINLOCK(random_ready_list_lock);
> static LIST_HEAD(random_ready_list);
> @@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
> *
> **********************************************************************/
>
> -static u32 input_pool_data[POOL_WORDS] __latent_entropy;
> -
> static struct {
> + struct blake2s_state hash;
> spinlock_t lock;
> - u16 add_ptr;
> - u16 input_rotate;
> int entropy_count;
> } input_pool = {
> + .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
> + BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
> + BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
> + .hash.outlen = BLAKE2S_HASH_SIZE,
> .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
> };
>
> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
> -static ssize_t _extract_entropy(void *buf, size_t nbytes);
> +static bool extract_entropy(void *buf, size_t nbytes, int min);
> +static void _extract_entropy(void *buf, size_t nbytes);
>
> static void crng_reseed(struct crng_state *crng, bool use_input_pool);
>
> -static const u32 twist_table[8] = {
> - 0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
> - 0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
> -
> /*
> * This function adds bytes into the entropy "pool". It does not
> * update the entropy estimate. The caller should call
> * credit_entropy_bits if this is appropriate.
> - *
> - * The pool is stirred with a primitive polynomial of the appropriate
> - * degree, and then twisted. We twist by three bits at a time because
> - * it's cheap to do so and helps slightly in the expected case where
> - * the entropy is concentrated in the low-order bits.
> */
> static void _mix_pool_bytes(const void *in, int nbytes)
> {
> - unsigned long i;
> - int input_rotate;
> - const u8 *bytes = in;
> - u32 w;
> -
> - input_rotate = input_pool.input_rotate;
> - i = input_pool.add_ptr;
> -
> - /* mix one byte at a time to simplify size handling and churn faster */
> - while (nbytes--) {
> - w = rol32(*bytes++, input_rotate);
> - i = (i - 1) & POOL_WORDMASK;
> -
> - /* XOR in the various taps */
> - w ^= input_pool_data[i];
> - w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
> - w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
> - w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
> - w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
> - w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
> -
> - /* Mix the result back in with a twist */
> - input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
> -
> - /*
> - * Normally, we add 7 bits of rotation to the pool.
> - * At the beginning of the pool, add an extra 7 bits
> - * rotation, so that successive passes spread the
> - * input bits across the pool evenly.
> - */
> - input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
> - }
> -
> - input_pool.input_rotate = input_rotate;
> - input_pool.add_ptr = i;
> + blake2s_update(&input_pool.hash, in, nbytes);
> }
>
> static void __mix_pool_bytes(const void *in, int nbytes)
> @@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
> static void crng_reseed(struct crng_state *crng, bool use_input_pool)
> {
> unsigned long flags;
> - int i, num;
> + int i;
> union {
> u8 block[CHACHA_BLOCK_SIZE];
> u32 key[8];
> } buf;
>
> if (use_input_pool) {
> - num = extract_entropy(&buf, 32, 16);
> - if (num == 0)
> + if (!extract_entropy(&buf, 32, 16))
> return;
> } else {
> _extract_crng(&primary_crng, buf.block);
> @@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
> }
>
> /*
> - * This function does the actual extraction for extract_entropy.
> - *
> - * Note: we assume that .poolwords is a multiple of 16 words.
> + * This is an HKDF-like construction for using the hashed collected entropy
> + * as a PRF key, that's then expanded block-by-block.
> */
> -static void extract_buf(u8 *out)
> +static void _extract_entropy(void *buf, size_t nbytes)
> {
> - struct blake2s_state state __aligned(__alignof__(unsigned long));
> - u8 hash[BLAKE2S_HASH_SIZE];
> - unsigned long *salt;
> unsigned long flags;
> -
> - blake2s_init(&state, sizeof(hash));
> -
> - /*
> - * If we have an architectural hardware random number
> - * generator, use it for BLAKE2's salt & personal fields.
> - */
> - for (salt = (unsigned long *)&state.h[4];
> - salt < (unsigned long *)&state.h[8]; ++salt) {
> - unsigned long v;
> - if (!arch_get_random_long(&v))
> - break;
> - *salt ^= v;
> + u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
> + struct {
> + unsigned long rdrand[32 / sizeof(long)];
> + size_t counter;
> + } block;
> + size_t i;
> +
> + for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
> + if (!arch_get_random_long(&block.rdrand[i]))
> + block.rdrand[i] = random_get_entropy();
> }
>
> - /* Generate a hash across the pool */
> spin_lock_irqsave(&input_pool.lock, flags);
> - blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
> - blake2s_final(&state, hash); /* final zeros out state */
>
> - /*
> - * We mix the hash back into the pool to prevent backtracking
> - * attacks (where the attacker knows the state of the pool
> - * plus the current outputs, and attempts to find previous
> - * outputs), unless the hash function can be inverted. By
> - * mixing at least a hash worth of hash data back, we make
> - * brute-forcing the feedback as hard as brute-forcing the
> - * hash.
> - */
> - __mix_pool_bytes(hash, sizeof(hash));
> - spin_unlock_irqrestore(&input_pool.lock, flags);
> + /* seed = HASHPRF(last_key, entropy_input) */
> + blake2s_final(&input_pool.hash, seed);
>
> - /* Note that EXTRACT_SIZE is half of hash size here, because above
> - * we've dumped the full length back into mixer. By reducing the
> - * amount that we emit, we retain a level of forward secrecy.
> - */
> - memcpy(out, hash, EXTRACT_SIZE);
> - memzero_explicit(hash, sizeof(hash));
> -}
> + /* next_key = HASHPRF(seed, RDRAND || 0) */
> + block.counter = 0;
> + blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
> + blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
>
> -static ssize_t _extract_entropy(void *buf, size_t nbytes)
> -{
> - ssize_t ret = 0, i;
> - u8 tmp[EXTRACT_SIZE];
> + spin_unlock_irqrestore(&input_pool.lock, flags);
> + memzero_explicit(next_key, sizeof(next_key));
>
> while (nbytes) {
> - extract_buf(tmp);
> - i = min_t(int, nbytes, EXTRACT_SIZE);
> - memcpy(buf, tmp, i);
> + i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
> + /* output = HASHPRF(seed, RDRAND || ++counter) */
> + ++block.counter;
> + blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
> nbytes -= i;
> buf += i;
> - ret += i;
> }
>
> - /* Wipe data just returned from memory */
> - memzero_explicit(tmp, sizeof(tmp));
> -
> - return ret;
> + memzero_explicit(seed, sizeof(seed));
> + memzero_explicit(&block, sizeof(block));
> }
>
> /*
> @@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
> * returns it in a buffer.
> *
> * The min parameter specifies the minimum amount we can pull before
> - * failing to avoid races that defeat catastrophic reseeding.
> + * failing to avoid races that defeat catastrophic reseeding. If we
> + * have less than min entropy available, we return false and buf is
> + * not filled.
> */
> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
> +static bool extract_entropy(void *buf, size_t nbytes, int min)
> {
> trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
> - nbytes = account(nbytes, min);
> - return _extract_entropy(buf, nbytes);
> + if (account(nbytes, min)) {
> + _extract_entropy(buf, nbytes);
> + return true;
> + }
> + return false;
> }
>
> #define warn_unseeded_randomness(previous) \
> @@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
> unsigned long rv;
>
> mix_pool_bytes(&now, sizeof(now));
> - for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
> + for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
> if (!arch_get_random_seed_long(&rv) &&
> !arch_get_random_long(&rv))
> rv = random_get_entropy();
> --
> 2.34.1
>

2022-03-31 03:08:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

On Tue, Mar 29, 2022 at 10:34:49AM -0700, Michael Brooks wrote:
> I agree with Ted, this patch is just to start the discussion on how
> we can safely remove these locks for the improvement of safety and
> security. Both boot and interrupt benchmarks stand to benefit from a
> patch like this, so it is worth a deep dive.
>
> Feedback welcome, I am always looking for ways I can be a better
> engineer, and a better hacker and a better person. And we are all here
> to make the very best kernel.

I think you're talking about a different patch than the one mentioned
in the subject line (which is upstream commit 6e8ec2552c7d, authored
by Jason)?

- Ted

2022-03-31 03:50:47

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

On Wed, Mar 30, 2022 at 11:33:21AM -0700, Michael Brooks wrote:
> The /dev/random device driver need not concern itself with root
> adversaries as this type of user has permissions to read and overwrite
> memory - this user even possesses permission to replace the kernel elf
> binary with a copy of /dev/random that always returns the number 0 -
> that is their right.

The design consideration that random number generators do concern
themselves with is recovery after pool exposure. This could happen
through any number of ways; maybe someone got a hold of the suspended
image after a hiberation, or maybe a VM is getting hybernated, and
then replicated, etc.

One can argue whether or not it's "reasonable" that these sorts of
attacks could happen, or whether they are equivalent to full root
access whether you can overwrite the pool. The point remains that it
is *possible* to have situations where the internal state of the RNG
might have gotten exposed, and a design criteria is how quickly or
reliably can you reocver from that situation over time.

See the Yarrow paper and its discussion of iterative guessing attack
for an explanation of why cryptographers like John Kelsey, Bruce
Schneier, and Niels Ferguson think it is important. And please don't
argue with me on this point while discussing which patches should be
backported to stable kernels --- argue with them. :-)

Cheers,

- Ted

2022-03-31 04:37:07

by David Laight

[permalink] [raw]
Subject: RE: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

From: Michael Brooks
> Sent: 30 March 2022 17:08
...
> I’d like to describe this bug using mathematics, because that is how I
> work - I am the kind of person that appreciates rigor. In this case,
> let's use inductive reasoning to illuminate this issue.
>
> Now, in this attack scenario let “p” be the overall pool state and let
> “n” be the good unknown values added to the pool. Finally, let “k” be
> the known values - such as jiffies. If we then describe the ratio of
> unknown uniqueness with known uniqueness as p=n/k then as a k grows
> the overall predictability of the pool approaches an infinite value as
> k approaches zero. A parallel pool, let's call it p’ (that is
> pronounced “p-prime” for those who don’t get the notation). let
> p’=n’/k’. In this case the attacker has no hope of constructing n’,
> but they can construct k’ - therefore the attacker’s parasol model of
> the pool p’ will become more accurate as the attack persists leading
> to p’ = p as time->∞.
>
> Q.E.D.

That rather depends on how the (not) 'randmoness' is added to the pool.
If there are 'r' bits of randomness in the pool and a 'stir in' a pile
of known bits there can still be 'n' bits of randomness in the pool.

The whole thing really relies on the non-reversability of the final prng.
Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
in the prng you only need to request 'r + p' bits of output to be able
to solve the 'p + r' simultaneous equations in 'p + r' unknowns
(I think that is in the field {0, 1}).

The old kernel random number generator that used xor to combine the
outputs of several LFSR is trivially reversable.
It will leak whatever it was seeded with.

The non-reversability of the pool isn't as important since you need
to reverse the prng as well.

So while, in some sense, removing 'p' bits from the entropy pool
to seed the prng only leaves 'r - p' bits left.
That is only true if you think the prng is reversable.
Provided 'r > p' (preferably 'r >> p') you can reseed the prng
again (provided you take reasonably random bits) without
really exposing any more state to an attacker.

Someone doing cat /dev/urandom >/dev/null should just keep reading
values out of the entropy pool.
But if they are discarding the values that shouldn't help them
recover the state of the entropy pool or the prng - even if only
constant values are being added to the pool.

Really what you mustn't do is delete the bits used to seed the prng
from the entropy pool.
About the only way to actually reduce the randomness of the entropy
pool is if you've discovered what is actually in it, know the
'stirring' algorithm and feed in data that exactly cancels out
bits that are present already.
I suspect that anything with root access can manage that!
(Although they can just overwrite the entropy pool itself,
and the prng for that matter.)

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-06 11:27:12

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction

Hi Greg, Sasha,

On Tue, Mar 29, 2022 at 7:31 AM Jason A. Donenfeld <[email protected]> wrote:
> I'm inclined to agree with Eric here that you might be a bit careful
> about autosel'ing 5.18, given how extensive the changes were. In
> theory they should all be properly sequenced so that nothing breaks,
> but I'd still be cautious. However, if you want, maybe we can work out
> some plan for backporting.

It's still way too early to backport these, but I'll maintain these
two branches for a little while:

https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=stable/linux-5.15.y
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=stable/linux-5.17.y

So that when or if it does ever make sense to do that, it's been
maintained incrementally while the knowledge is fresh. I'm omitting
from those branches new feature development, such as vmgenid.

Jason