2024-04-29 13:50:36

by Aaron Toponce

[permalink] [raw]
Subject: [PATCH] random: add chacha8_block and swtich the rng to it

According to Jean-Philippe Aumasson in his paper "Too Much Crypto" [1]:

> "The best result on ChaCha is a key recovery attack on the 7-round version
> with 2^237.7 time complexity using output data from 2^96 instances of ChaCha,
> that is, 2^105 bytes of data."

He then proposes that ChaCha use 8 rounds instead of 20, providing a 2.5x
speed-up. As such, this patch adds chacha8_block and chacha12_block and switches
the RNG from ChaCha20 to ChaCha8 to take advantage of that efficiency without
sacrificing security.

[1]: https://eprint.iacr.org/2019/1492

On my ThinkPad T480s with an Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, the
speed-up is close to what would be expected.

Without the patch:

$ dd if=/dev/urandom of=/dev/null bs=32M count=300
300+0 records in
300+0 records out
10066329600 bytes (10 GB, 9.4 GiB) copied, 20.4806 s, 492 MB/s

With the patch:

$ dd if=/dev/urandom of=/dev/null bs=32M count=300
300+0 records in
300+0 records out
10066329600 bytes (10 GB, 9.4 GiB) copied, 11.5321 s, 873 MB/s

Signed-off-by: Aaron Toponce <[email protected]>
---
drivers/char/random.c | 8 ++++----
include/crypto/chacha.h | 14 ++++++++++++--
lib/crypto/chacha.c | 6 +++---
3 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2597cb43f438..2e14a30b795f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -302,7 +302,7 @@ static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE],
chacha_init_consts(chacha_state);
memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE);
memset(&chacha_state[12], 0, sizeof(u32) * 4);
- chacha20_block(chacha_state, first_block);
+ chacha8_block(chacha_state, first_block);

memcpy(key, first_block, CHACHA_KEY_SIZE);
memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len);
@@ -388,13 +388,13 @@ static void _get_random_bytes(void *buf, size_t len)

while (len) {
if (len < CHACHA_BLOCK_SIZE) {
- chacha20_block(chacha_state, tmp);
+ chacha8_block(chacha_state, tmp);
memcpy(buf, tmp, len);
memzero_explicit(tmp, sizeof(tmp));
break;
}

- chacha20_block(chacha_state, buf);
+ chacha8_block(chacha_state, buf);
if (unlikely(chacha_state[12] == 0))
++chacha_state[13];
len -= CHACHA_BLOCK_SIZE;
@@ -444,7 +444,7 @@ static ssize_t get_random_bytes_user(struct iov_iter *iter)
}

for (;;) {
- chacha20_block(chacha_state, block);
+ chacha8_block(chacha_state, block);
if (unlikely(chacha_state[12] == 0))
++chacha_state[13];

diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h
index b3ea73b81944..64c45121c69a 100644
--- a/include/crypto/chacha.h
+++ b/include/crypto/chacha.h
@@ -8,8 +8,7 @@
*
* The ChaCha paper specifies 20, 12, and 8-round variants. In general, it is
* recommended to use the 20-round variant ChaCha20. However, the other
- * variants can be needed in some performance-sensitive scenarios. The generic
- * ChaCha code currently allows only the 20 and 12-round variants.
+ * variants can be needed in some performance-sensitive scenarios.
*/

#ifndef _CRYPTO_CHACHA_H
@@ -31,11 +30,22 @@
#define XCHACHA_IV_SIZE 32

void chacha_block_generic(u32 *state, u8 *stream, int nrounds);
+
static inline void chacha20_block(u32 *state, u8 *stream)
{
chacha_block_generic(state, stream, 20);
}

+static inline void chacha12_block(u32 *state, u8 *stream)
+{
+ chacha_block_generic(state, stream, 12);
+}
+
+static inline void chacha8_block(u32 *state, u8 *stream)
+{
+ chacha_block_generic(state, stream, 8);
+}
+
void hchacha_block_arch(const u32 *state, u32 *out, int nrounds);
void hchacha_block_generic(const u32 *state, u32 *out, int nrounds);

diff --git a/lib/crypto/chacha.c b/lib/crypto/chacha.c
index b748fd3d256e..15e773629f1d 100644
--- a/lib/crypto/chacha.c
+++ b/lib/crypto/chacha.c
@@ -18,7 +18,7 @@ static void chacha_permute(u32 *x, int nrounds)
int i;

/* whitelist the allowed round counts */
- WARN_ON_ONCE(nrounds != 20 && nrounds != 12);
+ WARN_ON_ONCE(nrounds != 20 && nrounds != 12 && nrounds != 8);

for (i = 0; i < nrounds; i += 2) {
x[0] += x[4]; x[12] = rol32(x[12] ^ x[0], 16);
@@ -67,7 +67,7 @@ static void chacha_permute(u32 *x, int nrounds)
* chacha_block_generic - generate one keystream block and increment block counter
* @state: input state matrix (16 32-bit words)
* @stream: output keystream block (64 bytes)
- * @nrounds: number of rounds (20 or 12; 20 is recommended)
+ * @nrounds: number of rounds (20, 12, or 8; 20 is recommended)
*
* This is the ChaCha core, a function from 64-byte strings to 64-byte strings.
* The caller has already converted the endianness of the input. This function
@@ -93,7 +93,7 @@ EXPORT_SYMBOL(chacha_block_generic);
* hchacha_block_generic - abbreviated ChaCha core, for XChaCha
* @state: input state matrix (16 32-bit words)
* @stream: output (8 32-bit words)
- * @nrounds: number of rounds (20 or 12; 20 is recommended)
+ * @nrounds: number of rounds (20, 12, or 8; 20 is recommended)
*
* HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step
* towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf). HChaCha
--
2.43.0



2024-04-30 04:41:25

by Aaron Toponce

[permalink] [raw]
Subject: Re: [PATCH] random: add chacha8_block and swtich the rng to it

On Mon, Apr 29, 2024 at 08:11:05PM -0700, Eric Biggers wrote:
> I don't think there is consensus on ChaCha8 being recommended. Adiantum uses
> ChaCha12, but even that received some pushback.
>
> The Linux RNG is also usually used only for small amounts of data, and its
> security (and the perception of its security) is extremely important.
>
> So just staying with ChaCha20 seems appropriate.

The 7-round attack does indeed fall short of the required 256-bits of security
per the stated goals of the ChaCha stream cipher, coming in at ~237 bits.
However, this attack is not catastrophic and of theoretical interest only. It's
well outside of practical reach. The 8-round version however reaches our
required security goal and is currently unbroken.

An interesting note in that paper is how we got ChaCha20 to begin with. ChaCha
is an evolution of Salsa20 which was included in the final eSTREAM portfolio [1]
The final eSTREAM portfolio recommends Salsa20/12, which is the 12 round
Salsa20. but with better diffusion [2]. In the "Too Much Crypto" paper, it
states [3]:

> "Regarding ChaCha, the eSTREAM actually recommended Salsa20/12, or ChaCha's
> predecessor with 12 rounds instead of 20, but ChaCha was de facto standardized
> with 20 rounds."

ChaCha20 is 13 additional rounds for extra security margin, more than have been
demonstrated for ChaCha to be secure.

[1]: https://www.ecrypt.eu.org/stream/
[2]: https://cr.yp.to/chacha/chacha-20080128.pdf
[3]: https://eprint.iacr.org/2019/1492

The reduced-round analysis of ChaCha is actually *better* than Salsa20.
Salsa20/8 has a known attack complexity with ~249 bits and Salsa20/7 has a known
attack complexity of ~153 bits. No known attacks exist against ChaCha8, and the
complexity against ChaCha7 is ~237 bits. This demonstrates to me that ChaCha's
security is very robust, and ChaCha8 is solid choice for a CSPRNG.

> Note also that currently the Linux RNG is using a portable C implementation of
> ChaCha20. If there is actually a desire to accelerate large reads (which
> again, aren't the main use case of the Linux RNG), it would be possible to use
> a SIMD implementation of ChaCha20, which already exists in the kernel. That
> would speed up ChaCha20 by roughly 2-5x depending on the CPU.

If ChaCha8 makes us uncomfortable, even though defensible, ChaCha12 is a good
compromise. As you mentioned, Google implemented ChaCha12 in Adiantum. It offers
a 1.67x speedup over ChaCha20 while still providing 5 additional rounds of
security over the best known attack.

--
o . o . o . . o o . . . o .
. o . o o o . o . o o . . o
o o o . o . . o o o o . o o o

2024-04-30 16:44:25

by Aaron Toponce

[permalink] [raw]
Subject: Re: [PATCH] random: add chacha8_block and swtich the rng to it

On Tue, Apr 30, 2024 at 12:26:32PM -0400, Theodore Ts'o wrote:
> I'm not sure I see the point of trying to accelerate the Linux RNG.
> Sure, doing "dd if=/dev/urandom" is *fun*, but what's the real world
> use case where this actually matters? The kernel RNG is meant for key
> generation, where a much larger safety margin is a good thing, and
> where absolute performance is generally not a big deal.

The goal is just to make the CSPRNG more efficient without sacrificing security.
Of course most reads will be small for cryptographic keys. ChaCha8 means even
those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example
was just to demonstrate the efficiency, not to be "fun".

> I judge the risk that you are a shill sent by a nation-state security agency
> ala Jia Tan of xz infamy, trying to weaken Linux's RNG to be very low;

Unlike Jia Tan, my name is not anonymous. I've been very public and transparent
about who I am, the software I work on, the security research I've participated
in, and the communities I involve myself in. I don't work for a nation state nor
am I interested in compromising the kernel RNG.

In fact, I work for a local ISP out of Salt Lake City, Utah where we provide a
web hosting product with KVM. We are very interested in a secure Linux stack as
our business depends on it.

You and I have also had email communication about the kernel RNG in the paste.
I've also interacted with Jason Donenfeld about the RNG and putting together a
document on the evolution of the RNG from 1.3.30 to current.

I'll ignore the attempeted ad hominem. I understand the uneasy feeling due to
the xz(1) backdoor and the kneejerk reactions to not trust anyone with proposals
that might seem radical.

--
o . o . o . . o o . . . o .
. o . o o o . o . o o . . o
o o o . o . . o o o o . o o o

2024-05-01 12:39:20

by Jean-Philippe Aumasson

[permalink] [raw]
Subject: Re: [PATCH] random: add chacha8_block and swtich the rng to it

My 2 cents:

As a cryptanalyst, having discovered the 2008 attack on ChaCha that's
only been slightly improved in 16 years: the 20-round ChaCha20is a
clear waste of CPU cycles, but ChaCha8 is admittedly risky, though
more in terms of PR than pure crypto merits (plus, afaiu the threat
model of ChaCha in the Linux PRNG doesnt allow the kind of chosen-IV
"attack" known to work on reduced-round versions).

Switching from ChaCha20 to ChaCha12 might still raise eyebrows but I
dont think any respectable crypto/security expert will suspect a
JiaTan situation.

On Wed, May 1, 2024 at 2:28 PM Theodore Ts'o <[email protected]> wrote:
>
> So first of all, my apologies for giving you offense. I really didn't
> think you were a shill for the NSA or the MSS, but I have to admit
> that when I get large set of patches which removes "unnecessary" code,
> which is _technically_ safe, but which reduces the safety margin, I
> find myself wondering whether it's part of a binary payload. (This is
> especially when I get patches from someone that I don't normally
> receive patches from.) Unfortunately, in the wake of the xz hack,
> we're just all going to have to be a lot more careful.
>
> On Tue, Apr 30, 2024 at 10:44:09AM -0600, Aaron Toponce wrote:
> >
> > The goal is just to make the CSPRNG more efficient without sacrificing security.
> > Of course most reads will be small for cryptographic keys. ChaCha8 means even
> > those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example
> > was just to demonstrate the efficiency, not to be "fun".
>
> This is a philosophical question; are we going for maximum efficiency,
> or maximum safety so long as it meets the performance requirements for
> the intended use case? From an academic perspective, or if a
> cryptographer is designing cipher for a NIST competition, there's a
> strong desire for maximum efficiency, since that's one of the metrics
> used in the competition. But for the Linux RNG, my bias is to go for
> safety, since we're not competing on who can do the fast bulk
> encryption, but "sufficiently fast for keygen".
>
> People of good will can disagree on what the approach should be. I
> tend to have much of a pragmatic engineer's perspective. It's been
> said that the Empire State Building is overbuilt by a factor of 10,
> but that doesn't bother me. People are now saying that perhaps the
> Francis Scott Key bridge, when it is rebuilt, should have more safety
> margin, since container ships have gotten so much bigger. (And
> apparently, cheap sh*t diesel fuel that is contaminated and the ship
> owners buy fuel from the lowest bidder.)
>
> Or we can talk about how Boeing has been trying to cheap-out on plane
> manufacturing to save $$$; but I think you get the point of where I'm
> coming from. I'm not a big fan of trimming safety margins and making
> things more efficient for it's own sake. (At least in the case of
> Boeing, the CEO at least got paid $22/million a year, so at least
> there's that. :-)
>
> Now, if this is actually impacting the TLS connection termination for
> a Facebook or Bing or Google's front end web server, then great, we
> can try to optimize it. But if it's not a bottleneck, what's the
> point? Making change for change's sake, especially when it's reducing
> safety margins, is just one of those things that I find really hard to
> get excited about.
>
> Cheers,
>
> - Ted