Now that we have it, this is an example of where it helps.
Signed-off-by: George Spelvin <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: [email protected]
---
mm/shuffle.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/shuffle.c b/mm/shuffle.c
index b3fe97fd66541..e0ed247f8d907 100644
--- a/mm/shuffle.c
+++ b/mm/shuffle.c
@@ -135,7 +135,7 @@ void __meminit __shuffle_zone(struct zone *z)
* in the zone.
*/
j = z->zone_start_pfn +
- ALIGN_DOWN(get_random_long() % z->spanned_pages,
+ ALIGN_DOWN(get_random_max(z->spanned_pages),
order_pages);
page_j = shuffle_valid_page(j, order);
if (page_j && page_j != page_i)
--
2.26.0
On Sat, Mar 28, 2020 at 9:43 AM George Spelvin <[email protected]> wrote:
>
> Now that we have it, this is an example of where it helps.
I didn't get copied on the cover and this series does not seem to be
threaded in a way lore can find the cover either:
https://lore.kernel.org/r/[email protected]
Mind including a short blurb about what it is and why it helps in the changelog?
>
> Signed-off-by: George Spelvin <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Qian Cai <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: [email protected]
> ---
> mm/shuffle.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/shuffle.c b/mm/shuffle.c
> index b3fe97fd66541..e0ed247f8d907 100644
> --- a/mm/shuffle.c
> +++ b/mm/shuffle.c
> @@ -135,7 +135,7 @@ void __meminit __shuffle_zone(struct zone *z)
> * in the zone.
> */
> j = z->zone_start_pfn +
> - ALIGN_DOWN(get_random_long() % z->spanned_pages,
> + ALIGN_DOWN(get_random_max(z->spanned_pages),
> order_pages);
> page_j = shuffle_valid_page(j, order);
> if (page_j && page_j != page_i)
> --
> 2.26.0
>
Sorry for the long Cc list, but this is a treewide cleanup and
I wanted to send this cover letter to everyone. Individual patches
are Cc: the relevant maintainers, and the whole series can be found
on linux-kernel@vger.
The original itch I wanted to scratch was fixing the numerous
instances of "get_random_u32() % range" and "prandom_u32() % range"
in the kernel. % is fast to type but slow to execute: even in the
optimized special case that the range is a compile-time constant,
the modulo requires two multiplies to evaluate.
prandom_u32_max() works better; it used reciprocal_scale(), which
ends up doing (u64)prandom_u32() * range >> 32".
This is not only faster and less code, the random numbers are
more uniform. When the range is not a power of 2, some outputs are
generated more often than others. With the modulo technique, those
are all the outputs less than 0x100000000 % range. When multiplying,
the overrepresented outputs are at least spread uniformly across the
possible outputs.
The multiplicative technique can also be easily extended to
generate perfectly uniform random numbers (by detecting an
overrepresented case and looping if it happens) if desired.
But as I started in on the project, I kept finding more things
that needed fixing. This patch series is what I have ready at the moment.
I could use feedback on individual patches, but also the series as
a whole. How should it be broken up? How should it be submitted?
Perhaps, with this series as motivation, merge the random.[ch]
changes, and then do the rest piecemeal?
The series (currently based on v5.5.10) consists of:
01..07: These are various bugs I came across while looking for code to
replace. They are independent of the rest of the series, but related.
08: (EXT4) This is a bit of code I'm not sure about. Sometimes a
random number is used, and sometimes a hash is used as input to the
range reduction. Does the modulo have to be preserved for backward
compatibility?
09: If the range is a compile-time constant power of 2, then gcc optimizes
the modulo better than the multiply in prandom_u32_max(). This patch
fixes that, so that you don't have to worry about the numerical
value of the range; prandom_u32_max() is always as good or better.
10: This is a very optional patch, for disucssion. When the range is
a compile-time constant, making the output of prandom_u32_max()
perfectly uniform is quite cheap. This adds the extra test
and retry loop to that common special case. Worth bothering?
11: This is the big one (which probably needs to be broken up by
subsystem), doing a treewide replacement of prandom_u32() % range,
when the range is not obviously a power of 2.
12: The same, for prandom_u32_state()
13..19: These are patches that were pulled out of the previous two
because they're involve some more complex code changes.
20..23: Changes to the prandom_u32() generator itself. Including
switching to a stronger & faster PRNG.
24..35: Places where crypto-strength get_random_* was used, but only
pseudorandom numbers are justified. Examples are:
- self-test code
- Seeding other PRNGs
- random backoffs and timeouts
36..38: Improvements to the batched entropy pool in random.c.
39: (ARM64) Avoid the need for a temporary buffer (and subsequent wiping)
when transferring random seed material across kexec.
40..44: Replace get_random_bytes() with get_random_{u32,u64,long},
where the slightly weaker security guarantees of the latter
suffice.
45: Add a get_random_max() function, using batched crypto-quality
entropy. This is more complex than prandom_u32_max()
because it has to support 64-bit results and is perfectly
uniform in all cases.
46..49: Use get_random_max in various applications
50: Is the first part of the next phase. This adds a function to
fill a byte buffer from the batched entropy buffer, a drop-in
replacement for get_random_bytes(). I'm particularly seeking
comment on what this function, with its different security
guarantees, should be called. One option I'm considering is
to name it get_random_bytes, and move the (relatively few)
call sites that really need unbacktrackable random numbers
to another name like get_secret_bytes().
George Spelvin (50):
IB/qib: Delete struct qib_ivdev.qp_rnd
lib/fault-inject.c: Fix off-by-one error in probability
fault-inject: Shrink struct fault_attr
batman-adv: fix batadv_nc_random_weight_tq
net/rds/bind.c: Use prandom_u32_max()
ubi/debug: Optimize computation of odds
mm/slab: Use simpler Fisher-Yates shuffle
fs/ext4/ialloc.c: Replace % with reciprocal_scale() TO BE VERIFIED
<linux/random.h> prandom_u32_max() for power-of-2 ranges
<linux/random.h> Make prandom_u32_max() exact for constant ranges
Treewide: Extirpate "prandom_u32() % range"
Treewide: Extirpate prandom_u32_state(rnd) % range
Avoid some useless msecs/jiffies conversions
crypto/testmgr.c: use prandom_u32_max() & prandom_bytes()
drivers/block/drbd/drbd_main: fix _drbd_fault_random()
include/net/red.h: Simplify red_random() TO BE VERIFIED
net/802/{garp,mrp}.c: Use prandom_u32_max instead of manual equivalent
net/ipv6/addrconf.c: Use prandom_u32_max for rfc3315 backoff time computation
net/sched/sch_netem: Simplify get_crandom
lib/random32: Move prandom_warmup() inside prandom_seed_early()
lib/random32.c: Change to SFC32 PRNG
lib/random32: Use random_ready_callback for seeding.
prandom_seed_state(): Change to 32-bit seed type.
crypto4xx_core: Use more appropriate seed for PRNG
HID: hid-lg: We only need pseudorandom bytes for the address
drivers/nfc/nfcsim: use prandom_32() for time delay
drivers/s390/scsi/zcsp_fc.c: Use prandom_u32_max() for backoff
drivers/target/iscsi: Replace O(n^2) randomization
fs/ocfs2/journal: Use prandom_u32() and not /dev/random for timeout
kernel/locking/test-ww_mutex.c: Use cheaper prandom_u32_max()
lib/nodemask.c: Use cheaper prandom_u32_max() in node_random()
lib/test*.c: Use prandom_u32_max()
lib/test_vmalloc.c: Use pseudorandom numbers
mm/slub.c: Use cheaper prandom source in shuffle_freelist
USB: serial: iuu_phoenix: Use pseudorandom for xmas mode
random: Merge batched entropy buffers
random: Simplify locking of batched entropy
random: use chacha_permute() directly
arm: kexec_file: Avoid temp buffer for RNG seed
arch/*/include/asm/stackprotector.h: Use get_random_canary() consistently
drivers/block/drbd/drbd_nl.c: Use get_random_u64()
drivers/ininiband: Use get_random_u32()
drivers/isdn: Use get_random_u32()
arm64: ptr auth: Use get_random_u64 instead of _bytes
random: add get_random_max() function
mm/shuffle.c: use get_random_max()
kernel/bpf/core.c: Use get_random_max32()
arch/arm/kernel/process.c: Use get_random_max32() for sigpage_addr()
arch/x86/entry/vdso/vma.c: Use get_random_max32() for vdso_addr
random: add get_random_nonce()
arch/arm/include/asm/stackprotector.h | 9 +-
arch/arm/kernel/process.c | 2 +-
arch/arm64/include/asm/pointer_auth.h | 20 +-
arch/arm64/include/asm/stackprotector.h | 8 +-
arch/arm64/kernel/machine_kexec_file.c | 8 +-
arch/arm64/kernel/pointer_auth.c | 62 +-
arch/mips/include/asm/stackprotector.h | 7 +-
arch/powerpc/include/asm/stackprotector.h | 6 +-
arch/sh/include/asm/stackprotector.h | 8 +-
arch/x86/entry/vdso/vma.c | 2 +-
arch/x86/include/asm/stackprotector.h | 4 +-
arch/x86/mm/pageattr-test.c | 4 +-
arch/xtensa/include/asm/stackprotector.h | 7 +-
crypto/testmgr.c | 21 +-
drivers/block/drbd/drbd_main.c | 37 +-
drivers/block/drbd/drbd_nl.c | 4 +-
drivers/char/random.c | 581 +++++++++++++-----
drivers/crypto/amcc/crypto4xx_core.c | 7 +-
drivers/crypto/chelsio/chtls/chtls_io.c | 4 +-
drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +-
drivers/gpu/drm/i915/selftests/scatterlist.c | 4 +-
drivers/hid/hid-lg.c | 2 +-
drivers/infiniband/core/cm.c | 2 +-
drivers/infiniband/core/cma.c | 5 +-
drivers/infiniband/core/sa_query.c | 2 +-
drivers/infiniband/hw/cxgb4/id_table.c | 4 +-
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 2 +-
drivers/infiniband/hw/qib/qib_verbs.c | 2 -
drivers/infiniband/hw/qib/qib_verbs.h | 1 -
drivers/infiniband/sw/siw/siw_mem.c | 9 +-
drivers/infiniband/sw/siw/siw_verbs.c | 2 +-
drivers/infiniband/ulp/srp/ib_srp.c | 3 +-
drivers/isdn/mISDN/tei.c | 5 +-
drivers/mmc/core/core.c | 4 +-
drivers/mtd/nand/raw/nandsim.c | 4 +-
drivers/mtd/tests/mtd_nandecctest.c | 10 +-
drivers/mtd/tests/stresstest.c | 15 +-
drivers/mtd/ubi/debug.c | 2 +-
drivers/mtd/ubi/debug.h | 6 +-
drivers/net/ethernet/broadcom/cnic.c | 3 +-
.../broadcom/brcm80211/brcmfmac/p2p.c | 2 +-
.../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c | 2 +-
drivers/nfc/nfcsim.c | 3 +-
drivers/s390/scsi/zfcp_fc.c | 2 +-
drivers/scsi/fcoe/fcoe_ctlr.c | 10 +-
drivers/scsi/qedi/qedi_main.c | 2 +-
.../target/iscsi/iscsi_target_seq_pdu_list.c | 72 +--
drivers/usb/serial/iuu_phoenix.c | 7 +-
fs/ceph/inode.c | 2 +-
fs/ceph/mdsmap.c | 2 +-
fs/ext2/ialloc.c | 3 +-
fs/ext4/ialloc.c | 5 +-
fs/ext4/super.c | 8 +-
fs/ocfs2/journal.c | 7 +-
fs/ubifs/debug.c | 11 +-
fs/ubifs/lpt_commit.c | 12 +-
fs/xfs/xfs_error.c | 2 +-
include/crypto/chacha.h | 1 +
include/linux/fault-inject.h | 4 +-
include/linux/random.h | 279 ++++++++-
include/net/red.h | 22 +-
kernel/bpf/core.c | 4 +-
kernel/locking/test-ww_mutex.c | 23 +-
lib/crypto/chacha.c | 2 +-
lib/fault-inject.c | 12 +-
lib/find_bit_benchmark.c | 4 +-
lib/interval_tree_test.c | 9 +-
lib/nodemask.c | 2 +-
lib/random32.c | 558 +++++++++--------
lib/rbtree_test.c | 2 +-
lib/reed_solomon/test_rslib.c | 4 +-
lib/sbitmap.c | 7 +-
lib/test-string_helpers.c | 2 +-
lib/test_bpf.c | 2 +-
lib/test_hexdump.c | 10 +-
lib/test_list_sort.c | 2 +-
lib/test_parman.c | 2 +-
lib/test_vmalloc.c | 58 +-
mm/shuffle.c | 2 +-
mm/slab.c | 25 +-
mm/slab_common.c | 15 +-
mm/slub.c | 2 +-
mm/swapfile.c | 2 +-
net/802/garp.c | 2 +-
net/802/mrp.c | 2 +-
net/batman-adv/bat_iv_ogm.c | 4 +-
net/batman-adv/bat_v_elp.c | 2 +-
net/batman-adv/bat_v_ogm.c | 10 +-
net/batman-adv/network-coding.c | 9 +-
net/bluetooth/hci_request.c | 2 +-
net/ceph/mon_client.c | 2 +-
net/core/neighbour.c | 6 +-
net/core/pktgen.c | 20 +-
net/core/stream.c | 2 +-
net/ipv4/igmp.c | 6 +-
net/ipv4/inet_connection_sock.c | 2 +-
net/ipv6/addrconf.c | 17 +-
net/ipv6/mcast.c | 10 +-
net/packet/af_packet.c | 2 +-
net/rds/bind.c | 2 +-
net/sched/act_gact.c | 2 +-
net/sched/act_sample.c | 2 +-
net/sched/sch_netem.c | 23 +-
net/sctp/socket.c | 2 +-
net/sunrpc/xprtsock.c | 2 +-
net/tipc/socket.c | 5 +-
net/wireless/scan.c | 2 +-
net/xfrm/xfrm_state.c | 2 +-
sound/core/pcm_lib.c | 2 +-
sound/core/pcm_native.c | 2 +-
110 files changed, 1324 insertions(+), 917 deletions(-)
--
2.26.0
From: George Spelvin
> Sent: 28 March 2020 18:28
...
> 20..23: Changes to the prandom_u32() generator itself. Including
> switching to a stronger & faster PRNG.
Does this remove the code that used 'xor' to combine the output
of (about) 5 LFSR?
Or is that somewhere else?
I didn't spot it in the patches - so it might already have gone.
Using xor was particularly stupid.
The whole generator was then linear and trivially reversable.
Just using addition would have made it much stronger.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sun, Mar 29, 2020 at 12:21:46PM +0000, David Laight wrote:
>From: George Spelvin
>> Sent: 28 March 2020 18:28
>...
>> 20..23: Changes to the prandom_u32() generator itself. Including
>> switching to a stronger & faster PRNG.
>
> Does this remove the code that used 'xor' to combine the output
> of (about) 5 LFSR?
> Or is that somewhere else?
> I didn't spot it in the patches - so it might already have gone.
Yes, Patch #21 ("lib/random32.c: Change to SFC32 PRNG") changes
out the generator. I kept the same 128-bit (per CPU) state size.
The previous degree-113 LFSR was okay, but not great.
(It was factored into degree-31, -29, -28 and -25 components,
so there were four subgenerators.)
(If people are willing to spend the additional state size on 64-bit
machines, there are lots of good 64-bit generators with 256 bits of state.
Just remember that we have one state per possible CPU, so that's
a jump from 2KB to 4KB with the default NR_CPUS = 64.)
> Using xor was particularly stupid.
> The whole generator was then linear and trivially reversable.
> Just using addition would have made it much stronger.
I considered changing it to addition (actually, add pairs and XOR the
sums), but that would break its self-test. And once I'd done that,
there are much better possibilities.
Actually, addition doesn't make it *much* stronger. To start
with, addition and xor are the same thing at the lsbit, so
observing 113 lsbits gives you a linear decoding problem.
On Sun, Mar 29, 2020 at 05:41:22PM +0000, George Spelvin wrote:
> > Using xor was particularly stupid.
> > The whole generator was then linear and trivially reversable.
> > Just using addition would have made it much stronger.
>
> I considered changing it to addition (actually, add pairs and XOR the
> sums), but that would break its self-test. And once I'd done that,
> there are much better possibilities.
>
> Actually, addition doesn't make it *much* stronger. To start
> with, addition and xor are the same thing at the lsbit, so
> observing 113 lsbits gives you a linear decoding problem.
David,
If anyone is trying to rely on prandom_u32() as being "strong" in any
sense of the word in terms of being reversable by attacker --- they
shouldn't be using prandom_u32(). That's going to be true no matter
*what* algorithm we use.
Better distribution? Sure. Making prandom_u32() faster? Absolutely;
that's its primary Raison d'Etre.
George,
Did you send the full set of patches to a single mailing list? Or can
you make it available on a git tree somewhere? I've y seen this
message plus the ext4 related change, and I can't find the full patch
series anywhere. If you can send the next version such that it's
fully cc'ed to linux-kernel, that would be really helpful.
Thanks!!
- Ted
Posting all those patches has depressurized my brain and let me think of
additional ways to speed up batched random number generation, taking
advantage of the fact that we don't have to anti-backtrack the key.
Rather than using the primary_crng and its lock, use a global 256-bit key,
and give each CPU a disjoint 64-bit sequence number space.
(for (seq = raw_smp_processor_id(); ; seq += NR_CPUS).)
Then, when a CPU needs to refill its batched pool, copy over the constant,
the global key, the per-cpu sequence number, do something TBD with the
nonce, and run ChaCha on the result.
And voila, no global locking ever, unless a reseed interval has elapsed.
(We could also consider using 12 <= r < 20 ChaCha rounds for the batch.
After all, the best attack is <8 rounds and eSTREAM recommends 12.
16 might be reasonable.)
extract_buf() allocates a temporary buffer, copies a prefix to
a caller-provided temporary buffer, then wipes it.
By just having the caller allocate the full size, extract_buf()
can work in-place, saving stack space, copying, and wiping.
The FIPS initialization in extract_entropy() required some
rejiggering, since that would have allocated a second (enlarged)
buffer on the stack in addition to the one in _extract_entropy().
I had hoped there would be a code size reduction, but it's +80
bytes. :-( I guess the extra indirections in extract_buf()
more than make up for the rest.
text data bss dec hex filename
17783 1405 864 20052 4e54 drivers/char/random.o.old
17863 1405 864 20132 4ea4 drivers/char/random.o.new
Signed-off-by: George Spelvin <[email protected]>
---
I started working on the idea in my previous e-mail, and spotted this
cleanup. Only compile-tested so far; I don't want to stop to reboot.
drivers/char/random.c | 105 +++++++++++++++++++++---------------------
1 file changed, 52 insertions(+), 53 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 02e80000310c..38bb80a9efa2 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -357,11 +357,26 @@
#define OUTPUT_POOL_SHIFT 10
#define OUTPUT_POOL_WORDS (1 << (OUTPUT_POOL_SHIFT-5))
#define SEC_XFER_SIZE 512
-#define EXTRACT_SIZE 10
+#define SHA_DIGEST_BYTES (SHA_DIGEST_WORDS * 4)
+#define EXTRACT_SIZE (SHA_DIGEST_BYTES / 2)
#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
+/*
+ * This buffer is used by the extract_buf function, and includes
+ * additional working space it needs. By having the caller
+ * allocate it, we save stack space, copying, and wiping overhead.
+ */
+union extract_buf {
+ __u8 b[EXTRACT_SIZE];
+ struct {
+ __u32 w[SHA_DIGEST_WORDS];
+ __u32 workspace[SHA_WORKSPACE_WORDS];
+ };
+ unsigned long l[LONGS(SHA_DIGEST_BYTES)];
+};
+
/*
* To allow fractional bits to be tracked, the entropy_count field is
* denominated in units of 1/16 bits.
@@ -1548,34 +1563,29 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
* This function does the actual extraction for extract_entropy and
* extract_entropy_user.
*
- * Note: we assume that .poolwords is a multiple of 16 words.
+ * Note: we assume that .poolbytes is a multiple of SHA_MESSAGE_BYTES = 64.
*/
-static void extract_buf(struct entropy_store *r, __u8 *out)
+static void extract_buf(struct entropy_store *r, union extract_buf *out)
{
int i;
- union {
- __u32 w[5];
- unsigned long l[LONGS(20)];
- } hash;
- __u32 workspace[SHA_WORKSPACE_WORDS];
unsigned long flags;
/*
* If we have an architectural hardware random number
* generator, use it for SHA's initial vector
*/
- sha_init(hash.w);
- for (i = 0; i < LONGS(20); i++) {
+ sha_init(out->w);
+ for (i = 0; i < ARRAY_SIZE(out->l); i++) {
unsigned long v;
if (!arch_get_random_long(&v))
break;
- hash.l[i] = v;
+ out->l[i] = v;
}
- /* Generate a hash across the pool, 16 words (512 bits) at a time */
+ /* Generate a hash across the pool, 64 bytes (512 bits) at a time */
spin_lock_irqsave(&r->lock, flags);
- for (i = 0; i < r->poolinfo->poolwords; i += 16)
- sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);
+ for (i = 0; i < r->poolinfo->poolbytes; i += SHA_MESSAGE_BYTES)
+ sha_transform(out->w, (__u8 *)r->pool + i, out->workspace);
/*
* We mix the hash back into the pool to prevent backtracking
@@ -1586,50 +1596,50 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* brute-forcing the feedback as hard as brute-forcing the
* hash.
*/
- __mix_pool_bytes(r, hash.w, sizeof(hash.w));
+ __mix_pool_bytes(r, out->w, sizeof(out->w));
spin_unlock_irqrestore(&r->lock, flags);
- memzero_explicit(workspace, sizeof(workspace));
-
/*
* In case the hash function has some recognizable output
* pattern, we fold it in half. Thus, we always feed back
* twice as much data as we output.
*/
- hash.w[0] ^= hash.w[3];
- hash.w[1] ^= hash.w[4];
- hash.w[2] ^= rol32(hash.w[2], 16);
-
- memcpy(out, &hash, EXTRACT_SIZE);
- memzero_explicit(&hash, sizeof(hash));
+ out->w[0] ^= out->w[3];
+ out->w[1] ^= out->w[4];
+ out->w[2] ^= rol32(out->w[2], 16);
}
static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
size_t nbytes, int fips)
{
- ssize_t ret = 0, i;
- __u8 tmp[EXTRACT_SIZE];
- unsigned long flags;
+ ssize_t ret = 0;
+ union extract_buf tmp;
while (nbytes) {
- extract_buf(r, tmp);
+ ssize_t i;
+
+ extract_buf(r, &tmp);
if (fips) {
+ unsigned long flags;
+
spin_lock_irqsave(&r->lock, flags);
- if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
+ if (unlikely(!r->last_data_init))
+ r->last_data_init = 1;
+ else if (!memcmp(tmp.b, r->last_data, EXTRACT_SIZE))
panic("Hardware RNG duplicated output!\n");
- memcpy(r->last_data, tmp, EXTRACT_SIZE);
+ memcpy(r->last_data, tmp.b, EXTRACT_SIZE);
spin_unlock_irqrestore(&r->lock, flags);
}
i = min_t(int, nbytes, EXTRACT_SIZE);
- memcpy(buf, tmp, i);
+ memcpy(buf, tmp.b, i);
nbytes -= i;
buf += i;
ret += i;
}
/* Wipe data just returned from memory */
- memzero_explicit(tmp, sizeof(tmp));
+ memzero_explicit(&tmp, sizeof(tmp));
return ret;
}
@@ -1646,24 +1656,13 @@ static ssize_t _extract_entropy(struct entropy_store *r, void *buf,
static ssize_t extract_entropy(struct entropy_store *r, void *buf,
size_t nbytes, int min, int reserved)
{
- __u8 tmp[EXTRACT_SIZE];
- unsigned long flags;
-
- /* if last_data isn't primed, we need EXTRACT_SIZE extra bytes */
- if (fips_enabled) {
- spin_lock_irqsave(&r->lock, flags);
- if (!r->last_data_init) {
- r->last_data_init = 1;
- spin_unlock_irqrestore(&r->lock, flags);
- trace_extract_entropy(r->name, EXTRACT_SIZE,
- ENTROPY_BITS(r), _RET_IP_);
- xfer_secondary_pool(r, EXTRACT_SIZE);
- extract_buf(r, tmp);
- spin_lock_irqsave(&r->lock, flags);
- memcpy(r->last_data, tmp, EXTRACT_SIZE);
- }
- spin_unlock_irqrestore(&r->lock, flags);
- }
+ /*
+ * If last_data isn't primed, prime it. We don't lock for the
+ * check, so there's a tiny chance two CPUs race and do this
+ * redundantly, but that's harmless.
+ */
+ if (fips_enabled && unlikely(!r->last_data_init))
+ _extract_entropy(r, buf, 1, fips_enabled);
trace_extract_entropy(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
xfer_secondary_pool(r, nbytes);
@@ -1680,7 +1679,7 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
size_t nbytes)
{
ssize_t ret = 0, i;
- __u8 tmp[EXTRACT_SIZE];
+ union extract_buf tmp;
int large_request = (nbytes > 256);
trace_extract_entropy_user(r->name, nbytes, ENTROPY_BITS(r), _RET_IP_);
@@ -1702,9 +1701,9 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
schedule();
}
- extract_buf(r, tmp);
+ extract_buf(r, &tmp);
i = min_t(int, nbytes, EXTRACT_SIZE);
- if (copy_to_user(buf, tmp, i)) {
+ if (copy_to_user(buf, tmp.b, i)) {
ret = -EFAULT;
break;
}
@@ -1715,7 +1714,7 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
}
/* Wipe data just returned from memory */
- memzero_explicit(tmp, sizeof(tmp));
+ memzero_explicit(&tmp, sizeof(tmp));
return ret;
}
--
2.26.0
From: Theodore Y. Ts'o
> Sent: 29 March 2020 22:42
> On Sun, Mar 29, 2020 at 05:41:22PM +0000, George Spelvin wrote:
> > > Using xor was particularly stupid.
> > > The whole generator was then linear and trivially reversable.
> > > Just using addition would have made it much stronger.
> >
> > I considered changing it to addition (actually, add pairs and XOR the
> > sums), but that would break its self-test. And once I'd done that,
> > there are much better possibilities.
> >
> > Actually, addition doesn't make it *much* stronger. To start
> > with, addition and xor are the same thing at the lsbit, so
> > observing 113 lsbits gives you a linear decoding problem.
>
> David,
>
> If anyone is trying to rely on prandom_u32() as being "strong" in any
> sense of the word in terms of being reversable by attacker --- they
> shouldn't be using prandom_u32(). That's going to be true no matter
> *what* algorithm we use.
Indeed, but xor merging of 4 LFSR gives an appearance of an
improvements (over a single LFSR) but gives none and just
increases the complexity.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Sun, Mar 29, 2020 at 05:42:14PM -0400, Theodore Y. Ts'o wrote:
> If anyone is trying to rely on prandom_u32() as being "strong" in any
> sense of the word in terms of being reversable by attacker --- they
> shouldn't be using prandom_u32(). That's going to be true no matter
> *what* algorithm we use.
>
> Better distribution? Sure. Making prandom_u32() faster? Absolutely;
> that's its primary Raison d'Etre.
I'd like your comments on an idea I had to add a second PRNG state
for security-interesting applications.
There are some ASLR tasks, notably slab freelist shuffling and
per-syscall stack offset randomization, which need a Very Fast
source of random numbers. No crypto-grade generator qualifies,
and both currently use very bad ad-hoc generators.
The per-syscall stack offset code currently uses the lsbits of the
TSC, which is obviously bad as they're observable by the (presumed
malicious) userspace immediately before the call and thus highly
predictable.
Likewise, the slab shuffling currently precomputes a permutation and
just picks a random starting position for every slab.
Both are saved by the fact that their PRNG outputs are mostly
unobservable, so an attacker can't start to predict them.
I was thinking that a second PRNG, identical to the prandom_u32()
one but seeded speartely, could be used for this purpose. The good
distribution would preclude trivial patterns in their output, which
is about all we can hope for.
The difference is that it would only be used for applications which
both require high speed and are (mostly) unobservable to an attacker.
Any opinions, anyone?