2023-01-13 22:03:58

by David Keisar Schm

[permalink] [raw]
Subject: [PATCH v4 0/3] Secure prandom_u32 invocations

From: David Keisar Schmidt <[email protected]>

Hi,

The security improvements for prandom_u32 done in commits c51f8f88d705
from October 2020 and d4150779e60f from May 2022 didn't handle the cases
when prandom_bytes_state() and prandom_u32_state() are used.

Specifically, this weak randomization takes place in three cases:
1. mm/slab.c
2. mm/slab_common.c
3. arch/x86/mm/kaslr.c

The first two invocations (mm/slab.c, mm/slab_common.c) are used to create
randomization in the slab allocator freelists.
This is done to make sure attackers can’t obtain information on the heap state.

The last invocation, inside arch/x86/mm/kaslr.c,
randomizes the virtual address space of kernel memory regions.
The use of prandom_bytes_state() is justified since it has a dedicated
state and draws only 3 pseudo random values, but the seeding state takes advantage
of only 32 bits out of 64 bits of the seed.
Hence, we have added the necessary changes to make those randomizations stronger,
switching the invocation of prandom_seed_state to a more secure version, which
we implemented inside kaslr.c.
---
Changes since v3:
* arch/x86/mm/kaslr.c: secure the way the region offsets are generated in the
seeding state - Adding a revised version of prandom_seed_state
* edited commit messages

Changes since v2:
* edited commit message.
* replaced instances of get_random_u32 with get_random_u32_below
in mm/slab.c, mm/slab_common.c

Regards,

David Keisar Schmidt (3):
Replace invocation of weak PRNG in mm/slab.c
Replace invocation of weak PRNG inside mm/slab_common.c
Add 64bits prandom_seed_state to arch/x86/mm/kaslr.c

arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++-
mm/slab.c | 25 ++++++++++---------------
mm/slab_common.c | 11 +++--------
3 files changed, 38 insertions(+), 24 deletions(-)

--
2.38.0


2023-01-13 22:04:09

by David Keisar Schm

[permalink] [raw]
Subject: [PATCH v4 2/3] slab_allocator: mm/slab_common.c: Replace invocation of weak PRNG

From: David Keisar Schmidt <[email protected]>

The Slab allocator randomization inside slab_common.c uses the prandom_u32 PRNG.
That was added to prevent attackers to obtain information on the heap state.

However, this PRNG turned out to be weak, as noted in commit c51f8f88d705
To fix it, we have changed the invocation of prandom_u32_state to get_random_u32
to ensure the PRNG is strong.

Since a modulo operation is applied right after that,
in the Fisher-Yates shuffle, we used get_random_u32_below, to achieve uniformity.

Signed-off-by: David Keisar Schmidt <[email protected]>
---
Changes since v3:
* edited commit message.

Changes since v2:
* replaced instances of get_random_u32 with get_random_u32_below
in mm/slab_common.c.


mm/slab_common.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 0042fb273..e254b2f55 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1130,7 +1130,7 @@ EXPORT_SYMBOL(kmalloc_large_node);

#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Randomize a generic freelist */
-static void freelist_randomize(struct rnd_state *state, unsigned int *list,
+static void freelist_randomize(unsigned int *list,
unsigned int count)
{
unsigned int rand;
@@ -1141,8 +1141,7 @@ static void freelist_randomize(struct rnd_state *state, unsigned int *list,

/* Fisher-Yates shuffle */
for (i = count - 1; i > 0; i--) {
- rand = prandom_u32_state(state);
- rand %= (i + 1);
+ rand = get_random_u32_below(i+1);
swap(list[i], list[rand]);
}
}
@@ -1151,7 +1150,6 @@ static void freelist_randomize(struct rnd_state *state, unsigned int *list,
int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
gfp_t gfp)
{
- struct rnd_state state;

if (count < 2 || cachep->random_seq)
return 0;
@@ -1160,10 +1158,7 @@ int cache_random_seq_create(struct kmem_cache *cachep, unsigned int count,
if (!cachep->random_seq)
return -ENOMEM;

- /* Get best entropy at this stage of boot */
- prandom_seed_state(&state, get_random_long());
-
- freelist_randomize(&state, cachep->random_seq, count);
+ freelist_randomize(cachep->random_seq, count);
return 0;
}

--
2.38.0

2023-01-13 22:04:49

by David Keisar Schm

[permalink] [raw]
Subject: [PATCH v4 1/3] slab_allocator: mm/slab.c: Replace invocation of weak PRNG

From: David Keisar Schmidt <[email protected]>

The Slab allocator randomization uses the prandom_u32 PRNG.
That was added to prevent attackers to obtain information on the heap state,
by randomizing the freelists state.

However, this PRNG turned out to be weak, as noted in commit c51f8f88d705
To fix it, we have changed the invocation of prandom_u32_state to get_random_u32
to ensure the PRNG is strong. Since a modulo operation is applied right after that,
we used get_random_u32_below, to achieve uniformity.

In addition, we changed the freelist_init_state union to struct,
since the rnd_state inside which is used to store the state of prandom_u32,
is not needed anymore, since get_random_u32 maintains its own state.

Signed-off-by: David Keisar Schmidt <[email protected]>
---
Changes since v3:
* edited commit message.

Changes since v2:
* replaced instances of get_random_u32 with get_random_u32_below
in mm/slab.c.


mm/slab.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 59c8e28f7..c259e0b09 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2360,20 +2360,17 @@ static void cache_init_objs_debug(struct kmem_cache *cachep, struct slab *slab)

#ifdef CONFIG_SLAB_FREELIST_RANDOM
/* Hold information during a freelist initialization */
-union freelist_init_state {
- struct {
- unsigned int pos;
- unsigned int *list;
- unsigned int count;
- };
- struct rnd_state rnd_state;
+struct freelist_init_state {
+ unsigned int pos;
+ unsigned int *list;
+ unsigned int count;
};

/*
* Initialize the state based on the randomization method available.
* return true if the pre-computed list is available, false otherwise.
*/
-static bool freelist_state_initialize(union freelist_init_state *state,
+static bool freelist_state_initialize(struct freelist_init_state *state,
struct kmem_cache *cachep,
unsigned int count)
{
@@ -2381,23 +2378,22 @@ static bool freelist_state_initialize(union freelist_init_state *state,
unsigned int rand;

/* Use best entropy available to define a random shift */
- rand = get_random_u32();
+ rand = get_random_u32_below(count);

/* Use a random state if the pre-computed list is not available */
if (!cachep->random_seq) {
- prandom_seed_state(&state->rnd_state, rand);
ret = false;
} else {
state->list = cachep->random_seq;
state->count = count;
- state->pos = rand % count;
+ state->pos = rand;
ret = true;
}
return ret;
}

/* Get the next entry on the list and randomize it using a random shift */
-static freelist_idx_t next_random_slot(union freelist_init_state *state)
+static freelist_idx_t next_random_slot(struct freelist_init_state *state)
{
if (state->pos >= state->count)
state->pos = 0;
@@ -2418,7 +2414,7 @@ static void swap_free_obj(struct slab *slab, unsigned int a, unsigned int b)
static bool shuffle_freelist(struct kmem_cache *cachep, struct slab *slab)
{
unsigned int objfreelist = 0, i, rand, count = cachep->num;
- union freelist_init_state state;
+ struct freelist_init_state state;
bool precomputed;

if (count < 2)
@@ -2447,8 +2443,7 @@ static bool shuffle_freelist(struct kmem_cache *cachep, struct slab *slab)

/* Fisher-Yates shuffle */
for (i = count - 1; i > 0; i--) {
- rand = prandom_u32_state(&state.rnd_state);
- rand %= (i + 1);
+ rand = get_random_u32_below(i+1);
swap_free_obj(slab, i, rand);
}
} else {
--
2.38.0

2023-01-13 22:06:17

by David Keisar Schm

[permalink] [raw]
Subject: [PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state

From: David Keisar Schmidt <[email protected]>

The memory randomization of the virtual address space of kernel memory regions
(physical memory mapping, vmalloc & vmemmap) inside arch/x86/mm/kaslr.c
is based on the function prandom_bytes_state which uses the prandom_u32 PRNG.

However, the seeding here is done by calling prandom_seed_state,
which effectively uses only 32bits of the seed, which means that observing ONE
region's offset (say 30 bits) can provide the attacker with 2 possible seeds
(from which the attacker can calculate the remaining two regions)

Hence, we implemented an adjusted version of prandom_seed_state, inside kaslr.c
so it takes advantage of all the seed's 64 bits. With this implementation,
enumerating over the seed is quite unfeasible, and attacking the linearity requires ~113
bits which we don't get with two exposed region offsets (but rather up to 30
bits each).

Signed-off-by: David Keisar Schmidt <[email protected]>
---
Changes since v3:
* We took a different approach, and replaced the invocation of
prandom_bytes_state, to a revised version which is more secure.

Changes since v2:
* edited commit message.


arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 557f0fe25..5fd73d5ad 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -59,6 +59,29 @@ static inline unsigned long get_padding(struct kaslr_memory_region *region)
{
return (region->size_tb << TB_SHIFT);
}
+static inline void _kaslr_prandom_seed_state(struct rnd_state *state, u64 seed)
+{
+ u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL;
+ // To take advantage of all 64 bits of the seed
+ u32 j = ((seed>>32) ^ (seed<<10)) & 0xffffffffUL;
+ state->s1 = __seed(i, 2U);
+ state->s2 = __seed(j, 8U);
+ /* Ensure no obvious linear relation with the previous states */
+ state->s3 = __seed(next_pseudo_random32(i+j), 16U);
+ state->s4 = __seed(next_pseudo_random32(j-((i>>16)^(i<<16))), 128U);
+
+ /* Calling RNG ten times to satisfy recurrence condition */
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+ prandom_u32_state(state);
+}

/* Initialize base and padding for each memory region randomized with KASLR */
void __init kernel_randomize_memory(void)
@@ -113,7 +136,8 @@ void __init kernel_randomize_memory(void)
for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
remain_entropy -= get_padding(&kaslr_regions[i]);

- prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
+ _kaslr_prandom_seed_state(&rand_state, kaslr_get_random_long
+ ("Memory"));

for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
unsigned long entropy;
--
2.38.0

2023-01-14 03:09:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state

On 1/13/23 13:39, [email protected] wrote:
> +{
> + u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL;
> + // To take advantage of all 64 bits of the seed
> + u32 j = ((seed>>32) ^ (seed<<10)) & 0xffffffffUL;

The and operation here is pointless. You are implicitly casting to u32.
Even if you had to mask explicitly, (u32) would be a lot more clear than
a hard-to-read constant.


> + state->s1 = __seed(i, 2U);
> + state->s2 = __seed(j, 8U);
> + /* Ensure no obvious linear relation with the previous states */
> + state->s3 = __seed(next_pseudo_random32(i+j), 16U);
> + state->s4 = __seed(next_pseudo_random32(j-((i>>16)^(i<<16))), 128U);
> +
> + /* Calling RNG ten times to satisfy recurrence condition */
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> + prandom_u32_state(state);
> +}

What recurrence relation is that? This looks like you are just trying to
re-invoke prandom_warmup(), but for heaven's sake, don't open-code it!

In fact, it seems you are just hacking an alternate version of
prandom_seed_full_state(). Why not just change prandom_seed_full_state()
if that is motivated? (You may need to factor out the iteration over all
CPUs.)

Honestly, I'm not a super expert in PRNGs, but this seems both slow (the
*only* motivation for prandom_u32() is to be fast) and pointless. If we
are relying on this for security, we should be using something that is
actually cryptographic; especially since this is only invoked on boot,
which is where entropy is maximally scarce and PRNGs maximally
vulnerable due to being close to their seed state.


Additionally, in terms of creating performant mixing functions, one
thing that comes to mind is the circular multiply:

static inline u32 circular_multiply(u32 a, u32 b)
{
u64 m = (u64)a * b;
return m + (m >> 32);
}

On x86, this is two instructions, even on 32 bits.

One can use ^ instead of + to make it a bit less algebraic, but I don't
know for sure that that is a net improvement.

-hpa

2023-01-18 01:07:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state



On 1/17/23 13:16, David Keisar Schm wrote:
>
> Because (the way we understand this), Kees Cook prefers to keep the
> original API, so that a fixed seed can be injected at will (for
> debugging). Seehttps://lkml.org/lkml/2023/1/6/772
> <https://lkml.org/lkml/2023/1/6/772>
>

This bothers me, because with FG-KASLR is *exactly* when a bad PRNG will
shine through.

-hpa