When exploiting memory vulnerabilities, "heap spraying" is a common
technique targeting those related to dynamic memory allocation (i.e. the
"heap"), and it plays an important role in a successful exploitation.
Basically, it is to overwrite the memory area of vulnerable object by
triggering allocation in other subsystems or modules and therefore
getting a reference to the targeted memory location. It's usable on
various types of vulnerablity including use after free (UAF), heap out-
of-bound write and etc.
There are (at least) two reasons why the heap can be sprayed: 1) generic
slab caches are shared among different subsystems and modules, and
2) dedicated slab caches could be merged with the generic ones.
Currently these two factors cannot be prevented at a low cost: the first
one is a widely used memory allocation mechanism, and shutting down slab
merging completely via `slub_nomerge` would be overkill.
To efficiently prevent heap spraying, we propose the following approach:
to create multiple copies of generic slab caches that will never be
merged, and random one of them will be used at allocation. The random
selection is based on the address of code that calls `kmalloc()`, which
means it is static at runtime (rather than dynamically determined at
each time of allocation, which could be bypassed by repeatedly spraying
in brute force). In other words, the randomness of cache selection will
be with respect to the code address rather than time, i.e. allocations
in different code paths would most likely pick different caches,
although kmalloc() at each place would use the same cache copy whenever
it is executed. In this way, the vulnerable object and memory allocated
in other subsystems and modules will (most probably) be on different
slab caches, which prevents the object from being sprayed.
Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code.
The overhead of performance has been tested on a 40-core x86 server by
comparing the results of `perf bench all` between the kernels with and
without this patch based on the latest linux-next kernel, which shows
minor difference. A subset of benchmarks are listed below:
sched/ sched/ syscall/ mem/ mem/
messaging pipe basic memcpy memset
(sec) (sec) (sec) (GB/sec) (GB/sec)
control1 0.019 5.459 0.733 15.258789 51.398026
control2 0.019 5.439 0.730 16.009221 48.828125
control3 0.019 5.282 0.735 16.009221 48.828125
control_avg 0.019 5.393 0.733 15.759077 49.684759
experiment1 0.019 5.374 0.741 15.500992 46.502976
experiment2 0.019 5.440 0.746 16.276042 51.398026
experiment3 0.019 5.242 0.752 15.258789 51.398026
experiment_avg 0.019 5.352 0.746 15.678608 49.766343
The overhead of memory usage was measured by executing `free` after boot
on a QEMU VM with 1GB total memory, and as expected, it's positively
correlated with # of cache copies:
control 4 copies 8 copies 16 copies
total 969.8M 968.2M 968.2M 968.2M
used 20.0M 21.9M 24.1M 26.7M
free 936.9M 933.6M 931.4M 928.6M
available 932.2M 928.8M 926.6M 923.9M
Signed-off-by: GONG, Ruiqi <[email protected]>
Co-developed-by: Xiu Jianfeng <[email protected]>
Signed-off-by: Xiu Jianfeng <[email protected]>
---
include/linux/percpu.h | 12 ++++++---
include/linux/slab.h | 20 ++++++++++++---
mm/Kconfig | 49 ++++++++++++++++++++++++++++++++++++
mm/kfence/kfence_test.c | 6 +++--
mm/slab.c | 2 +-
mm/slab.h | 2 +-
mm/slab_common.c | 55 +++++++++++++++++++++++++++++++++++++----
7 files changed, 130 insertions(+), 16 deletions(-)
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 42125cf9c506..bdcfc988e6db 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -34,6 +34,12 @@
#define PCPU_BITMAP_BLOCK_BITS (PCPU_BITMAP_BLOCK_SIZE >> \
PCPU_MIN_ALLOC_SHIFT)
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+#define PERCPU_DYNAMIC_SIZE_SHIFT 13
+#else
+#define PERCPU_DYNAMIC_SIZE_SHIFT 10
+#endif
+
/*
* Percpu allocator can serve percpu allocations before slab is
* initialized which allows slab to depend on the percpu allocator.
@@ -41,7 +47,7 @@
* for this. Keep PERCPU_DYNAMIC_RESERVE equal to or larger than
* PERCPU_DYNAMIC_EARLY_SIZE.
*/
-#define PERCPU_DYNAMIC_EARLY_SIZE (20 << 10)
+#define PERCPU_DYNAMIC_EARLY_SIZE (20 << PERCPU_DYNAMIC_SIZE_SHIFT)
/*
* PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy
@@ -55,9 +61,9 @@
* intelligent way to determine this would be nice.
*/
#if BITS_PER_LONG > 32
-#define PERCPU_DYNAMIC_RESERVE (28 << 10)
+#define PERCPU_DYNAMIC_RESERVE (28 << PERCPU_DYNAMIC_SIZE_SHIFT)
#else
-#define PERCPU_DYNAMIC_RESERVE (20 << 10)
+#define PERCPU_DYNAMIC_RESERVE (20 << PERCPU_DYNAMIC_SIZE_SHIFT)
#endif
extern void *pcpu_base_addr;
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 791f7453a04f..b7a5387f0dad 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -19,6 +19,9 @@
#include <linux/workqueue.h>
#include <linux/percpu-refcount.h>
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+#include <linux/hash.h>
+#endif
/*
* Flags to pass to kmem_cache_create().
@@ -351,7 +354,9 @@ static inline unsigned int arch_slab_minalign(void)
* kmem caches can have both accounted and unaccounted objects.
*/
enum kmalloc_cache_type {
- KMALLOC_NORMAL = 0,
+ KMALLOC_RANDOM_START = 0,
+ KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + CONFIG_RANDOM_KMALLOC_CACHES_NR - 1,
+ KMALLOC_NORMAL = KMALLOC_RANDOM_END,
#ifndef CONFIG_ZONE_DMA
KMALLOC_DMA = KMALLOC_NORMAL,
#endif
@@ -383,14 +388,21 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
(IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0) | \
(IS_ENABLED(CONFIG_MEMCG_KMEM) ? __GFP_ACCOUNT : 0))
-static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
+extern unsigned long random_kmalloc_seed;
+
+static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
{
/*
* The most common case is KMALLOC_NORMAL, so test for it
* with a single branch for all the relevant flags.
*/
if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+ return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
+ CONFIG_RANDOM_KMALLOC_CACHES_BITS);
+#else
return KMALLOC_NORMAL;
+#endif
/*
* At least one of the flags has to be set. Their priorities in
@@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
index = kmalloc_index(size);
return kmalloc_trace(
- kmalloc_caches[kmalloc_type(flags)][index],
+ kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, size);
}
return __kmalloc(size, flags);
@@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
index = kmalloc_index(size);
return kmalloc_node_trace(
- kmalloc_caches[kmalloc_type(flags)][index],
+ kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
flags, node, size);
}
return __kmalloc_node(size, flags, node);
diff --git a/mm/Kconfig b/mm/Kconfig
index a3c95338cd3a..6150e9a946a7 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -337,6 +337,55 @@ config SLUB_CPU_PARTIAL
which requires the taking of locks that may cause latency spikes.
Typically one would choose no for a realtime system.
+config RANDOM_KMALLOC_CACHES
+ default n
+ depends on SLUB
+ bool "Random slab caches for normal kmalloc"
+ help
+ A hardening feature that creates multiple copies of slab caches for
+ normal kmalloc allocation and makes kmalloc randomly pick one based
+ on code address, which makes the attackers unable to spray vulnerable
+ memory objects on the heap for exploiting memory vulnerabilities.
+
+choice
+ prompt "Number of random slab caches copies"
+ depends on RANDOM_KMALLOC_CACHES
+ default RANDOM_KMALLOC_CACHES_16
+ help
+ The number of copies of random slab caches. Bigger value makes the
+ potentially vulnerable memory object less likely to collide with
+ objects allocated from other subsystems or modules.
+
+config RANDOM_KMALLOC_CACHES_2
+ bool "2"
+
+config RANDOM_KMALLOC_CACHES_4
+ bool "4"
+
+config RANDOM_KMALLOC_CACHES_8
+ bool "8"
+
+config RANDOM_KMALLOC_CACHES_16
+ bool "16"
+
+endchoice
+
+config RANDOM_KMALLOC_CACHES_BITS
+ int
+ default 0 if !RANDOM_KMALLOC_CACHES
+ default 1 if RANDOM_KMALLOC_CACHES_2
+ default 2 if RANDOM_KMALLOC_CACHES_4
+ default 3 if RANDOM_KMALLOC_CACHES_8
+ default 4 if RANDOM_KMALLOC_CACHES_16
+
+config RANDOM_KMALLOC_CACHES_NR
+ int
+ default 1 if !RANDOM_KMALLOC_CACHES
+ default 2 if RANDOM_KMALLOC_CACHES_2
+ default 4 if RANDOM_KMALLOC_CACHES_4
+ default 8 if RANDOM_KMALLOC_CACHES_8
+ default 16 if RANDOM_KMALLOC_CACHES_16
+
endmenu # SLAB allocator options
config SHUFFLE_PAGE_ALLOCATOR
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
index 9e008a336d9f..7f5ffb490328 100644
--- a/mm/kfence/kfence_test.c
+++ b/mm/kfence/kfence_test.c
@@ -212,7 +212,8 @@ static void test_cache_destroy(void)
static inline size_t kmalloc_cache_alignment(size_t size)
{
- return kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]->align;
+ enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, _RET_IP_);
+ return kmalloc_caches[type][__kmalloc_index(size, false)]->align;
}
/* Must always inline to match stack trace against caller. */
@@ -282,8 +283,9 @@ static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocat
if (is_kfence_address(alloc)) {
struct slab *slab = virt_to_slab(alloc);
+ enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, _RET_IP_);
struct kmem_cache *s = test_cache ?:
- kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)];
+ kmalloc_caches[type][__kmalloc_index(size, false)];
/*
* Verify that various helpers return the right values
diff --git a/mm/slab.c b/mm/slab.c
index 88194391d553..9ad3d0f2d1a5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1670,7 +1670,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
if (freelist_size > KMALLOC_MAX_CACHE_SIZE) {
freelist_cache_size = PAGE_SIZE << get_order(freelist_size);
} else {
- freelist_cache = kmalloc_slab(freelist_size, 0u);
+ freelist_cache = kmalloc_slab(freelist_size, 0u, _RET_IP_);
if (!freelist_cache)
continue;
freelist_cache_size = freelist_cache->size;
diff --git a/mm/slab.h b/mm/slab.h
index 6a5633b25eb5..4ebe3bdfc17c 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -282,7 +282,7 @@ void setup_kmalloc_cache_index_table(void);
void create_kmalloc_caches(slab_flags_t);
/* Find the kmalloc slab corresponding for a certain size */
-struct kmem_cache *kmalloc_slab(size_t, gfp_t);
+struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller);
void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags,
int node, size_t orig_size,
diff --git a/mm/slab_common.c b/mm/slab_common.c
index ca8b9e587a55..dc1ecf19afd3 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -678,6 +678,11 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init =
{ /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ };
EXPORT_SYMBOL(kmalloc_caches);
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+unsigned long random_kmalloc_seed __ro_after_init;
+EXPORT_SYMBOL(random_kmalloc_seed);
+#endif
+
/*
* Conversion table for small slabs sizes / 8 to the index in the
* kmalloc array. This is necessary for slabs < 192 since we have non power
@@ -720,7 +725,7 @@ static inline unsigned int size_index_elem(unsigned int bytes)
* Find the kmem_cache structure that serves a given size of
* allocation
*/
-struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
+struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller)
{
unsigned int index;
@@ -735,7 +740,7 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
index = fls(size - 1);
}
- return kmalloc_caches[kmalloc_type(flags)][index];
+ return kmalloc_caches[kmalloc_type(flags, caller)][index];
}
size_t kmalloc_size_roundup(size_t size)
@@ -753,7 +758,7 @@ size_t kmalloc_size_roundup(size_t size)
return PAGE_SIZE << get_order(size);
/* The flags don't matter since size_index is common to all. */
- c = kmalloc_slab(size, GFP_KERNEL);
+ c = kmalloc_slab(size, GFP_KERNEL, _RET_IP_);
return c ? c->object_size : 0;
}
EXPORT_SYMBOL(kmalloc_size_roundup);
@@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
#define KMALLOC_RCL_NAME(sz)
#endif
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
+#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
+#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1
+#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 0] = "kmalloc-random-01-" #sz,
+#define KMA_RAND_2(sz) KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 1] = "kmalloc-random-02-" #sz,
+#endif
+#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 2
+#define KMA_RAND_3(sz) KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 2] = "kmalloc-random-03-" #sz,
+#define KMA_RAND_4(sz) KMA_RAND_3(sz) .name[KMALLOC_RANDOM_START + 3] = "kmalloc-random-04-" #sz,
+#endif
+#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 3
+#define KMA_RAND_5(sz) KMA_RAND_4(sz) .name[KMALLOC_RANDOM_START + 4] = "kmalloc-random-05-" #sz,
+#define KMA_RAND_6(sz) KMA_RAND_5(sz) .name[KMALLOC_RANDOM_START + 5] = "kmalloc-random-06-" #sz,
+#define KMA_RAND_7(sz) KMA_RAND_6(sz) .name[KMALLOC_RANDOM_START + 6] = "kmalloc-random-07-" #sz,
+#define KMA_RAND_8(sz) KMA_RAND_7(sz) .name[KMALLOC_RANDOM_START + 7] = "kmalloc-random-08-" #sz,
+#endif
+#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 4
+#define KMA_RAND_9(sz) KMA_RAND_8(sz) .name[KMALLOC_RANDOM_START + 8] = "kmalloc-random-09-" #sz,
+#define KMA_RAND_10(sz) KMA_RAND_9(sz) .name[KMALLOC_RANDOM_START + 9] = "kmalloc-random-10-" #sz,
+#define KMA_RAND_11(sz) KMA_RAND_10(sz) .name[KMALLOC_RANDOM_START + 10] = "kmalloc-random-11-" #sz,
+#define KMA_RAND_12(sz) KMA_RAND_11(sz) .name[KMALLOC_RANDOM_START + 11] = "kmalloc-random-12-" #sz,
+#define KMA_RAND_13(sz) KMA_RAND_12(sz) .name[KMALLOC_RANDOM_START + 12] = "kmalloc-random-13-" #sz,
+#define KMA_RAND_14(sz) KMA_RAND_13(sz) .name[KMALLOC_RANDOM_START + 13] = "kmalloc-random-14-" #sz,
+#define KMA_RAND_15(sz) KMA_RAND_14(sz) .name[KMALLOC_RANDOM_START + 14] = "kmalloc-random-15-" #sz,
+#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
+#endif
+#else // CONFIG_RANDOM_KMALLOC_CACHES
+#define KMALLOC_RANDOM_NAME(N, sz)
+#endif
+
#define INIT_KMALLOC_INFO(__size, __short_size) \
{ \
.name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \
KMALLOC_RCL_NAME(__short_size) \
KMALLOC_CGROUP_NAME(__short_size) \
KMALLOC_DMA_NAME(__short_size) \
+ KMALLOC_RANDOM_NAME(CONFIG_RANDOM_KMALLOC_CACHES_NR, __short_size) \
.size = __size, \
}
@@ -890,6 +927,11 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags)
flags |= SLAB_CACHE_DMA;
}
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+ if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END)
+ flags |= SLAB_NO_MERGE;
+#endif
+
if (minalign > ARCH_KMALLOC_MINALIGN) {
aligned_size = ALIGN(aligned_size, minalign);
aligned_idx = __kmalloc_index(aligned_size, false);
@@ -923,7 +965,7 @@ void __init create_kmalloc_caches(slab_flags_t flags)
/*
* Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined
*/
- for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) {
+ for (type = KMALLOC_RANDOM_START; type < NR_KMALLOC_TYPES; type++) {
for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
if (!kmalloc_caches[type][i])
new_kmalloc_cache(i, type, flags);
@@ -941,6 +983,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
new_kmalloc_cache(2, type, flags);
}
}
+#ifdef CONFIG_RANDOM_KMALLOC_CACHES
+ random_kmalloc_seed = get_random_u64();
+#endif
/* Kmalloc array is now usable */
slab_state = UP;
@@ -976,7 +1021,7 @@ void *__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller
return ret;
}
- s = kmalloc_slab(size, flags);
+ s = kmalloc_slab(size, flags, caller);
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;
--
2.25.1
On Fri, Jun 16, 2023 at 07:18:43PM +0800, GONG, Ruiqi wrote:
> When exploiting memory vulnerabilities, "heap spraying" is a common
> technique targeting those related to dynamic memory allocation (i.e. the
> "heap"), and it plays an important role in a successful exploitation.
> Basically, it is to overwrite the memory area of vulnerable object by
> triggering allocation in other subsystems or modules and therefore
> getting a reference to the targeted memory location. It's usable on
> various types of vulnerablity including use after free (UAF), heap out-
> of-bound write and etc.
>
> There are (at least) two reasons why the heap can be sprayed: 1) generic
> slab caches are shared among different subsystems and modules, and
> 2) dedicated slab caches could be merged with the generic ones.
> Currently these two factors cannot be prevented at a low cost: the first
> one is a widely used memory allocation mechanism, and shutting down slab
> merging completely via `slub_nomerge` would be overkill.
>
> To efficiently prevent heap spraying, we propose the following approach:
> to create multiple copies of generic slab caches that will never be
> merged, and random one of them will be used at allocation. The random
> selection is based on the address of code that calls `kmalloc()`, which
> means it is static at runtime (rather than dynamically determined at
> each time of allocation, which could be bypassed by repeatedly spraying
> in brute force). In other words, the randomness of cache selection will
> be with respect to the code address rather than time, i.e. allocations
> in different code paths would most likely pick different caches,
> although kmalloc() at each place would use the same cache copy whenever
> it is executed. In this way, the vulnerable object and memory allocated
> in other subsystems and modules will (most probably) be on different
> slab caches, which prevents the object from being sprayed.
>
> Meanwhile, the static random selection is further enhanced with a
> per-boot random seed, which prevents the attacker from finding a usable
> kmalloc that happens to pick the same cache with the vulnerable
> subsystem/module by analyzing the open source code.
>
> The overhead of performance has been tested on a 40-core x86 server by
> comparing the results of `perf bench all` between the kernels with and
> without this patch based on the latest linux-next kernel, which shows
> minor difference. A subset of benchmarks are listed below:
>
> sched/ sched/ syscall/ mem/ mem/
> messaging pipe basic memcpy memset
> (sec) (sec) (sec) (GB/sec) (GB/sec)
>
> control1 0.019 5.459 0.733 15.258789 51.398026
> control2 0.019 5.439 0.730 16.009221 48.828125
> control3 0.019 5.282 0.735 16.009221 48.828125
> control_avg 0.019 5.393 0.733 15.759077 49.684759
>
> experiment1 0.019 5.374 0.741 15.500992 46.502976
> experiment2 0.019 5.440 0.746 16.276042 51.398026
> experiment3 0.019 5.242 0.752 15.258789 51.398026
> experiment_avg 0.019 5.352 0.746 15.678608 49.766343
>
> The overhead of memory usage was measured by executing `free` after boot
> on a QEMU VM with 1GB total memory, and as expected, it's positively
> correlated with # of cache copies:
>
> control 4 copies 8 copies 16 copies
>
> total 969.8M 968.2M 968.2M 968.2M
> used 20.0M 21.9M 24.1M 26.7M
> free 936.9M 933.6M 931.4M 928.6M
> available 932.2M 928.8M 926.6M 923.9M
>
> Signed-off-by: GONG, Ruiqi <[email protected]>
> Co-developed-by: Xiu Jianfeng <[email protected]>
> Signed-off-by: Xiu Jianfeng <[email protected]>
I think this looks really good. Thanks for the respin! Some
nits/comments/questions below, but I think this can land and get
incrementally improved. Please consider it:
Reviewed-by: Kees Cook <[email protected]>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 791f7453a04f..b7a5387f0dad 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -19,6 +19,9 @@
> #include <linux/workqueue.h>
> #include <linux/percpu-refcount.h>
>
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +#include <linux/hash.h>
> +#endif
I think this can just be included unconditionally, yes?
> [...]
> +extern unsigned long random_kmalloc_seed;
> +
> +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
> {
> /*
> * The most common case is KMALLOC_NORMAL, so test for it
> * with a single branch for all the relevant flags.
> */
> if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
> + CONFIG_RANDOM_KMALLOC_CACHES_BITS);
> +#else
> return KMALLOC_NORMAL;
> +#endif
The commit log talks about having no runtime lookup, but that's not
entirely true, given this routine. And xor and a hash_64... I wonder how
expensive this is compared to some kind of constant expression that
could be computed at build time... (the xor should stay, but that's
"cheap").
>
> /*
> * At least one of the flags has to be set. Their priorities in
> @@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>
> index = kmalloc_index(size);
> return kmalloc_trace(
> - kmalloc_caches[kmalloc_type(flags)][index],
> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
> flags, size);
> }
> return __kmalloc(size, flags);
> @@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>
> index = kmalloc_index(size);
> return kmalloc_node_trace(
> - kmalloc_caches[kmalloc_type(flags)][index],
> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
> flags, node, size);
> }
> return __kmalloc_node(size, flags, node);
The use of _RET_IP_ is generally fine here, but I wonder about some of
the allocation wrappers (like devm_kmalloc(), etc). I think those aren't
being bucketed correctly? Have you checked that?
> [...]
> @@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
> #define KMALLOC_RCL_NAME(sz)
> #endif
>
> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
> +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
> +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
> +#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1
> +#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 0] = "kmalloc-random-01-" #sz,
I wonder if this name is getting too long? Should "random" be "rnd" ?
*shrug*
> [...]
> +#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
And if we wanted to save another character, this could be numbered 0-f,
but I defer these aesthetics to Vlastimil. :)
-Kees
--
Kees Cook
On 6/16/23 13:18, GONG, Ruiqi wrote:
> index a3c95338cd3a..6150e9a946a7 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -337,6 +337,55 @@ config SLUB_CPU_PARTIAL
> which requires the taking of locks that may cause latency spikes.
> Typically one would choose no for a realtime system.
>
> +config RANDOM_KMALLOC_CACHES
> + default n
> + depends on SLUB
> + bool "Random slab caches for normal kmalloc"
> + help
> + A hardening feature that creates multiple copies of slab caches for
> + normal kmalloc allocation and makes kmalloc randomly pick one based
> + on code address, which makes the attackers unable to spray vulnerable
> + memory objects on the heap for exploiting memory vulnerabilities.
> +
> +choice
> + prompt "Number of random slab caches copies"
> + depends on RANDOM_KMALLOC_CACHES
> + default RANDOM_KMALLOC_CACHES_16
> + help
> + The number of copies of random slab caches. Bigger value makes the
> + potentially vulnerable memory object less likely to collide with
> + objects allocated from other subsystems or modules.
When I read this, without further knowledge, why would I select anything
else than the largest value? It should mention memory overhead maybe?
Also would anyone really select only "2" and thus limit the collision
probability to 50% and not less? "4" also seems quite low for the given
purpose? Could we just pick and hardcode 8 or 16 and avoid the selection, at
least until there's some more experience with the whole approach?
On Thu, Jun 22, 2023 at 03:56:04PM +0200, Vlastimil Babka wrote:
> On 6/16/23 13:18, GONG, Ruiqi wrote:
> > index a3c95338cd3a..6150e9a946a7 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -337,6 +337,55 @@ config SLUB_CPU_PARTIAL
> > which requires the taking of locks that may cause latency spikes.
> > Typically one would choose no for a realtime system.
> >
> > +config RANDOM_KMALLOC_CACHES
> > + default n
> > + depends on SLUB
> > + bool "Random slab caches for normal kmalloc"
> > + help
> > + A hardening feature that creates multiple copies of slab caches for
> > + normal kmalloc allocation and makes kmalloc randomly pick one based
> > + on code address, which makes the attackers unable to spray vulnerable
> > + memory objects on the heap for exploiting memory vulnerabilities.
> > +
> > +choice
> > + prompt "Number of random slab caches copies"
> > + depends on RANDOM_KMALLOC_CACHES
> > + default RANDOM_KMALLOC_CACHES_16
> > + help
> > + The number of copies of random slab caches. Bigger value makes the
> > + potentially vulnerable memory object less likely to collide with
> > + objects allocated from other subsystems or modules.
>
> When I read this, without further knowledge, why would I select anything
> else than the largest value? It should mention memory overhead maybe?
Yeah, good idea.
> Also would anyone really select only "2" and thus limit the collision
> probability to 50% and not less? "4" also seems quite low for the given
> purpose? Could we just pick and hardcode 8 or 16 and avoid the selection, at
> least until there's some more experience with the whole approach?
I assume it was for doing performance (speed or space) analysis for
people interested in tuning it. The default is 16, which is what most
folks will end up with. i.e. I'm not sure I see a benefit to dropping 2
and 4, since I imagine people will either want the highest value (16),
or the ability to do a full comparison of each setting.
Regardless, I would be fine if we dropped 2 and 4, since I am focused on
the maximum number (16) of hash buckets. :)
-Kees
--
Kees Cook
On 2023/06/22 2:21, Kees Cook wrote:
> On Fri, Jun 16, 2023 at 07:18:43PM +0800, GONG, Ruiqi wrote:
>> [...]
>>
>> Signed-off-by: GONG, Ruiqi <[email protected]>
>> Co-developed-by: Xiu Jianfeng <[email protected]>
>> Signed-off-by: Xiu Jianfeng <[email protected]>
>
> I think this looks really good. Thanks for the respin! Some
> nits/comments/questions below, but I think this can land and get
> incrementally improved. Please consider it:
>
> Reviewed-by: Kees Cook <[email protected]>
>
Thanks, Kees!
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 791f7453a04f..b7a5387f0dad 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -19,6 +19,9 @@
>> #include <linux/workqueue.h>
>> #include <linux/percpu-refcount.h>
>>
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#include <linux/hash.h>
>> +#endif
>
> I think this can just be included unconditionally, yes?
>
True. Will change it.
>> [...]
>> +extern unsigned long random_kmalloc_seed;
>> +
>> +static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller)
>> {
>> /*
>> * The most common case is KMALLOC_NORMAL, so test for it
>> * with a single branch for all the relevant flags.
>> */
>> if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0))
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed,
>> + CONFIG_RANDOM_KMALLOC_CACHES_BITS);
>> +#else
>> return KMALLOC_NORMAL;
>> +#endif
>
> The commit log talks about having no runtime lookup, but that's not
> entirely true, given this routine. And xor and a hash_64... I wonder how
> expensive this is compared to some kind of constant expression that
> could be computed at build time... (the xor should stay, but that's
> "cheap").
>
To be precise, currently the random selection is static during each time
the system starts and runs, but not across different system startups. In
the commit log, I've added the following paragraph to explain this
feature, and will expand it a bit in the next version:
"Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code."
As for the build-time hashing, I think theoretically it could be
achieved, as long as we can have a compile-time random number generator.
However afaik the compiler has no support on this at the moment, and I
can only find a few discussions about this (in the C++ community).
>>
>> /*
>> * At least one of the flags has to be set. Their priorities in
>> @@ -577,7 +589,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags)
>>
>> index = kmalloc_index(size);
>> return kmalloc_trace(
>> - kmalloc_caches[kmalloc_type(flags)][index],
>> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>> flags, size);
>> }
>> return __kmalloc(size, flags);
>> @@ -593,7 +605,7 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla
>>
>> index = kmalloc_index(size);
>> return kmalloc_node_trace(
>> - kmalloc_caches[kmalloc_type(flags)][index],
>> + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
>> flags, node, size);
>> }
>> return __kmalloc_node(size, flags, node);
>
> The use of _RET_IP_ is generally fine here, but I wonder about some of
> the allocation wrappers (like devm_kmalloc(), etc). I think those aren't
> being bucketed correctly? Have you checked that?
>
Yes, I checked the distribution of used slab caches by booting the
kernel in QEMU, and /proc/slabinfo shows that they are in general evenly
spread among the copies.
I think in most cases, hashing on _RET_IP_ can effectively diverge
allocation in different subsystems/modules into different caches. For
example, using devm_kmalloc() on different places will acquire slab obj
on different cache copies:
xxx_func () {
devm_kmalloc() {
------------ always inlined alloc_dr() ---------------
__kmalloc_node_track_caller(..., _RET_IP_)
--------------------------------------------------------
}
next inst. of devm_kmalloc() // where _RET_IP_ takes
}
There are cases like sk_alloc(), where the wrapping is deep and all
struct sock would gather into a few caches:
sk_alloc() {
sk_prot_alloc() {
------------ always inlined kmalloc() -----------------
kmalloc_trace(... kmalloc_type(flags, _RET_IP_) ...) // A
__kmalloc(...) {
__do_kmalloc_node(..., _RET_IP_) // B
}
--------------------------------------------------------
next inst. of kmalloc() // where B takes
}
next inst. of sk_prot_alloc() // where A takes
}
But it's still better than nothing. Currently _RET_IP_ is the best
option I can think of, and in general it works.
>> [...]
>> @@ -776,12 +781,44 @@ EXPORT_SYMBOL(kmalloc_size_roundup);
>> #define KMALLOC_RCL_NAME(sz)
>> #endif
>>
>> +#ifdef CONFIG_RANDOM_KMALLOC_CACHES
>> +#define __KMALLOC_RANDOM_CONCAT(a, b) a ## b
>> +#define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz)
>> +#if CONFIG_RANDOM_KMALLOC_CACHES_BITS >= 1
>> +#define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 0] = "kmalloc-random-01-" #sz,
>
> I wonder if this name is getting too long? Should "random" be "rnd" ?
> *shrug*
>
Okay. Will do.
>> [...]
>> +#define KMA_RAND_16(sz) KMA_RAND_15(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-random-16-" #sz,
>
> And if we wanted to save another character, this could be numbered 0-f,
> but I defer these aesthetics to Vlastimil. :)
Same with me ;)
>
> -Kees
>
On 2023/06/23 4:10, Kees Cook wrote:
> On Thu, Jun 22, 2023 at 03:56:04PM +0200, Vlastimil Babka wrote:
>> On 6/16/23 13:18, GONG, Ruiqi wrote:
>>> index a3c95338cd3a..6150e9a946a7 100644
>>> --- a/mm/Kconfig
>>> +++ b/mm/Kconfig
>>> @@ -337,6 +337,55 @@ config SLUB_CPU_PARTIAL
>>> which requires the taking of locks that may cause latency spikes.
>>> Typically one would choose no for a realtime system.
>>>
>>> +config RANDOM_KMALLOC_CACHES
>>> + default n
>>> + depends on SLUB
>>> + bool "Random slab caches for normal kmalloc"
>>> + help
>>> + A hardening feature that creates multiple copies of slab caches for
>>> + normal kmalloc allocation and makes kmalloc randomly pick one based
>>> + on code address, which makes the attackers unable to spray vulnerable
>>> + memory objects on the heap for exploiting memory vulnerabilities.
>>> +
>>> +choice
>>> + prompt "Number of random slab caches copies"
>>> + depends on RANDOM_KMALLOC_CACHES
>>> + default RANDOM_KMALLOC_CACHES_16
>>> + help
>>> + The number of copies of random slab caches. Bigger value makes the
>>> + potentially vulnerable memory object less likely to collide with
>>> + objects allocated from other subsystems or modules.
>>
>> When I read this, without further knowledge, why would I select anything
>> else than the largest value? It should mention memory overhead maybe?
>
> Yeah, good idea.
>
No problem. Will add some text about memory overhead into the help
paragraph of RANDOM_KMALLOC_CACHES.
>> Also would anyone really select only "2" and thus limit the collision
>> probability to 50% and not less? "4" also seems quite low for the given
>> purpose? Could we just pick and hardcode 8 or 16 and avoid the selection, at
>> least until there's some more experience with the whole approach?
>
> I assume it was for doing performance (speed or space) analysis for
> people interested in tuning it. The default is 16, which is what most
> folks will end up with. i.e. I'm not sure I see a benefit to dropping 2
> and 4, since I imagine people will either want the highest value (16),
> or the ability to do a full comparison of each setting.
>
> Regardless, I would be fine if we dropped 2 and 4, since I am focused on
> the maximum number (16) of hash buckets. :)
>
It's true that 2 and 4 don't make much sense from the hardening
perspective, and I added them only to cover all possible choices. And
since the overhead difference between 8 and 16 is small, I will hardcode
16 and drop all other options in the next version.
> -Kees
>