2020-09-21 13:28:20

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
low-overhead sampling-based memory safety error detector of heap
use-after-free, invalid-free, and out-of-bounds access errors. This
series enables KFENCE for the x86 and arm64 architectures, and adds
KFENCE hooks to the SLAB and SLUB allocators.

KFENCE is designed to be enabled in production kernels, and has near
zero performance overhead. Compared to KASAN, KFENCE trades performance
for precision. The main motivation behind KFENCE's design, is that with
enough total uptime KFENCE will detect bugs in code paths not typically
exercised by non-production test workloads. One way to quickly achieve a
large enough total uptime is when the tool is deployed across a large
fleet of machines.

KFENCE objects each reside on a dedicated page, at either the left or
right page boundaries. The pages to the left and right of the object
page are "guard pages", whose attributes are changed to a protected
state, and cause page faults on any attempted access to them. Such page
faults are then intercepted by KFENCE, which handles the fault
gracefully by reporting a memory access error.

Guarded allocations are set up based on a sample interval (can be set
via kfence.sample_interval). After expiration of the sample interval,
the next allocation through the main allocator (SLAB or SLUB) returns a
guarded allocation from the KFENCE object pool. At this point, the timer
is reset, and the next allocation is set up after the expiration of the
interval.

To enable/disable a KFENCE allocation through the main allocator's
fast-path without overhead, KFENCE relies on static branches via the
static keys infrastructure. The static branch is toggled to redirect the
allocation to KFENCE.

The KFENCE memory pool is of fixed size, and if the pool is exhausted no
further KFENCE allocations occur. The default config is conservative
with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
pages).

We have verified by running synthetic benchmarks (sysbench I/O,
hackbench) that a kernel with KFENCE is performance-neutral compared to
a non-KFENCE baseline kernel.

KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
properties. The name "KFENCE" is a homage to the Electric Fence Malloc
Debugger [2].

For more details, see Documentation/dev-tools/kfence.rst added in the
series -- also viewable here:

https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst

[1] http://llvm.org/docs/GwpAsan.html
[2] https://linux.die.net/man/3/efence

v3:
* Rewrite SLAB/SLUB patch descriptions to clarify need for 'orig_size'.
* Various smaller fixes (see details in patches).

v2: https://lkml.kernel.org/r/[email protected]
* Various comment/documentation changes (see details in patches).
* Various smaller fixes (see details in patches).
* Change all reports to reference the kfence object, "kfence-#nn".
* Skip allocation/free internals stack trace.
* Rework KMEMLEAK compatibility patch.

RFC/v1: https://lkml.kernel.org/r/[email protected]

Alexander Potapenko (6):
mm: add Kernel Electric-Fence infrastructure
x86, kfence: enable KFENCE for x86
mm, kfence: insert KFENCE hooks for SLAB
mm, kfence: insert KFENCE hooks for SLUB
kfence, kasan: make KFENCE compatible with KASAN
kfence, kmemleak: make KFENCE compatible with KMEMLEAK

Marco Elver (4):
arm64, kfence: enable KFENCE for ARM64
kfence, lockdep: make KFENCE compatible with lockdep
kfence, Documentation: add KFENCE documentation
kfence: add test suite

Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kfence.rst | 291 +++++++++++
MAINTAINERS | 11 +
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kfence.h | 39 ++
arch/arm64/mm/fault.c | 4 +
arch/x86/Kconfig | 2 +
arch/x86/include/asm/kfence.h | 60 +++
arch/x86/mm/fault.c | 4 +
include/linux/kfence.h | 174 +++++++
init/main.c | 2 +
kernel/locking/lockdep.c | 8 +
lib/Kconfig.debug | 1 +
lib/Kconfig.kfence | 78 +++
mm/Makefile | 1 +
mm/kasan/common.c | 7 +
mm/kfence/Makefile | 6 +
mm/kfence/core.c | 733 +++++++++++++++++++++++++++
mm/kfence/kfence.h | 102 ++++
mm/kfence/kfence_test.c | 777 +++++++++++++++++++++++++++++
mm/kfence/report.c | 219 ++++++++
mm/kmemleak.c | 6 +
mm/slab.c | 46 +-
mm/slab_common.c | 6 +-
mm/slub.c | 72 ++-
25 files changed, 2619 insertions(+), 32 deletions(-)
create mode 100644 Documentation/dev-tools/kfence.rst
create mode 100644 arch/arm64/include/asm/kfence.h
create mode 100644 arch/x86/include/asm/kfence.h
create mode 100644 include/linux/kfence.h
create mode 100644 lib/Kconfig.kfence
create mode 100644 mm/kfence/Makefile
create mode 100644 mm/kfence/core.c
create mode 100644 mm/kfence/kfence.h
create mode 100644 mm/kfence/kfence_test.c
create mode 100644 mm/kfence/report.c

--
2.28.0.681.g6f77f65b4e-goog


2020-09-21 13:28:28

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

Add architecture specific implementation details for KFENCE and enable
KFENCE for the arm64 architecture. In particular, this implements the
required interface in <asm/kfence.h>. Currently, the arm64 version does
not yet use a statically allocated memory pool, at the cost of a pointer
load for each is_kfence_address().

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
For ARM64, we would like to solicit feedback on what the best option is
to obtain a constant address for __kfence_pool. One option is to declare
a memory range in the memory layout to be dedicated to KFENCE (like is
done for KASAN), however, it is unclear if this is the best available
option. We would like to avoid touching the memory layout.
---
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kfence.h | 39 +++++++++++++++++++++++++++++++++
arch/arm64/mm/fault.c | 4 ++++
3 files changed, 44 insertions(+)
create mode 100644 arch/arm64/include/asm/kfence.h

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbee..1acc6b2877c3 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -132,6 +132,7 @@ config ARM64
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
+ select HAVE_ARCH_KFENCE if (!ARM64_16K_PAGES && !ARM64_64K_PAGES)
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
new file mode 100644
index 000000000000..608dde80e5ca
--- /dev/null
+++ b/arch/arm64/include/asm/kfence.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_KFENCE_H
+#define __ASM_KFENCE_H
+
+#include <linux/kfence.h>
+#include <linux/log2.h>
+#include <linux/mm.h>
+
+#include <asm/cacheflush.h>
+
+#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync"
+
+/*
+ * FIXME: Support HAVE_ARCH_KFENCE_STATIC_POOL: Use the statically allocated
+ * __kfence_pool, to avoid the extra pointer load for is_kfence_address(). By
+ * default, however, we do not have struct pages for static allocations.
+ */
+
+static inline bool arch_kfence_initialize_pool(void)
+{
+ const unsigned int num_pages = ilog2(roundup_pow_of_two(KFENCE_POOL_SIZE / PAGE_SIZE));
+ struct page *pages = alloc_pages(GFP_KERNEL, num_pages);
+
+ if (!pages)
+ return false;
+
+ __kfence_pool = page_address(pages);
+ return true;
+}
+
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+ set_memory_valid(addr, 1, !protect);
+
+ return true;
+}
+
+#endif /* __ASM_KFENCE_H */
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index f07333e86c2f..d5b72ecbeeea 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -10,6 +10,7 @@
#include <linux/acpi.h>
#include <linux/bitfield.h>
#include <linux/extable.h>
+#include <linux/kfence.h>
#include <linux/signal.h>
#include <linux/mm.h>
#include <linux/hardirq.h>
@@ -310,6 +311,9 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
"Ignoring spurious kernel translation fault at virtual address %016lx\n", addr))
return;

+ if (kfence_handle_page_fault(addr))
+ return;
+
if (is_el1_permission_fault(addr, esr, regs)) {
if (esr & ESR_ELx_WNR)
msg = "write to read-only memory";
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:28:39

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 05/10] mm, kfence: insert KFENCE hooks for SLUB

From: Alexander Potapenko <[email protected]>

Inserts KFENCE hooks into the SLUB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Marco Elver <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
v3:
* Rewrite patch description to clarify need for 'orig_size'
[reported by Christopher Lameter].
---
mm/slub.c | 72 ++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 53 insertions(+), 19 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index d4177aecedf6..5c5a13a7857c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -27,6 +27,7 @@
#include <linux/ctype.h>
#include <linux/debugobjects.h>
#include <linux/kallsyms.h>
+#include <linux/kfence.h>
#include <linux/memory.h>
#include <linux/math64.h>
#include <linux/fault-inject.h>
@@ -1557,6 +1558,11 @@ static inline bool slab_free_freelist_hook(struct kmem_cache *s,
void *old_tail = *tail ? *tail : *head;
int rsize;

+ if (is_kfence_address(next)) {
+ slab_free_hook(s, next);
+ return true;
+ }
+
/* Head and tail of the reconstructed freelist */
*head = NULL;
*tail = NULL;
@@ -2660,7 +2666,8 @@ static inline void *get_freelist(struct kmem_cache *s, struct page *page)
* already disabled (which is the case for bulk allocation).
*/
static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
+ unsigned long addr, struct kmem_cache_cpu *c,
+ size_t orig_size)
{
void *freelist;
struct page *page;
@@ -2763,7 +2770,8 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
* cpu changes by refetching the per cpu area pointer.
*/
static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
- unsigned long addr, struct kmem_cache_cpu *c)
+ unsigned long addr, struct kmem_cache_cpu *c,
+ size_t orig_size)
{
void *p;
unsigned long flags;
@@ -2778,7 +2786,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
c = this_cpu_ptr(s->cpu_slab);
#endif

- p = ___slab_alloc(s, gfpflags, node, addr, c);
+ p = ___slab_alloc(s, gfpflags, node, addr, c, orig_size);
local_irq_restore(flags);
return p;
}
@@ -2805,7 +2813,7 @@ static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s,
* Otherwise we can simply pick the next object from the lockless free list.
*/
static __always_inline void *slab_alloc_node(struct kmem_cache *s,
- gfp_t gfpflags, int node, unsigned long addr)
+ gfp_t gfpflags, int node, unsigned long addr, size_t orig_size)
{
void *object;
struct kmem_cache_cpu *c;
@@ -2816,6 +2824,11 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
s = slab_pre_alloc_hook(s, &objcg, 1, gfpflags);
if (!s)
return NULL;
+
+ object = kfence_alloc(s, orig_size, gfpflags);
+ if (unlikely(object))
+ goto out;
+
redo:
/*
* Must read kmem_cache cpu data via this cpu ptr. Preemption is
@@ -2853,7 +2866,7 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
object = c->freelist;
page = c->page;
if (unlikely(!object || !node_match(page, node))) {
- object = __slab_alloc(s, gfpflags, node, addr, c);
+ object = __slab_alloc(s, gfpflags, node, addr, c, orig_size);
stat(s, ALLOC_SLOWPATH);
} else {
void *next_object = get_freepointer_safe(s, object);
@@ -2889,20 +2902,21 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
if (unlikely(slab_want_init_on_alloc(gfpflags, s)) && object)
memset(object, 0, s->object_size);

+out:
slab_post_alloc_hook(s, objcg, gfpflags, 1, &object);

return object;
}

static __always_inline void *slab_alloc(struct kmem_cache *s,
- gfp_t gfpflags, unsigned long addr)
+ gfp_t gfpflags, unsigned long addr, size_t orig_size)
{
- return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr);
+ return slab_alloc_node(s, gfpflags, NUMA_NO_NODE, addr, orig_size);
}

void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_);
+ void *ret = slab_alloc(s, gfpflags, _RET_IP_, s->object_size);

trace_kmem_cache_alloc(_RET_IP_, ret, s->object_size,
s->size, gfpflags);
@@ -2914,7 +2928,7 @@ EXPORT_SYMBOL(kmem_cache_alloc);
#ifdef CONFIG_TRACING
void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t gfpflags, size_t size)
{
- void *ret = slab_alloc(s, gfpflags, _RET_IP_);
+ void *ret = slab_alloc(s, gfpflags, _RET_IP_, size);
trace_kmalloc(_RET_IP_, ret, size, s->size, gfpflags);
ret = kasan_kmalloc(s, ret, size, gfpflags);
return ret;
@@ -2925,7 +2939,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
#ifdef CONFIG_NUMA
void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_);
+ void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, s->object_size);

trace_kmem_cache_alloc_node(_RET_IP_, ret,
s->object_size, s->size, gfpflags, node);
@@ -2939,7 +2953,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *s,
gfp_t gfpflags,
int node, size_t size)
{
- void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_);
+ void *ret = slab_alloc_node(s, gfpflags, node, _RET_IP_, size);

trace_kmalloc_node(_RET_IP_, ret,
size, s->size, gfpflags, node);
@@ -2973,6 +2987,9 @@ static void __slab_free(struct kmem_cache *s, struct page *page,

stat(s, FREE_SLOWPATH);

+ if (kfence_free(head))
+ return;
+
if (kmem_cache_debug(s) &&
!free_debug_processing(s, page, head, tail, cnt, addr))
return;
@@ -3216,6 +3233,13 @@ int build_detached_freelist(struct kmem_cache *s, size_t size,
df->s = cache_from_obj(s, object); /* Support for memcg */
}

+ if (is_kfence_address(object)) {
+ slab_free_hook(df->s, object);
+ WARN_ON(!kfence_free(object));
+ p[size] = NULL; /* mark object processed */
+ return size;
+ }
+
/* Start new detached freelist */
df->page = page;
set_freepointer(df->s, object, NULL);
@@ -3290,8 +3314,14 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
c = this_cpu_ptr(s->cpu_slab);

for (i = 0; i < size; i++) {
- void *object = c->freelist;
+ void *object = kfence_alloc(s, s->object_size, flags);

+ if (unlikely(object)) {
+ p[i] = object;
+ continue;
+ }
+
+ object = c->freelist;
if (unlikely(!object)) {
/*
* We may have removed an object from c->freelist using
@@ -3307,7 +3337,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
* of re-populating per CPU c->freelist
*/
p[i] = ___slab_alloc(s, flags, NUMA_NO_NODE,
- _RET_IP_, c);
+ _RET_IP_, c, size);
if (unlikely(!p[i]))
goto error;

@@ -3962,7 +3992,7 @@ void *__kmalloc(size_t size, gfp_t flags)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;

- ret = slab_alloc(s, flags, _RET_IP_);
+ ret = slab_alloc(s, flags, _RET_IP_, size);

trace_kmalloc(_RET_IP_, ret, size, s->size, flags);

@@ -4010,7 +4040,7 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;

- ret = slab_alloc_node(s, flags, node, _RET_IP_);
+ ret = slab_alloc_node(s, flags, node, _RET_IP_, size);

trace_kmalloc_node(_RET_IP_, ret, size, s->size, flags, node);

@@ -4036,6 +4066,7 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
struct kmem_cache *s;
unsigned int offset;
size_t object_size;
+ bool is_kfence = is_kfence_address(ptr);

ptr = kasan_reset_tag(ptr);

@@ -4048,10 +4079,13 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
to_user, 0, n);

/* Find offset within object. */
- offset = (ptr - page_address(page)) % s->size;
+ if (is_kfence)
+ offset = ptr - kfence_object_start(ptr);
+ else
+ offset = (ptr - page_address(page)) % s->size;

/* Adjust for redzone and reject if within the redzone. */
- if (kmem_cache_debug_flags(s, SLAB_RED_ZONE)) {
+ if (!is_kfence && kmem_cache_debug_flags(s, SLAB_RED_ZONE)) {
if (offset < s->red_left_pad)
usercopy_abort("SLUB object in left red zone",
s->name, to_user, offset, n);
@@ -4460,7 +4494,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;

- ret = slab_alloc(s, gfpflags, caller);
+ ret = slab_alloc(s, gfpflags, caller, size);

/* Honor the call site pointer we received. */
trace_kmalloc(caller, ret, size, s->size, gfpflags);
@@ -4491,7 +4525,7 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
if (unlikely(ZERO_OR_NULL_PTR(s)))
return s;

- ret = slab_alloc_node(s, gfpflags, node, caller);
+ ret = slab_alloc_node(s, gfpflags, node, caller, size);

/* Honor the call site pointer we received. */
trace_kmalloc_node(caller, ret, size, s->size, gfpflags, node);
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:28:44

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 07/10] kfence, kmemleak: make KFENCE compatible with KMEMLEAK

From: Alexander Potapenko <[email protected]>

Add compatibility with KMEMLEAK, by making KMEMLEAK aware of the KFENCE
memory pool. This allows building debug kernels with both enabled, which
also helped in debugging KFENCE.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Marco Elver <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
* Rework using delete_object_part() [suggested by Catalin Marinas].
---
mm/kmemleak.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index 5e252d91eb14..feff16068e8e 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -97,6 +97,7 @@
#include <linux/atomic.h>

#include <linux/kasan.h>
+#include <linux/kfence.h>
#include <linux/kmemleak.h>
#include <linux/memory_hotplug.h>

@@ -1948,6 +1949,11 @@ void __init kmemleak_init(void)
KMEMLEAK_GREY, GFP_ATOMIC);
create_object((unsigned long)__bss_start, __bss_stop - __bss_start,
KMEMLEAK_GREY, GFP_ATOMIC);
+#if defined(CONFIG_KFENCE) && defined(CONFIG_HAVE_ARCH_KFENCE_STATIC_POOL)
+ /* KFENCE objects are located in .bss, which may confuse kmemleak. Skip them. */
+ delete_object_part((unsigned long)__kfence_pool, KFENCE_POOL_SIZE);
+#endif
+
/* only register .data..ro_after_init if not within .data */
if (&__start_ro_after_init < &_sdata || &__end_ro_after_init > &_edata)
create_object((unsigned long)__start_ro_after_init,
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:28:49

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 10/10] kfence: add test suite

Add KFENCE test suite, testing various error detection scenarios. Makes
use of KUnit for test organization. Since KFENCE's interface to obtain
error reports is via the console, the test verifies that KFENCE outputs
expected reports to the console.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
v3:
* Lower line buffer size to avoid warnings of using more than 1024 bytes
stack usage [reported by kernel test robot <[email protected]>].

v2:
* Update for shortened memory corruption report.
---
lib/Kconfig.kfence | 13 +
mm/kfence/Makefile | 3 +
mm/kfence/kfence_test.c | 777 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 793 insertions(+)
create mode 100644 mm/kfence/kfence_test.c

diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence
index 872bcbdd8cc4..46d9b6693abb 100644
--- a/lib/Kconfig.kfence
+++ b/lib/Kconfig.kfence
@@ -62,4 +62,17 @@ config KFENCE_STRESS_TEST_FAULTS

The option is only to test KFENCE; set to 0 if you are unsure.

+config KFENCE_KUNIT_TEST
+ tristate "KFENCE integration test suite" if !KUNIT_ALL_TESTS
+ default KUNIT_ALL_TESTS
+ depends on TRACEPOINTS && KUNIT
+ help
+ Test suite for KFENCE, testing various error detection scenarios with
+ various allocation types, and checking that reports are correctly
+ output to console.
+
+ Say Y here if you want the test to be built into the kernel and run
+ during boot; say M if you want the test to build as a module; say N
+ if you are unsure.
+
endif # KFENCE
diff --git a/mm/kfence/Makefile b/mm/kfence/Makefile
index d991e9a349f0..6872cd5e5390 100644
--- a/mm/kfence/Makefile
+++ b/mm/kfence/Makefile
@@ -1,3 +1,6 @@
# SPDX-License-Identifier: GPL-2.0

obj-$(CONFIG_KFENCE) := core.o report.o
+
+CFLAGS_kfence_test.o := -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
+obj-$(CONFIG_KFENCE_KUNIT_TEST) += kfence_test.o
diff --git a/mm/kfence/kfence_test.c b/mm/kfence/kfence_test.c
new file mode 100644
index 000000000000..2ecd87668a74
--- /dev/null
+++ b/mm/kfence/kfence_test.c
@@ -0,0 +1,777 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KFENCE memory safety error detector. Since the interface with
+ * which KFENCE's reports are obtained is via the console, this is the output we
+ * should verify. For each test case checks the presence (or absence) of
+ * generated reports. Relies on 'console' tracepoint to capture reports as they
+ * appear in the kernel log.
+ *
+ * Copyright (C) 2020, Google LLC.
+ * Author: Alexander Potapenko <[email protected]>
+ * Marco Elver <[email protected]>
+ */
+
+#include <kunit/test.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kfence.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+#include "kfence.h"
+
+/* Report as observed from console. */
+static struct {
+ spinlock_t lock;
+ int nlines;
+ char lines[2][256];
+} observed = {
+ .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+ unsigned long flags;
+ int nlines;
+
+ spin_lock_irqsave(&observed.lock, flags);
+ nlines = observed.nlines;
+
+ if (strnstr(buf, "BUG: KFENCE: ", len) && strnstr(buf, "test_", len)) {
+ /*
+ * KFENCE report and related to the test.
+ *
+ * The provided @buf is not NUL-terminated; copy no more than
+ * @len bytes and let strscpy() add the missing NUL-terminator.
+ */
+ strscpy(observed.lines[0], buf, min(len + 1, sizeof(observed.lines[0])));
+ nlines = 1;
+ } else if (nlines == 1 && (strnstr(buf, "at 0x", len) || strnstr(buf, "of 0x", len))) {
+ strscpy(observed.lines[nlines++], buf, min(len + 1, sizeof(observed.lines[0])));
+ }
+
+ WRITE_ONCE(observed.nlines, nlines); /* Publish new nlines. */
+ spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+ return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+ enum kfence_error_type type; /* The type or error. */
+ void *fn; /* Function pointer to expected function where access occurred. */
+ char *addr; /* Address at which the bad access occurred. */
+};
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+ bool ret = false;
+ unsigned long flags;
+ typeof(observed.lines) expect;
+ const char *end;
+ char *cur;
+
+ /* Doubled-checked locking. */
+ if (!report_available())
+ return false;
+
+ /* Generate expected report contents. */
+
+ /* Title */
+ cur = expect[0];
+ end = &expect[0][sizeof(expect[0]) - 1];
+ switch (r->type) {
+ case KFENCE_ERROR_OOB:
+ cur += scnprintf(cur, end - cur, "BUG: KFENCE: out-of-bounds");
+ break;
+ case KFENCE_ERROR_UAF:
+ cur += scnprintf(cur, end - cur, "BUG: KFENCE: use-after-free");
+ break;
+ case KFENCE_ERROR_CORRUPTION:
+ cur += scnprintf(cur, end - cur, "BUG: KFENCE: memory corruption");
+ break;
+ case KFENCE_ERROR_INVALID:
+ cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid access");
+ break;
+ case KFENCE_ERROR_INVALID_FREE:
+ cur += scnprintf(cur, end - cur, "BUG: KFENCE: invalid free");
+ break;
+ }
+
+ scnprintf(cur, end - cur, " in %pS", r->fn);
+ /* The exact offset won't match, remove it; also strip module name. */
+ cur = strchr(expect[0], '+');
+ if (cur)
+ *cur = '\0';
+
+ /* Access information */
+ cur = expect[1];
+ end = &expect[1][sizeof(expect[1]) - 1];
+
+ switch (r->type) {
+ case KFENCE_ERROR_OOB:
+ cur += scnprintf(cur, end - cur, "Out-of-bounds access at");
+ break;
+ case KFENCE_ERROR_UAF:
+ cur += scnprintf(cur, end - cur, "Use-after-free access at");
+ break;
+ case KFENCE_ERROR_CORRUPTION:
+ cur += scnprintf(cur, end - cur, "Corrupted memory at");
+ break;
+ case KFENCE_ERROR_INVALID:
+ cur += scnprintf(cur, end - cur, "Invalid access at");
+ break;
+ case KFENCE_ERROR_INVALID_FREE:
+ cur += scnprintf(cur, end - cur, "Invalid free of");
+ break;
+ }
+
+ cur += scnprintf(cur, end - cur, " 0x" PTR_FMT, (void *)r->addr);
+
+ spin_lock_irqsave(&observed.lock, flags);
+ if (!report_available())
+ goto out; /* A new report is being captured. */
+
+ /* Finally match expected output to what we actually observed. */
+ ret = strstr(observed.lines[0], expect[0]) && strstr(observed.lines[1], expect[1]);
+out:
+ spin_unlock_irqrestore(&observed.lock, flags);
+ return ret;
+}
+
+/* ===== Test cases ===== */
+
+#define TEST_PRIV_WANT_MEMCACHE ((void *)1)
+
+/* Cache used by tests; if NULL, allocate from kmalloc instead. */
+static struct kmem_cache *test_cache;
+
+static size_t setup_test_cache(struct kunit *test, size_t size, slab_flags_t flags,
+ void (*ctor)(void *))
+{
+ if (test->priv != TEST_PRIV_WANT_MEMCACHE)
+ return size;
+
+ kunit_info(test, "%s: size=%zu, ctor=%ps\n", __func__, size, ctor);
+
+ /*
+ * Use SLAB_NOLEAKTRACE to prevent merging with existing caches. Any
+ * other flag in SLAB_NEVER_MERGE also works. Use SLAB_ACCOUNT to
+ * allocate via memcg, if enabled.
+ */
+ flags |= SLAB_NOLEAKTRACE | SLAB_ACCOUNT;
+ test_cache = kmem_cache_create("test", size, 1, flags, ctor);
+ KUNIT_ASSERT_TRUE_MSG(test, test_cache, "could not create cache");
+
+ return size;
+}
+
+static void test_cache_destroy(void)
+{
+ if (!test_cache)
+ return;
+
+ kmem_cache_destroy(test_cache);
+ test_cache = NULL;
+}
+
+static inline size_t kmalloc_cache_alignment(size_t size)
+{
+ return kmalloc_caches[kmalloc_type(GFP_KERNEL)][kmalloc_index(size)]->align;
+}
+
+/* Must always inline to match stack trace against caller. */
+static __always_inline void test_free(void *ptr)
+{
+ if (test_cache)
+ kmem_cache_free(test_cache, ptr);
+ else
+ kfree(ptr);
+}
+
+/*
+ * If this should be a KFENCE allocation, and on which side the allocation and
+ * the closest guard page should be.
+ */
+enum allocation_policy {
+ ALLOCATE_ANY, /* KFENCE, any side. */
+ ALLOCATE_LEFT, /* KFENCE, left side of page. */
+ ALLOCATE_RIGHT, /* KFENCE, right side of page. */
+ ALLOCATE_NONE, /* No KFENCE allocation. */
+};
+
+/*
+ * Try to get a guarded allocation from KFENCE. Uses either kmalloc() or the
+ * current test_cache if set up.
+ */
+static void *test_alloc(struct kunit *test, size_t size, gfp_t gfp, enum allocation_policy policy)
+{
+ void *alloc;
+ unsigned long timeout, resched_after;
+ const char *policy_name;
+
+ switch (policy) {
+ case ALLOCATE_ANY:
+ policy_name = "any";
+ break;
+ case ALLOCATE_LEFT:
+ policy_name = "left";
+ break;
+ case ALLOCATE_RIGHT:
+ policy_name = "right";
+ break;
+ case ALLOCATE_NONE:
+ policy_name = "none";
+ break;
+ }
+
+ kunit_info(test, "%s: size=%zu, gfp=%x, policy=%s, cache=%i\n", __func__, size, gfp,
+ policy_name, !!test_cache);
+
+ /*
+ * 100x the sample interval should be more than enough to ensure we get
+ * a KFENCE allocation eventually.
+ */
+ timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+ /*
+ * Especially for non-preemption kernels, ensure the allocation-gate
+ * timer has time to catch up.
+ */
+ resched_after = jiffies + msecs_to_jiffies(CONFIG_KFENCE_SAMPLE_INTERVAL);
+ do {
+ if (test_cache)
+ alloc = kmem_cache_alloc(test_cache, gfp);
+ else
+ alloc = kmalloc(size, gfp);
+
+ if (is_kfence_address(alloc)) {
+ if (policy == ALLOCATE_ANY)
+ return alloc;
+ if (policy == ALLOCATE_LEFT && IS_ALIGNED((unsigned long)alloc, PAGE_SIZE))
+ return alloc;
+ if (policy == ALLOCATE_RIGHT &&
+ !IS_ALIGNED((unsigned long)alloc, PAGE_SIZE))
+ return alloc;
+ } else if (policy == ALLOCATE_NONE)
+ return alloc;
+
+ test_free(alloc);
+
+ if (time_after(jiffies, resched_after))
+ cond_resched();
+ } while (time_before(jiffies, timeout));
+
+ KUNIT_ASSERT_TRUE_MSG(test, false, "failed to allocate from KFENCE");
+ return NULL; /* Unreachable. */
+}
+
+static void test_out_of_bounds_read(struct kunit *test)
+{
+ size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_OOB,
+ .fn = test_out_of_bounds_read,
+ };
+ char *buf;
+
+ setup_test_cache(test, size, 0, NULL);
+
+ /*
+ * If we don't have our own cache, adjust based on alignment, so that we
+ * actually access guard pages on either side.
+ */
+ if (!test_cache)
+ size = kmalloc_cache_alignment(size);
+
+ /* Test both sides. */
+
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT);
+ expect.addr = buf - 1;
+ READ_ONCE(*expect.addr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+ test_free(buf);
+
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+ expect.addr = buf + size;
+ READ_ONCE(*expect.addr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+ test_free(buf);
+}
+
+static void test_use_after_free_read(struct kunit *test)
+{
+ const size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_UAF,
+ .fn = test_use_after_free_read,
+ };
+
+ setup_test_cache(test, size, 0, NULL);
+ expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ test_free(expect.addr);
+ READ_ONCE(*expect.addr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_double_free(struct kunit *test)
+{
+ const size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_INVALID_FREE,
+ .fn = test_double_free,
+ };
+
+ setup_test_cache(test, size, 0, NULL);
+ expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ test_free(expect.addr);
+ test_free(expect.addr); /* Double-free. */
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_invalid_addr_free(struct kunit *test)
+{
+ const size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_INVALID_FREE,
+ .fn = test_invalid_addr_free,
+ };
+ char *buf;
+
+ setup_test_cache(test, size, 0, NULL);
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ expect.addr = buf + 1; /* Free on invalid address. */
+ test_free(expect.addr); /* Invalid address free. */
+ test_free(buf); /* No error. */
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * KFENCE is unable to detect an OOB if the allocation's alignment requirements
+ * leave a gap between the object and the guard page. Specifically, an
+ * allocation of e.g. 73 bytes is aligned on 8 and 128 bytes for SLUB or SLAB
+ * respectively. Therefore it is impossible for the allocated object to adhere
+ * to either of the page boundaries.
+ *
+ * However, we test that an access to memory beyond the gap result in KFENCE
+ * detecting an OOB access.
+ */
+static void test_kmalloc_aligned_oob_read(struct kunit *test)
+{
+ const size_t size = 73;
+ const size_t align = kmalloc_cache_alignment(size);
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_OOB,
+ .fn = test_kmalloc_aligned_oob_read,
+ };
+ char *buf;
+
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+
+ /*
+ * The object is offset to the right, so there won't be an OOB to the
+ * left of it.
+ */
+ READ_ONCE(*(buf - 1));
+ KUNIT_EXPECT_FALSE(test, report_available());
+
+ /*
+ * @buf must be aligned on @align, therefore buf + size belongs to the
+ * same page -> no OOB.
+ */
+ READ_ONCE(*(buf + size));
+ KUNIT_EXPECT_FALSE(test, report_available());
+
+ /* Overflowing by @align bytes will result in an OOB. */
+ expect.addr = buf + size + align;
+ READ_ONCE(*expect.addr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+
+ test_free(buf);
+}
+
+static void test_kmalloc_aligned_oob_write(struct kunit *test)
+{
+ const size_t size = 73;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_CORRUPTION,
+ .fn = test_kmalloc_aligned_oob_write,
+ };
+ char *buf;
+
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT);
+ /*
+ * The object is offset to the right, so we won't get a page
+ * fault immediately after it.
+ */
+ expect.addr = buf + size;
+ WRITE_ONCE(*expect.addr, READ_ONCE(*expect.addr) + 1);
+ KUNIT_EXPECT_FALSE(test, report_available());
+ test_free(buf);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test cache shrinking and destroying with KFENCE. */
+static void test_shrink_memcache(struct kunit *test)
+{
+ const size_t size = 32;
+ void *buf;
+
+ setup_test_cache(test, size, 0, NULL);
+ KUNIT_EXPECT_TRUE(test, test_cache);
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ kmem_cache_shrink(test_cache);
+ test_free(buf);
+
+ KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+static void ctor_set_x(void *obj)
+{
+ /* Every object has at least 8 bytes. */
+ memset(obj, 'x', 8);
+}
+
+/* Ensure that SL*B does not modify KFENCE objects on bulk free. */
+static void test_free_bulk(struct kunit *test)
+{
+ int iter;
+
+ for (iter = 0; iter < 5; iter++) {
+ const size_t size = setup_test_cache(test, 8 + prandom_u32_max(300), 0,
+ (iter & 1) ? ctor_set_x : NULL);
+ void *objects[] = {
+ test_alloc(test, size, GFP_KERNEL, ALLOCATE_RIGHT),
+ test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+ test_alloc(test, size, GFP_KERNEL, ALLOCATE_LEFT),
+ test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+ test_alloc(test, size, GFP_KERNEL, ALLOCATE_NONE),
+ };
+
+ kmem_cache_free_bulk(test_cache, ARRAY_SIZE(objects), objects);
+ KUNIT_ASSERT_FALSE(test, report_available());
+ test_cache_destroy();
+ }
+}
+
+/* Test init-on-free works. */
+static void test_init_on_free(struct kunit *test)
+{
+ const size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_UAF,
+ .fn = test_init_on_free,
+ };
+ int i;
+
+ if (!IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON))
+ return;
+ /* Assume it hasn't been disabled on command line. */
+
+ setup_test_cache(test, size, 0, NULL);
+ expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ for (i = 0; i < size; i++)
+ expect.addr[i] = i + 1;
+ test_free(expect.addr);
+
+ for (i = 0; i < size; i++) {
+ /*
+ * This may fail if the page was recycled by KFENCE and then
+ * written to again -- this however, is near impossible with a
+ * default config.
+ */
+ KUNIT_EXPECT_EQ(test, expect.addr[i], (char)0);
+
+ if (!i) /* Only check first access to not fail test if page is ever re-protected. */
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+ }
+}
+
+/* Ensure that constructors work properly. */
+static void test_memcache_ctor(struct kunit *test)
+{
+ const size_t size = 32;
+ char *buf;
+ int i;
+
+ setup_test_cache(test, size, 0, ctor_set_x);
+ buf = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+
+ for (i = 0; i < 8; i++)
+ KUNIT_EXPECT_EQ(test, buf[i], (char)'x');
+
+ test_free(buf);
+
+ KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+/* Test that memory is zeroed if requested. */
+static void test_gfpzero(struct kunit *test)
+{
+ const size_t size = PAGE_SIZE; /* PAGE_SIZE so we can use ALLOCATE_ANY. */
+ char *buf1, *buf2;
+ int i;
+
+ if (CONFIG_KFENCE_SAMPLE_INTERVAL > 100) {
+ kunit_warn(test, "skipping ... would take too long\n");
+ return;
+ }
+
+ setup_test_cache(test, size, 0, NULL);
+ buf1 = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ for (i = 0; i < size; i++)
+ buf1[i] = i + 1;
+ test_free(buf1);
+
+ /* Try to get same address again -- this can take a while. */
+ for (i = 0;; i++) {
+ buf2 = test_alloc(test, size, GFP_KERNEL | __GFP_ZERO, ALLOCATE_ANY);
+ if (buf1 == buf2)
+ break;
+ test_free(buf2);
+
+ if (i == CONFIG_KFENCE_NUM_OBJECTS) {
+ kunit_warn(test, "giving up ... cannot get same object back\n");
+ return;
+ }
+ }
+
+ for (i = 0; i < size; i++)
+ KUNIT_EXPECT_EQ(test, buf2[i], (char)0);
+
+ test_free(buf2);
+
+ KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+static void test_invalid_access(struct kunit *test)
+{
+ const struct expect_report expect = {
+ .type = KFENCE_ERROR_INVALID,
+ .fn = test_invalid_access,
+ .addr = &__kfence_pool[10],
+ };
+
+ READ_ONCE(__kfence_pool[10]);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test SLAB_TYPESAFE_BY_RCU works. */
+static void test_memcache_typesafe_by_rcu(struct kunit *test)
+{
+ const size_t size = 32;
+ struct expect_report expect = {
+ .type = KFENCE_ERROR_UAF,
+ .fn = test_memcache_typesafe_by_rcu,
+ };
+
+ setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL);
+ KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
+
+ expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
+ *expect.addr = 42;
+
+ rcu_read_lock();
+ test_free(expect.addr);
+ KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
+ rcu_read_unlock();
+
+ /* No reports yet, memory should not have been freed on access. */
+ KUNIT_EXPECT_FALSE(test, report_available());
+ rcu_barrier(); /* Wait for free to happen. */
+
+ /* Expect use-after-free. */
+ KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test krealloc(). */
+static void test_krealloc(struct kunit *test)
+{
+ const size_t size = 32;
+ const struct expect_report expect = {
+ .type = KFENCE_ERROR_UAF,
+ .fn = test_krealloc,
+ .addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY),
+ };
+ char *buf = expect.addr;
+ int i;
+
+ KUNIT_EXPECT_FALSE(test, test_cache);
+ KUNIT_EXPECT_EQ(test, ksize(buf), size); /* Precise size match after KFENCE alloc. */
+ for (i = 0; i < size; i++)
+ buf[i] = i + 1;
+
+ /* Check that we successfully change the size. */
+ buf = krealloc(buf, size * 3, GFP_KERNEL); /* Grow. */
+ /* Note: Might no longer be a KFENCE alloc. */
+ KUNIT_EXPECT_GE(test, ksize(buf), size * 3);
+ for (i = 0; i < size; i++)
+ KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1));
+ for (; i < size * 3; i++) /* Fill to extra bytes. */
+ buf[i] = i + 1;
+
+ buf = krealloc(buf, size * 2, GFP_KERNEL * 2); /* Shrink. */
+ KUNIT_EXPECT_GE(test, ksize(buf), size * 2);
+ for (i = 0; i < size * 2; i++)
+ KUNIT_EXPECT_EQ(test, buf[i], (char)(i + 1));
+
+ buf = krealloc(buf, 0, GFP_KERNEL); /* Free. */
+ KUNIT_EXPECT_EQ(test, (unsigned long)buf, (unsigned long)ZERO_SIZE_PTR);
+ KUNIT_ASSERT_FALSE(test, report_available()); /* No reports yet! */
+
+ READ_ONCE(*expect.addr); /* Ensure krealloc() actually freed earlier KFENCE object. */
+ KUNIT_ASSERT_TRUE(test, report_matches(&expect));
+}
+
+/* Test that some objects from a bulk allocation belong to KFENCE pool. */
+static void test_memcache_alloc_bulk(struct kunit *test)
+{
+ const size_t size = 32;
+ bool pass = false;
+ unsigned long timeout;
+
+ setup_test_cache(test, size, 0, NULL);
+ KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
+ /*
+ * 100x the sample interval should be more than enough to ensure we get
+ * a KFENCE allocation eventually.
+ */
+ timeout = jiffies + msecs_to_jiffies(100 * CONFIG_KFENCE_SAMPLE_INTERVAL);
+ do {
+ void *objects[100];
+ int i, num = kmem_cache_alloc_bulk(test_cache, GFP_ATOMIC, ARRAY_SIZE(objects),
+ objects);
+ if (!num)
+ continue;
+ for (i = 0; i < ARRAY_SIZE(objects); i++) {
+ if (is_kfence_address(objects[i])) {
+ pass = true;
+ break;
+ }
+ }
+ kmem_cache_free_bulk(test_cache, num, objects);
+ /*
+ * kmem_cache_alloc_bulk() disables interrupts, and calling it
+ * in a tight loop may not give KFENCE a chance to switch the
+ * static branch. Call cond_resched() to let KFENCE chime in.
+ */
+ cond_resched();
+ } while (!pass && time_before(jiffies, timeout));
+
+ KUNIT_EXPECT_TRUE(test, pass);
+ KUNIT_EXPECT_FALSE(test, report_available());
+}
+
+/*
+ * KUnit does not provide a way to provide arguments to tests, and we encode
+ * additional info in the name. Set up 2 tests per test case, one using the
+ * default allocator, and another using a custom memcache (suffix '-memcache').
+ */
+#define KFENCE_KUNIT_CASE(test_name) \
+ { .run_case = test_name, .name = #test_name }, \
+ { .run_case = test_name, .name = #test_name "-memcache" }
+
+static struct kunit_case kfence_test_cases[] = {
+ KFENCE_KUNIT_CASE(test_out_of_bounds_read),
+ KFENCE_KUNIT_CASE(test_use_after_free_read),
+ KFENCE_KUNIT_CASE(test_double_free),
+ KFENCE_KUNIT_CASE(test_invalid_addr_free),
+ KFENCE_KUNIT_CASE(test_free_bulk),
+ KFENCE_KUNIT_CASE(test_init_on_free),
+ KUNIT_CASE(test_kmalloc_aligned_oob_read),
+ KUNIT_CASE(test_kmalloc_aligned_oob_write),
+ KUNIT_CASE(test_shrink_memcache),
+ KUNIT_CASE(test_memcache_ctor),
+ KUNIT_CASE(test_invalid_access),
+ KUNIT_CASE(test_gfpzero),
+ KUNIT_CASE(test_memcache_typesafe_by_rcu),
+ KUNIT_CASE(test_krealloc),
+ KUNIT_CASE(test_memcache_alloc_bulk),
+ {},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+ unsigned long flags;
+ int i;
+
+ spin_lock_irqsave(&observed.lock, flags);
+ for (i = 0; i < ARRAY_SIZE(observed.lines); i++)
+ observed.lines[i][0] = '\0';
+ observed.nlines = 0;
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ /* Any test with 'memcache' in its name will want a memcache. */
+ if (strstr(test->name, "memcache"))
+ test->priv = TEST_PRIV_WANT_MEMCACHE;
+ else
+ test->priv = NULL;
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+ test_cache_destroy();
+}
+
+static struct kunit_suite kfence_test_suite = {
+ .name = "kfence",
+ .test_cases = kfence_test_cases,
+ .init = test_init,
+ .exit = test_exit,
+};
+static struct kunit_suite *kfence_test_suites[] = { &kfence_test_suite, NULL };
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ check_trace_callback_type_console(probe_console);
+ if (!strcmp(tp->name, "console"))
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ if (!strcmp(tp->name, "console"))
+ tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+/*
+ * We only want to do tracepoints setup and teardown once, therefore we have to
+ * customize the init and exit functions and cannot rely on kunit_test_suite().
+ */
+static int __init kfence_test_init(void)
+{
+ /*
+ * Because we want to be able to build the test as a module, we need to
+ * iterate through all known tracepoints, since the static registration
+ * won't work here.
+ */
+ for_each_kernel_tracepoint(register_tracepoints, NULL);
+ return __kunit_test_suites_init(kfence_test_suites);
+}
+
+static void kfence_test_exit(void)
+{
+ __kunit_test_suites_exit(kfence_test_suites);
+ for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+ tracepoint_synchronize_unregister();
+}
+
+late_initcall(kfence_test_init);
+module_exit(kfence_test_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alexander Potapenko <[email protected]>, Marco Elver <[email protected]>");
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:29:00

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 09/10] kfence, Documentation: add KFENCE documentation

Add KFENCE documentation in dev-tools/kfence.rst, and add to index.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
v3:
* Re-introduce reference to Documentation/dev-tools/kfence.rst.

v2:
* Many clarifications based on comments from Andrey Konovalov.
* Document CONFIG_KFENCE_SAMPLE_INTERVAL=0 usage.
* Make use-cases between KASAN and KFENCE clearer.
* Be clearer about the fact the pool is fixed size.
* Update based on reporting changes.
* Explicitly mention max supported allocation size is PAGE_SIZE.
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kfence.rst | 291 +++++++++++++++++++++++++++++
lib/Kconfig.kfence | 2 +
3 files changed, 294 insertions(+)
create mode 100644 Documentation/dev-tools/kfence.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index f7809c7b1ba9..1b1cf4f5c9d9 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -22,6 +22,7 @@ whole; patches welcome!
ubsan
kmemleak
kcsan
+ kfence
gdb-kernel-debugging
kgdb
kselftest
diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst
new file mode 100644
index 000000000000..efe86b1b1074
--- /dev/null
+++ b/Documentation/dev-tools/kfence.rst
@@ -0,0 +1,291 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Kernel Electric-Fence (KFENCE)
+==============================
+
+Kernel Electric-Fence (KFENCE) is a low-overhead sampling-based memory safety
+error detector. KFENCE detects heap out-of-bounds access, use-after-free, and
+invalid-free errors.
+
+KFENCE is designed to be enabled in production kernels, and has near zero
+performance overhead. Compared to KASAN, KFENCE trades performance for
+precision. The main motivation behind KFENCE's design, is that with enough
+total uptime KFENCE will detect bugs in code paths not typically exercised by
+non-production test workloads. One way to quickly achieve a large enough total
+uptime is when the tool is deployed across a large fleet of machines.
+
+Usage
+-----
+
+To enable KFENCE, configure the kernel with::
+
+ CONFIG_KFENCE=y
+
+To build a kernel with KFENCE support, but disabled by default (to enable, set
+``kfence.sample_interval`` to non-zero value), configure the kernel with::
+
+ CONFIG_KFENCE=y
+ CONFIG_KFENCE_SAMPLE_INTERVAL=0
+
+KFENCE provides several other configuration options to customize behaviour (see
+the respective help text in ``lib/Kconfig.kfence`` for more info).
+
+Tuning performance
+~~~~~~~~~~~~~~~~~~
+
+The most important parameter is KFENCE's sample interval, which can be set via
+the kernel boot parameter ``kfence.sample_interval`` in milliseconds. The
+sample interval determines the frequency with which heap allocations will be
+guarded by KFENCE. The default is configurable via the Kconfig option
+``CONFIG_KFENCE_SAMPLE_INTERVAL``. Setting ``kfence.sample_interval=0``
+disables KFENCE.
+
+The KFENCE memory pool is of fixed size, and if the pool is exhausted, no
+further KFENCE allocations occur. With ``CONFIG_KFENCE_NUM_OBJECTS`` (default
+255), the number of available guarded objects can be controlled. Each object
+requires 2 pages, one for the object itself and the other one used as a guard
+page; object pages are interleaved with guard pages, and every object page is
+therefore surrounded by two guard pages.
+
+The total memory dedicated to the KFENCE memory pool can be computed as::
+
+ ( #objects + 1 ) * 2 * PAGE_SIZE
+
+Using the default config, and assuming a page size of 4 KiB, results in
+dedicating 2 MiB to the KFENCE memory pool.
+
+Error reports
+~~~~~~~~~~~~~
+
+A typical out-of-bounds access looks like this::
+
+ ==================================================================
+ BUG: KFENCE: out-of-bounds in test_out_of_bounds_read+0xa3/0x22b
+
+ Out-of-bounds access at 0xffffffffb672efff (left of kfence-#17):
+ test_out_of_bounds_read+0xa3/0x22b
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated in:
+ test_alloc+0xf3/0x25b
+ test_out_of_bounds_read+0x98/0x22b
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+ ==================================================================
+
+The header of the report provides a short summary of the function involved in
+the access. It is followed by more detailed information about the access and
+its origin. Note that, real kernel addresses are only shown for
+``CONFIG_DEBUG_KERNEL=y`` builds.
+
+Use-after-free accesses are reported as::
+
+ ==================================================================
+ BUG: KFENCE: use-after-free in test_use_after_free_read+0xb3/0x143
+
+ Use-after-free access at 0xffffffffb673dfe0 (in kfence-#24):
+ test_use_after_free_read+0xb3/0x143
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated in:
+ test_alloc+0xf3/0x25b
+ test_use_after_free_read+0x76/0x143
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ freed in:
+ test_use_after_free_read+0xa8/0x143
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+ ==================================================================
+
+KFENCE also reports on invalid frees, such as double-frees::
+
+ ==================================================================
+ BUG: KFENCE: invalid free in test_double_free+0xdc/0x171
+
+ Invalid free of 0xffffffffb6741000:
+ test_double_free+0xdc/0x171
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated in:
+ test_alloc+0xf3/0x25b
+ test_double_free+0x76/0x171
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ freed in:
+ test_double_free+0xa8/0x171
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+ ==================================================================
+
+KFENCE also uses pattern-based redzones on the other side of an object's guard
+page, to detect out-of-bounds writes on the unprotected side of the object.
+These are reported on frees::
+
+ ==================================================================
+ BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184
+
+ Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69):
+ test_kmalloc_aligned_oob_write+0xef/0x184
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated in:
+ test_alloc+0xf3/0x25b
+ test_kmalloc_aligned_oob_write+0x57/0x184
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+ ==================================================================
+
+For such errors, the address where the corruption as well as the invalidly
+written bytes (offset from the address) are shown; in this representation, '.'
+denote untouched bytes. In the example above ``0xac`` is the value written to
+the invalid address at offset 0, and the remaining '.' denote that no following
+bytes have been touched. Note that, real values are only shown for
+``CONFIG_DEBUG_KERNEL=y`` builds; to avoid information disclosure for non-debug
+builds, '!' is used instead to denote invalidly written bytes.
+
+And finally, KFENCE may also report on invalid accesses to any protected page
+where it was not possible to determine an associated object, e.g. if adjacent
+object pages had not yet been allocated::
+
+ ==================================================================
+ BUG: KFENCE: invalid access in test_invalid_access+0x26/0xe0
+
+ Invalid access at 0xffffffffb670b00a:
+ test_invalid_access+0x26/0xe0
+ kunit_try_run_case+0x51/0x85
+ kunit_generic_run_threadfn_adapter+0x16/0x30
+ kthread+0x137/0x160
+ ret_from_fork+0x22/0x30
+
+ CPU: 4 PID: 124 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
+ ==================================================================
+
+DebugFS interface
+~~~~~~~~~~~~~~~~~
+
+Some debugging information is exposed via debugfs:
+
+* The file ``/sys/kernel/debug/kfence/stats`` provides runtime statistics.
+
+* The file ``/sys/kernel/debug/kfence/objects`` provides a list of objects
+ allocated via KFENCE, including those already freed but protected.
+
+Implementation Details
+----------------------
+
+Guarded allocations are set up based on the sample interval. After expiration
+of the sample interval, the next allocation through the main allocator (SLAB or
+SLUB) returns a guarded allocation from the KFENCE object pool (allocation
+sizes up to PAGE_SIZE are supported). At this point, the timer is reset, and
+the next allocation is set up after the expiration of the interval. To "gate" a
+KFENCE allocation through the main allocator's fast-path without overhead,
+KFENCE relies on static branches via the static keys infrastructure. The static
+branch is toggled to redirect the allocation to KFENCE.
+
+KFENCE objects each reside on a dedicated page, at either the left or right
+page boundaries selected at random. The pages to the left and right of the
+object page are "guard pages", whose attributes are changed to a protected
+state, and cause page faults on any attempted access. Such page faults are then
+intercepted by KFENCE, which handles the fault gracefully by reporting an
+out-of-bounds access.
+
+To detect out-of-bounds writes to memory within the object's page itself,
+KFENCE also uses pattern-based redzones. For each object page, a redzone is set
+up for all non-object memory. For typical alignments, the redzone is only
+required on the unguarded side of an object. Because KFENCE must honor the
+cache's requested alignment, special alignments may result in unprotected gaps
+on either side of an object, all of which are redzoned.
+
+The following figure illustrates the page layout::
+
+ ---+-----------+-----------+-----------+-----------+-----------+---
+ | xxxxxxxxx | O : | xxxxxxxxx | : O | xxxxxxxxx |
+ | xxxxxxxxx | B : | xxxxxxxxx | : B | xxxxxxxxx |
+ | x GUARD x | J : RED- | x GUARD x | RED- : J | x GUARD x |
+ | xxxxxxxxx | E : ZONE | xxxxxxxxx | ZONE : E | xxxxxxxxx |
+ | xxxxxxxxx | C : | xxxxxxxxx | : C | xxxxxxxxx |
+ | xxxxxxxxx | T : | xxxxxxxxx | : T | xxxxxxxxx |
+ ---+-----------+-----------+-----------+-----------+-----------+---
+
+Upon deallocation of a KFENCE object, the object's page is again protected and
+the object is marked as freed. Any further access to the object causes a fault
+and KFENCE reports a use-after-free access. Freed objects are inserted at the
+tail of KFENCE's freelist, so that the least recently freed objects are reused
+first, and the chances of detecting use-after-frees of recently freed objects
+is increased.
+
+Interface
+---------
+
+The following describes the functions which are used by allocators as well page
+handling code to set up and deal with KFENCE allocations.
+
+.. kernel-doc:: include/linux/kfence.h
+ :functions: is_kfence_address
+ kfence_shutdown_cache
+ kfence_alloc kfence_free
+ kfence_ksize kfence_object_start
+ kfence_handle_page_fault
+
+Related Tools
+-------------
+
+In userspace, a similar approach is taken by `GWP-ASan
+<http://llvm.org/docs/GwpAsan.html>`_. GWP-ASan also relies on guard pages and
+a sampling strategy to detect memory unsafety bugs at scale. KFENCE's design is
+directly influenced by GWP-ASan, and can be seen as its kernel sibling. Another
+similar but non-sampling approach, that also inspired the name "KFENCE", can be
+found in the userspace `Electric Fence Malloc Debugger
+<https://linux.die.net/man/3/efence>`_.
+
+In the kernel, several tools exist to debug memory access errors, and in
+particular KASAN can detect all bug classes that KFENCE can detect. While KASAN
+is more precise, relying on compiler instrumentation, this comes at a
+performance cost.
+
+It is worth highlighting that KASAN and KFENCE are complementary, with
+different target environments. For instance, KASAN is the better debugging-aid,
+where test cases or reproducers exists: due to the lower chance to detect the
+error, it would require more effort using KFENCE to debug. Deployments at scale
+that cannot afford to enable KASAN, however, would benefit from using KFENCE to
+discover bugs due to code paths not exercised by test cases or fuzzers.
diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence
index 6825c1c07a10..872bcbdd8cc4 100644
--- a/lib/Kconfig.kfence
+++ b/lib/Kconfig.kfence
@@ -19,6 +19,8 @@ menuconfig KFENCE
to have negligible cost to permit enabling it in production
environments.

+ See <file:Documentation/dev-tools/kfence.rst> for more details.
+
Note that, KFENCE is not a substitute for explicit testing with tools
such as KASAN. KFENCE can detect a subset of bugs that KASAN can
detect, albeit at very different performance profiles. If you can
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:30:10

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 04/10] mm, kfence: insert KFENCE hooks for SLAB

From: Alexander Potapenko <[email protected]>

Inserts KFENCE hooks into the SLAB allocator.

To pass the originally requested size to KFENCE, add an argument
'orig_size' to slab_alloc*(). The additional argument is required to
preserve the requested original size for kmalloc() allocations, which
uses size classes (e.g. an allocation of 272 bytes will return an object
of size 512). Therefore, kmem_cache::size does not represent the
kmalloc-caller's requested size, and we must introduce the argument
'orig_size' to propagate the originally requested size to KFENCE.

Without the originally requested size, we would not be able to detect
out-of-bounds accesses for objects placed at the end of a KFENCE object
page if that object is not equal to the kmalloc-size class it was
bucketed into.

When KFENCE is disabled, there is no additional overhead, since
slab_alloc*() functions are __always_inline.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Marco Elver <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
v3:
* Rewrite patch description to clarify need for 'orig_size'
[reported by Christopher Lameter].
---
mm/slab.c | 46 ++++++++++++++++++++++++++++++++++------------
mm/slab_common.c | 6 +++++-
2 files changed, 39 insertions(+), 13 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 3160dff6fd76..30aba06ae02b 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -100,6 +100,7 @@
#include <linux/seq_file.h>
#include <linux/notifier.h>
#include <linux/kallsyms.h>
+#include <linux/kfence.h>
#include <linux/cpu.h>
#include <linux/sysctl.h>
#include <linux/module.h>
@@ -3206,7 +3207,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
}

static __always_inline void *
-slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
+slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, size_t orig_size,
unsigned long caller)
{
unsigned long save_flags;
@@ -3219,6 +3220,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
if (unlikely(!cachep))
return NULL;

+ ptr = kfence_alloc(cachep, orig_size, flags);
+ if (unlikely(ptr))
+ goto out_hooks;
+
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);

@@ -3251,6 +3256,7 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
if (unlikely(slab_want_init_on_alloc(flags, cachep)) && ptr)
memset(ptr, 0, cachep->object_size);

+out_hooks:
slab_post_alloc_hook(cachep, objcg, flags, 1, &ptr);
return ptr;
}
@@ -3288,7 +3294,7 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
#endif /* CONFIG_NUMA */

static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, gfp_t flags, size_t orig_size, unsigned long caller)
{
unsigned long save_flags;
void *objp;
@@ -3299,6 +3305,10 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
if (unlikely(!cachep))
return NULL;

+ objp = kfence_alloc(cachep, orig_size, flags);
+ if (unlikely(objp))
+ goto leave;
+
cache_alloc_debugcheck_before(cachep, flags);
local_irq_save(save_flags);
objp = __do_cache_alloc(cachep, flags);
@@ -3309,6 +3319,7 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
if (unlikely(slab_want_init_on_alloc(flags, cachep)) && objp)
memset(objp, 0, cachep->object_size);

+leave:
slab_post_alloc_hook(cachep, objcg, flags, 1, &objp);
return objp;
}
@@ -3414,6 +3425,11 @@ static void cache_flusharray(struct kmem_cache *cachep, struct array_cache *ac)
static __always_inline void __cache_free(struct kmem_cache *cachep, void *objp,
unsigned long caller)
{
+ if (kfence_free(objp)) {
+ kmemleak_free_recursive(objp, cachep->flags);
+ return;
+ }
+
/* Put the object into the quarantine, don't touch it for now. */
if (kasan_slab_free(cachep, objp, _RET_IP_))
return;
@@ -3479,7 +3495,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
*/
void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
- void *ret = slab_alloc(cachep, flags, _RET_IP_);
+ void *ret = slab_alloc(cachep, flags, cachep->object_size, _RET_IP_);

trace_kmem_cache_alloc(_RET_IP_, ret,
cachep->object_size, cachep->size, flags);
@@ -3512,7 +3528,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,

local_irq_disable();
for (i = 0; i < size; i++) {
- void *objp = __do_cache_alloc(s, flags);
+ void *objp = kfence_alloc(s, s->object_size, flags) ?: __do_cache_alloc(s, flags);

if (unlikely(!objp))
goto error;
@@ -3545,7 +3561,7 @@ kmem_cache_alloc_trace(struct kmem_cache *cachep, gfp_t flags, size_t size)
{
void *ret;

- ret = slab_alloc(cachep, flags, _RET_IP_);
+ ret = slab_alloc(cachep, flags, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(_RET_IP_, ret,
@@ -3571,7 +3587,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_trace);
*/
void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
{
- void *ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
+ void *ret = slab_alloc_node(cachep, flags, nodeid, cachep->object_size, _RET_IP_);

trace_kmem_cache_alloc_node(_RET_IP_, ret,
cachep->object_size, cachep->size,
@@ -3589,7 +3605,7 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
{
void *ret;

- ret = slab_alloc_node(cachep, flags, nodeid, _RET_IP_);
+ ret = slab_alloc_node(cachep, flags, nodeid, size, _RET_IP_);

ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc_node(_RET_IP_, ret,
@@ -3650,7 +3666,7 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
cachep = kmalloc_slab(size, flags);
if (unlikely(ZERO_OR_NULL_PTR(cachep)))
return cachep;
- ret = slab_alloc(cachep, flags, caller);
+ ret = slab_alloc(cachep, flags, size, caller);

ret = kasan_kmalloc(cachep, ret, size, flags);
trace_kmalloc(caller, ret,
@@ -4138,18 +4154,24 @@ void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
bool to_user)
{
struct kmem_cache *cachep;
- unsigned int objnr;
+ unsigned int objnr = 0;
unsigned long offset;
+ bool is_kfence = is_kfence_address(ptr);

ptr = kasan_reset_tag(ptr);

/* Find and validate object. */
cachep = page->slab_cache;
- objnr = obj_to_index(cachep, page, (void *)ptr);
- BUG_ON(objnr >= cachep->num);
+ if (!is_kfence) {
+ objnr = obj_to_index(cachep, page, (void *)ptr);
+ BUG_ON(objnr >= cachep->num);
+ }

/* Find offset within object. */
- offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);
+ if (is_kfence_address(ptr))
+ offset = ptr - kfence_object_start(ptr);
+ else
+ offset = ptr - index_to_obj(cachep, page, objnr) - obj_offset(cachep);

/* Allow address range falling entirely within usercopy region. */
if (offset >= cachep->useroffset &&
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f9ccd5dc13f3..6e35e273681a 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -12,6 +12,7 @@
#include <linux/memory.h>
#include <linux/cache.h>
#include <linux/compiler.h>
+#include <linux/kfence.h>
#include <linux/module.h>
#include <linux/cpu.h>
#include <linux/uaccess.h>
@@ -448,6 +449,9 @@ static int shutdown_cache(struct kmem_cache *s)
/* free asan quarantined objects */
kasan_cache_shutdown(s);

+ if (!kfence_shutdown_cache(s))
+ return -EBUSY;
+
if (__kmem_cache_shutdown(s) != 0)
return -EBUSY;

@@ -1171,7 +1175,7 @@ size_t ksize(const void *objp)
if (unlikely(ZERO_OR_NULL_PTR(objp)) || !__kasan_check_read(objp, 1))
return 0;

- size = __ksize(objp);
+ size = kfence_ksize(objp) ?: __ksize(objp);
/*
* We assume that ksize callers could use whole allocated area,
* so we need to unpoison this area.
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:30:32

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 02/10] x86, kfence: enable KFENCE for x86

From: Alexander Potapenko <[email protected]>

Add architecture specific implementation details for KFENCE and enable
KFENCE for the x86 architecture. In particular, this implements the
required interface in <asm/kfence.h> for setting up the pool and
providing helper functions for protecting and unprotecting pages.

For x86, we need to ensure that the pool uses 4K pages, which is done
using the set_memory_4k() helper function.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Marco Elver <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
arch/x86/Kconfig | 2 ++
arch/x86/include/asm/kfence.h | 60 +++++++++++++++++++++++++++++++++++
arch/x86/mm/fault.c | 4 +++
3 files changed, 66 insertions(+)
create mode 100644 arch/x86/include/asm/kfence.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7101ac64bb20..e22dc722698c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -144,6 +144,8 @@ config X86
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if X86_64
+ select HAVE_ARCH_KFENCE
+ select HAVE_ARCH_KFENCE_STATIC_POOL
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h
new file mode 100644
index 000000000000..cf09e377faf9
--- /dev/null
+++ b/arch/x86/include/asm/kfence.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _ASM_X86_KFENCE_H
+#define _ASM_X86_KFENCE_H
+
+#include <linux/bug.h>
+#include <linux/kfence.h>
+
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/set_memory.h>
+#include <asm/tlbflush.h>
+
+/* The alignment should be at least a 4K page. */
+#define KFENCE_POOL_ALIGNMENT PAGE_SIZE
+
+/*
+ * The page fault handler entry function, up to which the stack trace is
+ * truncated in reports.
+ */
+#define KFENCE_SKIP_ARCH_FAULT_HANDLER "asm_exc_page_fault"
+
+/* Force 4K pages for __kfence_pool. */
+static inline bool arch_kfence_initialize_pool(void)
+{
+ unsigned long addr;
+
+ for (addr = (unsigned long)__kfence_pool; is_kfence_address((void *)addr);
+ addr += PAGE_SIZE) {
+ unsigned int level;
+
+ if (!lookup_address(addr, &level))
+ return false;
+
+ if (level != PG_LEVEL_4K)
+ set_memory_4k(addr, 1);
+ }
+
+ return true;
+}
+
+/* Protect the given page and flush TLBs. */
+static inline bool kfence_protect_page(unsigned long addr, bool protect)
+{
+ unsigned int level;
+ pte_t *pte = lookup_address(addr, &level);
+
+ if (!pte || level != PG_LEVEL_4K)
+ return false;
+
+ if (protect)
+ set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
+ else
+ set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
+
+ flush_tlb_one_kernel(addr);
+ return true;
+}
+
+#endif /* _ASM_X86_KFENCE_H */
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 6e3e8a124903..423e15ad5eb6 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -9,6 +9,7 @@
#include <linux/kdebug.h> /* oops_begin/end, ... */
#include <linux/extable.h> /* search_exception_tables */
#include <linux/memblock.h> /* max_low_pfn */
+#include <linux/kfence.h> /* kfence_handle_page_fault */
#include <linux/kprobes.h> /* NOKPROBE_SYMBOL, ... */
#include <linux/mmiotrace.h> /* kmmio_handler, ... */
#include <linux/perf_event.h> /* perf_sw_event */
@@ -701,6 +702,9 @@ no_context(struct pt_regs *regs, unsigned long error_code,
}
#endif

+ if (kfence_handle_page_fault(address))
+ return;
+
/*
* 32-bit:
*
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:31:20

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 08/10] kfence, lockdep: make KFENCE compatible with lockdep

Lockdep checks that dynamic key registration is only performed on keys
that are not static objects. With KFENCE, it is possible that such a
dynamically allocated key is a KFENCE object which may, however, be
allocated from a static memory pool (if HAVE_ARCH_KFENCE_STATIC_POOL).

Therefore, ignore KFENCE-allocated objects in static_obj().

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
kernel/locking/lockdep.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 54b74fabf40c..0cf5d5ecbd31 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -38,6 +38,7 @@
#include <linux/seq_file.h>
#include <linux/spinlock.h>
#include <linux/kallsyms.h>
+#include <linux/kfence.h>
#include <linux/interrupt.h>
#include <linux/stacktrace.h>
#include <linux/debug_locks.h>
@@ -755,6 +756,13 @@ static int static_obj(const void *obj)
if (arch_is_kernel_initmem_freed(addr))
return 0;

+ /*
+ * KFENCE objects may be allocated from a static memory pool, but are
+ * not actually static objects.
+ */
+ if (is_kfence_address(obj))
+ return 0;
+
/*
* static variable?
*/
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:31:42

by Marco Elver

[permalink] [raw]
Subject: [PATCH v3 06/10] kfence, kasan: make KFENCE compatible with KASAN

From: Alexander Potapenko <[email protected]>

We make KFENCE compatible with KASAN for testing KFENCE itself. In
particular, KASAN helps to catch any potential corruptions to KFENCE
state, or other corruptions that may be a result of freepointer
corruptions in the main allocators.

To indicate that the combination of the two is generally discouraged,
CONFIG_EXPERT=y should be set. It also gives us the nice property that
KFENCE will be build-tested by allyesconfig builds.

Reviewed-by: Dmitry Vyukov <[email protected]>
Co-developed-by: Marco Elver <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
lib/Kconfig.kfence | 2 +-
mm/kasan/common.c | 7 +++++++
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence
index 4c2ea1c722de..6825c1c07a10 100644
--- a/lib/Kconfig.kfence
+++ b/lib/Kconfig.kfence
@@ -10,7 +10,7 @@ config HAVE_ARCH_KFENCE_STATIC_POOL

menuconfig KFENCE
bool "KFENCE: low-overhead sampling-based memory safety error detector"
- depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
+ depends on HAVE_ARCH_KFENCE && (!KASAN || EXPERT) && (SLAB || SLUB)
depends on JUMP_LABEL # To ensure performance, require jump labels
select STACKTRACE
help
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 950fd372a07e..f5c49f0fdeff 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -18,6 +18,7 @@
#include <linux/init.h>
#include <linux/kasan.h>
#include <linux/kernel.h>
+#include <linux/kfence.h>
#include <linux/kmemleak.h>
#include <linux/linkage.h>
#include <linux/memblock.h>
@@ -396,6 +397,9 @@ static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
tagged_object = object;
object = reset_tag(object);

+ if (is_kfence_address(object))
+ return false;
+
if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
object)) {
kasan_report_invalid_free(tagged_object, ip);
@@ -444,6 +448,9 @@ static void *__kasan_kmalloc(struct kmem_cache *cache, const void *object,
if (unlikely(object == NULL))
return NULL;

+ if (is_kfence_address(object))
+ return (void *)object;
+
redzone_start = round_up((unsigned long)(object + size),
KASAN_SHADOW_SCALE_SIZE);
redzone_end = round_up((unsigned long)object + cache->object_size,
--
2.28.0.681.g6f77f65b4e-goog

2020-09-21 13:42:20

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] KFENCE: A low-overhead sampling-based memory safety error detector

On Mon, Sep 21, 2020 at 3:26 PM Marco Elver <[email protected]> wrote:
>
> This adds the Kernel Electric-Fence (KFENCE) infrastructure. KFENCE is a
> low-overhead sampling-based memory safety error detector of heap
> use-after-free, invalid-free, and out-of-bounds access errors. This
> series enables KFENCE for the x86 and arm64 architectures, and adds
> KFENCE hooks to the SLAB and SLUB allocators.

Hi Andrew,

I wanted to ask what we can expect with respect to the timeline of
merging this into mm/upstream? The series got few reviews/positive
feedback.

Thank you



> KFENCE is designed to be enabled in production kernels, and has near
> zero performance overhead. Compared to KASAN, KFENCE trades performance
> for precision. The main motivation behind KFENCE's design, is that with
> enough total uptime KFENCE will detect bugs in code paths not typically
> exercised by non-production test workloads. One way to quickly achieve a
> large enough total uptime is when the tool is deployed across a large
> fleet of machines.
>
> KFENCE objects each reside on a dedicated page, at either the left or
> right page boundaries. The pages to the left and right of the object
> page are "guard pages", whose attributes are changed to a protected
> state, and cause page faults on any attempted access to them. Such page
> faults are then intercepted by KFENCE, which handles the fault
> gracefully by reporting a memory access error.
>
> Guarded allocations are set up based on a sample interval (can be set
> via kfence.sample_interval). After expiration of the sample interval,
> the next allocation through the main allocator (SLAB or SLUB) returns a
> guarded allocation from the KFENCE object pool. At this point, the timer
> is reset, and the next allocation is set up after the expiration of the
> interval.
>
> To enable/disable a KFENCE allocation through the main allocator's
> fast-path without overhead, KFENCE relies on static branches via the
> static keys infrastructure. The static branch is toggled to redirect the
> allocation to KFENCE.
>
> The KFENCE memory pool is of fixed size, and if the pool is exhausted no
> further KFENCE allocations occur. The default config is conservative
> with only 255 objects, resulting in a pool size of 2 MiB (with 4 KiB
> pages).
>
> We have verified by running synthetic benchmarks (sysbench I/O,
> hackbench) that a kernel with KFENCE is performance-neutral compared to
> a non-KFENCE baseline kernel.
>
> KFENCE is inspired by GWP-ASan [1], a userspace tool with similar
> properties. The name "KFENCE" is a homage to the Electric Fence Malloc
> Debugger [2].
>
> For more details, see Documentation/dev-tools/kfence.rst added in the
> series -- also viewable here:
>
> https://raw.githubusercontent.com/google/kasan/kfence/Documentation/dev-tools/kfence.rst
>
> [1] http://llvm.org/docs/GwpAsan.html
> [2] https://linux.die.net/man/3/efence
>
> v3:
> * Rewrite SLAB/SLUB patch descriptions to clarify need for 'orig_size'.
> * Various smaller fixes (see details in patches).
>
> v2: https://lkml.kernel.org/r/[email protected]
> * Various comment/documentation changes (see details in patches).
> * Various smaller fixes (see details in patches).
> * Change all reports to reference the kfence object, "kfence-#nn".
> * Skip allocation/free internals stack trace.
> * Rework KMEMLEAK compatibility patch.
>
> RFC/v1: https://lkml.kernel.org/r/[email protected]
>
> Alexander Potapenko (6):
> mm: add Kernel Electric-Fence infrastructure
> x86, kfence: enable KFENCE for x86
> mm, kfence: insert KFENCE hooks for SLAB
> mm, kfence: insert KFENCE hooks for SLUB
> kfence, kasan: make KFENCE compatible with KASAN
> kfence, kmemleak: make KFENCE compatible with KMEMLEAK
>
> Marco Elver (4):
> arm64, kfence: enable KFENCE for ARM64
> kfence, lockdep: make KFENCE compatible with lockdep
> kfence, Documentation: add KFENCE documentation
> kfence: add test suite
>
> Documentation/dev-tools/index.rst | 1 +
> Documentation/dev-tools/kfence.rst | 291 +++++++++++
> MAINTAINERS | 11 +
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/kfence.h | 39 ++
> arch/arm64/mm/fault.c | 4 +
> arch/x86/Kconfig | 2 +
> arch/x86/include/asm/kfence.h | 60 +++
> arch/x86/mm/fault.c | 4 +
> include/linux/kfence.h | 174 +++++++
> init/main.c | 2 +
> kernel/locking/lockdep.c | 8 +
> lib/Kconfig.debug | 1 +
> lib/Kconfig.kfence | 78 +++
> mm/Makefile | 1 +
> mm/kasan/common.c | 7 +
> mm/kfence/Makefile | 6 +
> mm/kfence/core.c | 733 +++++++++++++++++++++++++++
> mm/kfence/kfence.h | 102 ++++
> mm/kfence/kfence_test.c | 777 +++++++++++++++++++++++++++++
> mm/kfence/report.c | 219 ++++++++
> mm/kmemleak.c | 6 +
> mm/slab.c | 46 +-
> mm/slab_common.c | 6 +-
> mm/slub.c | 72 ++-
> 25 files changed, 2619 insertions(+), 32 deletions(-)
> create mode 100644 Documentation/dev-tools/kfence.rst
> create mode 100644 arch/arm64/include/asm/kfence.h
> create mode 100644 arch/x86/include/asm/kfence.h
> create mode 100644 include/linux/kfence.h
> create mode 100644 lib/Kconfig.kfence
> create mode 100644 mm/kfence/Makefile
> create mode 100644 mm/kfence/core.c
> create mode 100644 mm/kfence/kfence.h
> create mode 100644 mm/kfence/kfence_test.c
> create mode 100644 mm/kfence/report.c
>
> --
> 2.28.0.681.g6f77f65b4e-goog
>

2020-09-21 14:33:01

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> Add architecture specific implementation details for KFENCE and enable
> KFENCE for the arm64 architecture. In particular, this implements the
> required interface in <asm/kfence.h>. Currently, the arm64 version does
> not yet use a statically allocated memory pool, at the cost of a pointer
> load for each is_kfence_address().
>
> Reviewed-by: Dmitry Vyukov <[email protected]>
> Co-developed-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> For ARM64, we would like to solicit feedback on what the best option is
> to obtain a constant address for __kfence_pool. One option is to declare
> a memory range in the memory layout to be dedicated to KFENCE (like is
> done for KASAN), however, it is unclear if this is the best available
> option. We would like to avoid touching the memory layout.

Sorry for the delay on this.

Given that the pool is relatively small (i.e. when compared with our virtual
address space), dedicating an area of virtual space sounds like it makes
the most sense here. How early do you need it to be available?

An alternative approach would be to patch in the address at runtime, with
something like a static key to swizzle off the direct __kfence_pool load
once we're up and running.

Will

2020-09-21 15:00:19

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, Sep 21, 2020 at 4:31 PM Will Deacon <[email protected]> wrote:
>
> On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> > Add architecture specific implementation details for KFENCE and enable
> > KFENCE for the arm64 architecture. In particular, this implements the
> > required interface in <asm/kfence.h>. Currently, the arm64 version does
> > not yet use a statically allocated memory pool, at the cost of a pointer
> > load for each is_kfence_address().
> >
> > Reviewed-by: Dmitry Vyukov <[email protected]>
> > Co-developed-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
> > ---
> > For ARM64, we would like to solicit feedback on what the best option is
> > to obtain a constant address for __kfence_pool. One option is to declare
> > a memory range in the memory layout to be dedicated to KFENCE (like is
> > done for KASAN), however, it is unclear if this is the best available
> > option. We would like to avoid touching the memory layout.
>
> Sorry for the delay on this.

NP, thanks for looking!

> Given that the pool is relatively small (i.e. when compared with our virtual
> address space), dedicating an area of virtual space sounds like it makes
> the most sense here. How early do you need it to be available?

Yes, having a dedicated address sounds good.
We're inserting kfence_init() into start_kernel() after timekeeping_init().
So way after mm_init(), if that matters.

> An alternative approach would be to patch in the address at runtime, with
> something like a static key to swizzle off the direct __kfence_pool load
> once we're up and running.

IIUC there's no such thing as address patching in the kernel at the
moment, at least static keys work differently?
I am not sure how much we need to randomize this address range (we
don't on x86 anyway).

> Will
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20200921143059.GO2139%40willie-the-truck.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2020-09-21 15:40:55

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, Sep 21, 2020 at 4:58 PM Alexander Potapenko <[email protected]> wrote:
>
> On Mon, Sep 21, 2020 at 4:31 PM Will Deacon <[email protected]> wrote:
> >
> > On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> > > Add architecture specific implementation details for KFENCE and enable
> > > KFENCE for the arm64 architecture. In particular, this implements the
> > > required interface in <asm/kfence.h>. Currently, the arm64 version does
> > > not yet use a statically allocated memory pool, at the cost of a pointer
> > > load for each is_kfence_address().
> > >
> > > Reviewed-by: Dmitry Vyukov <[email protected]>
> > > Co-developed-by: Alexander Potapenko <[email protected]>
> > > Signed-off-by: Alexander Potapenko <[email protected]>
> > > Signed-off-by: Marco Elver <[email protected]>
> > > ---
> > > For ARM64, we would like to solicit feedback on what the best option is
> > > to obtain a constant address for __kfence_pool. One option is to declare
> > > a memory range in the memory layout to be dedicated to KFENCE (like is
> > > done for KASAN), however, it is unclear if this is the best available
> > > option. We would like to avoid touching the memory layout.
> >
> > Sorry for the delay on this.
>
> NP, thanks for looking!
>
> > Given that the pool is relatively small (i.e. when compared with our virtual
> > address space), dedicating an area of virtual space sounds like it makes
> > the most sense here. How early do you need it to be available?
>
> Yes, having a dedicated address sounds good.
> We're inserting kfence_init() into start_kernel() after timekeeping_init().
> So way after mm_init(), if that matters.

The question is though, how big should that dedicated area be?
Right now KFENCE_NUM_OBJECTS can be up to 16383 (which makes the pool
size 64MB), but this number actually comes from the limitation on
static objects, so we might want to increase that number on arm64.

2020-09-21 17:18:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 10/10] kfence: add test suite

On Mon, Sep 21, 2020 at 03:26:11PM +0200, Marco Elver wrote:
> Add KFENCE test suite, testing various error detection scenarios. Makes
> use of KUnit for test organization. Since KFENCE's interface to obtain
> error reports is via the console, the test verifies that KFENCE outputs
> expected reports to the console.
>
> Reviewed-by: Dmitry Vyukov <[email protected]>
> Co-developed-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>

[ . . . ]

> +/* Test SLAB_TYPESAFE_BY_RCU works. */
> +static void test_memcache_typesafe_by_rcu(struct kunit *test)
> +{
> + const size_t size = 32;
> + struct expect_report expect = {
> + .type = KFENCE_ERROR_UAF,
> + .fn = test_memcache_typesafe_by_rcu,
> + };
> +
> + setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL);
> + KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
> +
> + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
> + *expect.addr = 42;
> +
> + rcu_read_lock();
> + test_free(expect.addr);
> + KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
> + rcu_read_unlock();

It won't happen very often, but memory really could be freed at this point,
especially in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels ...

> + /* No reports yet, memory should not have been freed on access. */
> + KUNIT_EXPECT_FALSE(test, report_available());

... so the above statement needs to go before the rcu_read_unlock().

> + rcu_barrier(); /* Wait for free to happen. */

But you are quite right that the memory is not -guaranteed- to be freed
until we get here.

Thanx, Paul

2020-09-21 17:41:00

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 10/10] kfence: add test suite

On Mon, 21 Sep 2020 at 19:13, Paul E. McKenney <[email protected]> wrote:
>
> On Mon, Sep 21, 2020 at 03:26:11PM +0200, Marco Elver wrote:
> > Add KFENCE test suite, testing various error detection scenarios. Makes
> > use of KUnit for test organization. Since KFENCE's interface to obtain
> > error reports is via the console, the test verifies that KFENCE outputs
> > expected reports to the console.
> >
> > Reviewed-by: Dmitry Vyukov <[email protected]>
> > Co-developed-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
>
> [ . . . ]
>
> > +/* Test SLAB_TYPESAFE_BY_RCU works. */
> > +static void test_memcache_typesafe_by_rcu(struct kunit *test)
> > +{
> > + const size_t size = 32;
> > + struct expect_report expect = {
> > + .type = KFENCE_ERROR_UAF,
> > + .fn = test_memcache_typesafe_by_rcu,
> > + };
> > +
> > + setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL);
> > + KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
> > +
> > + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
> > + *expect.addr = 42;
> > +
> > + rcu_read_lock();
> > + test_free(expect.addr);
> > + KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
> > + rcu_read_unlock();
>
> It won't happen very often, but memory really could be freed at this point,
> especially in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels ...

Ah, thanks for pointing it out.

> > + /* No reports yet, memory should not have been freed on access. */
> > + KUNIT_EXPECT_FALSE(test, report_available());
>
> ... so the above statement needs to go before the rcu_read_unlock().

You mean the comment (and not the KUNIT_EXPECT_FALSE that no reports
were generated), correct?

Admittedly, the whole comment is a bit imprecise, so I'll reword.

> > + rcu_barrier(); /* Wait for free to happen. */
>
> But you are quite right that the memory is not -guaranteed- to be freed
> until we get here.

Right, I'll update the comment.

Thanks,
-- Marco

2020-09-21 17:47:56

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, Sep 21, 2020 at 05:37:10PM +0200, Alexander Potapenko wrote:
> On Mon, Sep 21, 2020 at 4:58 PM Alexander Potapenko <[email protected]> wrote:
> >
> > On Mon, Sep 21, 2020 at 4:31 PM Will Deacon <[email protected]> wrote:
> > >
> > > On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> > > > Add architecture specific implementation details for KFENCE and enable
> > > > KFENCE for the arm64 architecture. In particular, this implements the
> > > > required interface in <asm/kfence.h>. Currently, the arm64 version does
> > > > not yet use a statically allocated memory pool, at the cost of a pointer
> > > > load for each is_kfence_address().
> > > >
> > > > Reviewed-by: Dmitry Vyukov <[email protected]>
> > > > Co-developed-by: Alexander Potapenko <[email protected]>
> > > > Signed-off-by: Alexander Potapenko <[email protected]>
> > > > Signed-off-by: Marco Elver <[email protected]>
> > > > ---
> > > > For ARM64, we would like to solicit feedback on what the best option is
> > > > to obtain a constant address for __kfence_pool. One option is to declare
> > > > a memory range in the memory layout to be dedicated to KFENCE (like is
> > > > done for KASAN), however, it is unclear if this is the best available
> > > > option. We would like to avoid touching the memory layout.
> > >
> > > Sorry for the delay on this.
> >
> > NP, thanks for looking!
> >
> > > Given that the pool is relatively small (i.e. when compared with our virtual
> > > address space), dedicating an area of virtual space sounds like it makes
> > > the most sense here. How early do you need it to be available?
> >
> > Yes, having a dedicated address sounds good.
> > We're inserting kfence_init() into start_kernel() after timekeeping_init().
> > So way after mm_init(), if that matters.
>
> The question is though, how big should that dedicated area be?
> Right now KFENCE_NUM_OBJECTS can be up to 16383 (which makes the pool
> size 64MB), but this number actually comes from the limitation on
> static objects, so we might want to increase that number on arm64.

What happens on x86 and why would we do something different?

Will

2020-09-21 17:49:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 10/10] kfence: add test suite

On Mon, Sep 21, 2020 at 07:37:13PM +0200, Marco Elver wrote:
> On Mon, 21 Sep 2020 at 19:13, Paul E. McKenney <[email protected]> wrote:
> >
> > On Mon, Sep 21, 2020 at 03:26:11PM +0200, Marco Elver wrote:
> > > Add KFENCE test suite, testing various error detection scenarios. Makes
> > > use of KUnit for test organization. Since KFENCE's interface to obtain
> > > error reports is via the console, the test verifies that KFENCE outputs
> > > expected reports to the console.
> > >
> > > Reviewed-by: Dmitry Vyukov <[email protected]>
> > > Co-developed-by: Alexander Potapenko <[email protected]>
> > > Signed-off-by: Alexander Potapenko <[email protected]>
> > > Signed-off-by: Marco Elver <[email protected]>
> >
> > [ . . . ]
> >
> > > +/* Test SLAB_TYPESAFE_BY_RCU works. */
> > > +static void test_memcache_typesafe_by_rcu(struct kunit *test)
> > > +{
> > > + const size_t size = 32;
> > > + struct expect_report expect = {
> > > + .type = KFENCE_ERROR_UAF,
> > > + .fn = test_memcache_typesafe_by_rcu,
> > > + };
> > > +
> > > + setup_test_cache(test, size, SLAB_TYPESAFE_BY_RCU, NULL);
> > > + KUNIT_EXPECT_TRUE(test, test_cache); /* Want memcache. */
> > > +
> > > + expect.addr = test_alloc(test, size, GFP_KERNEL, ALLOCATE_ANY);
> > > + *expect.addr = 42;
> > > +
> > > + rcu_read_lock();
> > > + test_free(expect.addr);
> > > + KUNIT_EXPECT_EQ(test, *expect.addr, (char)42);
> > > + rcu_read_unlock();
> >
> > It won't happen very often, but memory really could be freed at this point,
> > especially in CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels ...
>
> Ah, thanks for pointing it out.
>
> > > + /* No reports yet, memory should not have been freed on access. */
> > > + KUNIT_EXPECT_FALSE(test, report_available());
> >
> > ... so the above statement needs to go before the rcu_read_unlock().
>
> You mean the comment (and not the KUNIT_EXPECT_FALSE that no reports
> were generated), correct?
>
> Admittedly, the whole comment is a bit imprecise, so I'll reword.

I freely confess that I did not research exactly what might generate
a report. But if this KUNIT_EXPECT_FALSE() was just verifying that the
previous KUNIT_EXPECT_TRUE() did not trigger, then yes, the code is just
fine as it is.

Thanx, Paul

> > > + rcu_barrier(); /* Wait for free to happen. */
> >
> > But you are quite right that the memory is not -guaranteed- to be freed
> > until we get here.
>
> Right, I'll update the comment.
>
> Thanks,
> -- Marco

2020-09-22 09:58:18

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, 21 Sep 2020 at 19:44, Will Deacon <[email protected]> wrote:
[...]
> > > > > For ARM64, we would like to solicit feedback on what the best option is
> > > > > to obtain a constant address for __kfence_pool. One option is to declare
> > > > > a memory range in the memory layout to be dedicated to KFENCE (like is
> > > > > done for KASAN), however, it is unclear if this is the best available
> > > > > option. We would like to avoid touching the memory layout.
> > > >
> > > > Sorry for the delay on this.
> > >
> > > NP, thanks for looking!
> > >
> > > > Given that the pool is relatively small (i.e. when compared with our virtual
> > > > address space), dedicating an area of virtual space sounds like it makes
> > > > the most sense here. How early do you need it to be available?
> > >
> > > Yes, having a dedicated address sounds good.
> > > We're inserting kfence_init() into start_kernel() after timekeeping_init().
> > > So way after mm_init(), if that matters.
> >
> > The question is though, how big should that dedicated area be?
> > Right now KFENCE_NUM_OBJECTS can be up to 16383 (which makes the pool
> > size 64MB), but this number actually comes from the limitation on
> > static objects, so we might want to increase that number on arm64.
>
> What happens on x86 and why would we do something different?

On x86 we just do `char __kfence_pool[KFENCE_POOL_SIZE] ...;` to
statically allocate the pool. On arm64 this doesn't seem to work
because static memory doesn't have struct pages?

Thanks,
-- Marco

2020-09-25 15:28:48

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

Will,

> Given that the pool is relatively small (i.e. when compared with our virtual
> address space), dedicating an area of virtual space sounds like it makes
> the most sense here. How early do you need it to be available?

How do we assign struct pages to a fixed virtual space area (I'm
currently experimenting with 0xffff7f0000000000-0xffff7f0000200000)?
Looks like filling page table entries (similarly to what's being done
in arch/arm64/mm/kasan_init.c) is not enough.
I thought maybe vmemmap_populate() would do the job, but it didn't
(virt_to_pfn() still returns invalid PFNs).

2020-09-28 11:57:50

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, 21 Sep 2020 at 16:31, Will Deacon <[email protected]> wrote:
> On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> > Add architecture specific implementation details for KFENCE and enable
> > KFENCE for the arm64 architecture. In particular, this implements the
> > required interface in <asm/kfence.h>. Currently, the arm64 version does
> > not yet use a statically allocated memory pool, at the cost of a pointer
> > load for each is_kfence_address().
[...]
> > For ARM64, we would like to solicit feedback on what the best option is
> > to obtain a constant address for __kfence_pool. One option is to declare
> > a memory range in the memory layout to be dedicated to KFENCE (like is
> > done for KASAN), however, it is unclear if this is the best available
> > option. We would like to avoid touching the memory layout.

> Given that the pool is relatively small (i.e. when compared with our virtual
> address space), dedicating an area of virtual space sounds like it makes
> the most sense here. How early do you need it to be available?

Note: we're going to send a v4 this or next week with a few other
minor fixes in it. But I think we just don't want to block the entire
series on figuring out what the static-pool arm64 version should do,
especially if we'll have a few iterations with only this patch here
changing.

So the plan will be:

1. Send v4, which could from our point-of-view be picked up for
merging. Unless of course there are more comments.

2. Work out the details for the static-pool arm64 version, since it
doesn't seem trivial to do the same thing as we do for x86. In
preparation for that, v4 will allow the __kfence_pool's attributes to
be defined entirely by <asm/kfence.h>, so that we can fiddle with
sections etc.

3. Send patch switching out the simpler arm64 version here for one
that places __kfence_pool at a static location.

Hopefully that plan is reasonable.

Thanks,
-- Marco

2020-09-29 12:24:42

by Andrey Konovalov

[permalink] [raw]
Subject: Re: [PATCH v3 06/10] kfence, kasan: make KFENCE compatible with KASAN

On Mon, Sep 21, 2020 at 3:26 PM Marco Elver <[email protected]> wrote:
>
> From: Alexander Potapenko <[email protected]>
>
> We make KFENCE compatible with KASAN for testing KFENCE itself. In
> particular, KASAN helps to catch any potential corruptions to KFENCE
> state, or other corruptions that may be a result of freepointer
> corruptions in the main allocators.
>
> To indicate that the combination of the two is generally discouraged,
> CONFIG_EXPERT=y should be set. It also gives us the nice property that
> KFENCE will be build-tested by allyesconfig builds.
>
> Reviewed-by: Dmitry Vyukov <[email protected]>
> Co-developed-by: Marco Elver <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> lib/Kconfig.kfence | 2 +-
> mm/kasan/common.c | 7 +++++++
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence
> index 4c2ea1c722de..6825c1c07a10 100644
> --- a/lib/Kconfig.kfence
> +++ b/lib/Kconfig.kfence
> @@ -10,7 +10,7 @@ config HAVE_ARCH_KFENCE_STATIC_POOL
>
> menuconfig KFENCE
> bool "KFENCE: low-overhead sampling-based memory safety error detector"
> - depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
> + depends on HAVE_ARCH_KFENCE && (!KASAN || EXPERT) && (SLAB || SLUB)
> depends on JUMP_LABEL # To ensure performance, require jump labels
> select STACKTRACE
> help
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 950fd372a07e..f5c49f0fdeff 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -18,6 +18,7 @@
> #include <linux/init.h>
> #include <linux/kasan.h>
> #include <linux/kernel.h>
> +#include <linux/kfence.h>
> #include <linux/kmemleak.h>
> #include <linux/linkage.h>
> #include <linux/memblock.h>
> @@ -396,6 +397,9 @@ static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
> tagged_object = object;
> object = reset_tag(object);
>
> + if (is_kfence_address(object))
> + return false;
> +
> if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
> object)) {
> kasan_report_invalid_free(tagged_object, ip);
> @@ -444,6 +448,9 @@ static void *__kasan_kmalloc(struct kmem_cache *cache, const void *object,
> if (unlikely(object == NULL))
> return NULL;
>
> + if (is_kfence_address(object))
> + return (void *)object;
> +
> redzone_start = round_up((unsigned long)(object + size),
> KASAN_SHADOW_SCALE_SIZE);
> redzone_end = round_up((unsigned long)object + cache->object_size,
> --
> 2.28.0.681.g6f77f65b4e-goog
>

With KFENCE + KASAN both enabled we need to bail out in all KASAN
hooks that get called from the allocator, right? Do I understand
correctly that these two are the only ones that are called for
KFENCE-allocated objects due to the way KFENCE is integrated into the
allocator?

2020-09-29 13:15:24

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 06/10] kfence, kasan: make KFENCE compatible with KASAN

On Tue, Sep 29, 2020 at 2:21 PM Andrey Konovalov <[email protected]> wrote:
>
> On Mon, Sep 21, 2020 at 3:26 PM Marco Elver <[email protected]> wrote:
> >
> > From: Alexander Potapenko <[email protected]>
> >
> > We make KFENCE compatible with KASAN for testing KFENCE itself. In
> > particular, KASAN helps to catch any potential corruptions to KFENCE
> > state, or other corruptions that may be a result of freepointer
> > corruptions in the main allocators.
> >
> > To indicate that the combination of the two is generally discouraged,
> > CONFIG_EXPERT=y should be set. It also gives us the nice property that
> > KFENCE will be build-tested by allyesconfig builds.
> >
> > Reviewed-by: Dmitry Vyukov <[email protected]>
> > Co-developed-by: Marco Elver <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
> > Signed-off-by: Alexander Potapenko <[email protected]>
> > ---
> > lib/Kconfig.kfence | 2 +-
> > mm/kasan/common.c | 7 +++++++
> > 2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/Kconfig.kfence b/lib/Kconfig.kfence
> > index 4c2ea1c722de..6825c1c07a10 100644
> > --- a/lib/Kconfig.kfence
> > +++ b/lib/Kconfig.kfence
> > @@ -10,7 +10,7 @@ config HAVE_ARCH_KFENCE_STATIC_POOL
> >
> > menuconfig KFENCE
> > bool "KFENCE: low-overhead sampling-based memory safety error detector"
> > - depends on HAVE_ARCH_KFENCE && !KASAN && (SLAB || SLUB)
> > + depends on HAVE_ARCH_KFENCE && (!KASAN || EXPERT) && (SLAB || SLUB)
> > depends on JUMP_LABEL # To ensure performance, require jump labels
> > select STACKTRACE
> > help
> > diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> > index 950fd372a07e..f5c49f0fdeff 100644
> > --- a/mm/kasan/common.c
> > +++ b/mm/kasan/common.c
> > @@ -18,6 +18,7 @@
> > #include <linux/init.h>
> > #include <linux/kasan.h>
> > #include <linux/kernel.h>
> > +#include <linux/kfence.h>
> > #include <linux/kmemleak.h>
> > #include <linux/linkage.h>
> > #include <linux/memblock.h>
> > @@ -396,6 +397,9 @@ static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
> > tagged_object = object;
> > object = reset_tag(object);
> >
> > + if (is_kfence_address(object))
> > + return false;
> > +
> > if (unlikely(nearest_obj(cache, virt_to_head_page(object), object) !=
> > object)) {
> > kasan_report_invalid_free(tagged_object, ip);
> > @@ -444,6 +448,9 @@ static void *__kasan_kmalloc(struct kmem_cache *cache, const void *object,
> > if (unlikely(object == NULL))
> > return NULL;
> >
> > + if (is_kfence_address(object))
> > + return (void *)object;
> > +
> > redzone_start = round_up((unsigned long)(object + size),
> > KASAN_SHADOW_SCALE_SIZE);
> > redzone_end = round_up((unsigned long)object + cache->object_size,
> > --
> > 2.28.0.681.g6f77f65b4e-goog
> >
>
> With KFENCE + KASAN both enabled we need to bail out in all KASAN
> hooks that get called from the allocator, right? Do I understand
> correctly that these two are the only ones that are called for
> KFENCE-allocated objects due to the way KFENCE is integrated into the
> allocator?

Yes, these two places were sufficient; we've checked that KFENCE and
KASAN work together.

2020-09-29 13:55:46

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Tue, Sep 22, 2020 at 11:56:26AM +0200, Marco Elver wrote:
> On Mon, 21 Sep 2020 at 19:44, Will Deacon <[email protected]> wrote:
> [...]
> > > > > > For ARM64, we would like to solicit feedback on what the best option is
> > > > > > to obtain a constant address for __kfence_pool. One option is to declare
> > > > > > a memory range in the memory layout to be dedicated to KFENCE (like is
> > > > > > done for KASAN), however, it is unclear if this is the best available
> > > > > > option. We would like to avoid touching the memory layout.
> > > > >
> > > > > Sorry for the delay on this.
> > > >
> > > > NP, thanks for looking!
> > > >
> > > > > Given that the pool is relatively small (i.e. when compared with our virtual
> > > > > address space), dedicating an area of virtual space sounds like it makes
> > > > > the most sense here. How early do you need it to be available?
> > > >
> > > > Yes, having a dedicated address sounds good.
> > > > We're inserting kfence_init() into start_kernel() after timekeeping_init().
> > > > So way after mm_init(), if that matters.
> > >
> > > The question is though, how big should that dedicated area be?
> > > Right now KFENCE_NUM_OBJECTS can be up to 16383 (which makes the pool
> > > size 64MB), but this number actually comes from the limitation on
> > > static objects, so we might want to increase that number on arm64.
> >
> > What happens on x86 and why would we do something different?
>
> On x86 we just do `char __kfence_pool[KFENCE_POOL_SIZE] ...;` to
> statically allocate the pool. On arm64 this doesn't seem to work
> because static memory doesn't have struct pages?

Are you using virt_to_page() directly on that statically-allocated
__kfence_pool? If so you'll need to use lm_alias() if so, as is done in
mm/kasan/init.c.

Anything statically allocated is part of the kernel image address range
rather than the linear/direct map, and doesn't have a valid virt addr,
but its linear map alias does.

If you enable CONFIG_DEBUG_VIRTUAL you should get warnings if missing
lm_alias() calls.

Thanks,
Mark.

2020-09-29 14:04:18

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Fri, Sep 25, 2020 at 05:25:11PM +0200, Alexander Potapenko wrote:
> Will,
>
> > Given that the pool is relatively small (i.e. when compared with our virtual
> > address space), dedicating an area of virtual space sounds like it makes
> > the most sense here. How early do you need it to be available?
>
> How do we assign struct pages to a fixed virtual space area (I'm
> currently experimenting with 0xffff7f0000000000-0xffff7f0000200000)?

You don't.

There should be a struct page for each of the /physical/ pages, and
these can be found:

* via the physical address, using phyts_to_page() or pfn_to_page()
* via the linear/direct map, using virt_to_page()
* via the vmalloc page tables using vmalloc_to_page()

If you need virt_to_page() to work, the address has to be part of the
linear/direct map.

If you need to find the struct page for something that's part of the
kernel image you can use virt_to_page(lm_alias(x)).

> Looks like filling page table entries (similarly to what's being done
> in arch/arm64/mm/kasan_init.c) is not enough.
> I thought maybe vmemmap_populate() would do the job, but it didn't
> (virt_to_pfn() still returns invalid PFNs).

As above, I think lm_alias() will solve the problem here. Please see
that and CONFIG_DEBUG_VIRTUAL.

Thanks,
Mark.

2020-09-29 14:32:05

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> Add architecture specific implementation details for KFENCE and enable
> KFENCE for the arm64 architecture. In particular, this implements the
> required interface in <asm/kfence.h>. Currently, the arm64 version does
> not yet use a statically allocated memory pool, at the cost of a pointer
> load for each is_kfence_address().
>
> Reviewed-by: Dmitry Vyukov <[email protected]>
> Co-developed-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> For ARM64, we would like to solicit feedback on what the best option is
> to obtain a constant address for __kfence_pool. One option is to declare
> a memory range in the memory layout to be dedicated to KFENCE (like is
> done for KASAN), however, it is unclear if this is the best available
> option. We would like to avoid touching the memory layout.
> ---
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/kfence.h | 39 +++++++++++++++++++++++++++++++++
> arch/arm64/mm/fault.c | 4 ++++
> 3 files changed, 44 insertions(+)
> create mode 100644 arch/arm64/include/asm/kfence.h
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 6d232837cbee..1acc6b2877c3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -132,6 +132,7 @@ config ARM64
> select HAVE_ARCH_JUMP_LABEL_RELATIVE
> select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
> select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
> + select HAVE_ARCH_KFENCE if (!ARM64_16K_PAGES && !ARM64_64K_PAGES)
> select HAVE_ARCH_KGDB
> select HAVE_ARCH_MMAP_RND_BITS
> select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
> new file mode 100644
> index 000000000000..608dde80e5ca
> --- /dev/null
> +++ b/arch/arm64/include/asm/kfence.h
> @@ -0,0 +1,39 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __ASM_KFENCE_H
> +#define __ASM_KFENCE_H
> +
> +#include <linux/kfence.h>
> +#include <linux/log2.h>
> +#include <linux/mm.h>
> +
> +#include <asm/cacheflush.h>
> +
> +#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync"
> +
> +/*
> + * FIXME: Support HAVE_ARCH_KFENCE_STATIC_POOL: Use the statically allocated
> + * __kfence_pool, to avoid the extra pointer load for is_kfence_address(). By
> + * default, however, we do not have struct pages for static allocations.
> + */
> +
> +static inline bool arch_kfence_initialize_pool(void)
> +{
> + const unsigned int num_pages = ilog2(roundup_pow_of_two(KFENCE_POOL_SIZE / PAGE_SIZE));
> + struct page *pages = alloc_pages(GFP_KERNEL, num_pages);
> +
> + if (!pages)
> + return false;
> +
> + __kfence_pool = page_address(pages);
> + return true;
> +}
> +
> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> +{
> + set_memory_valid(addr, 1, !protect);
> +
> + return true;
> +}

This is only safe if the linear map is force ot page granularity. That's
the default with rodata=full, but this is not always the case, so this
will need some interaction with the MMU setup in arch/arm64/mm/mmu.c.

Thanks,
Mark.

2020-09-29 16:54:28

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

> > On x86 we just do `char __kfence_pool[KFENCE_POOL_SIZE] ...;` to
> > statically allocate the pool. On arm64 this doesn't seem to work
> > because static memory doesn't have struct pages?
>
> Are you using virt_to_page() directly on that statically-allocated
> __kfence_pool? If so you'll need to use lm_alias() if so, as is done in
> mm/kasan/init.c.
>
> Anything statically allocated is part of the kernel image address range
> rather than the linear/direct map, and doesn't have a valid virt addr,
> but its linear map alias does.
>
> If you enable CONFIG_DEBUG_VIRTUAL you should get warnings if missing
> lm_alias() calls.

I just checked that on x86 CONFIG_DEBUG_VIRTUAL prints no warnings on our tests.
virt_addr_valid() also returns true for addresses belonging to
__kfence_pool declared in BSS.
Could this be related to x86 mapping the kernel twice?

2020-09-29 17:06:23

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Tue, Sep 29, 2020 at 4:28 PM Mark Rutland <[email protected]> wrote:
>
> On Mon, Sep 21, 2020 at 03:26:04PM +0200, Marco Elver wrote:
> > Add architecture specific implementation details for KFENCE and enable
> > KFENCE for the arm64 architecture. In particular, this implements the
> > required interface in <asm/kfence.h>. Currently, the arm64 version does
> > not yet use a statically allocated memory pool, at the cost of a pointer
> > load for each is_kfence_address().
> >
> > Reviewed-by: Dmitry Vyukov <[email protected]>
> > Co-developed-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Marco Elver <[email protected]>
> > ---
> > For ARM64, we would like to solicit feedback on what the best option is
> > to obtain a constant address for __kfence_pool. One option is to declare
> > a memory range in the memory layout to be dedicated to KFENCE (like is
> > done for KASAN), however, it is unclear if this is the best available
> > option. We would like to avoid touching the memory layout.
> > ---
> > arch/arm64/Kconfig | 1 +
> > arch/arm64/include/asm/kfence.h | 39 +++++++++++++++++++++++++++++++++
> > arch/arm64/mm/fault.c | 4 ++++
> > 3 files changed, 44 insertions(+)
> > create mode 100644 arch/arm64/include/asm/kfence.h
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 6d232837cbee..1acc6b2877c3 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -132,6 +132,7 @@ config ARM64
> > select HAVE_ARCH_JUMP_LABEL_RELATIVE
> > select HAVE_ARCH_KASAN if !(ARM64_16K_PAGES && ARM64_VA_BITS_48)
> > select HAVE_ARCH_KASAN_SW_TAGS if HAVE_ARCH_KASAN
> > + select HAVE_ARCH_KFENCE if (!ARM64_16K_PAGES && !ARM64_64K_PAGES)
> > select HAVE_ARCH_KGDB
> > select HAVE_ARCH_MMAP_RND_BITS
> > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> > diff --git a/arch/arm64/include/asm/kfence.h b/arch/arm64/include/asm/kfence.h
> > new file mode 100644
> > index 000000000000..608dde80e5ca
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/kfence.h
> > @@ -0,0 +1,39 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef __ASM_KFENCE_H
> > +#define __ASM_KFENCE_H
> > +
> > +#include <linux/kfence.h>
> > +#include <linux/log2.h>
> > +#include <linux/mm.h>
> > +
> > +#include <asm/cacheflush.h>
> > +
> > +#define KFENCE_SKIP_ARCH_FAULT_HANDLER "el1_sync"
> > +
> > +/*
> > + * FIXME: Support HAVE_ARCH_KFENCE_STATIC_POOL: Use the statically allocated
> > + * __kfence_pool, to avoid the extra pointer load for is_kfence_address(). By
> > + * default, however, we do not have struct pages for static allocations.
> > + */
> > +
> > +static inline bool arch_kfence_initialize_pool(void)
> > +{
> > + const unsigned int num_pages = ilog2(roundup_pow_of_two(KFENCE_POOL_SIZE / PAGE_SIZE));
> > + struct page *pages = alloc_pages(GFP_KERNEL, num_pages);
> > +
> > + if (!pages)
> > + return false;
> > +
> > + __kfence_pool = page_address(pages);
> > + return true;
> > +}
> > +
> > +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> > +{
> > + set_memory_valid(addr, 1, !protect);
> > +
> > + return true;
> > +}
>
> This is only safe if the linear map is force ot page granularity. That's
> the default with rodata=full, but this is not always the case, so this
> will need some interaction with the MMU setup in arch/arm64/mm/mmu.c.

On x86 we ensure this by reallocating the necessary page tables.

But looks like your suggestion is what we need for arm64 as long as we
also want virt_to_page() to work for our pool.
It's still unclear to me whether a carveout you suggest can be placed
at a fixed (known at link time) address, as the main point of this
dance is to remove memory accesses from is_kfence_addr().

> Thanks,
> Mark.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2020-10-01 11:26:39

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

Mark,

> If you need virt_to_page() to work, the address has to be part of the
> linear/direct map.
>
> If you need to find the struct page for something that's part of the
> kernel image you can use virt_to_page(lm_alias(x)).
>
> > Looks like filling page table entries (similarly to what's being done
> > in arch/arm64/mm/kasan_init.c) is not enough.
> > I thought maybe vmemmap_populate() would do the job, but it didn't
> > (virt_to_pfn() still returns invalid PFNs).
>
> As above, I think lm_alias() will solve the problem here. Please see
> that and CONFIG_DEBUG_VIRTUAL.

The approach you suggest works to some extent, but there are some caveats.

To reiterate, we are trying to allocate the pool (2Mb by default, but
users may want a bigger one, up to, say, 64 Mb) in a way that:
(1) The underlying page tables support 4K granularity.
(2) is_kfence_address() (checks that __kfence_pool <= addr <=
__kfence_pool + KFENCE_POOL_SIZE) does not reference memory
(3) For addresses belonging to that pool virt_addr_valid() is true
(SLAB/SLUB rely on that)

On x86 we achieve (2) by making our pool a .bss array, so that its
address is known statically. Aligning that array on 4K and calling
set_memory_4k() ensures that (1) is also fulfilled. (3) seems to just
happen automagically without any address translations.

Now, what we are seeing on arm64 is different.
My understanding (please correct me if I'm wrong) is that on arm64
only the memory range at 0xffff000000000000 has valid struct pages,
and the size of that range depends on the amount of memory on the
system.
This probably means we cannot just pick a fixed address for our pool
in that range, unless it is very close to 0xffff000000000000.

If we allocate the pool statically in the way x86 does (assuming we
somehow resolve (1)), we can apply lm_alias() to addresses returned by
the KFENCE allocator, making kmalloc() always return addresses from
the linear map and satisfying (3).
But in that case is_kfence_address() will also need to be updated to
compare the addresses to lm_alias(__kfence_pool), and this becomes
more heavyweight than just reading the address from memory.

So looks like it's still more preferable to allocate the pool
dynamically on ARM64, unless there's a clever trick to allocate a
fixed address in the linear map (DTS maybe?)

Thanks,
Alex

2020-10-01 17:59:36

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Thu, Oct 01, 2020 at 01:24:49PM +0200, Alexander Potapenko wrote:
> Mark,
>
> > If you need virt_to_page() to work, the address has to be part of the
> > linear/direct map.
> >
> > If you need to find the struct page for something that's part of the
> > kernel image you can use virt_to_page(lm_alias(x)).
> >
> > > Looks like filling page table entries (similarly to what's being done
> > > in arch/arm64/mm/kasan_init.c) is not enough.
> > > I thought maybe vmemmap_populate() would do the job, but it didn't
> > > (virt_to_pfn() still returns invalid PFNs).
> >
> > As above, I think lm_alias() will solve the problem here. Please see
> > that and CONFIG_DEBUG_VIRTUAL.
>
> The approach you suggest works to some extent, but there are some caveats.
>
> To reiterate, we are trying to allocate the pool (2Mb by default, but
> users may want a bigger one, up to, say, 64 Mb) in a way that:
> (1) The underlying page tables support 4K granularity.
> (2) is_kfence_address() (checks that __kfence_pool <= addr <=
> __kfence_pool + KFENCE_POOL_SIZE) does not reference memory

What's the underlying requirement here? Is this a performance concern,
codegen/codesize, or something else?

> (3) For addresses belonging to that pool virt_addr_valid() is true
> (SLAB/SLUB rely on that)

As I hinted at before, there's a reasonable amount of code which relies
on being able to round-trip convert (va->{pa,page}->va) allocations from
SLUB, e.g. phys = virt_to_page(addr); ... ; phys = page_to_virt(phys).
Usually this is because the phys addr is stored in some HW register, or
in-memory structure shared with HW.

I'm fairly certain KFENCE will need to support this in order to be
deployable in production, and arm64 is the canary in the coalmine.

I added tests for this back when tag-based KASAN broke this property.
See commit:

b92a953cb7f727c4 ("lib/test_kasan.c: add roundtrip tests")

... for which IIUC the kfree_via_phys() test would be broken by KFENCE,
even on x86:

| static noinline void __init kfree_via_phys(void)
| {
| char *ptr;
| size_t size = 8;
| phys_addr_t phys;
|
| pr_info("invalid-free false positive (via phys)\n");
| ptr = kmalloc(size, GFP_KERNEL);
| if (!ptr) {
| pr_err("Allocation failed\n");
| return;
| }
|
| phys = virt_to_phys(ptr);
| kfree(phys_to_virt(phys));
| }

... since the code will pass the linear map alias of the KFENCE VA into
kfree().

To avoid random breakage we either need to:

* Have KFENCE retain this property (which effectively requires
allocation VAs to fall within the linear/direct map)

* Decide that round-trips are forbidden, and go modify that code
somehow, which was deemed to be impractical in the past

... and I would strongly prefer the former as it's less liable to break any
existing code.

> On x86 we achieve (2) by making our pool a .bss array, so that its
> address is known statically. Aligning that array on 4K and calling
> set_memory_4k() ensures that (1) is also fulfilled. (3) seems to just
> happen automagically without any address translations.
>
> Now, what we are seeing on arm64 is different.
> My understanding (please correct me if I'm wrong) is that on arm64
> only the memory range at 0xffff000000000000 has valid struct pages,
> and the size of that range depends on the amount of memory on the
> system.

The way virt_to_page() works is based on there being a constant (at
runtime) offset between a linear map address and the corresponding
physical page. That makes it easy to get the PA with a subtraction, then
the PFN with a shift, then to index the vmemmap array with that to get
the page. The x86 version of virt_to_page() automatically fixes up an
image address to its linear map alias internally.

> This probably means we cannot just pick a fixed address for our pool
> in that range, unless it is very close to 0xffff000000000000.

It would have to be part of the linear map, or we'd have to apply the
same fixup as x86 does. But as above, I'm reluctant to do that as it
only encourages writing fragile code. The only sensible way to detect
that is to disallow virt_to_*() on image addresses, since that's the
only time we can distinguish the source.

> If we allocate the pool statically in the way x86 does (assuming we
> somehow resolve (1)), we can apply lm_alias() to addresses returned by
> the KFENCE allocator, making kmalloc() always return addresses from
> the linear map and satisfying (3).
> But in that case is_kfence_address() will also need to be updated to
> compare the addresses to lm_alias(__kfence_pool), and this becomes
> more heavyweight than just reading the address from memory.

We can calculate the lm_alias(__kfence_pool) at boot time, so it's only
a read from memory in the fast-path.

> So looks like it's still more preferable to allocate the pool
> dynamically on ARM64, unless there's a clever trick to allocate a
> fixed address in the linear map (DTS maybe?)

I'm not too worried about allocating this dynamically, but:

* The arch code needs to set up the translation tables for this, as we
cannot safely change the mapping granularity live.

* As above I'm fairly certain x86 needs to use a carevout from the
linear map to function correctly anyhow, so we should follow the same
approach for both arm64 and x86. That might be a static carevout that
we figure out the aliasing for, or something entirely dynamic.

Thanks,
Mark.

2020-10-08 09:42:57

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Thu, 1 Oct 2020 at 19:58, Mark Rutland <[email protected]> wrote:
[...]
> > > If you need virt_to_page() to work, the address has to be part of the
> > > linear/direct map.
[...]
>
> What's the underlying requirement here? Is this a performance concern,
> codegen/codesize, or something else?

It used to be performance, since is_kfence_address() is used in the
fast path. However, with some further tweaks we just did to
is_kfence_address(), our benchmarks show a pointer load can be
tolerated.

> > (3) For addresses belonging to that pool virt_addr_valid() is true
> > (SLAB/SLUB rely on that)
>
> As I hinted at before, there's a reasonable amount of code which relies
> on being able to round-trip convert (va->{pa,page}->va) allocations from
> SLUB, e.g. phys = virt_to_page(addr); ... ; phys = page_to_virt(phys).
> Usually this is because the phys addr is stored in some HW register, or
> in-memory structure shared with HW.
>
> I'm fairly certain KFENCE will need to support this in order to be
> deployable in production, and arm64 is the canary in the coalmine.
>
> I added tests for this back when tag-based KASAN broke this property.
> See commit:
>
> b92a953cb7f727c4 ("lib/test_kasan.c: add roundtrip tests")
>
> ... for which IIUC the kfree_via_phys() test would be broken by KFENCE,
> even on x86:

Yeah, we're fixing that by also making x86 use a dynamically allocated
pool now. The benefits we got from the static pool no longer apply, so
the whole dance to make the static pool work right is no longer worth
it.

> | static noinline void __init kfree_via_phys(void)
> | {
> | char *ptr;
> | size_t size = 8;
> | phys_addr_t phys;
> |
> | pr_info("invalid-free false positive (via phys)\n");
> | ptr = kmalloc(size, GFP_KERNEL);
> | if (!ptr) {
> | pr_err("Allocation failed\n");
> | return;
> | }
> |
> | phys = virt_to_phys(ptr);
> | kfree(phys_to_virt(phys));
> | }
>
> ... since the code will pass the linear map alias of the KFENCE VA into
> kfree().
>
> To avoid random breakage we either need to:
>
> * Have KFENCE retain this property (which effectively requires
> allocation VAs to fall within the linear/direct map)

^^ Yes, this is the only realistic option.

> * Decide that round-trips are forbidden, and go modify that code
> somehow, which was deemed to be impractical in the past
>
> ... and I would strongly prefer the former as it's less liable to break any
> existing code.
>
> > On x86 we achieve (2) by making our pool a .bss array, so that its
> > address is known statically. Aligning that array on 4K and calling
> > set_memory_4k() ensures that (1) is also fulfilled. (3) seems to just
> > happen automagically without any address translations.
> >
> > Now, what we are seeing on arm64 is different.
> > My understanding (please correct me if I'm wrong) is that on arm64
> > only the memory range at 0xffff000000000000 has valid struct pages,
> > and the size of that range depends on the amount of memory on the
> > system.
>
> The way virt_to_page() works is based on there being a constant (at
> runtime) offset between a linear map address and the corresponding
> physical page. That makes it easy to get the PA with a subtraction, then
> the PFN with a shift, then to index the vmemmap array with that to get
> the page. The x86 version of virt_to_page() automatically fixes up an
> image address to its linear map alias internally.
>
> > This probably means we cannot just pick a fixed address for our pool
> > in that range, unless it is very close to 0xffff000000000000.
>
> It would have to be part of the linear map, or we'd have to apply the
> same fixup as x86 does. But as above, I'm reluctant to do that as it
> only encourages writing fragile code. The only sensible way to detect
> that is to disallow virt_to_*() on image addresses, since that's the
> only time we can distinguish the source.
>
> > If we allocate the pool statically in the way x86 does (assuming we
> > somehow resolve (1)), we can apply lm_alias() to addresses returned by
> > the KFENCE allocator, making kmalloc() always return addresses from
> > the linear map and satisfying (3).
> > But in that case is_kfence_address() will also need to be updated to
> > compare the addresses to lm_alias(__kfence_pool), and this becomes
> > more heavyweight than just reading the address from memory.
>
> We can calculate the lm_alias(__kfence_pool) at boot time, so it's only
> a read from memory in the fast-path.
>
> > So looks like it's still more preferable to allocate the pool
> > dynamically on ARM64, unless there's a clever trick to allocate a
> > fixed address in the linear map (DTS maybe?)
>
> I'm not too worried about allocating this dynamically, but:
>
> * The arch code needs to set up the translation tables for this, as we
> cannot safely change the mapping granularity live.
>
> * As above I'm fairly certain x86 needs to use a carevout from the
> linear map to function correctly anyhow, so we should follow the same
> approach for both arm64 and x86. That might be a static carevout that
> we figure out the aliasing for, or something entirely dynamic.

We're going with dynamically allocating the pool (for both x86 and
arm64), since any benefits we used to measure from the static pool are
no longer measurable (after removing a branch from
is_kfence_address()). It should hopefully simplify a lot of things,
given all the caveats that you pointed out.

For arm64, the only thing left then is to fix up the case if the
linear map is not forced to page granularity.

Thanks,
-- Marco

2020-10-08 11:42:50

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Thu, Oct 08, 2020 at 11:40:52AM +0200, Marco Elver wrote:
> On Thu, 1 Oct 2020 at 19:58, Mark Rutland <[email protected]> wrote:
> [...]
> > > > If you need virt_to_page() to work, the address has to be part of the
> > > > linear/direct map.
> [...]
> >
> > What's the underlying requirement here? Is this a performance concern,
> > codegen/codesize, or something else?
>
> It used to be performance, since is_kfence_address() is used in the
> fast path. However, with some further tweaks we just did to
> is_kfence_address(), our benchmarks show a pointer load can be
> tolerated.

Great!

I reckon that this is something we can optimize in futue if necessary
(e.g. with some form of code-patching for immediate values), but it's
good to have a starting point that works everywhere!

[...]

> > I'm not too worried about allocating this dynamically, but:
> >
> > * The arch code needs to set up the translation tables for this, as we
> > cannot safely change the mapping granularity live.
> >
> > * As above I'm fairly certain x86 needs to use a carevout from the
> > linear map to function correctly anyhow, so we should follow the same
> > approach for both arm64 and x86. That might be a static carevout that
> > we figure out the aliasing for, or something entirely dynamic.
>
> We're going with dynamically allocating the pool (for both x86 and
> arm64), since any benefits we used to measure from the static pool are
> no longer measurable (after removing a branch from
> is_kfence_address()). It should hopefully simplify a lot of things,
> given all the caveats that you pointed out.
>
> For arm64, the only thing left then is to fix up the case if the
> linear map is not forced to page granularity.

The simplest way to do this is to modify arm64's arch_add_memory() to
force the entire linear map to be mapped at page granularity when KFENCE
is enabled, something like:

| diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
| index 936c4762dadff..f6eba0642a4a3 100644
| --- a/arch/arm64/mm/mmu.c
| +++ b/arch/arm64/mm/mmu.c
| @@ -1454,7 +1454,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
| {
| int ret, flags = 0;
|
| - if (rodata_full || debug_pagealloc_enabled())
| + if (rodata_full || debug_pagealloc_enabled() ||
| + IS_ENABLED(CONFIG_KFENCE))
| flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
|
| __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),

... and I given that RODATA_FULL_DEFAULT_ENABLED is the default, I
suspect it's not worth trying to only for that for the KFENCE region
unless someone complains.

Thanks,
Mark.

2020-10-14 19:17:40

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Thu, 8 Oct 2020 at 12:45, Mark Rutland <[email protected]> wrote:
> On Thu, Oct 08, 2020 at 11:40:52AM +0200, Marco Elver wrote:
> > On Thu, 1 Oct 2020 at 19:58, Mark Rutland <[email protected]> wrote:
> > [...]
> > > > > If you need virt_to_page() to work, the address has to be part of the
> > > > > linear/direct map.
> > [...]
> > >
> > > What's the underlying requirement here? Is this a performance concern,
> > > codegen/codesize, or something else?
> >
> > It used to be performance, since is_kfence_address() is used in the
> > fast path. However, with some further tweaks we just did to
> > is_kfence_address(), our benchmarks show a pointer load can be
> > tolerated.
>
> Great!
>
> I reckon that this is something we can optimize in futue if necessary
> (e.g. with some form of code-patching for immediate values), but it's
> good to have a starting point that works everywhere!
>
> [...]
>
> > > I'm not too worried about allocating this dynamically, but:
> > >
> > > * The arch code needs to set up the translation tables for this, as we
> > > cannot safely change the mapping granularity live.
> > >
> > > * As above I'm fairly certain x86 needs to use a carevout from the
> > > linear map to function correctly anyhow, so we should follow the same
> > > approach for both arm64 and x86. That might be a static carevout that
> > > we figure out the aliasing for, or something entirely dynamic.
> >
> > We're going with dynamically allocating the pool (for both x86 and
> > arm64), since any benefits we used to measure from the static pool are
> > no longer measurable (after removing a branch from
> > is_kfence_address()). It should hopefully simplify a lot of things,
> > given all the caveats that you pointed out.
> >
> > For arm64, the only thing left then is to fix up the case if the
> > linear map is not forced to page granularity.
>
> The simplest way to do this is to modify arm64's arch_add_memory() to
> force the entire linear map to be mapped at page granularity when KFENCE
> is enabled, something like:
>
[...]
>
> ... and I given that RODATA_FULL_DEFAULT_ENABLED is the default, I
> suspect it's not worth trying to only for that for the KFENCE region
> unless someone complains.

We've got most of this sorted now for v5 -- thank you!

The only thing we're wondering now, is if there are any corner cases
with using memblock_alloc'd memory for the KFENCE pool? (We'd like to
avoid page alloc's MAX_ORDER limit.) We have a version that passes
tests on x86 and arm64, but checking just in case. :-)

Thanks,
-- Marco

2020-10-15 13:43:07

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Wed, Oct 14, 2020 at 09:12:37PM +0200, Marco Elver wrote:
> On Thu, 8 Oct 2020 at 12:45, Mark Rutland <[email protected]> wrote:
> > On Thu, Oct 08, 2020 at 11:40:52AM +0200, Marco Elver wrote:
> > > On Thu, 1 Oct 2020 at 19:58, Mark Rutland <[email protected]> wrote:

> > > > > > If you need virt_to_page() to work, the address has to be part of the
> > > > > > linear/direct map.

> > > We're going with dynamically allocating the pool (for both x86 and
> > > arm64),

[...]

> We've got most of this sorted now for v5 -- thank you!
>
> The only thing we're wondering now, is if there are any corner cases
> with using memblock_alloc'd memory for the KFENCE pool? (We'd like to
> avoid page alloc's MAX_ORDER limit.) We have a version that passes
> tests on x86 and arm64, but checking just in case. :-)

AFAICT otherwise the only noticeable difference might be PageSlab(), if
that's clear for KFENCE allocated pages? A few helpers appear to check
that to determine how something was allocated (e.g. in the scatterlist
and hwpoison code), and I suspect that needs to behave the same.

Otherwise, I *think* using memblock_alloc should be fine on arm64; I'm
not entirely sure for x86 (but suspect it's similar). On arm64:

* All memory is given a struct page via memblocks_present() adding all
memory memblocks. This includes memory allocated by memblock_alloc().

* All memory is mapped into the linear map via arm64's map_mem() adding
all (non-nomap) memory memblocks. This includes memory allocated by
memblock_alloc().

Thanks,
Mark.

2020-10-15 14:17:18

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/10] arm64, kfence: enable KFENCE for ARM64

On Thu, 15 Oct 2020 at 15:39, Mark Rutland <[email protected]> wrote:
> On Wed, Oct 14, 2020 at 09:12:37PM +0200, Marco Elver wrote:
> > On Thu, 8 Oct 2020 at 12:45, Mark Rutland <[email protected]> wrote:
> > > On Thu, Oct 08, 2020 at 11:40:52AM +0200, Marco Elver wrote:
> > > > On Thu, 1 Oct 2020 at 19:58, Mark Rutland <[email protected]> wrote:
>
> > > > > > > If you need virt_to_page() to work, the address has to be part of the
> > > > > > > linear/direct map.
>
> > > > We're going with dynamically allocating the pool (for both x86 and
> > > > arm64),
>
> [...]
>
> > We've got most of this sorted now for v5 -- thank you!
> >
> > The only thing we're wondering now, is if there are any corner cases
> > with using memblock_alloc'd memory for the KFENCE pool? (We'd like to
> > avoid page alloc's MAX_ORDER limit.) We have a version that passes
> > tests on x86 and arm64, but checking just in case. :-)
>
> AFAICT otherwise the only noticeable difference might be PageSlab(), if
> that's clear for KFENCE allocated pages? A few helpers appear to check
> that to determine how something was allocated (e.g. in the scatterlist
> and hwpoison code), and I suspect that needs to behave the same.

We had to take care of setting PageSlab before, too. We do this during
kfence_init().

> Otherwise, I *think* using memblock_alloc should be fine on arm64; I'm
> not entirely sure for x86 (but suspect it's similar). On arm64:
>
> * All memory is given a struct page via memblocks_present() adding all
> memory memblocks. This includes memory allocated by memblock_alloc().
>
> * All memory is mapped into the linear map via arm64's map_mem() adding
> all (non-nomap) memory memblocks. This includes memory allocated by
> memblock_alloc().

Very good, thank you. We'll send v5 with these changes rebased on
5.10-rc1 (in ~2 weeks).

Thanks,
-- Marco