2022-03-29 16:42:18

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 00/48] Add KernelMemorySanitizer infrastructure

KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
uninitialized memory. It relies on compile-time Clang instrumentation
(similar to MSan in the userspace [1]) and tracks the state of every bit
of kernel memory, being able to report an error if uninitialized value is
used in a condition, dereferenced, or escapes to userspace, USB or DMA.

KMSAN has reported more than 300 bugs in the past few years (recently
fixed bugs: [2]), most of them with the help of syzkaller. Such bugs
keep getting introduced into the kernel despite new compiler warnings and
other analyses (the 5.16 cycle already resulted in several KMSAN-reported
bugs, e.g. [3]). Mitigations like total stack and heap initialization are
unfortunately very far from being deployable.

The proposed patchset contains KMSAN runtime implementation together with
small changes to other subsystems needed to make KMSAN work.

The latter changes fall into several categories:

1. Changes and refactorings of existing code required to add KMSAN:
- [1/48] x86: add missing include to sparsemem.h
- [2/48] stackdepot: reserve 5 extra bits in depot_stack_handle_t
- [3/48] kasan: common: adapt to the new prototype of __stack_depot_save()
- [4/48] instrumented.h: allow instrumenting both sides of copy_from_user()
- [5/48] x86: asm: instrument usercopy in get_user() and __put_user_size()
- [6/48] asm-generic: instrument usercopy in cacheflush.h
- [11/48] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
- [12/48] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

2. KMSAN-related declarations in generic code, KMSAN runtime library,
docs and configs:
- [7/48] kmsan: add ReST documentation
- [8/48] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
- [10/48] x86: kmsan: pgtable: reduce vmalloc space
- [13/48] kmsan: add KMSAN runtime core
- [16/48] MAINTAINERS: add entry for KMSAN
- [30/48] kmsan: add tests for KMSAN
- [38/48] objtool: kmsan: list KMSAN API functions as uaccess-safe
- [43/48] x86: kmsan: use __msan_ string functions where possible.
- [48/48] x86: kmsan: enable KMSAN builds for x86

3. Adding hooks from different subsystems to notify KMSAN about memory
state changes:
- [17/48] kmsan: mm: maintain KMSAN metadata for page operations
- [18/48] kmsan: mm: call KMSAN hooks from SLUB code
- [19/48] kmsan: handle task creation and exiting
- [20/48] kmsan: init: call KMSAN initialization routines
- [21/48] instrumented.h: add KMSAN support
- [23/48] kmsan: add iomap support
- [24/48] Input: libps2: mark data received in __ps2_command() as initialized
- [25/48] kmsan: dma: unpoison DMA mappings
- [42/48] x86: kmsan: handle open-coded assembly in lib/iomem.c
- [44/48] x86: kmsan: sync metadata pages on page fault

4. Changes that prevent false reports by explicitly initializing memory,
disabling optimized code that may trick KMSAN, selectively skipping
instrumentation:
- [14/48] kmsan: implement kmsan_init(), initialize READ_ONCE_NOCHECK()
- [15/48] kmsan: disable instrumentation of unsupported common kernel code
- [22/48] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
- [26/48] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
- [27/48] kmsan: handle memory sent to/from USB
- [31/48] kernel: kmsan: don't instrument stacktrace.c
- [32/48] kmsan: disable strscpy() optimization under KMSAN
- [33/48] crypto: kmsan: disable accelerated configs under KMSAN
- [34/48] kmsan: disable physical page merging in biovec
- [35/48] kmsan: block: skip bio block merging logic for KMSAN
- [36/48] kmsan: kcov: unpoison area->list in kcov_remote_area_put()
- [37/48] security: kmsan: fix interoperability with auto-initialization
- [39/48] x86: kmsan: make READ_ONCE_TASK_STACK() return initialized values
- [40/48] x86: kmsan: disable instrumentation of unsupported code
- [41/48] x86: kmsan: skip shadow checks in __switch_to()
- [45/48] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
- [46/48] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS

5. Noinstr handling:
- [9/48] kmsan: mark noinstr as __no_sanitize_memory
- [28/48] kmsan: instrumentation.h: add instrumentation_begin_with_regs()
- [29/48] kmsan: entry: handle register passing from uninstrumented code
- [47/48] x86: kmsan: handle register passing from uninstrumented code

This patchset allows one to boot and run a defconfig+KMSAN kernel on a
QEMU without known false positives. It however doesn't guarantee there
are no false positives in drivers of certain devices or less tested
subsystems, although KMSAN is actively tested on syzbot with a large
config.

The patchset was generated relative to Linux v5.17. The most
up-to-date KMSAN tree currently resides at
https://github.com/google/kmsan/.
One may find it handy to review these patches in Gerrit:
https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/12604/25

A huge thanks goes to the reviewers of the RFC patch series sent to LKML
in 2020
(https://lore.kernel.org/all/[email protected]/).

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-kmsan-gce
[3] https://lore.kernel.org/all/[email protected]/


Alexander Potapenko (47):
stackdepot: reserve 5 extra bits in depot_stack_handle_t
kasan: common: adapt to the new prototype of __stack_depot_save()
instrumented.h: allow instrumenting both sides of copy_from_user()
x86: asm: instrument usercopy in get_user() and __put_user_size()
asm-generic: instrument usercopy in cacheflush.h
kmsan: add ReST documentation
kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
kmsan: mark noinstr as __no_sanitize_memory
x86: kmsan: pgtable: reduce vmalloc space
libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN
kmsan: add KMSAN runtime core
kmsan: implement kmsan_init(), initialize READ_ONCE_NOCHECK()
kmsan: disable instrumentation of unsupported common kernel code
MAINTAINERS: add entry for KMSAN
kmsan: mm: maintain KMSAN metadata for page operations
kmsan: mm: call KMSAN hooks from SLUB code
kmsan: handle task creation and exiting
kmsan: init: call KMSAN initialization routines
instrumented.h: add KMSAN support
kmsan: unpoison @tlb in arch_tlb_gather_mmu()
kmsan: add iomap support
Input: libps2: mark data received in __ps2_command() as initialized
kmsan: dma: unpoison DMA mappings
kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
kmsan: handle memory sent to/from USB
kmsan: instrumentation.h: add instrumentation_begin_with_regs()
kmsan: entry: handle register passing from uninstrumented code
kmsan: add tests for KMSAN
kernel: kmsan: don't instrument stacktrace.c
kmsan: disable strscpy() optimization under KMSAN
crypto: kmsan: disable accelerated configs under KMSAN
kmsan: disable physical page merging in biovec
kmsan: block: skip bio block merging logic for KMSAN
kmsan: kcov: unpoison area->list in kcov_remote_area_put()
security: kmsan: fix interoperability with auto-initialization
objtool: kmsan: list KMSAN API functions as uaccess-safe
x86: kmsan: make READ_ONCE_TASK_STACK() return initialized values
x86: kmsan: disable instrumentation of unsupported code
x86: kmsan: skip shadow checks in __switch_to()
x86: kmsan: handle open-coded assembly in lib/iomem.c
x86: kmsan: use __msan_ string functions where possible.
x86: kmsan: sync metadata pages on page fault
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for
KASAN/KMSAN
x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
x86: kmsan: handle register passing from uninstrumented code
x86: kmsan: enable KMSAN builds for x86

Dmitry Vyukov (1):
x86: add missing include to sparsemem.h

Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 414 ++++++++++++++++++
MAINTAINERS | 12 +
Makefile | 1 +
arch/x86/Kconfig | 9 +-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/common.c | 3 +-
arch/x86/entry/vdso/Makefile | 3 +
arch/x86/include/asm/checksum.h | 16 +-
arch/x86/include/asm/idtentry.h | 10 +-
arch/x86/include/asm/page_64.h | 13 +
arch/x86/include/asm/pgtable_64_types.h | 41 +-
arch/x86/include/asm/sparsemem.h | 2 +
arch/x86/include/asm/string_64.h | 23 +-
arch/x86/include/asm/uaccess.h | 7 +
arch/x86/include/asm/unwind.h | 23 +-
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/mce/core.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/nmi.c | 2 +-
arch/x86/kernel/process_64.c | 1 +
arch/x86/kernel/sev.c | 4 +-
arch/x86/kernel/traps.c | 14 +-
arch/x86/lib/Makefile | 2 +
arch/x86/lib/iomem.c | 5 +
arch/x86/mm/Makefile | 2 +
arch/x86/mm/fault.c | 25 +-
arch/x86/mm/init_64.c | 2 +-
arch/x86/mm/ioremap.c | 3 +
arch/x86/realmode/rm/Makefile | 1 +
block/bio.c | 2 +
block/blk.h | 7 +
crypto/Kconfig | 30 ++
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/input/serio/libps2.c | 5 +-
drivers/net/Kconfig | 1 +
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
drivers/usb/core/urb.c | 2 +
drivers/virtio/virtio_ring.c | 10 +-
include/asm-generic/cacheflush.h | 9 +-
include/asm-generic/rwonce.h | 5 +-
include/linux/compiler-clang.h | 23 +
include/linux/compiler-gcc.h | 6 +
include/linux/compiler_types.h | 3 +-
include/linux/fortify-string.h | 2 +
include/linux/highmem.h | 3 +
include/linux/instrumentation.h | 6 +
include/linux/instrumented.h | 26 +-
include/linux/kmsan-checks.h | 123 ++++++
include/linux/kmsan.h | 359 ++++++++++++++++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
include/linux/stackdepot.h | 8 +
include/linux/uaccess.h | 19 +-
init/main.c | 3 +
kernel/Makefile | 6 +
kernel/dma/mapping.c | 9 +-
kernel/entry/common.c | 22 +-
kernel/exit.c | 2 +
kernel/fork.c | 2 +
kernel/kcov.c | 7 +
kernel/locking/Makefile | 3 +-
lib/Kconfig.debug | 1 +
lib/Kconfig.kcsan | 11 -
lib/Kconfig.kmsan | 39 ++
lib/Makefile | 1 +
lib/iomap.c | 40 ++
lib/iov_iter.c | 9 +-
lib/stackdepot.c | 29 +-
lib/string.c | 8 +
lib/usercopy.c | 3 +-
mm/Makefile | 1 +
mm/internal.h | 6 +
mm/kasan/common.c | 2 +-
mm/kmsan/Makefile | 26 ++
mm/kmsan/annotations.c | 28 ++
mm/kmsan/core.c | 463 ++++++++++++++++++++
mm/kmsan/hooks.c | 384 +++++++++++++++++
mm/kmsan/init.c | 240 +++++++++++
mm/kmsan/instrumentation.c | 267 ++++++++++++
mm/kmsan/kmsan.h | 188 +++++++++
mm/kmsan/kmsan_test.c | 536 ++++++++++++++++++++++++
mm/kmsan/report.c | 211 ++++++++++
mm/kmsan/shadow.c | 336 +++++++++++++++
mm/memory.c | 2 +
mm/mmu_gather.c | 10 +
mm/page_alloc.c | 18 +
mm/slab.h | 1 +
mm/slub.c | 21 +-
mm/vmalloc.c | 20 +-
scripts/Makefile.kmsan | 1 +
scripts/Makefile.lib | 9 +
security/Kconfig.hardening | 4 +
tools/objtool/check.c | 19 +
97 files changed, 4209 insertions(+), 98 deletions(-)
create mode 100644 Documentation/dev-tools/kmsan.rst
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/annotations.c
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/init.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/kmsan_test.c
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

--
2.35.1.1021.g381101b075-goog


2022-03-29 17:39:40

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 20/48] kmsan: init: call KMSAN initialization routines

kmsan_init_shadow() scans the mappings created at boot time and creates
metadata pages for those mappings.

When the memblock allocator returns pages to pagealloc, we reserve 2/3
of those pages and use them as metadata for the remaining 1/3. Once KMSAN
starts, every page allocated by pagealloc has its associated shadow and
origin pages.

kmsan_initialize() initializes the bookkeeping for init_task and enables
KMSAN.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move mm/kmsan/init.c and kmsan_memblock_free_pages() to this patch
-- print a warning that KMSAN is a debugging tool (per Greg K-H's
request)

Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
---
include/linux/kmsan.h | 48 +++++++++
init/main.c | 3 +
mm/kmsan/Makefile | 3 +-
mm/kmsan/init.c | 240 ++++++++++++++++++++++++++++++++++++++++++
mm/kmsan/kmsan.h | 3 +
mm/kmsan/shadow.c | 36 +++++++
mm/page_alloc.c | 3 +
7 files changed, 335 insertions(+), 1 deletion(-)
create mode 100644 mm/kmsan/init.c

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index dca42e0e91991..a5767c728a46b 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -52,6 +52,40 @@ void kmsan_task_create(struct task_struct *task);
*/
void kmsan_task_exit(struct task_struct *task);

+/**
+ * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
+ *
+ * Allocate and initialize KMSAN metadata for early allocations.
+ */
+void __init kmsan_init_shadow(void);
+
+/**
+ * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
+ */
+void __init kmsan_init_runtime(void);
+
+/**
+ * kmsan_memblock_free_pages() - handle freeing of memblock pages.
+ * @page: struct page to free.
+ * @order: order of @page.
+ *
+ * Freed pages are either returned to buddy allocator or held back to be used
+ * as metadata pages.
+ */
+bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
+
+/**
+ * kmsan_task_create() - Initialize KMSAN state for the task.
+ * @task: task to initialize.
+ */
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -173,6 +207,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_init_shadow(void)
+{
+}
+
+static inline void kmsan_init_runtime(void)
+{
+}
+
+static inline bool kmsan_memblock_free_pages(struct page *page,
+ unsigned int order)
+{
+ return true;
+}
+
static inline void kmsan_task_create(struct task_struct *task)
{
}
diff --git a/init/main.c b/init/main.c
index 65fa2e41a9c09..88be337b54298 100644
--- a/init/main.c
+++ b/init/main.c
@@ -34,6 +34,7 @@
#include <linux/percpu.h>
#include <linux/kmod.h>
#include <linux/kprobes.h>
+#include <linux/kmsan.h>
#include <linux/vmalloc.h>
#include <linux/kernel_stat.h>
#include <linux/start_kernel.h>
@@ -834,6 +835,7 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
+ kmsan_init_shadow();
stack_depot_early_init();
mem_init();
mem_init_print_info();
@@ -851,6 +853,7 @@ static void __init mm_init(void)
init_espfix_bsp();
/* Should be run after espfix64 is set up. */
pti_init();
+ kmsan_init_runtime();
}

#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 73b705cbf75b9..f57a956cb1c8b 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -1,4 +1,4 @@
-obj-y := core.o instrumentation.o hooks.o report.o shadow.o annotations.o
+obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o annotations.o

KMSAN_SANITIZE := n
KCOV_INSTRUMENT := n
@@ -16,6 +16,7 @@ CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
CFLAGS_annotations.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
new file mode 100644
index 0000000000000..45757d1390402
--- /dev/null
+++ b/mm/kmsan/init.c
@@ -0,0 +1,240 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN initialization routines.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include "kmsan.h"
+
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+
+#include "../internal.h"
+
+#define NUM_FUTURE_RANGES 128
+struct start_end_pair {
+ u64 start, end;
+};
+
+static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
+static int future_index __initdata;
+
+/*
+ * Record a range of memory for which the metadata pages will be created once
+ * the page allocator becomes available.
+ */
+static void __init kmsan_record_future_shadow_range(void *start, void *end)
+{
+ u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
+ bool merged = false;
+ int i;
+
+ KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
+ KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+ nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
+ nend = ALIGN(nend, PAGE_SIZE);
+
+ /*
+ * Scan the existing ranges to see if any of them overlaps with
+ * [start, end). In that case, merge the two ranges instead of
+ * creating a new one.
+ * The number of ranges is less than 20, so there is no need to organize
+ * them into a more intelligent data structure.
+ */
+ for (i = 0; i < future_index; i++) {
+ cstart = start_end_pairs[i].start;
+ cend = start_end_pairs[i].end;
+ if ((cstart < nstart && cend < nstart) ||
+ (cstart > nend && cend > nend))
+ /* ranges are disjoint - do not merge */
+ continue;
+ start_end_pairs[i].start = min(nstart, cstart);
+ start_end_pairs[i].end = max(nend, cend);
+ merged = true;
+ break;
+ }
+ if (merged)
+ return;
+ start_end_pairs[future_index].start = nstart;
+ start_end_pairs[future_index].end = nend;
+ future_index++;
+}
+
+/*
+ * Initialize the shadow for existing mappings during kernel initialization.
+ * These include kernel text/data sections, NODE_DATA and future ranges
+ * registered while creating other data (e.g. percpu).
+ *
+ * Allocations via memblock can be only done before slab is initialized.
+ */
+void __init kmsan_init_shadow(void)
+{
+ const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+ phys_addr_t p_start, p_end;
+ int nid;
+ u64 i;
+
+ for_each_reserved_mem_range(i, &p_start, &p_end)
+ kmsan_record_future_shadow_range(phys_to_virt(p_start),
+ phys_to_virt(p_end));
+ /* Allocate shadow for .data */
+ kmsan_record_future_shadow_range(_sdata, _edata);
+
+ for_each_online_node(nid)
+ kmsan_record_future_shadow_range(
+ NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
+
+ for (i = 0; i < future_index; i++)
+ kmsan_init_alloc_meta_for_range(
+ (void *)start_end_pairs[i].start,
+ (void *)start_end_pairs[i].end);
+}
+EXPORT_SYMBOL(kmsan_init_shadow);
+
+struct page_pair {
+ struct page *shadow, *origin;
+};
+static struct page_pair held_back[MAX_ORDER] __initdata;
+
+/*
+ * Eager metadata allocation. When the memblock allocator is freeing pages to
+ * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
+ * We store the pointers to the returned blocks of pages in held_back[] grouped
+ * by their order: when kmsan_memblock_free_pages() is called for the first
+ * time with a certain order, it is reserved as a shadow block, for the second
+ * time - as an origin block. On the third time the incoming block receives its
+ * shadow and origin ranges from the previously saved shadow and origin blocks,
+ * after which held_back[order] can be used again.
+ *
+ * At the very end there may be leftover blocks in held_back[]. They are
+ * collected later by kmsan_memblock_discard().
+ */
+bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
+{
+ struct page *shadow, *origin;
+
+ if (!held_back[order].shadow) {
+ held_back[order].shadow = page;
+ return false;
+ }
+ if (!held_back[order].origin) {
+ held_back[order].origin = page;
+ return false;
+ }
+ shadow = held_back[order].shadow;
+ origin = held_back[order].origin;
+ kmsan_setup_meta(page, shadow, origin, order);
+
+ held_back[order].shadow = NULL;
+ held_back[order].origin = NULL;
+ return true;
+}
+
+#define MAX_BLOCKS 8
+struct smallstack {
+ struct page *items[MAX_BLOCKS];
+ int index;
+ int order;
+};
+
+struct smallstack collect = {
+ .index = 0,
+ .order = MAX_ORDER,
+};
+
+static void smallstack_push(struct smallstack *stack, struct page *pages)
+{
+ KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
+ stack->items[stack->index] = pages;
+ stack->index++;
+}
+#undef MAX_BLOCKS
+
+static struct page *smallstack_pop(struct smallstack *stack)
+{
+ struct page *ret;
+
+ KMSAN_WARN_ON(stack->index == 0);
+ stack->index--;
+ ret = stack->items[stack->index];
+ stack->items[stack->index] = NULL;
+ return ret;
+}
+
+static void do_collection(void)
+{
+ struct page *page, *shadow, *origin;
+
+ while (collect.index >= 3) {
+ page = smallstack_pop(&collect);
+ shadow = smallstack_pop(&collect);
+ origin = smallstack_pop(&collect);
+ kmsan_setup_meta(page, shadow, origin, collect.order);
+ __free_pages_core(page, collect.order);
+ }
+}
+
+static void collect_split(void)
+{
+ struct smallstack tmp = {
+ .order = collect.order - 1,
+ .index = 0,
+ };
+ struct page *page;
+
+ if (!collect.order)
+ return;
+ while (collect.index) {
+ page = smallstack_pop(&collect);
+ smallstack_push(&tmp, &page[0]);
+ smallstack_push(&tmp, &page[1 << tmp.order]);
+ }
+ __memcpy(&collect, &tmp, sizeof(struct smallstack));
+}
+
+/*
+ * Memblock is about to go away. Split the page blocks left over in held_back[]
+ * and return 1/3 of that memory to the system.
+ */
+static void kmsan_memblock_discard(void)
+{
+ int i;
+
+ /*
+ * For each order=N:
+ * - push held_back[N].shadow and .origin to |collect|;
+ * - while there are >= 3 elements in |collect|, do garbage collection:
+ * - pop 3 ranges from |collect|;
+ * - use two of them as shadow and origin for the third one;
+ * - repeat;
+ * - split each remaining element from |collect| into 2 ranges of
+ * order=N-1,
+ * - repeat.
+ */
+ collect.order = MAX_ORDER - 1;
+ for (i = MAX_ORDER - 1; i >= 0; i--) {
+ if (held_back[i].shadow)
+ smallstack_push(&collect, held_back[i].shadow);
+ if (held_back[i].origin)
+ smallstack_push(&collect, held_back[i].origin);
+ held_back[i].shadow = NULL;
+ held_back[i].origin = NULL;
+ do_collection();
+ collect_split();
+ }
+}
+
+void __init kmsan_init_runtime(void)
+{
+ /* Assuming current is init_task */
+ kmsan_internal_task_create(current);
+ kmsan_memblock_discard();
+ pr_info("Starting KernelMemorySanitizer\n");
+ pr_info("ATTENTION: KMSAN is a debugging tool! Do not use it on production machines!\n");
+ kmsan_enabled = true;
+}
+EXPORT_SYMBOL(kmsan_init_runtime);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index a1b5900ffd97b..059f21c39ec1b 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -66,6 +66,7 @@ struct shadow_origin_ptr {
struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
bool store);
void *kmsan_get_metadata(void *addr, bool is_origin);
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end);

enum kmsan_bug_reason {
REASON_ANY,
@@ -181,5 +182,7 @@ bool kmsan_internal_is_module_addr(void *vaddr);
bool kmsan_internal_is_vmalloc_addr(void *addr);

struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order);

#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 8fe6a5ed05e67..99cb9436eddc6 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -298,3 +298,39 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
kfree(s_pages);
kfree(o_pages);
}
+
+/* Allocate metadata for pages allocated at boot time. */
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
+{
+ struct page *shadow_p, *origin_p;
+ void *shadow, *origin;
+ struct page *page;
+ u64 addr, size;
+
+ start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
+ size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
+ shadow = memblock_alloc(size, PAGE_SIZE);
+ origin = memblock_alloc(size, PAGE_SIZE);
+ for (addr = 0; addr < size; addr += PAGE_SIZE) {
+ page = virt_to_page_or_null((char *)start + addr);
+ shadow_p = virt_to_page_or_null((char *)shadow + addr);
+ set_no_shadow_origin_page(shadow_p);
+ shadow_page_for(page) = shadow_p;
+ origin_p = virt_to_page_or_null((char *)origin + addr);
+ set_no_shadow_origin_page(origin_p);
+ origin_page_for(page) = origin_p;
+ }
+}
+
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order)
+{
+ int i;
+
+ for (i = 0; i < (1 << order); i++) {
+ set_no_shadow_origin_page(&shadow[i]);
+ set_no_shadow_origin_page(&origin[i]);
+ shadow_page_for(&page[i]) = &shadow[i];
+ origin_page_for(&page[i]) = &origin[i];
+ }
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98a066c0a9f63..4237b7290e619 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1751,6 +1751,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
{
if (early_page_uninitialised(pfn))
return;
+ if (!kmsan_memblock_free_pages(page, order))
+ /* KMSAN will take care of these pages. */
+ return;
__free_pages_core(page, order);
}

--
2.35.1.1021.g381101b075-goog

2022-03-29 18:08:39

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 02/48] stackdepot: reserve 5 extra bits in depot_stack_handle_t

Some users (currently only KMSAN) may want to use spare bits in
depot_stack_handle_t. Let them do so by adding @extra_bits to
__stack_depot_save() to store arbitrary flags, and providing
stack_depot_get_extra_bits() to retrieve those flags.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
---
include/linux/stackdepot.h | 8 ++++++++
lib/stackdepot.c | 29 ++++++++++++++++++++++++-----
2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 17f992fe6355b..fd641d266bead 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -14,9 +14,15 @@
#include <linux/gfp.h>

typedef u32 depot_stack_handle_t;
+/*
+ * Number of bits in the handle that stack depot doesn't use. Users may store
+ * information in them.
+ */
+#define STACK_DEPOT_EXTRA_BITS 5

depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t gfp_flags, bool can_alloc);

/*
@@ -41,6 +47,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces);

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af05009..6dc11a3b7b88e 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -42,7 +42,8 @@
#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
- STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
+ STACK_ALLOC_NULL_PROTECTION_BITS - \
+ STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
#define STACK_ALLOC_SLABS_CAP 8192
#define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -55,6 +56,7 @@ union handle_parts {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
+ u32 extra : STACK_DEPOT_EXTRA_BITS;
};
};

@@ -73,6 +75,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
+{
+ union handle_parts parts = { .handle = handle };
+
+ return parts.extra;
+}
+EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -136,6 +146,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
+ stack->handle.extra = 0;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;

@@ -320,6 +331,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
+ * @extra_bits: Flags to store in unused bits of depot_stack_handle_t
* @alloc_flags: Allocation gfp flags
* @can_alloc: Allocate stack slabs (increased chance of failure if false)
*
@@ -331,6 +343,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
* If the stack trace in @entries is from an interrupt, only the portion up to
* interrupt entry is saved.
*
+ * Additional opaque flags can be passed in @extra_bits, stored in the unused
+ * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
+ * without calling stack_depot_fetch().
+ *
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
@@ -340,10 +356,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t alloc_flags, bool can_alloc)
{
struct stack_record *found = NULL, **bucket;
- depot_stack_handle_t retval = 0;
+ union handle_parts retval = { .handle = 0 };
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
@@ -427,9 +444,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
if (found)
- retval = found->handle.handle;
+ retval.handle = found->handle.handle;
fast_exit:
- return retval;
+ retval.extra = extra_bits;
+
+ return retval.handle;
}
EXPORT_SYMBOL_GPL(__stack_depot_save);

@@ -449,6 +468,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
--
2.35.1.1021.g381101b075-goog

2022-03-29 20:11:07

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 37/48] security: kmsan: fix interoperability with auto-initialization

Heap and stack initialization is great, but not when we are trying
uses of uninitialized memory. When the kernel is built with KMSAN,
having kernel memory initialization enabled may introduce false
negatives.

We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
under CONFIG_KMSAN, making it impossible to auto-initialize stack
variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
auto-initialization.

We however still let the users enable heap auto-initialization at
boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
a warning is printed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
---
mm/page_alloc.c | 4 ++++
security/Kconfig.hardening | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4237b7290e619..ef0906296c57f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -868,6 +868,10 @@ void init_mem_debugging_and_hardening(void)
else
static_branch_disable(&init_on_free);

+ if (IS_ENABLED(CONFIG_KMSAN) &&
+ (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
+ pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
+
#ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index d051f8ceefddd..bd13a46024457 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -106,6 +106,7 @@ choice
config INIT_STACK_ALL_PATTERN
bool "pattern-init everything (strongest)"
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a specific debug value. This is intended to eliminate
@@ -124,6 +125,7 @@ choice
config INIT_STACK_ALL_ZERO
bool "zero-init everything (strongest and safest)"
depends on CC_HAS_AUTO_VAR_INIT_ZERO
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a zero value. This is intended to eliminate all
@@ -208,6 +210,7 @@ config STACKLEAK_RUNTIME_DISABLE

config INIT_ON_ALLOC_DEFAULT_ON
bool "Enable heap memory zeroing on allocation by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_alloc=1" on the kernel
command line. This can be disabled with "init_on_alloc=0".
@@ -220,6 +223,7 @@ config INIT_ON_ALLOC_DEFAULT_ON

config INIT_ON_FREE_DEFAULT_ON
bool "Enable heap memory zeroing on free by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_free=1" on the kernel
command line. This can be disabled with "init_on_free=0".
--
2.35.1.1021.g381101b075-goog

2022-03-29 20:20:08

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 32/48] kmsan: disable strscpy() optimization under KMSAN

Disable the efficient 8-byte reading under KMSAN to avoid false positives.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Iffd8336965e88fce915db2e6a9d6524422975f69
---
lib/string.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 485777c9da832..4ece4c7e7831b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -197,6 +197,14 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
#endif

+ /*
+ * read_word_at_a_time() below may read uninitialized bytes after the
+ * trailing zero and use them in comparisons. Disable this optimization
+ * under KMSAN to prevent false positive reports.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ max = 0;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;

--
2.35.1.1021.g381101b075-goog

2022-03-29 20:21:20

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 21/48] instrumented.h: add KMSAN support

To avoid false positives, KMSAN needs to unpoison the data copied from
the userspace. To detect infoleaks - check the memory buffer passed to
copy_to_user().

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move implementation of kmsan_copy_to_user() here

Link: https://linux-review.googlesource.com/id/I43e93b9c02709e6be8d222342f1b044ac8bdbaaf
---
include/linux/instrumented.h | 5 ++++-
include/linux/kmsan-checks.h | 19 ++++++++++++++++++
mm/kmsan/hooks.c | 38 ++++++++++++++++++++++++++++++++++++
3 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index ee8f7d17d34f5..c73c1b19e9227 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -2,7 +2,7 @@

/*
* This header provides generic wrappers for memory access instrumentation that
- * the compiler cannot emit for: KASAN, KCSAN.
+ * the compiler cannot emit for: KASAN, KCSAN, KMSAN.
*/
#ifndef _LINUX_INSTRUMENTED_H
#define _LINUX_INSTRUMENTED_H
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>
+#include <linux/kmsan-checks.h>
#include <linux/types.h>

/**
@@ -117,6 +118,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
{
kasan_check_read(from, n);
kcsan_check_read(from, n);
+ kmsan_copy_to_user(to, from, n, 0);
}

/**
@@ -151,6 +153,7 @@ static __always_inline void
instrument_copy_from_user_after(const void *to, const void __user *from,
unsigned long n, unsigned long left)
{
+ kmsan_unpoison_memory(to, n - left);
}

#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
index ecd8336190fc0..aabaf1ba7c251 100644
--- a/include/linux/kmsan-checks.h
+++ b/include/linux/kmsan-checks.h
@@ -84,6 +84,21 @@ void kmsan_unpoison_memory(const void *address, size_t size);
*/
void kmsan_check_memory(const void *address, size_t size);

+/**
+ * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace.
+ * @to: destination address in the userspace.
+ * @from: source address in the kernel.
+ * @to_copy: number of bytes to copy.
+ * @left: number of bytes not copied.
+ *
+ * If this is a real userspace data transfer, KMSAN checks the bytes that were
+ * actually copied to ensure there was no information leak. If @to belongs to
+ * the kernel space (which is possible for compat syscalls), KMSAN just copies
+ * the metadata.
+ */
+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+ size_t left);
+
#else

#define kmsan_init(value) (value)
@@ -98,6 +113,10 @@ static inline void kmsan_unpoison_memory(const void *address, size_t size)
static inline void kmsan_check_memory(const void *address, size_t size)
{
}
+static inline void kmsan_copy_to_user(void __user *to, const void *from,
+ size_t to_copy, size_t left)
+{
+}

#endif

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index a13e15ef2bfd5..365eedcb08953 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -212,6 +212,44 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
}
EXPORT_SYMBOL(kmsan_iounmap_page_range);

+void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
+ size_t left)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * At this point we've copied the memory already. It's hard to check it
+ * before copying, as the size of actually copied buffer is unknown.
+ */
+
+ /* copy_to_user() may copy zero bytes. No need to check. */
+ if (!to_copy)
+ return;
+ /* Or maybe copy_to_user() failed to copy anything. */
+ if (to_copy <= left)
+ return;
+
+ ua_flags = user_access_save();
+ if ((u64)to < TASK_SIZE) {
+ /* This is a user memory access, check it. */
+ kmsan_internal_check_memory((void *)from, to_copy - left, to,
+ REASON_COPY_TO_USER);
+ user_access_restore(ua_flags);
+ return;
+ }
+ /* Otherwise this is a kernel memory access. This happens when a compat
+ * syscall passes an argument allocated on the kernel stack to a real
+ * syscall.
+ * Don't check anything, just copy the shadow of the copied bytes.
+ */
+ kmsan_internal_memmove_metadata((void *)to, (void *)from,
+ to_copy - left);
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_copy_to_user);
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
--
2.35.1.1021.g381101b075-goog

2022-03-29 21:11:42

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 12/48] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

kcov used to be broken prior to Clang 11, but right now that version is
already the minimum required to build with KCSAN, because no prior
compiler has "-tsan-distinguish-volatile=1".

Therefore KCSAN_KCOV_BROKEN is not needed anymore.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ida287421577f37de337139b5b5b9e977e4a6fee2
---
lib/Kconfig.kcsan | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
index 63b70b8c55519..de022445fbba5 100644
--- a/lib/Kconfig.kcsan
+++ b/lib/Kconfig.kcsan
@@ -10,21 +10,10 @@ config HAVE_KCSAN_COMPILER
For the list of compilers that support KCSAN, please see
<file:Documentation/dev-tools/kcsan.rst>.

-config KCSAN_KCOV_BROKEN
- def_bool KCOV && CC_HAS_SANCOV_TRACE_PC
- depends on CC_IS_CLANG
- depends on !$(cc-option,-Werror=unused-command-line-argument -fsanitize=thread -fsanitize-coverage=trace-pc)
- help
- Some versions of clang support either KCSAN and KCOV but not the
- combination of the two.
- See https://bugs.llvm.org/show_bug.cgi?id=45831 for the status
- in newer releases.
-
menuconfig KCSAN
bool "KCSAN: dynamic data race detector"
depends on HAVE_ARCH_KCSAN && HAVE_KCSAN_COMPILER
depends on DEBUG_KERNEL && !KASAN
- depends on !KCSAN_KCOV_BROKEN
select STACKTRACE
help
The Kernel Concurrency Sanitizer (KCSAN) is a dynamic
--
2.35.1.1021.g381101b075-goog

2022-03-29 21:23:44

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 36/48] kmsan: kcov: unpoison area->list in kcov_remote_area_put()

KMSAN does not instrument kernel/kcov.c for performance reasons (with
CONFIG_KCOV=y virtually every place in the kernel invokes kcov
instrumentation). Therefore the tool may miss writes from kcov.c that
initialize memory.

When CONFIG_DEBUG_LIST is enabled, list pointers from kernel/kcov.c are
passed to instrumented helpers in lib/list_debug.c, resulting in false
positives.

To work around these reports, we unpoison the contents of area->list after
initializing it.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ie17f2ee47a7af58f5cdf716d585ebf0769348a5a
---
kernel/kcov.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index 36ca640c4f8e7..88ffdddc99ba1 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/hashtable.h>
#include <linux/init.h>
+#include <linux/kmsan-checks.h>
#include <linux/mm.h>
#include <linux/preempt.h>
#include <linux/printk.h>
@@ -152,6 +153,12 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
INIT_LIST_HEAD(&area->list);
area->size = size;
list_add(&area->list, &kcov_remote_areas);
+ /*
+ * KMSAN doesn't instrument this file, so it may not know area->list
+ * is initialized. Unpoison it explicitly to avoid reports in
+ * kcov_remote_area_get().
+ */
+ kmsan_unpoison_memory(&area->list, sizeof(struct list_head));
}

static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
--
2.35.1.1021.g381101b075-goog

2022-03-29 21:34:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 16/48] MAINTAINERS: add entry for KMSAN

Add entry for KMSAN maintainers/reviewers.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
---
MAINTAINERS | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index cd0f68d4a34a6..4053523a1e890 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10721,6 +10721,18 @@ F: kernel/kmod.c
F: lib/test_kmod.c
F: tools/testing/selftests/kmod/

+KMSAN
+M: Alexander Potapenko <[email protected]>
+R: Marco Elver <[email protected]>
+R: Dmitry Vyukov <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/dev-tools/kmsan.rst
+F: include/linux/kmsan*.h
+F: lib/Kconfig.kmsan
+F: mm/kmsan/
+F: scripts/Makefile.kmsan
+
KPROBES
M: Naveen N. Rao <[email protected]>
M: Anil S Keshavamurthy <[email protected]>
--
2.35.1.1021.g381101b075-goog

2022-03-29 21:59:20

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 10/48] x86: kmsan: pgtable: reduce vmalloc space

KMSAN is going to use 3/4 of existing vmalloc space to hold the
metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
allocate past the first 1/4.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- added x86: to the title

Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
---
arch/x86/include/asm/pgtable_64_types.h | 41 ++++++++++++++++++++++++-
arch/x86/mm/init_64.c | 2 +-
2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 91ac106545703..7f15d43754a34 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -139,7 +139,46 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */

-#define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+#define VMEMORY_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+
+#ifndef CONFIG_KMSAN
+#define VMALLOC_END VMEMORY_END
+#else
+/*
+ * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
+ * are used to keep the metadata for virtual pages. The memory formerly
+ * belonging to vmalloc area is now laid out as follows:
+ *
+ * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
+ * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
+ * VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
+ * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
+ * VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
+ * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
+ * - shadow for modules,
+ * KMSAN_MODULES_ORIGIN_START to
+ * KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
+ */
+#define VMALLOC_QUARTER_SIZE ((VMALLOC_SIZE_TB << 40) >> 2)
+#define VMALLOC_END (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
+
+/*
+ * vmalloc metadata addresses are calculated by adding shadow/origin offsets
+ * to vmalloc address.
+ */
+#define KMSAN_VMALLOC_SHADOW_OFFSET VMALLOC_QUARTER_SIZE
+#define KMSAN_VMALLOC_ORIGIN_OFFSET (VMALLOC_QUARTER_SIZE << 1)
+
+#define KMSAN_VMALLOC_SHADOW_START (VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
+#define KMSAN_VMALLOC_ORIGIN_START (VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
+
+/*
+ * The shadow/origin for modules are placed one by one in the last 1/4 of
+ * vmalloc space.
+ */
+#define KMSAN_MODULES_SHADOW_START (VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
+#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
+#endif /* CONFIG_KMSAN */

#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 96d34ebb20a9e..fcea37beb3911 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
unsigned long addr;
const char *lvl;

- for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+ for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
pgd_t *pgd = pgd_offset_k(addr);
p4d_t *p4d;
pud_t *pud;
--
2.35.1.1021.g381101b075-goog

2022-03-29 22:19:49

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 09/48] kmsan: mark noinstr as __no_sanitize_memory

noinstr functions should never be instrumented, so make KMSAN skip them
by applying the __no_sanitize_memory attribute.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- moved this patch earlier in the series per Mark Rutland's request

Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
---
include/linux/compiler_types.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 3c1795fdb5686..286675559cbba 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -221,7 +221,8 @@ struct ftrace_likely_data {
/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
- __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+ __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
+ __no_sanitize_memory

#endif /* __KERNEL__ */

--
2.35.1.1021.g381101b075-goog

2022-03-29 22:30:21

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 07/48] kmsan: add ReST documentation

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- added a note that KMSAN is not intended for production use

Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 414 ++++++++++++++++++++++++++++++
2 files changed, 415 insertions(+)
create mode 100644 Documentation/dev-tools/kmsan.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 4621eac290f46..6b0663075dc04 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
kcov
gcov
kasan
+ kmsan
ubsan
kmemleak
kcsan
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 0000000000000..e116889da79d5
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,414 @@
+=============================
+KernelMemorySanitizer (KMSAN)
+=============================
+
+KMSAN is a dynamic error detector aimed at finding uses of uninitialized
+values. It is based on compiler instrumentation, and is quite similar to the
+userspace `MemorySanitizer tool`_.
+
+An important note is that KMSAN is not intended for production use, because it
+drastically increases kernel memory footprint and slows the whole system down.
+
+Example report
+==============
+
+Here is an example of a KMSAN report::
+
+ =====================================================
+ BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
+ test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Uninit was stored to memory at:
+ do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Local variable uninit created at:
+ do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+
+ Bytes 4-7 of 8 are uninitialized
+ Memory access of size 8 starts at ffff888083fe3da0
+
+ CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
+ =====================================================
+
+
+The report says that the local variable ``uninit`` was created uninitialized in
+``do_uninit_local_array()``. The lower stack trace corresponds to the place
+where this variable was created.
+
+The upper stack shows where the uninit value was used - in
+``test_uninit_kmsan_check_memory()``. The tool shows the bytes which were left
+uninitialized in the local variable, as well as the stack where the value was
+copied to another memory location before use.
+
+Please note that KMSAN only reports an error when an uninitialized value is
+actually used (e.g. in a condition or pointer dereference). A lot of
+uninitialized values in the kernel are never used, and reporting them would
+result in too many false positives.
+
+KMSAN and Clang
+===============
+
+In order for KMSAN to work the kernel must be built with Clang, which so far is
+the only compiler that has KMSAN support. The kernel instrumentation pass is
+based on the userspace `MemorySanitizer tool`_.
+
+How to build
+============
+
+In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
+Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
+
+Now configure and build the kernel with CONFIG_KMSAN enabled.
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a metadata byte (also called shadow byte) with every byte of
+kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
+kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
+setting its shadow bytes to ``0xff``) is called poisoning, marking it
+initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
+
+When a new variable is allocated on the stack, it is poisoned by default by
+instrumentation code inserted by the compiler (unless it is a stack variable
+that is immediately initialized). Any new heap allocation done without
+``__GFP_ZERO`` is also poisoned.
+
+Compiler instrumentation also tracks the shadow values with the help from the
+runtime library in ``mm/kmsan/``.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length. When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+ int a = 0xff; // i.e. 0x000000ff
+ int b;
+ int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin assigned to
+them. This origin describes the point in program execution at which the
+uninitialized value was created. Every origin is associated with either the
+full allocation stack (for heap-allocated memory), or the function containing
+the uninitialized variable (for locals).
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value.
+When a value is read from memory, its origin is also read and kept together
+with the shadow. For every instruction that takes one or more values the origin
+of the result is one of the origins corresponding to any of the uninitialized
+inputs. If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+ int a = 42;
+ int b;
+ int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk. In this case every write to either variable updates the
+origin for all of them. We have to sacrifice precision in this case, because
+storing origins for individual bits (and even bytes) would be too costly.
+
+Example 2::
+
+ int combine(short a, short b) {
+ union ret_t {
+ int i;
+ short s[2];
+ } ret;
+ ret.s[0] = a;
+ ret.s[1] = b;
+ return ret.i;
+ }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will be never used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+
+To ease debugging, KMSAN creates a new origin for every store of an
+uninitialized value to memory. The new origin references both its creation stack
+and the previous origin the value had. This may cause increased memory
+consumption, so we limit the length of origin chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/instrumentation.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+ typedef struct {
+ void *shadow, *origin;
+ } shadow_origin_ptr_t
+
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
+
+The function name depends on the memory access size.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory. When a value is stored to memory, its shadow and
+origin are also stored using the metadata pointers.
+
+Origin tracking
+~~~~~~~~~~~~~~~
+
+A special function is used to create a new origin value for a local variable and
+set the origin of that variable to that value::
+
+ void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+ kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+ struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+ };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+ void *__msan_memcpy(void *dst, void *src, uintptr_t n)
+ void *__msan_memmove(void *dst, void *src, uintptr_t n)
+ void *__msan_memset(void *dst, int c, uintptr_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each pointer dereference and each condition the compiler emits a shadow
+check that calls ``__msan_warning()`` in the case a poisoned value is being
+used::
+
+ void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+ void __msan_instrument_asm_store(void *addr, uintptr_t size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly do not point to valid memory.
+In such cases they are ignored at runtime.
+
+Disabling the instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
+ignore uninitialized values in that function and mark its output as initialized.
+As a result, the user will not get KMSAN reports related to that function.
+
+Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
+Applying this attribute to a function will result in KMSAN not instrumenting it,
+which can be helpful if we do not want the compiler to mess up some low-level
+code (e.g. that marked with ``noinstr``).
+
+This however comes at a cost: stack allocations from such functions will have
+incorrect shadow/origin values, likely leading to false positives. Functions
+called from non-instrumented code may also receive incorrect metadata for their
+parameters.
+
+As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
+
+It is also possible to disable KMSAN for a single file (e.g. main.o)::
+
+ KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+ KMSAN_SANITIZE := n
+
+in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
+function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
+their code gets broken by KMSAN (e.g. runs at early boot time).
+
+Runtime library
+---------------
+
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+ struct kmsan_context {
+ ...
+ bool allow_reporting;
+ struct kmsan_context_state cstate;
+ ...
+ }
+
+ struct task_struct {
+ ...
+ struct kmsan_context kmsan;
+ ...
+ }
+
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+ DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+ struct page {
+ ...
+ struct page *shadow, *origin;
+ ...
+ };
+
+At boot-time, the kernel allocates shadow and origin pages for every available
+kernel page. This is done quite late, when the kernel address space is already
+fragmented, so normal data pages may arbitrarily interleave with the metadata
+pages.
+
+This means that in general for two contiguous memory pages their shadow/origin
+pages may not be contiguous. So, if a memory access crosses the boundary
+of a memory block, accesses to shadow/origin memory may potentially corrupt
+other pages or read incorrect values from them.
+
+In practice, contiguous memory pages returned by the same ``alloc_pages()``
+call will have contiguous metadata, whereas if these pages belong to two
+different allocations their metadata pages can be fragmented.
+
+For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
+there also are no guarantees on metadata contiguity.
+
+In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
+pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
+
+ char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+ char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+2. For vmalloc memory and modules, there is a direct mapping between the memory
+range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+3. For CPU entry area there are separate per-CPU arrays that hold its
+metadata::
+
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);
+
+When calculating shadow and origin addresses for a given memory address, KMSAN
+checks whether the address belongs to the physical page range, the virtual page
+range or CPU entry area.
+
+Handling ``pt_regs``
+~~~~~~~~~~~~~~~~~~~~
+
+Many functions receive a ``struct pt_regs`` holding the register state at a
+certain point. Registers do not have (easily calculatable) shadow or origin
+associated with them, so we assume they are always initialized.
+
+References
+==========
+
+E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
+memory use in C++
+<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
+In Proceedings of CGO 2015.
+
+.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
+.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
--
2.35.1.1021.g381101b075-goog

2022-03-29 23:15:28

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 18/48] kmsan: mm: call KMSAN hooks from SLUB code

In order to report uninitialized memory coming from heap allocations
KMSAN has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move the implementation of SLUB hooks here

Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
---
include/linux/kmsan.h | 57 ++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 80 +++++++++++++++++++++++++++++++++++++++++++
mm/slab.h | 1 +
mm/slub.c | 21 ++++++++++--
4 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index da41850b46cbd..ed3630068e2ef 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -16,6 +16,7 @@
#include <linux/vmalloc.h>

struct page;
+struct kmem_cache;

#ifdef CONFIG_KMSAN

@@ -73,6 +74,44 @@ void kmsan_free_page(struct page *page, unsigned int order);
*/
void kmsan_copy_page_meta(struct page *dst, struct page *src);

+/**
+ * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
+ * newly created object, marking it as initialized or uninitialized.
+ */
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
+
+/**
+ * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ *
+ * KMSAN marks the freed object as uninitialized.
+ */
+void kmsan_slab_free(struct kmem_cache *s, void *object);
+
+/**
+ * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
+ * @ptr: object pointer.
+ * @size: object size.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Similar to kmsan_slab_alloc(), but for large allocations.
+ */
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+
+/**
+ * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
+ * @ptr: object pointer.
+ *
+ * Similar to kmsan_slab_free(), but for large allocations.
+ */
+void kmsan_kfree_large(const void *ptr);
+
/**
* kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
* @start: start of vmapped range.
@@ -139,6 +178,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
{
}

+static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+}
+
+static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_kfree_large(const void *ptr)
+{
+}
+
static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
unsigned long end,
pgprot_t prot,
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 5d886df57adca..e7c3ff48ed5cd 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,86 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
+{
+ if (unlikely(object == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * There's a ctor or this is an RCU cache - do nothing. The memory
+ * status hasn't changed since last use.
+ */
+ if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
+ return;
+
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory(object, s->object_size,
+ KMSAN_POISON_CHECK);
+ else
+ kmsan_internal_poison_memory(object, s->object_size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_alloc);
+
+void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ /* RCU slabs could be legally used after free within the RCU period */
+ if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+ return;
+ /*
+ * If there's a constructor, freed memory must remain in the same state
+ * until the next allocation. We cannot save its state to detect
+ * use-after-free bugs, instead we just keep it unpoisoned.
+ */
+ if (s->ctor)
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_free);
+
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+{
+ if (unlikely(ptr == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory((void *)ptr, size,
+ /*checked*/ true);
+ else
+ kmsan_internal_poison_memory((void *)ptr, size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kmalloc_large);
+
+void kmsan_kfree_large(const void *ptr)
+{
+ struct page *page;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ page = virt_to_head_page((void *)ptr);
+ KMSAN_WARN_ON(ptr != page_address(page));
+ kmsan_internal_poison_memory((void *)ptr,
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kfree_large);
+
static unsigned long vmalloc_shadow(unsigned long addr)
{
return (unsigned long)kmsan_get_metadata((void *)addr,
diff --git a/mm/slab.h b/mm/slab.h
index c7f2abc2b154c..c2538d856ec45 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -734,6 +734,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
memset(p[i], 0, s->object_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
+ kmsan_slab_alloc(s, p[i], flags);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43e..9b266f6b384b9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -22,6 +22,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
@@ -357,18 +358,28 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
prefetchw(object + s->offset);
}

+/*
+ * When running under KMSAN, get_freepointer_safe() may return an uninitialized
+ * pointer value in the case the current thread loses the race for the next
+ * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
+ * slab_alloc_node() will fail, so the uninitialized value won't be used, but
+ * KMSAN will still check all arguments of cmpxchg because of imperfect
+ * handling of inline assembly.
+ * To work around this problem, use kmsan_init() to force initialize the
+ * return value of get_freepointer_safe().
+ */
static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
{
unsigned long freepointer_addr;
void *p;

if (!debug_pagealloc_enabled_static())
- return get_freepointer(s, object);
+ return kmsan_init(get_freepointer(s, object));

object = kasan_reset_tag(object);
freepointer_addr = (unsigned long)object + s->offset;
copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
- return freelist_ptr(s, p, freepointer_addr);
+ return kmsan_init(freelist_ptr(s, p, freepointer_addr));
}

static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
@@ -1683,6 +1694,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
ptr = kasan_kmalloc_large(ptr, size, flags);
/* As ptr might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ptr, size, 1, flags);
+ kmsan_kmalloc_large(ptr, size, flags);
return ptr;
}

@@ -1690,12 +1702,14 @@ static __always_inline void kfree_hook(void *x)
{
kmemleak_free(x);
kasan_kfree_large(x);
+ kmsan_kfree_large(x);
}

static __always_inline bool slab_free_hook(struct kmem_cache *s,
void *x, bool init)
{
kmemleak_free_recursive(x, s->flags);
+ kmsan_slab_free(s, x);

debug_check_no_locks_freed(x, s->object_size);

@@ -3729,6 +3743,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
*/
slab_post_alloc_hook(s, objcg, flags, size, p,
slab_want_init_on_alloc(flags, s));
+
return i;
error:
slub_put_cpu_ptr(s->cpu_slab);
@@ -5910,6 +5925,7 @@ static char *create_unique_id(struct kmem_cache *s)
p += sprintf(p, "%07u", s->size);

BUG_ON(p > name + ID_STR_LENGTH - 1);
+ kmsan_unpoison_memory(name, p - name);
return name;
}

@@ -6011,6 +6027,7 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
al->name = name;
al->next = alias_list;
alias_list = al;
+ kmsan_unpoison_memory(al, sizeof(struct saved_alias));
return 0;
}

--
2.35.1.1021.g381101b075-goog

2022-03-30 00:22:50

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 22/48] kmsan: unpoison @tlb in arch_tlb_gather_mmu()

This is a hack to reduce stackdepot pressure.

struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
int value. The remaining 25 bits remain uninitialized and are never used,
but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
thus creating very long origin chains. This is technically correct, but
consumes too much memory.

Unpoisoning the whole structure will prevent creating such chains.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I76abee411b8323acfdbc29bc3a60dca8cff2de77
---
mm/mmu_gather.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index afb7185ffdc45..2f3821268b311 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -1,6 +1,7 @@
#include <linux/gfp.h>
#include <linux/highmem.h>
#include <linux/kernel.h>
+#include <linux/kmsan-checks.h>
#include <linux/mmdebug.h>
#include <linux/mm_types.h>
#include <linux/mm_inline.h>
@@ -253,6 +254,15 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
bool fullmm)
{
+ /*
+ * struct mmu_gather contains 7 1-bit fields packed into a 32-bit
+ * unsigned int value. The remaining 25 bits remain uninitialized
+ * and are never used, but KMSAN updates the origin for them in
+ * zap_pXX_range() in mm/memory.c, thus creating very long origin
+ * chains. This is technically correct, but consumes too much memory.
+ * Unpoisoning the whole structure will prevent creating such chains.
+ */
+ kmsan_unpoison_memory(tlb, sizeof(*tlb));
tlb->mm = mm;
tlb->fullmm = fullmm;

--
2.35.1.1021.g381101b075-goog

2022-03-30 01:01:42

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 31/48] kernel: kmsan: don't instrument stacktrace.c

When unwinding stack traces, the kernel may pick uninitialized data from
the stack. To avoid false reports on that data, we do not instrument
stacktrace.c

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Iadb72036ff6868b1d7c9f1ed6630a66be6c57a42
---
kernel/Makefile | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/Makefile b/kernel/Makefile
index 80f6cfb60c020..1147f0bd6e022 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -40,6 +40,11 @@ KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
UBSAN_SANITIZE_kcov.o := n
KMSAN_SANITIZE_kcov.o := n
+
+# Code in stactrace.c may branch on random values taken from the stack.
+# Prevent KMSAN false positives by not instrumenting this file.
+KMSAN_SANITIZE_stacktrace.o := n
+
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector

# Don't instrument error handlers
--
2.35.1.1021.g381101b075-goog

2022-03-30 02:37:56

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 47/48] x86: kmsan: handle register passing from uninstrumented code

Replace instrumentation_begin() with instrumentation_begin_with_regs()
to let KMSAN handle the non-instrumented code and unpoison pt_regs
passed from the instrumented part. This is done to reduce the number of
false positive reports.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- this patch was previously called "x86: kmsan: handle register
passing from uninstrumented code". Instead of adding KMSAN-specific
code to every instrumentation_begin()/instrumentation_end() section,
we changed instrumentation_begin() to
instrumentation_begin_with_regs() where applicable.

Link: https://linux-review.googlesource.com/id/I435ec076cd21752c2f877f5da81f5eced62a2ea4
---
arch/x86/entry/common.c | 3 ++-
arch/x86/include/asm/idtentry.h | 10 +++++-----
arch/x86/kernel/cpu/mce/core.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/nmi.c | 2 +-
arch/x86/kernel/sev.c | 4 ++--
arch/x86/kernel/traps.c | 14 +++++++-------
arch/x86/mm/fault.c | 2 +-
8 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b337..047d157987859 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/errno.h>
+#include <linux/kmsan.h>
#include <linux/ptrace.h>
#include <linux/export.h>
#include <linux/nospec.h>
@@ -75,7 +76,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
/* Invalid system call, but still a system call. */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e99025..f24ff33fc3681 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -51,7 +51,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__##func (regs); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -98,7 +98,7 @@ __visible noinstr void func(struct pt_regs *regs, \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__##func (regs, error_code); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -195,7 +195,7 @@ __visible noinstr void func(struct pt_regs *regs, \
irqentry_state_t state = irqentry_enter(regs); \
u32 vector = (u32)(u8)error_code; \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_irq_on_irqstack_cond(__##func, regs, vector); \
instrumentation_end(); \
@@ -235,7 +235,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_sysvec_on_irqstack_cond(__##func, regs); \
instrumentation_end(); \
@@ -262,7 +262,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__irq_enter_raw(); \
kvm_set_cpu_l1tf_flush_l1d(); \
__##func (regs); \
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5818b837fd4d4..7b8c43d8727cc 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1355,7 +1355,7 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba
/* Handle unconfigured int18 (should never happen) */
static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
instrumentation_end();
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index d77481ecb0d5f..eaed9b412908c 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -249,7 +249,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
return false;

state = irqentry_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* If the host managed to inject an async #PF into an interrupt
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4bce802d25fb1..3f987a5dc38c7 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -329,7 +329,7 @@ static noinstr void default_do_nmi(struct pt_regs *regs)

__this_cpu_write(last_nmi_rip, regs->ip);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e6d316a01fdd4..9bfc29fc9c983 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1330,7 +1330,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)

irq_state = irqentry_nmi_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/* Show some debug info */
@@ -1362,7 +1362,7 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
}

irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 8143693a7ea6e..f08741abc0e5b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -229,7 +229,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
/*
* All lies, just get the WARN/BUG out.
*/
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
/*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
@@ -260,7 +260,7 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
return;

state = irqentry_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
handle_invalid_op(regs);
instrumentation_end();
irqentry_exit(regs, state);
@@ -414,7 +414,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
#endif

irqentry_nmi_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);

tsk->thread.error_code = error_code;
@@ -690,14 +690,14 @@ DEFINE_IDTENTRY_RAW(exc_int3)
*/
if (user_mode(regs)) {
irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
do_int3_user(regs);
instrumentation_end();
irqentry_exit_to_user_mode(regs);
} else {
irqentry_state_t irq_state = irqentry_nmi_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
if (!do_int3(regs))
die("int3", regs, 0);
instrumentation_end();
@@ -896,7 +896,7 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
*/
unsigned long dr7 = local_db_save();
irqentry_state_t irq_state = irqentry_nmi_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* If something gets miswired and we end up here for a user mode
@@ -975,7 +975,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
*/

irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* Start the virtual/ptrace DR6 value with just the DR_STEP mask
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2250a32a10ca..676e394f1af5b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1557,7 +1557,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
*/
state = irqentry_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
handle_page_fault(regs, error_code, address);
instrumentation_end();

--
2.35.1.1021.g381101b075-goog

2022-03-30 05:24:08

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 24/48] Input: libps2: mark data received in __ps2_command() as initialized

KMSAN does not know that the device initializes certain bytes in
ps2dev->cmdbuf. Call kmsan_unpoison_memory() to explicitly mark them as
initialized.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I2d26f6baa45271d37320d3f4a528c39cb7e545f0
---
drivers/input/serio/libps2.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 250e213cc80c6..3e19344eda93c 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -12,6 +12,7 @@
#include <linux/sched.h>
#include <linux/interrupt.h>
#include <linux/input.h>
+#include <linux/kmsan-checks.h>
#include <linux/serio.h>
#include <linux/i8042.h>
#include <linux/libps2.h>
@@ -294,9 +295,11 @@ int __ps2_command(struct ps2dev *ps2dev, u8 *param, unsigned int command)

serio_pause_rx(ps2dev->serio);

- if (param)
+ if (param) {
for (i = 0; i < receive; i++)
param[i] = ps2dev->cmdbuf[(receive - 1) - i];
+ kmsan_unpoison_memory(param, receive);
+ }

if (ps2dev->cmdcnt &&
(command != PS2_CMD_RESET_BAT || ps2dev->cmdcnt != 1)) {
--
2.35.1.1021.g381101b075-goog

2022-03-30 06:40:23

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 05/48] x86: asm: instrument usercopy in get_user() and __put_user_size()

Use hooks from instrumented.h to notify bug detection tools about
usercopy events in get_user() and put_user_size().

It's still unclear how to instrument put_user(), which assumes that
instrumentation code doesn't clobber RAX.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
---
arch/x86/include/asm/uaccess.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index ac96f9b2d64b3..e6abe6f27ae99 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -5,6 +5,7 @@
* User space memory access functions
*/
#include <linux/compiler.h>
+#include <linux/instrumented.h>
#include <linux/kasan-checks.h>
#include <linux/string.h>
#include <asm/asm.h>
@@ -126,11 +127,13 @@ extern int __get_user_bad(void);
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
asm volatile("call __" #fn "_%P4" \
: "=a" (__ret_gu), "=r" (__val_gu), \
ASM_CALL_CONSTRAINT \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
+ instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
__builtin_expect(__ret_gu, 0); \
})

@@ -275,7 +278,9 @@ extern void __put_user_nocheck_8(void);

#define __put_user_size(x, ptr, size, label) \
do { \
+ __typeof__(*(ptr)) __pus_val = x; \
__chk_user_ptr(ptr); \
+ instrument_copy_to_user(ptr, &(__pus_val), size); \
switch (size) { \
case 1: \
__put_user_goto(x, ptr, "b", "iq", label); \
@@ -313,6 +318,7 @@ do { \
#define __get_user_size(x, ptr, size, label) \
do { \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, size); \
switch (size) { \
case 1: { \
unsigned char x_u8__; \
@@ -332,6 +338,7 @@ do { \
default: \
(x) = __get_user_bad(); \
} \
+ instrument_copy_from_user_after((void *)&(x), ptr, size, 0); \
} while (0)

#define __get_user_asm(x, addr, itype, ltype, label) \
--
2.35.1.1021.g381101b075-goog

2022-03-30 07:22:27

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 29/48] kmsan: entry: handle register passing from uninstrumented code

Replace instrumentation_begin() with instrumentation_begin_with_regs()
to let KMSAN handle the non-instrumented code and unpoison pt_regs
passed from the instrumented part.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I7f0a9809b66bd85faae43142971d0095771b7a42
---
kernel/entry/common.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bad713684c2e3..dcf91ab14512a 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -21,7 +21,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
CT_WARN_ON(ct_state() != CONTEXT_USER);
user_exit_irqoff();

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();
}
@@ -103,7 +103,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)

__enter_from_user_mode(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
local_irq_enable();
ret = __syscall_enter_from_user_work(regs, syscall);
instrumentation_end();
@@ -114,7 +114,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
{
__enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
local_irq_enable();
instrumentation_end();
}
@@ -296,7 +296,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs)

__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
__syscall_exit_to_user_mode_work(regs);
instrumentation_end();
__exit_to_user_mode();
@@ -309,7 +309,7 @@ noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)

noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
exit_to_user_mode_prepare(regs);
instrumentation_end();
__exit_to_user_mode();
@@ -357,7 +357,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
*/
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();

@@ -372,7 +372,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
* in having another one here.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
rcu_irq_enter_check_tick();
trace_hardirqs_off_finish();
instrumentation_end();
@@ -409,7 +409,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
* and RCU as the return to user mode path.
*/
if (state.exit_rcu) {
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
/* Tell the tracer that IRET will enable interrupts */
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
@@ -419,7 +419,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
return;
}

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
if (IS_ENABLED(CONFIG_PREEMPTION)) {
#ifdef CONFIG_PREEMPT_DYNAMIC
static_call(irqentry_exit_cond_resched)();
@@ -451,7 +451,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
lockdep_hardirq_enter();
rcu_nmi_enter();

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();
@@ -461,7 +461,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)

void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
ftrace_nmi_exit();
if (irq_state.lockdep) {
trace_hardirqs_on_prepare();
--
2.35.1.1021.g381101b075-goog

2022-03-30 07:25:54

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 42/48] x86: kmsan: handle open-coded assembly in lib/iomem.c

KMSAN cannot intercept memory accesses within asm() statements.
That's why we add kmsan_unpoison_memory() and kmsan_check_memory() to
hint it how to handle memory copied from/to I/O memory.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Icb16bf17269087e475debf07a7fe7d4bebc3df23
---
arch/x86/lib/iomem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index df50451d94ef7..2307770f3f4c8 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -1,6 +1,7 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#define movs(type,to,from) \
asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
@@ -37,6 +38,8 @@ void memcpy_fromio(void *to, const volatile void __iomem *from, size_t n)
n-=2;
}
rep_movs(to, (const void *)from, n);
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(to, n);
}
EXPORT_SYMBOL(memcpy_fromio);

@@ -45,6 +48,8 @@ void memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
if (unlikely(!n))
return;

+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(from, n);
/* Align any unaligned destination IO */
if (unlikely(1 & (unsigned long)to)) {
movs("b", to, from);
--
2.35.1.1021.g381101b075-goog

2022-03-30 09:42:37

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 15/48] kmsan: disable instrumentation of unsupported common kernel code

EFI stub cannot be linked with KMSAN runtime, so we disable
instrumentation for it.

Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
caused by instrumentation hooks calling instrumented code again.

This patch was previously part of "kmsan: disable KMSAN instrumentation
for certain kernel parts", but was split away per Mark Rutland's
request.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I41ae706bd3474f074f6a870bfc3f0f90e9c720f7
---
drivers/firmware/efi/libstub/Makefile | 1 +
kernel/Makefile | 1 +
kernel/locking/Makefile | 3 ++-
lib/Makefile | 1 +
4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e9..81432d0c904b1 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -46,6 +46,7 @@ GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

diff --git a/kernel/Makefile b/kernel/Makefile
index 56f4ee97f3284..80f6cfb60c020 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -39,6 +39,7 @@ KCOV_INSTRUMENT_kcov.o := n
KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
UBSAN_SANITIZE_kcov.o := n
+KMSAN_SANITIZE_kcov.o := n
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector

# Don't instrument error handlers
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f382..ea925731fa40f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,8 +5,9 @@ KCOV_INSTRUMENT := n

obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o

-# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
KCSAN_SANITIZE_lockdep.o := n
+KMSAN_SANITIZE_lockdep.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/lib/Makefile b/lib/Makefile
index 300f569c626b0..0ac9b38ec172e 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -269,6 +269,7 @@ obj-$(CONFIG_IRQ_POLL) += irq_poll.o
CFLAGS_stackdepot.o += -fno-builtin
obj-$(CONFIG_STACKDEPOT) += stackdepot.o
KASAN_SANITIZE_stackdepot.o := n
+KMSAN_SANITIZE_stackdepot.o := n
KCOV_INSTRUMENT_stackdepot.o := n

obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
--
2.35.1.1021.g381101b075-goog

2022-03-30 10:01:43

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 26/48] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()

If vring doesn't use the DMA API, KMSAN is unable to tell whether the
memory is initialized by hardware. Explicitly call kmsan_handle_dma()
from vring_map_one_sg() in this case to prevent false positives.

Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>

---
Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
---
drivers/virtio/virtio_ring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 962f1477b1fab..461e08f7f0a0f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
+#include <linux/kmsan-checks.h>
#include <linux/spinlock.h>
#include <xen/xen.h>

@@ -331,8 +332,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
struct scatterlist *sg,
enum dma_data_direction direction)
{
- if (!vq->use_dma_api)
+ if (!vq->use_dma_api) {
+ /*
+ * If DMA is not used, KMSAN doesn't know that the scatterlist
+ * is initialized by the hardware. Explicitly check/unpoison it
+ * depending on the direction.
+ */
+ kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
return (dma_addr_t)sg_phys(sg);
+ }

/*
* We can't use dma_map_sg, because we don't use scatterlists in
--
2.35.1.1021.g381101b075-goog

2022-03-30 10:24:23

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 33/48] crypto: kmsan: disable accelerated configs under KMSAN

KMSAN is unable to understand when initialized values come from assembly.
Disable accelerated configs in KMSAN builds to prevent false positive
reports.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Idb2334bf3a1b68b31b399709baefaa763038cc50
---
crypto/Kconfig | 30 ++++++++++++++++++++++++++++++
drivers/net/Kconfig | 1 +
2 files changed, 31 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 442765219c375..c5e1c86043bbf 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -290,6 +290,7 @@ config CRYPTO_CURVE25519
config CRYPTO_CURVE25519_X86
tristate "x86_64 accelerated Curve25519 scalar multiplication library"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519

@@ -338,11 +339,13 @@ config CRYPTO_AEGIS128
config CRYPTO_AEGIS128_SIMD
bool "Support SIMD acceleration for AEGIS-128"
depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
+ depends on !KMSAN # avoid false positives from assembly
default y

config CRYPTO_AEGIS128_AESNI_SSE2
tristate "AEGIS-128 AEAD algorithm (x86_64 AESNI+SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_SIMD
help
@@ -478,6 +481,7 @@ config CRYPTO_NHPOLY1305
config CRYPTO_NHPOLY1305_SSE2
tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
SSE2 optimized implementation of the hash function used by the
@@ -486,6 +490,7 @@ config CRYPTO_NHPOLY1305_SSE2
config CRYPTO_NHPOLY1305_AVX2
tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
AVX2 optimized implementation of the hash function used by the
@@ -599,6 +604,7 @@ config CRYPTO_CRC32C
config CRYPTO_CRC32C_INTEL
tristate "CRC32c INTEL hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
In Intel processor with SSE4.2 supported, the processor will
@@ -639,6 +645,7 @@ config CRYPTO_CRC32
config CRYPTO_CRC32_PCLMUL
tristate "CRC32 PCLMULQDQ hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
select CRC32
help
@@ -704,6 +711,7 @@ config CRYPTO_BLAKE2S
config CRYPTO_BLAKE2S_X86
tristate "BLAKE2s digest algorithm (x86 accelerated version)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_BLAKE2S_GENERIC
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S

@@ -718,6 +726,7 @@ config CRYPTO_CRCT10DIF
config CRYPTO_CRCT10DIF_PCLMUL
tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
depends on X86 && 64BIT && CRC_T10DIF
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
@@ -765,6 +774,7 @@ config CRYPTO_POLY1305
config CRYPTO_POLY1305_X86_64
tristate "Poly1305 authenticator algorithm (x86_64/SSE2/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_POLY1305_GENERIC
select CRYPTO_ARCH_HAVE_LIB_POLY1305
help
@@ -853,6 +863,7 @@ config CRYPTO_SHA1
config CRYPTO_SHA1_SSSE3
tristate "SHA1 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA1
select CRYPTO_HASH
help
@@ -864,6 +875,7 @@ config CRYPTO_SHA1_SSSE3
config CRYPTO_SHA256_SSSE3
tristate "SHA256 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA256
select CRYPTO_HASH
help
@@ -876,6 +888,7 @@ config CRYPTO_SHA256_SSSE3
config CRYPTO_SHA512_SSSE3
tristate "SHA512 digest algorithm (SSSE3/AVX/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA512
select CRYPTO_HASH
help
@@ -1034,6 +1047,7 @@ config CRYPTO_WP512
config CRYPTO_GHASH_CLMUL_NI_INTEL
tristate "GHASH hash function (CLMUL-NI accelerated)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CRYPTD
help
This is the x86_64 CLMUL-NI accelerated implementation of
@@ -1084,6 +1098,7 @@ config CRYPTO_AES_TI
config CRYPTO_AES_NI_INTEL
tristate "AES cipher algorithms (AES-NI)"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_LIB_AES
select CRYPTO_ALGAPI
@@ -1208,6 +1223,7 @@ config CRYPTO_BLOWFISH_COMMON
config CRYPTO_BLOWFISH_X86_64
tristate "Blowfish cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_BLOWFISH_COMMON
imply CRYPTO_CTR
@@ -1238,6 +1254,7 @@ config CRYPTO_CAMELLIA
config CRYPTO_CAMELLIA_X86_64
tristate "Camellia cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
imply CRYPTO_CTR
help
@@ -1254,6 +1271,7 @@ config CRYPTO_CAMELLIA_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAMELLIA_X86_64
select CRYPTO_SIMD
@@ -1272,6 +1290,7 @@ config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
help
Camellia cipher algorithm module (x86_64/AES-NI/AVX2).
@@ -1317,6 +1336,7 @@ config CRYPTO_CAST5
config CRYPTO_CAST5_AVX_X86_64
tristate "CAST5 (CAST-128) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST5
select CRYPTO_CAST_COMMON
@@ -1340,6 +1360,7 @@ config CRYPTO_CAST6
config CRYPTO_CAST6_AVX_X86_64
tristate "CAST6 (CAST-256) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST6
select CRYPTO_CAST_COMMON
@@ -1373,6 +1394,7 @@ config CRYPTO_DES_SPARC64
config CRYPTO_DES3_EDE_X86_64
tristate "Triple DES EDE cipher algorithm (x86-64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_DES
imply CRYPTO_CTR
@@ -1430,6 +1452,7 @@ config CRYPTO_CHACHA20
config CRYPTO_CHACHA20_X86_64
tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_CHACHA_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CHACHA
@@ -1473,6 +1496,7 @@ config CRYPTO_SERPENT
config CRYPTO_SERPENT_SSE2_X86_64
tristate "Serpent cipher algorithm (x86_64/SSE2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1492,6 +1516,7 @@ config CRYPTO_SERPENT_SSE2_X86_64
config CRYPTO_SERPENT_SSE2_586
tristate "Serpent cipher algorithm (i586/SSE2)"
depends on X86 && !64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1511,6 +1536,7 @@ config CRYPTO_SERPENT_SSE2_586
config CRYPTO_SERPENT_AVX_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1531,6 +1557,7 @@ config CRYPTO_SERPENT_AVX_X86_64
config CRYPTO_SERPENT_AVX2_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SERPENT_AVX_X86_64
help
Serpent cipher algorithm, by Anderson, Biham & Knudsen.
@@ -1672,6 +1699,7 @@ config CRYPTO_TWOFISH_586
config CRYPTO_TWOFISH_X86_64
tristate "Twofish cipher algorithm (x86_64)"
depends on (X86 || UML_X86) && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
imply CRYPTO_CTR
@@ -1689,6 +1717,7 @@ config CRYPTO_TWOFISH_X86_64
config CRYPTO_TWOFISH_X86_64_3WAY
tristate "Twofish cipher algorithm (x86_64, 3-way parallel)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_TWOFISH_COMMON
select CRYPTO_TWOFISH_X86_64
@@ -1709,6 +1738,7 @@ config CRYPTO_TWOFISH_X86_64_3WAY
config CRYPTO_TWOFISH_AVX_X86_64
tristate "Twofish cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SIMD
select CRYPTO_TWOFISH_COMMON
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b2a4f998c180e..fed89b6981759 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -76,6 +76,7 @@ config WIREGUARD
tristate "WireGuard secure network tunnel"
depends on NET && INET
depends on IPV6 || !IPV6
+ depends on !KMSAN # KMSAN doesn't support the crypto configs below
select NET_UDP_TUNNEL
select DST_CACHE
select CRYPTO
--
2.35.1.1021.g381101b075-goog

2022-03-30 10:36:56

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 11/48] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 6f8ce114032d0..b50aecd1dd423 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -663,7 +663,7 @@ void devm_namespace_disable(struct device *dev,
struct nd_namespace_common *ndns);
#if IS_ENABLED(CONFIG_ND_CLAIM)
/* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
#else
static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 58eda16f5c534..07a539195cc8b 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -785,7 +785,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
* when populating the vmemmap. This *should* be equal to
* PMD_SIZE for most architectures.
*
- * Also make sure size of struct page is less than 64. We
+ * Also make sure size of struct page is less than 128. We
* want to make sure we use large enough size here so that
* we don't have a dynamic reserve space depending on
* struct page size. But we also want to make sure we notice
--
2.35.1.1021.g381101b075-goog

2022-03-30 10:39:50

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 43/48] x86: kmsan: use __msan_ string functions where possible.

Unless stated otherwise (by explicitly calling __memcpy(), __memset() or
__memmove()) we want all string functions to call their __msan_ versions
(e.g. __msan_memcpy() instead of memcpy()), so that shadow and origin
values are updated accordingly.

Bootloader must still use the default string functions to avoid crashes.

Signed-off-by: Alexander Potapenko <[email protected]>
---

Link: https://linux-review.googlesource.com/id/I7ca9bd6b4f5c9b9816404862ae87ca7984395f33
---
arch/x86/include/asm/string_64.h | 23 +++++++++++++++++++++--
include/linux/fortify-string.h | 2 ++
2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 6e450827f677a..3b87d889b6e16 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -11,11 +11,23 @@
function. */

#define __HAVE_ARCH_MEMCPY 1
+#if defined(__SANITIZE_MEMORY__)
+#undef memcpy
+void *__msan_memcpy(void *dst, const void *src, size_t size);
+#define memcpy __msan_memcpy
+#else
extern void *memcpy(void *to, const void *from, size_t len);
+#endif
extern void *__memcpy(void *to, const void *from, size_t len);

#define __HAVE_ARCH_MEMSET
+#if defined(__SANITIZE_MEMORY__)
+extern void *__msan_memset(void *s, int c, size_t n);
+#undef memset
+#define memset __msan_memset
+#else
void *memset(void *s, int c, size_t n);
+#endif
void *__memset(void *s, int c, size_t n);

#define __HAVE_ARCH_MEMSET16
@@ -55,7 +67,13 @@ static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
}

#define __HAVE_ARCH_MEMMOVE
+#if defined(__SANITIZE_MEMORY__)
+#undef memmove
+void *__msan_memmove(void *dest, const void *src, size_t len);
+#define memmove __msan_memmove
+#else
void *memmove(void *dest, const void *src, size_t count);
+#endif
void *__memmove(void *dest, const void *src, size_t count);

int memcmp(const void *cs, const void *ct, size_t count);
@@ -64,8 +82,7 @@ char *strcpy(char *dest, const char *src);
char *strcat(char *dest, const char *src);
int strcmp(const char *cs, const char *ct);

-#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
-
+#if (defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__))
/*
* For files that not instrumented (e.g. mm/slub.c) we
* should use not instrumented version of mem* functions.
@@ -73,7 +90,9 @@ int strcmp(const char *cs, const char *ct);

#undef memcpy
#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#undef memmove
#define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
#define memset(s, c, n) __memset(s, c, n)

#ifndef __NO_FORTIFY
diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index a6cd6815f2490..b2c74cb85e20e 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -198,6 +198,7 @@ __FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count)
return p;
}

+#ifndef CONFIG_KMSAN
__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
{
size_t p_size = __builtin_object_size(p, 0);
@@ -240,6 +241,7 @@ __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
fortify_panic(__func__);
return __underlying_memmove(p, q, size);
}
+#endif

extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan);
__FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size)
--
2.35.1.1021.g381101b075-goog

2022-03-30 10:55:49

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 01/48] x86: add missing include to sparsemem.h

From: Dmitry Vyukov <[email protected]>

sparsemem.h:34:32: error: unknown type name 'phys_addr_t'
extern int phys_to_target_node(phys_addr_t start);
^
sparsemem.h:36:39: error: unknown type name 'u64'
extern int memory_add_physaddr_to_nid(u64 start);
^
Signed-off-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ifae221ce85d870d8f8d17173bd44d5cf9be2950f
---
arch/x86/include/asm/sparsemem.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 6a9ccc1b2be5d..64df897c0ee30 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_SPARSEMEM_H
#define _ASM_X86_SPARSEMEM_H

+#include <linux/types.h>
+
#ifdef CONFIG_SPARSEMEM
/*
* generic non-linear memory support:
--
2.35.1.1021.g381101b075-goog

2022-03-30 11:58:47

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 23/48] kmsan: add iomap support

Functions from lib/iomap.c interact with hardware, so KMSAN must ensure
that:
- every read function returns an initialized value
- every write function checks values before sending them to hardware.

Signed-off-by: Alexander Potapenko <[email protected]>

---
Link: https://linux-review.googlesource.com/id/I45527599f09090aca046dfe1a26df453adab100d
---
lib/iomap.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/lib/iomap.c b/lib/iomap.c
index fbaa3e8f19d6c..bdda1a42771b2 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -6,6 +6,7 @@
*/
#include <linux/pci.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#include <linux/export.h>

@@ -70,26 +71,31 @@ static void bad_io_access(unsigned long port, const char *access)
#define mmio_read64be(addr) swab64(readq(addr))
#endif

+__no_sanitize_memory
unsigned int ioread8(const void __iomem *addr)
{
IO_COND(addr, return inb(port), return readb(addr));
return 0xff;
}
+__no_sanitize_memory
unsigned int ioread16(const void __iomem *addr)
{
IO_COND(addr, return inw(port), return readw(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread16be(const void __iomem *addr)
{
IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread32(const void __iomem *addr)
{
IO_COND(addr, return inl(port), return readl(addr));
return 0xffffffff;
}
+__no_sanitize_memory
unsigned int ioread32be(const void __iomem *addr)
{
IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
@@ -142,18 +148,21 @@ static u64 pio_read64be_hi_lo(unsigned long port)
return lo | (hi << 32);
}

+__no_sanitize_memory
u64 ioread64_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_lo_hi(port),
@@ -161,6 +170,7 @@ u64 ioread64be_lo_hi(const void __iomem *addr)
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_hi_lo(port),
@@ -188,22 +198,32 @@ EXPORT_SYMBOL(ioread64be_hi_lo);

void iowrite8(u8 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outb(val,port), writeb(val, addr));
}
void iowrite16(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outw(val,port), writew(val, addr));
}
void iowrite16be(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
}
void iowrite32(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outl(val,port), writel(val, addr));
}
void iowrite32be(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
@@ -239,24 +259,32 @@ static void pio_write64be_hi_lo(u64 val, unsigned long port)

void iowrite64_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_lo_hi(val, port),
writeq(val, addr));
}

void iowrite64_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_hi_lo(val, port),
writeq(val, addr));
}

void iowrite64be_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_lo_hi(val, port),
mmio_write64be(val, addr));
}

void iowrite64be_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_hi_lo(val, port),
mmio_write64be(val, addr));
}
@@ -328,14 +356,20 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count);
}
void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 2);
}
void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 4);
}
EXPORT_SYMBOL(ioread8_rep);
EXPORT_SYMBOL(ioread16_rep);
@@ -343,14 +377,20 @@ EXPORT_SYMBOL(ioread32_rep);

void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count);
IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
}
void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 2);
IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
}
void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 4);
IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
}
EXPORT_SYMBOL(iowrite8_rep);
--
2.35.1.1021.g381101b075-goog

2022-03-30 12:15:28

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 04/48] instrumented.h: allow instrumenting both sides of copy_from_user()

Introduce instrument_copy_from_user_before() and
instrument_copy_from_user_after() hooks to be invoked before and after
the call to copy_from_user().

KASAN and KCSAN will be only using instrument_copy_from_user_before(),
but for KMSAN we'll need to insert code after copy_from_user().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
---
include/linux/instrumented.h | 21 +++++++++++++++++++--
include/linux/uaccess.h | 19 ++++++++++++++-----
lib/iov_iter.c | 9 ++++++---
lib/usercopy.c | 3 ++-
4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202a..ee8f7d17d34f5 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
}

/**
- * instrument_copy_from_user - instrument writes of copy_from_user
+ * instrument_copy_from_user_before - add instrumentation before copy_from_user
*
* Instrument writes to kernel memory, that are due to copy_from_user (and
* variants). The instrumentation should be inserted before the accesses.
@@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
* @n number of bytes to copy
*/
static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
{
kasan_check_write(to, n);
kcsan_check_write(to, n);
}

+/**
+ * instrument_copy_from_user_after - add instrumentation after copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted after the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ * @left number of bytes not copied (as returned by copy_from_user)
+ */
+static __always_inline void
+instrument_copy_from_user_after(const void *to, const void __user *from,
+ unsigned long n, unsigned long left)
+{
+}
+
#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ac0394087f7d4..8dadd8642afbb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -98,20 +98,28 @@ static inline void force_uaccess_end(mm_segment_t oldfs)
static __always_inline __must_check unsigned long
__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
{
- instrument_copy_from_user(to, from, n);
+ unsigned long res;
+
+ instrument_copy_from_user_before(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

static __always_inline __must_check unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
+ unsigned long res;
+
might_fault();
+ instrument_copy_from_user_before(to, from, n);
if (should_fail_usercopy())
return n;
- instrument_copy_from_user(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

/**
@@ -155,8 +163,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a995..fb19401c29c4f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -159,13 +159,16 @@ static int copyout(void __user *to, const void *from, size_t n)

static int copyin(void *to, const void __user *from, size_t n)
{
+ size_t res = n;
+
if (should_fail_usercopy())
return n;
if (access_ok(from, n)) {
- instrument_copy_from_user(to, from, n);
- n = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
- return n;
+ return res;
}

static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516e..1505a52f23a01 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
--
2.35.1.1021.g381101b075-goog

2022-03-30 12:15:30

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 03/48] kasan: common: adapt to the new prototype of __stack_depot_save()

Pass extra_bits=0, as KASAN does not intend to store additional
information in the stack handle. No functional change.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I932d8f4f11a41b7483e0d57078744cc94697607a
---
mm/kasan/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 92196562687b6..1182388ed3e0e 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
unsigned int nr_entries;

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
}

void kasan_set_track(struct kasan_track *track, gfp_t flags)
--
2.35.1.1021.g381101b075-goog

2022-03-30 12:16:19

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 28/48] kmsan: instrumentation.h: add instrumentation_begin_with_regs()

When calling KMSAN-instrumented functions from non-instrumented
functions, function parameters may not be initialized properly, leading
to false positive reports. In particular, this happens all the time when
calling interrupt handlers from `noinstr` IDT entries.

We introduce instrumentation_begin_with_regs(), which calls
instrumentation_begin() and notifies KMSAN about the beginning of the
potentially instrumented region by calling
kmsan_instrumentation_begin(), which:
- wipes the current KMSAN state at the beginning of the region, ensuring
that the first call of an instrumented function receives initialized
parameters (this is a pretty good approximation of having all other
instrumented functions receive initialized parameters);
- unpoisons the `struct pt_regs` set up by the non-instrumented assembly
code.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0f5e3372e00bd5fe25ddbf286f7260aae9011858
---
include/linux/instrumentation.h | 6 ++++++
include/linux/kmsan.h | 11 +++++++++++
mm/kmsan/hooks.c | 16 ++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
index 24359b4a96053..3bbce9d556381 100644
--- a/include/linux/instrumentation.h
+++ b/include/linux/instrumentation.h
@@ -15,6 +15,11 @@
})
#define instrumentation_begin() __instrumentation_begin(__COUNTER__)

+#define instrumentation_begin_with_regs(regs) do { \
+ __instrumentation_begin(__COUNTER__); \
+ kmsan_instrumentation_begin(regs); \
+} while (0)
+
/*
* Because instrumentation_{begin,end}() can nest, objtool validation considers
* _begin() a +1 and _end() a -1 and computes a sum over the instructions.
@@ -55,6 +60,7 @@
#define instrumentation_end() __instrumentation_end(__COUNTER__)
#else
# define instrumentation_begin() do { } while(0)
+# define instrumentation_begin_with_regs(regs) kmsan_instrumentation_begin(regs)
# define instrumentation_end() do { } while(0)
#endif

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 55f976b721566..209a5a2192e22 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -247,6 +247,13 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
*/
void kmsan_handle_urb(const struct urb *urb, bool is_out);

+/**
+ * kmsan_instrumentation_begin() - handle instrumentation_begin().
+ * @regs: pointer to struct pt_regs that non-instrumented code passes to
+ * instrumented code.
+ */
+void kmsan_instrumentation_begin(struct pt_regs *regs);
+
#else

static inline void kmsan_init_shadow(void)
@@ -343,6 +350,10 @@ static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
{
}

+static inline void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index d95fd16a4b1dc..6b133533ff7d9 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -366,3 +366,19 @@ void kmsan_check_memory(const void *addr, size_t size)
REASON_ANY);
}
EXPORT_SYMBOL(kmsan_check_memory);
+
+void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+ struct kmsan_context_state *state = &kmsan_get_context()->cstate;
+
+ if (state)
+ __memset(state, 0, sizeof(struct kmsan_context_state));
+ if (!kmsan_enabled || !regs)
+ return;
+ /*
+ * @regs may reside in cpu_entry_area, for which KMSAN does not allocate
+ * metadata. Do not force an error in that case.
+ */
+ kmsan_internal_unpoison_memory(regs, sizeof(*regs), /*checked*/ false);
+}
+EXPORT_SYMBOL(kmsan_instrumentation_begin);
--
2.35.1.1021.g381101b075-goog

2022-03-30 12:48:07

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 38/48] objtool: kmsan: list KMSAN API functions as uaccess-safe

KMSAN inserts API function calls in a lot of places (function entries
and exits, local variables, memory accesses), so they may get called
from the uaccess regions as well.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
---
tools/objtool/check.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7c33ec67c4a95..8518eaf05bff0 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -943,6 +943,25 @@ static const char *uaccess_safe_builtin[] = {
"__sanitizer_cov_trace_cmp4",
"__sanitizer_cov_trace_cmp8",
"__sanitizer_cov_trace_switch",
+ /* KMSAN */
+ "kmsan_copy_to_user",
+ "kmsan_report",
+ "kmsan_unpoison_memory",
+ "__msan_chain_origin",
+ "__msan_get_context_state",
+ "__msan_instrument_asm_store",
+ "__msan_metadata_ptr_for_load_1",
+ "__msan_metadata_ptr_for_load_2",
+ "__msan_metadata_ptr_for_load_4",
+ "__msan_metadata_ptr_for_load_8",
+ "__msan_metadata_ptr_for_load_n",
+ "__msan_metadata_ptr_for_store_1",
+ "__msan_metadata_ptr_for_store_2",
+ "__msan_metadata_ptr_for_store_4",
+ "__msan_metadata_ptr_for_store_8",
+ "__msan_metadata_ptr_for_store_n",
+ "__msan_poison_alloca",
+ "__msan_warning",
/* UBSAN */
"ubsan_type_mismatch_common",
"__ubsan_handle_type_mismatch",
--
2.35.1.1021.g381101b075-goog

2022-03-30 13:30:39

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 17/48] kmsan: mm: maintain KMSAN metadata for page operations

Insert KMSAN hooks that make the necessary bookkeeping changes:
- poison page shadow and origins in alloc_pages()/free_page();
- clear page shadow and origins in clear_page(), copy_user_highpage();
- copy page metadata in copy_highpage(), wp_page_copy();
- handle vmap()/vunmap()/iounmap();

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move page metadata hooks implementation here
-- remove call to kmsan_memblock_free_pages()

Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
---
arch/x86/include/asm/page_64.h | 13 ++++
arch/x86/mm/ioremap.c | 3 +
include/linux/highmem.h | 3 +
include/linux/kmsan.h | 123 +++++++++++++++++++++++++++++++++
mm/internal.h | 6 ++
mm/kmsan/hooks.c | 87 +++++++++++++++++++++++
mm/kmsan/shadow.c | 114 ++++++++++++++++++++++++++++++
mm/memory.c | 2 +
mm/page_alloc.c | 11 +++
mm/vmalloc.c | 20 +++++-
10 files changed, 380 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index e9c86299b8351..36e270a8ea9a4 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -45,14 +45,27 @@ void clear_page_orig(void *page);
void clear_page_rep(void *page);
void clear_page_erms(void *page);

+/* This is an assembly header, avoid including too much of kmsan.h */
+#ifdef CONFIG_KMSAN
+void kmsan_unpoison_memory(const void *addr, size_t size);
+#endif
+__no_sanitize_memory
static inline void clear_page(void *page)
{
+#ifdef CONFIG_KMSAN
+ /* alternative_call_2() changes @page. */
+ void *page_copy = page;
+#endif
alternative_call_2(clear_page_orig,
clear_page_rep, X86_FEATURE_REP_GOOD,
clear_page_erms, X86_FEATURE_ERMS,
"=D" (page),
"0" (page)
: "cc", "memory", "rax", "rcx");
+#ifdef CONFIG_KMSAN
+ /* Clear KMSAN shadow for the pages that have it. */
+ kmsan_unpoison_memory(page_copy, PAGE_SIZE);
+#endif
}

void copy_page(void *to, void *from);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 17a492c273069..0da8608778221 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
#include <linux/cc_platform.h>
#include <linux/efi.h>
#include <linux/pgtable.h>
+#include <linux/kmsan.h>

#include <asm/set_memory.h>
#include <asm/e820/api.h>
@@ -474,6 +475,8 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ kmsan_iounmap_page_range((unsigned long)addr,
+ (unsigned long)addr + get_vm_area_size(p));
memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));

/* Finally remove it */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9cd..3e1898a44d7e3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/bug.h>
#include <linux/cacheflush.h>
+#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <linux/hardirq.h>
@@ -277,6 +278,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_user_page(vto, vfrom, vaddr, to);
+ kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
kunmap_local(vto);
kunmap_local(vfrom);
}
@@ -292,6 +294,7 @@ static inline void copy_highpage(struct page *to, struct page *from)
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_page(vto, vfrom);
+ kmsan_copy_page_meta(to, from);
kunmap_local(vto);
kunmap_local(vfrom);
}
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 4e35f43eceaa9..da41850b46cbd 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -42,6 +42,129 @@ struct kmsan_ctx {
bool allow_reporting;
};

+/**
+ * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
+ * @page: struct page pointer returned by alloc_pages().
+ * @order: order of allocated struct page.
+ * @flags: GFP flags used by alloc_pages()
+ *
+ * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
+ * @flags contain __GFP_ZERO.
+ */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
+
+/**
+ * kmsan_free_page() - Notify KMSAN about a free_pages() call.
+ * @page: struct page pointer passed to free_pages().
+ * @order: order of deallocated struct page.
+ *
+ * KMSAN marks freed memory as uninitialized.
+ */
+void kmsan_free_page(struct page *page, unsigned int order);
+
+/**
+ * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
+ * @dst: destination page.
+ * @src: source page.
+ *
+ * KMSAN copies the contents of metadata pages for @src into the metadata pages
+ * for @dst. If @dst has no associated metadata pages, nothing happens.
+ * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
+ */
+void kmsan_copy_page_meta(struct page *dst, struct page *src);
+
+/**
+ * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
+ * @start: start of vmapped range.
+ * @end: end of vmapped range.
+ * @prot: page protection flags used for vmap.
+ * @pages: array of pages.
+ * @page_shift: page_shift passed to vmap_range_noflush().
+ *
+ * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
+ * vmalloc metadata address range.
+ */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
+/**
+ * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
+ * @start: start of vunmapped range.
+ * @end: end of vunmapped range.
+ *
+ * KMSAN unmaps the contiguous metadata ranges created by
+ * kmsan_map_kernel_range_noflush().
+ */
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
+ * @addr: range start.
+ * @end: range end.
+ * @phys_addr: physical range start.
+ * @prot: page protection flags used for ioremap_page_range().
+ * @page_shift: page_shift argument passed to vmap_range_noflush().
+ *
+ * KMSAN creates new metadata pages for the physical pages mapped into the
+ * virtual memory.
+ */
+void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift);
+
+/**
+ * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
+ * @start: range start.
+ * @end: range end.
+ *
+ * KMSAN unmaps the metadata pages for the given range and, unlike for
+ * vunmap_page_range(), also deallocates them.
+ */
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
+
+#else
+
+static inline int kmsan_alloc_page(struct page *page, unsigned int order,
+ gfp_t flags)
+{
+ return 0;
+}
+
+static inline void kmsan_free_page(struct page *page, unsigned int order)
+{
+}
+
+static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+}
+
+static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
+ unsigned long end,
+ pgprot_t prot,
+ struct page **pages,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_vunmap_range_noflush(unsigned long start,
+ unsigned long end)
+{
+}
+
+static inline void kmsan_ioremap_page_range(unsigned long start,
+ unsigned long end,
+ phys_addr_t phys_addr,
+ pgprot_t prot,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_iounmap_page_range(unsigned long start,
+ unsigned long end)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/internal.h b/mm/internal.h
index d80300392a194..2d8ca0ae4774f 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -713,8 +713,14 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
}
#endif

+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
void vunmap_range_noflush(unsigned long start, unsigned long end);

+void __vunmap_range_noflush(unsigned long start, unsigned long end);
+
int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 4ac62fa67a02a..5d886df57adca 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,93 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+static unsigned long vmalloc_shadow(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_SHADOW);
+}
+
+static unsigned long vmalloc_origin(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_ORIGIN);
+}
+
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ __vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
+ __vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+}
+EXPORT_SYMBOL(kmsan_vunmap_range_noflush);
+
+/*
+ * This function creates new shadow/origin pages for the physical pages mapped
+ * into the virtual memory. If those physical pages already had shadow/origin,
+ * those are ignored.
+ */
+void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift)
+{
+ gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+ struct page *shadow, *origin;
+ unsigned long off = 0;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ for (i = 0; i < nr; i++, off += PAGE_SIZE) {
+ shadow = alloc_pages(gfp_mask, 1);
+ origin = alloc_pages(gfp_mask, 1);
+ __vmap_pages_range_noflush(
+ vmalloc_shadow(start + off),
+ vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
+ page_shift);
+ __vmap_pages_range_noflush(
+ vmalloc_origin(start + off),
+ vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
+ page_shift);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_ioremap_page_range);
+
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
+{
+ unsigned long v_shadow, v_origin;
+ struct page *shadow, *origin;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ v_shadow = (unsigned long)vmalloc_shadow(start);
+ v_origin = (unsigned long)vmalloc_origin(start);
+ for (i = 0; i < nr; i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
+ shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
+ origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
+ __vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
+ __vunmap_range_noflush(v_origin, vmalloc_origin(end));
+ if (shadow)
+ __free_pages(shadow, 1);
+ if (origin)
+ __free_pages(origin, 1);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_iounmap_page_range);
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index de58cfbc55b9d..8fe6a5ed05e67 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -184,3 +184,117 @@ void *kmsan_get_metadata(void *address, bool is_origin)
ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
return ret;
}
+
+void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ if (!dst || !page_has_metadata(dst))
+ return;
+ if (!src || !page_has_metadata(src)) {
+ kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
+ /*checked*/ false);
+ return;
+ }
+
+ kmsan_enter_runtime();
+ __memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
+ __memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
+{
+ bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
+ struct page *shadow, *origin;
+ depot_stack_handle_t handle;
+ int pages = 1 << order;
+ int i;
+
+ if (!page)
+ return;
+
+ shadow = shadow_page_for(page);
+ origin = origin_page_for(page);
+
+ if (initialized) {
+ __memset(page_address(shadow), 0, PAGE_SIZE * pages);
+ __memset(page_address(origin), 0, PAGE_SIZE * pages);
+ return;
+ }
+
+ /* Zero pages allocated by the runtime should also be initialized. */
+ if (kmsan_in_runtime())
+ return;
+
+ __memset(page_address(shadow), -1, PAGE_SIZE * pages);
+ kmsan_enter_runtime();
+ handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
+ kmsan_leave_runtime();
+ /*
+ * Addresses are page-aligned, pages are contiguous, so it's ok
+ * to just fill the origin pages with |handle|.
+ */
+ for (i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
+ ((depot_stack_handle_t *)page_address(origin))[i] = handle;
+}
+
+void kmsan_free_page(struct page *page, unsigned int order)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(page_address(page),
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift)
+{
+ unsigned long shadow_start, origin_start, shadow_end, origin_end;
+ struct page **s_pages, **o_pages;
+ int nr, i, mapped;
+
+ if (!kmsan_enabled)
+ return;
+
+ shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
+ shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
+ if (!shadow_start)
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ s_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ o_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ if (!s_pages || !o_pages)
+ goto ret;
+ for (i = 0; i < nr; i++) {
+ s_pages[i] = shadow_page_for(pages[i]);
+ o_pages[i] = origin_page_for(pages[i]);
+ }
+ prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
+ prot = PAGE_KERNEL;
+
+ origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
+ origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
+ kmsan_enter_runtime();
+ mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
+ s_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
+ o_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ kmsan_leave_runtime();
+ flush_tlb_kernel_range(shadow_start, shadow_end);
+ flush_tlb_kernel_range(origin_start, origin_end);
+ flush_cache_vmap(shadow_start, shadow_end);
+ flush_cache_vmap(origin_start, origin_end);
+
+ret:
+ kfree(s_pages);
+ kfree(o_pages);
+}
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913a..7465eb43e6d3e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -52,6 +52,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/memremap.h>
+#include <linux/kmsan.h>
#include <linux/ksm.h>
#include <linux/rmap.h>
#include <linux/export.h>
@@ -3026,6 +3027,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
put_page(old_page);
return 0;
}
+ kmsan_copy_page_meta(new_page, old_page);
}

if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d319..98a066c0a9f63 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -27,6 +27,7 @@
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/module.h>
#include <linux/suspend.h>
#include <linux/pagevec.h>
@@ -1301,6 +1302,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(PageTail(page), page);

trace_mm_page_free(page, order);
+ kmsan_free_page(page, order);

if (unlikely(PageHWPoison(page)) && !order) {
/*
@@ -3679,6 +3681,14 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
/*
* Allocate a page from the given zone. Use pcplists for order-0 allocations.
*/
+
+/*
+ * Do not instrument rmqueue() with KMSAN. This function may call
+ * __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
+ * If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
+ * may call rmqueue() again, which will result in a deadlock.
+ */
+__no_sanitize_memory
static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
@@ -5409,6 +5419,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
}

trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ kmsan_alloc_page(page, order, alloc_gfp);

return page;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4165304d35471..7bcbf7a08597a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -321,6 +321,9 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
+ if (!err)
+ kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+ ioremap_max_page_shift);
return err;
}

@@ -420,7 +423,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-void vunmap_range_noflush(unsigned long start, unsigned long end)
+void __vunmap_range_noflush(unsigned long start, unsigned long end)
{
unsigned long next;
pgd_t *pgd;
@@ -442,6 +445,12 @@ void vunmap_range_noflush(unsigned long start, unsigned long end)
arch_sync_kernel_mappings(start, end);
}

+void vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ kmsan_vunmap_range_noflush(start, end);
+ __vunmap_range_noflush(start, end);
+}
+
/**
* vunmap_range - unmap kernel virtual addresses
* @addr: start of the VM area to unmap
@@ -576,7 +585,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
@@ -602,6 +611,13 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
return 0;
}

+int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int page_shift)
+{
+ kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+ return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+}
+
/**
* vmap_pages_range - map pages to a kernel virtual address
* @addr: start of the VM area to map
--
2.35.1.1021.g381101b075-goog

2022-03-30 14:51:00

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 06/48] asm-generic: instrument usercopy in cacheflush.h

Notify memory tools about usercopy events in copy_to_user_page() and
copy_from_user_page().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
---
include/asm-generic/cacheflush.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc239..0f63eb325025f 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,8 @@
#ifndef _ASM_GENERIC_CACHEFLUSH_H
#define _ASM_GENERIC_CACHEFLUSH_H

+#include <linux/instrumented.h>
+
struct mm_struct;
struct vm_area_struct;
struct page;
@@ -105,6 +107,7 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
#ifndef copy_to_user_page
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
+ instrument_copy_to_user(dst, src, len); \
memcpy(dst, src, len); \
flush_icache_user_page(vma, page, vaddr, len); \
} while (0)
@@ -112,7 +115,11 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)

#ifndef copy_from_user_page
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
- memcpy(dst, src, len)
+ do { \
+ instrument_copy_from_user_before(dst, src, len); \
+ memcpy(dst, src, len); \
+ instrument_copy_from_user_after(dst, src, len, 0); \
+ } while (0)
#endif

#endif /* _ASM_GENERIC_CACHEFLUSH_H */
--
2.35.1.1021.g381101b075-goog

2022-03-30 15:23:26

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 46/48] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS

dentry_string_cmp() calls read_word_at_a_time(), which might read
uninitialized bytes to optimize string comparisons.
Disabling CONFIG_DCACHE_WORD_ACCESS should prohibit this optimization,
as well as (probably) similar ones.

Suggested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I4c0073224ac2897cafb8c037362c49dda9cfa133
---
arch/x86/Kconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 86df15017f79d..646a7849be4cf 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -126,7 +126,9 @@ config X86
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
- select DCACHE_WORD_ACCESS
+ # Word-size accesses may read uninitialized data past the trailing \0
+ # in strings and cause false KMSAN reports.
+ select DCACHE_WORD_ACCESS if !KMSAN
select DYNAMIC_SIGFRAME
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
--
2.35.1.1021.g381101b075-goog

2022-03-30 20:41:13

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 35/48] kmsan: block: skip bio block merging logic for KMSAN

KMSAN doesn't allow treating adjacent memory pages as such, if they were
allocated by different alloc_pages() calls.
The block layer however does so: adjacent pages end up being used
together. To prevent this, make page_is_mergeable() return false under
KMSAN.

Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Ie29cc2464c70032347c32ab2a22e1e7a0b37b905
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 4312a8085396b..3c8806bbe3a81 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -808,6 +808,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return false;

*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+ if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
+ return false;
if (*same_page)
return true;
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
--
2.35.1.1021.g381101b075-goog

2022-03-31 01:22:10

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v2 12/48] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

On Tue, 29 Mar 2022 at 14:41, Alexander Potapenko <[email protected]> wrote:
>
> kcov used to be broken prior to Clang 11, but right now that version is
> already the minimum required to build with KCSAN, because no prior
> compiler has "-tsan-distinguish-volatile=1".
>
> Therefore KCSAN_KCOV_BROKEN is not needed anymore.
>
> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>

FYI, this is superseded by
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b027471adaf955efde6153d67f391fe1604b7292

> ---
> Link: https://linux-review.googlesource.com/id/Ida287421577f37de337139b5b5b9e977e4a6fee2
> ---
> lib/Kconfig.kcsan | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> index 63b70b8c55519..de022445fbba5 100644
> --- a/lib/Kconfig.kcsan
> +++ b/lib/Kconfig.kcsan
> @@ -10,21 +10,10 @@ config HAVE_KCSAN_COMPILER
> For the list of compilers that support KCSAN, please see
> <file:Documentation/dev-tools/kcsan.rst>.
>
> -config KCSAN_KCOV_BROKEN
> - def_bool KCOV && CC_HAS_SANCOV_TRACE_PC
> - depends on CC_IS_CLANG
> - depends on !$(cc-option,-Werror=unused-command-line-argument -fsanitize=thread -fsanitize-coverage=trace-pc)
> - help
> - Some versions of clang support either KCSAN and KCOV but not the
> - combination of the two.
> - See https://bugs.llvm.org/show_bug.cgi?id=45831 for the status
> - in newer releases.
> -
> menuconfig KCSAN
> bool "KCSAN: dynamic data race detector"
> depends on HAVE_ARCH_KCSAN && HAVE_KCSAN_COMPILER
> depends on DEBUG_KERNEL && !KASAN
> - depends on !KCSAN_KCOV_BROKEN
> select STACKTRACE
> help
> The Kernel Concurrency Sanitizer (KCSAN) is a dynamic
> --
> 2.35.1.1021.g381101b075-goog
>

2022-03-31 02:57:43

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 34/48] kmsan: disable physical page merging in biovec

KMSAN metadata for adjacent physical pages may not be adjacent,
therefore accessing such pages together may lead to metadata
corruption.
We disable merging pages in biovec to prevent such corruptions.

Signed-off-by: Alexander Potapenko <[email protected]>
---

Link: https://linux-review.googlesource.com/id/Iece16041be5ee47904fbc98121b105e5be5fea5c
---
block/blk.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index 8bd43b3ad33d5..eb349916ac116 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -93,6 +93,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;

+ /*
+ * Merging adjacent physical pages may not work correctly under KMSAN
+ * if their metadata pages aren't adjacent. Just disable merging.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ return false;
+
if (addr1 + vec1->bv_len != addr2)
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
--
2.35.1.1021.g381101b075-goog

2022-03-31 03:09:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 40/48] x86: kmsan: disable instrumentation of unsupported code

Instrumenting some files with KMSAN will result in kernel being unable
to link, boot or crashing at runtime for various reasons (e.g. infinite
recursion caused by instrumentation hooks calling instrumented code again).

Completely omit KMSAN instrumentation in the following places:
- arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386;
- arch/x86/entry/vdso, which isn't linked with KMSAN runtime;
- three files in arch/x86/kernel - boot problems;
- arch/x86/mm/cpu_entry_area.c - recursion.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
- moved the patch earlier in the series so that KMSAN can compile
- split off the non-x86 part into a separate patch

Link: https://linux-review.googlesource.com/id/Id5e5c4a9f9d53c24a35ebb633b814c414628d81b
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 3 +++
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/mm/Makefile | 2 ++
arch/x86/realmode/rm/Makefile | 1 +
7 files changed, 11 insertions(+)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b5aecb524a8aa..d5623232b763f 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -12,6 +12,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 6115274fe10fc..6e2e34d2655ce 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -20,6 +20,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index 693f8b9031fb8..4f835eaa03ec1 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -11,6 +11,9 @@ include $(srctree)/lib/vdso/Makefile

# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
+KMSAN_SANITIZE_vclock_gettime.o := n
+KMSAN_SANITIZE_vgetcpu.o := n
+
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 6aef9ee28a394..ad645fb8b02dd 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,6 +35,8 @@ KASAN_SANITIZE_cc_platform.o := n
# With some compiler versions the generated code results in boot hangs, caused
# by several compilation units. To be safe, disable all instrumentation.
KCSAN_SANITIZE := n
+KMSAN_SANITIZE_head$(BITS).o := n
+KMSAN_SANITIZE_nmi.o := n

OBJECT_FILES_NON_STANDARD_test_nx.o := y

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be5..f10a921ee7565 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -12,6 +12,7 @@ endif
# If these files are instrumented, boot hangs during the first second.
KCOV_INSTRUMENT_common.o := n
KCOV_INSTRUMENT_perf_event.o := n
+KMSAN_SANITIZE_common.o := n

# As above, instrumenting secondary CPU boot code causes boot hangs.
KCSAN_SANITIZE_common.o := n
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index fe3d3061fc116..ada726784012f 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -12,6 +12,8 @@ KASAN_SANITIZE_mem_encrypt_identity.o := n
# Disable KCSAN entirely, because otherwise we get warnings that some functions
# reference __initdata sections.
KCSAN_SANITIZE := n
+# Avoid recursion by not calling KMSAN hooks for CEA code.
+KMSAN_SANITIZE_cpu_entry_area.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_mem_encrypt.o = -pg
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..f614009d3e4e2 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -10,6 +10,7 @@
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
--
2.35.1.1021.g381101b075-goog

2022-03-31 03:14:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v2 19/48] kmsan: handle task creation and exiting

Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move implementation of kmsan_task_create() and kmsan_task_exit() here

Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
---
include/linux/kmsan.h | 17 +++++++++++++++++
kernel/exit.c | 2 ++
kernel/fork.c | 2 ++
mm/kmsan/core.c | 10 ++++++++++
mm/kmsan/hooks.c | 19 +++++++++++++++++++
mm/kmsan/kmsan.h | 2 ++
6 files changed, 52 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index ed3630068e2ef..dca42e0e91991 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -17,6 +17,7 @@

struct page;
struct kmem_cache;
+struct task_struct;

#ifdef CONFIG_KMSAN

@@ -43,6 +44,14 @@ struct kmsan_ctx {
bool allow_reporting;
};

+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -164,6 +173,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_task_create(struct task_struct *task)
+{
+}
+
+static inline void kmsan_task_exit(struct task_struct *task)
+{
+}
+
static inline int kmsan_alloc_page(struct page *page, unsigned int order,
gfp_t flags)
{
diff --git a/kernel/exit.c b/kernel/exit.c
index b00a25bb4ab93..15e1bf7fe1fa1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -59,6 +59,7 @@
#include <linux/writeback.h>
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/kmsan.h>
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
@@ -752,6 +753,7 @@ void __noreturn do_exit(long code)
force_uaccess_begin();

kcov_task_exit(tsk);
+ kmsan_task_exit(tsk);

coredump_task_exit(tsk);
ptrace_event(PTRACE_EVENT_EXIT, code);
diff --git a/kernel/fork.c b/kernel/fork.c
index f1e89007f2288..f62c51d9cbfb1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
#include <linux/fdtable.h>
#include <linux/iocontext.h>
#include <linux/key.h>
+#include <linux/kmsan.h>
#include <linux/binfmts.h>
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
@@ -956,6 +957,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
account_kernel_stack(tsk, 1);

kcov_task_init(tsk);
+ kmsan_task_create(tsk);
kmap_local_fork(tsk);

#ifdef CONFIG_FAULT_INJECTION
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index f4196f274e754..8e594361332c6 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -44,6 +44,16 @@ bool kmsan_enabled __read_mostly;
*/
DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);

+void kmsan_internal_task_create(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ __memset(ctx, 0, sizeof(struct kmsan_ctx));
+ ctx->allow_reporting = true;
+ kmsan_internal_unpoison_memory(current_thread_info(),
+ sizeof(struct thread_info), false);
+}
+
void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
unsigned int poison_flags)
{
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index e7c3ff48ed5cd..a13e15ef2bfd5 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,25 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_task_create(struct task_struct *task)
+{
+ kmsan_enter_runtime();
+ kmsan_internal_task_create(task);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_task_create);
+
+void kmsan_task_exit(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ctx->allow_reporting = false;
+}
+EXPORT_SYMBOL(kmsan_task_exit);
+
void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
{
if (unlikely(object == NULL))
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index bfe38789950a6..a1b5900ffd97b 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -172,6 +172,8 @@ void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
u32 origin, bool checked);
depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);

+void kmsan_internal_task_create(struct task_struct *task);
+
bool kmsan_metadata_is_contiguous(void *addr, size_t size);
void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
int reason);
--
2.35.1.1021.g381101b075-goog

2022-03-31 03:27:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 38/48] objtool: kmsan: list KMSAN API functions as uaccess-safe

On Tue, Mar 29, 2022 at 02:40:07PM +0200, Alexander Potapenko wrote:
> KMSAN inserts API function calls in a lot of places (function entries
> and exits, local variables, memory accesses), so they may get called
> from the uaccess regions as well.

That's insufficient. Explain how you did the right thing and made these
functions actually safe to be called in this context.

> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
> ---
> tools/objtool/check.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 7c33ec67c4a95..8518eaf05bff0 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -943,6 +943,25 @@ static const char *uaccess_safe_builtin[] = {
> "__sanitizer_cov_trace_cmp4",
> "__sanitizer_cov_trace_cmp8",
> "__sanitizer_cov_trace_switch",
> + /* KMSAN */
> + "kmsan_copy_to_user",
> + "kmsan_report",
> + "kmsan_unpoison_memory",
> + "__msan_chain_origin",
> + "__msan_get_context_state",
> + "__msan_instrument_asm_store",
> + "__msan_metadata_ptr_for_load_1",
> + "__msan_metadata_ptr_for_load_2",
> + "__msan_metadata_ptr_for_load_4",
> + "__msan_metadata_ptr_for_load_8",
> + "__msan_metadata_ptr_for_load_n",
> + "__msan_metadata_ptr_for_store_1",
> + "__msan_metadata_ptr_for_store_2",
> + "__msan_metadata_ptr_for_store_4",
> + "__msan_metadata_ptr_for_store_8",
> + "__msan_metadata_ptr_for_store_n",
> + "__msan_poison_alloca",
> + "__msan_warning",
> /* UBSAN */
> "ubsan_type_mismatch_common",
> "__ubsan_handle_type_mismatch",
> --
> 2.35.1.1021.g381101b075-goog
>

2022-04-05 02:14:32

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v2 31/48] kernel: kmsan: don't instrument stacktrace.c

On Tue, Mar 29, 2022 at 2:41 PM Alexander Potapenko <[email protected]> wrote:
>
> When unwinding stack traces, the kernel may pick uninitialized data from
> the stack. To avoid false reports on that data, we do not instrument
> stacktrace.c

This patch is not needed anymore if we unpoison the stack traces
passed to __stack_depot_save() from KMSAN core.

2022-04-16 00:17:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 38/48] objtool: kmsan: list KMSAN API functions as uaccess-safe

On Thu, Apr 14, 2022 at 05:30:34PM +0200, Alexander Potapenko wrote:
> On Wed, Mar 30, 2022 at 10:46 AM Peter Zijlstra <[email protected]>
> wrote:
>
> > On Tue, Mar 29, 2022 at 02:40:07PM +0200, Alexander Potapenko wrote:
> > > KMSAN inserts API function calls in a lot of places (function entries
> > > and exits, local variables, memory accesses), so they may get called
> > > from the uaccess regions as well.
> >
> > That's insufficient. Explain how you did the right thing and made these
> > functions actually safe to be called in this context.
> >

> As a result, none of KMSAN API functions perform userspace accesses, but
> since they might be called from UACCESS regions they
> use user_access_save/restore().

^ That.. very good.

Thanks!