2021-12-14 16:22:00

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 00/43] Add KernelMemorySanitizer infrastructure

KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
uninitialized memory. It relies on compile-time Clang instrumentation
(similar to MSan in the userspace [1]) and tracks the state of every bit
of kernel memory, being able to report an error if uninitialized value is
used in a condition, dereferenced, or escapes to userspace, USB or DMA.

KMSAN has reported more than 300 bugs in the past few years (recently
fixed bugs: [2]), most of them with the help of syzkaller. Such bugs
keep getting introduced into the kernel despite new compiler warnings and
other analyses (the 5.16 cycle already resulted in several KMSAN-reported
bugs, e.g. [3]). Mitigations like total stack and heap initialization are
unfortunately very far from being deployable.

The proposed patchset contains KMSAN runtime implementation together with
small changes to other subsystems needed to make KMSAN work.

The latter changes fall into several categories:

1. Changes and refactorings of existing code required to add KMSAN:
- [1/43] arch/x86: add missing include to sparsemem.h
- [2/43] stackdepot: reserve 5 extra bits in depot_stack_handle_t
- [3/43] kasan: common: adapt to the new prototype of __stack_depot_save()
- [4/43] instrumented.h: allow instrumenting both sides of copy_from_user()
- [5/43] asm: x86: instrument usercopy in get_user() and __put_user_size()
- [6/43] asm-generic: instrument usercopy in cacheflush.h
- [7/43] compiler_attributes.h: add __disable_sanitizer_instrumentation
- [11/43] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
- [12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

2. KMSAN-related declarations in generic code, KMSAN runtime library,
docs and configs:
- [8/43] kmsan: add ReST documentation
- [9/43] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
- [10/43] kmsan: pgtable: reduce vmalloc space
- [13/43] kmsan: add KMSAN runtime core
- [14/43] MAINTAINERS: add entry for KMSAN
- [30/43] kmsan: add tests for KMSAN
- [35/43] x86: kmsan: use __msan_ string functions where possible.
- [42/43] objtool: kmsan: list KMSAN API functions as uaccess-safe
- [43/43] x86: kmsan: enable KMSAN builds for x86

3. Adding hooks from different subsystems to notify KMSAN about memory
state changes:
- [15/43] kmsan: mm: maintain KMSAN metadata for page operations
- [16/43] kmsan: mm: call KMSAN hooks from SLUB code
- [17/43] kmsan: handle task creation and exiting
- [19/43] kmsan: init: call KMSAN initialization routines
- [20/43] instrumented.h: add KMSAN support
- [26/43] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
- [27/43] x86: kmsan: add iomem support
- [28/43] kmsan: dma: unpoison DMA mappings
- [29/43] kmsan: handle memory sent to/from USB
- [36/43] x86: kmsan: sync metadata pages on page fault

4. Changes that prevent false reports by explicitly initializing memory,
disabling optimized code that may trick KMSAN, selectively skipping
instrumentation:
- [18/43] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
- [22/43] kmsan: initialize the output of READ_ONCE_NOCHECK()
- [23/43] kmsan: make READ_ONCE_TASK_STACK() return initialized values
- [24/43] kmsan: disable KMSAN instrumentation for certain kernel parts
- [25/43] kmsan: skip shadow checks in files doing context switches
- [31/43] kmsan: disable strscpy() optimization under KMSAN
- [32/43] crypto: kmsan: disable accelerated configs under KMSAN
- [33/43] kmsan: disable physical page merging in biovec
- [34/43] kmsan: block: skip bio block merging logic for KMSAN
- [37/43] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
- [38/43] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
- [40/43] kmsan: kcov: unpoison area->list in kcov_remote_area_put()
- [41/43] security: kmsan: fix interoperability with auto-initialization

5. Noinstr handling:
- [21/43] kmsan: mark noinstr as __no_sanitize_memory
- [39/43] x86: kmsan: handle register passing from uninstrumented code

This patchset allows one to boot and run a defconfig+KMSAN kernel on a
QEMU without known false positives. It however doesn't guarantee there
are no false positives in drivers of certain devices or less tested
subsystems, although KMSAN is actively tested on syzbot with a large
config.

The patchset was generated relative to Linux v5.16-rc5. The most
up-to-date KMSAN tree currently resides at
https://github.com/google/kmsan/.
One may find it handy to review these patches in Gerrit:
https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/1081

A huge thanks goes to the reviewers of the RFC patch series sent to LKML
last year
(https://lore.kernel.org/all/[email protected]/).

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-kmsan-gce
[3] https://lore.kernel.org/all/[email protected]/


Alexander Potapenko (42):
stackdepot: reserve 5 extra bits in depot_stack_handle_t
kasan: common: adapt to the new prototype of __stack_depot_save()
instrumented.h: allow instrumenting both sides of copy_from_user()
asm: x86: instrument usercopy in get_user() and __put_user_size()
asm-generic: instrument usercopy in cacheflush.h
compiler_attributes.h: add __disable_sanitizer_instrumentation
kmsan: add ReST documentation
kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
kmsan: pgtable: reduce vmalloc space
libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN
kmsan: add KMSAN runtime core
MAINTAINERS: add entry for KMSAN
kmsan: mm: maintain KMSAN metadata for page operations
kmsan: mm: call KMSAN hooks from SLUB code
kmsan: handle task creation and exiting
kmsan: unpoison @tlb in arch_tlb_gather_mmu()
kmsan: init: call KMSAN initialization routines
instrumented.h: add KMSAN support
kmsan: mark noinstr as __no_sanitize_memory
kmsan: initialize the output of READ_ONCE_NOCHECK()
kmsan: make READ_ONCE_TASK_STACK() return initialized values
kmsan: disable KMSAN instrumentation for certain kernel parts
kmsan: skip shadow checks in files doing context switches
kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
x86: kmsan: add iomem support
kmsan: dma: unpoison DMA mappings
kmsan: handle memory sent to/from USB
kmsan: add tests for KMSAN
kmsan: disable strscpy() optimization under KMSAN
crypto: kmsan: disable accelerated configs under KMSAN
kmsan: disable physical page merging in biovec
kmsan: block: skip bio block merging logic for KMSAN
x86: kmsan: use __msan_ string functions where possible.
x86: kmsan: sync metadata pages on page fault
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for
KASAN/KMSAN
x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
x86: kmsan: handle register passing from uninstrumented code
kmsan: kcov: unpoison area->list in kcov_remote_area_put()
security: kmsan: fix interoperability with auto-initialization
objtool: kmsan: list KMSAN API functions as uaccess-safe
x86: kmsan: enable KMSAN builds for x86

Dmitry Vyukov (1):
arch/x86: add missing include to sparsemem.h

Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 411 ++++++++++++++++++++++
MAINTAINERS | 12 +
Makefile | 1 +
arch/x86/Kconfig | 9 +-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/common.c | 2 +
arch/x86/entry/vdso/Makefile | 3 +
arch/x86/include/asm/checksum.h | 16 +-
arch/x86/include/asm/idtentry.h | 5 +
arch/x86/include/asm/page_64.h | 13 +
arch/x86/include/asm/pgtable_64_types.h | 41 ++-
arch/x86/include/asm/sparsemem.h | 2 +
arch/x86/include/asm/string_64.h | 23 +-
arch/x86/include/asm/uaccess.h | 7 +
arch/x86/include/asm/unwind.h | 23 +-
arch/x86/kernel/Makefile | 6 +
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/mce/core.c | 1 +
arch/x86/kernel/kvm.c | 1 +
arch/x86/kernel/nmi.c | 1 +
arch/x86/kernel/sev.c | 2 +
arch/x86/kernel/traps.c | 7 +
arch/x86/lib/Makefile | 2 +
arch/x86/lib/iomem.c | 5 +
arch/x86/mm/Makefile | 2 +
arch/x86/mm/fault.c | 23 +-
arch/x86/mm/init_64.c | 2 +-
arch/x86/mm/ioremap.c | 3 +
arch/x86/realmode/rm/Makefile | 1 +
block/bio.c | 2 +
block/blk.h | 7 +
crypto/Kconfig | 30 ++
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/net/Kconfig | 1 +
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
drivers/usb/core/urb.c | 2 +
drivers/virtio/virtio_ring.c | 10 +-
include/asm-generic/cacheflush.h | 9 +-
include/asm-generic/rwonce.h | 5 +-
include/linux/compiler-clang.h | 23 ++
include/linux/compiler-gcc.h | 6 +
include/linux/compiler_attributes.h | 18 +
include/linux/compiler_types.h | 3 +-
include/linux/fortify-string.h | 2 +
include/linux/highmem.h | 3 +
include/linux/instrumented.h | 26 +-
include/linux/kmsan-checks.h | 123 +++++++
include/linux/kmsan.h | 365 +++++++++++++++++++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
include/linux/stackdepot.h | 8 +
include/linux/uaccess.h | 19 +-
init/main.c | 3 +
kernel/Makefile | 1 +
kernel/dma/mapping.c | 9 +-
kernel/entry/common.c | 3 +
kernel/exit.c | 2 +
kernel/fork.c | 2 +
kernel/kcov.c | 7 +
kernel/locking/Makefile | 3 +-
kernel/sched/Makefile | 4 +
lib/Kconfig.debug | 1 +
lib/Kconfig.kcsan | 11 -
lib/Kconfig.kmsan | 34 ++
lib/Makefile | 1 +
lib/iomap.c | 40 +++
lib/iov_iter.c | 9 +-
lib/stackdepot.c | 29 +-
lib/string.c | 8 +
lib/usercopy.c | 3 +-
mm/Makefile | 1 +
mm/kasan/common.c | 2 +-
mm/kmsan/Makefile | 26 ++
mm/kmsan/annotations.c | 28 ++
mm/kmsan/core.c | 427 +++++++++++++++++++++++
mm/kmsan/hooks.c | 400 +++++++++++++++++++++
mm/kmsan/init.c | 238 +++++++++++++
mm/kmsan/instrumentation.c | 233 +++++++++++++
mm/kmsan/kmsan.h | 197 +++++++++++
mm/kmsan/kmsan_test.c | 444 ++++++++++++++++++++++++
mm/kmsan/report.c | 210 +++++++++++
mm/kmsan/shadow.c | 332 ++++++++++++++++++
mm/memory.c | 2 +
mm/mmu_gather.c | 10 +
mm/page_alloc.c | 18 +
mm/slab.h | 1 +
mm/slub.c | 26 +-
mm/vmalloc.c | 20 +-
scripts/Makefile.kmsan | 1 +
scripts/Makefile.lib | 9 +
security/Kconfig.hardening | 4 +
tools/objtool/check.c | 19 +
95 files changed, 4062 insertions(+), 68 deletions(-)
create mode 100644 Documentation/dev-tools/kmsan.rst
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/annotations.c
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/init.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/kmsan_test.c
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

--
2.34.1.173.g76aa8bc2d0-goog



2021-12-14 16:22:02

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 01/43] arch/x86: add missing include to sparsemem.h

From: Dmitry Vyukov <[email protected]>

Somehow all existing inclusions of sparsemem.h are preceded by inclusion
of <linux/types.h>, but KMSAN contains code that transitively includes
sparsemem.h without that header, resulting in a compilation error:

sparsemem.h:34:32: error: unknown type name 'phys_addr_t'
extern int phys_to_target_node(phys_addr_t start);
^
sparsemem.h:36:39: error: unknown type name 'u64'
extern int memory_add_physaddr_to_nid(u64 start);
^

Because sparsemem.h does actually use phys_addr_t and u64, include
types.h explicitly.

Signed-off-by: Dmitry Vyukov <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ifae221ce85d870d8f8d17173bd44d5cf9be2950f
---
arch/x86/include/asm/sparsemem.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/sparsemem.h b/arch/x86/include/asm/sparsemem.h
index 6a9ccc1b2be5d..64df897c0ee30 100644
--- a/arch/x86/include/asm/sparsemem.h
+++ b/arch/x86/include/asm/sparsemem.h
@@ -2,6 +2,8 @@
#ifndef _ASM_X86_SPARSEMEM_H
#define _ASM_X86_SPARSEMEM_H

+#include <linux/types.h>
+
#ifdef CONFIG_SPARSEMEM
/*
* generic non-linear memory support:
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:11

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 02/43] stackdepot: reserve 5 extra bits in depot_stack_handle_t

Some users (currently only KMSAN) may want to use spare bits in
depot_stack_handle_t. Let them do so by adding @extra_bits to
__stack_depot_save() to store arbitrary flags, and providing
stack_depot_get_extra_bits() to retrieve those flags.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
---
include/linux/stackdepot.h | 8 ++++++++
lib/stackdepot.c | 29 ++++++++++++++++++++++++-----
2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index c34b55a6e5540..b24f404ab03ac 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -14,9 +14,15 @@
#include <linux/gfp.h>

typedef u32 depot_stack_handle_t;
+/*
+ * Number of bits in the handle that stack depot doesn't use. Users may store
+ * information in them.
+ */
+#define STACK_DEPOT_EXTRA_BITS 5

depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t gfp_flags, bool can_alloc);

depot_stack_handle_t stack_depot_save(unsigned long *entries,
@@ -25,6 +31,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces);

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index b437ae79aca14..6ad7b8888ff19 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -41,7 +41,8 @@
#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
- STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
+ STACK_ALLOC_NULL_PROTECTION_BITS - \
+ STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
#define STACK_ALLOC_SLABS_CAP 8192
#define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -54,6 +55,7 @@ union handle_parts {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
+ u32 extra : STACK_DEPOT_EXTRA_BITS;
};
};

@@ -72,6 +74,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
+{
+ union handle_parts parts = { .handle = handle };
+
+ return parts.extra;
+}
+EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -135,6 +145,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
+ stack->handle.extra = 0;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;

@@ -297,6 +308,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
+ * @extra_bits: Flags to store in unused bits of depot_stack_handle_t
* @alloc_flags: Allocation gfp flags
* @can_alloc: Allocate stack slabs (increased chance of failure if false)
*
@@ -305,6 +317,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
* (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids
* any allocations and will fail if no space is left to store the stack trace.
*
+ * Additional opaque flags can be passed in @extra_bits, stored in the unused
+ * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
+ * without calling stack_depot_fetch().
+ *
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
@@ -314,10 +330,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t alloc_flags, bool can_alloc)
{
struct stack_record *found = NULL, **bucket;
- depot_stack_handle_t retval = 0;
+ union handle_parts retval = { .handle = 0 };
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
@@ -391,9 +408,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
if (found)
- retval = found->handle.handle;
+ retval.handle = found->handle.handle;
fast_exit:
- return retval;
+ retval.extra = extra_bits;
+
+ return retval.handle;
}
EXPORT_SYMBOL_GPL(__stack_depot_save);

@@ -413,6 +432,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:16

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 03/43] kasan: common: adapt to the new prototype of __stack_depot_save()

Pass extra_bits=0, as KASAN does not intend to store additional
information in the stack handle. No functional change.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I932d8f4f11a41b7483e0d57078744cc94697607a
---
mm/kasan/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 8428da2aaf173..6c690ca0ee41a 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -37,7 +37,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
nr_entries = filter_irq_stacks(entries, nr_entries);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
}

void kasan_set_track(struct kasan_track *track, gfp_t flags)
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:19

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 04/43] instrumented.h: allow instrumenting both sides of copy_from_user()

Introduce instrument_copy_from_user_before() and
instrument_copy_from_user_after() hooks to be invoked before and after
the call to copy_from_user().

KASAN and KCSAN will be only using instrument_copy_from_user_before(),
but for KMSAN we'll need to insert code after copy_from_user().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
---
include/linux/instrumented.h | 21 +++++++++++++++++++--
include/linux/uaccess.h | 19 ++++++++++++++-----
lib/iov_iter.c | 9 ++++++---
lib/usercopy.c | 3 ++-
4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202a..ee8f7d17d34f5 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
}

/**
- * instrument_copy_from_user - instrument writes of copy_from_user
+ * instrument_copy_from_user_before - add instrumentation before copy_from_user
*
* Instrument writes to kernel memory, that are due to copy_from_user (and
* variants). The instrumentation should be inserted before the accesses.
@@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
* @n number of bytes to copy
*/
static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
{
kasan_check_write(to, n);
kcsan_check_write(to, n);
}

+/**
+ * instrument_copy_from_user_after - add instrumentation after copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted after the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ * @left number of bytes not copied (as returned by copy_from_user)
+ */
+static __always_inline void
+instrument_copy_from_user_after(const void *to, const void __user *from,
+ unsigned long n, unsigned long left)
+{
+}
+
#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index ac0394087f7d4..8dadd8642afbb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -98,20 +98,28 @@ static inline void force_uaccess_end(mm_segment_t oldfs)
static __always_inline __must_check unsigned long
__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
{
- instrument_copy_from_user(to, from, n);
+ unsigned long res;
+
+ instrument_copy_from_user_before(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

static __always_inline __must_check unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
+ unsigned long res;
+
might_fault();
+ instrument_copy_from_user_before(to, from, n);
if (should_fail_usercopy())
return n;
- instrument_copy_from_user(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

/**
@@ -155,8 +163,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 66a740e6e153c..28c033cb9e803 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -161,13 +161,16 @@ static int copyout(void __user *to, const void *from, size_t n)

static int copyin(void *to, const void __user *from, size_t n)
{
+ size_t res = n;
+
if (should_fail_usercopy())
return n;
if (access_ok(from, n)) {
- instrument_copy_from_user(to, from, n);
- n = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
- return n;
+ return res;
}

static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516e..1505a52f23a01 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:20

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 05/43] asm: x86: instrument usercopy in get_user() and __put_user_size()

Use hooks from instrumented.h to notify bug detection tools about
usercopy events in get_user() and put_user_size().

It's still unclear how to instrument put_user(), which assumes that
instrumentation code doesn't clobber RAX.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
---
arch/x86/include/asm/uaccess.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 33a68407def3f..86ad5ab211e97 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -5,6 +5,7 @@
* User space memory access functions
*/
#include <linux/compiler.h>
+#include <linux/instrumented.h>
#include <linux/kasan-checks.h>
#include <linux/string.h>
#include <asm/asm.h>
@@ -126,11 +127,13 @@ extern int __get_user_bad(void);
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
asm volatile("call __" #fn "_%P4" \
: "=a" (__ret_gu), "=r" (__val_gu), \
ASM_CALL_CONSTRAINT \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
+ instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
__builtin_expect(__ret_gu, 0); \
})

@@ -275,7 +278,9 @@ extern void __put_user_nocheck_8(void);

#define __put_user_size(x, ptr, size, label) \
do { \
+ __typeof__(*(ptr)) __pus_val = x; \
__chk_user_ptr(ptr); \
+ instrument_copy_to_user(ptr, &(__pus_val), size); \
switch (size) { \
case 1: \
__put_user_goto(x, ptr, "b", "iq", label); \
@@ -313,6 +318,7 @@ do { \
#define __get_user_size(x, ptr, size, label) \
do { \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, size); \
switch (size) { \
unsigned char x_u8__; \
case 1: \
@@ -331,6 +337,7 @@ do { \
default: \
(x) = __get_user_bad(); \
} \
+ instrument_copy_from_user_after((void *)&(x), ptr, size, 0); \
} while (0)

#define __get_user_asm(x, addr, itype, ltype, label) \
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:23

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 06/43] asm-generic: instrument usercopy in cacheflush.h

Notify memory tools about usercopy events in copy_to_user_page() and
copy_from_user_page().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
---
include/asm-generic/cacheflush.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc239..0f63eb325025f 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,8 @@
#ifndef _ASM_GENERIC_CACHEFLUSH_H
#define _ASM_GENERIC_CACHEFLUSH_H

+#include <linux/instrumented.h>
+
struct mm_struct;
struct vm_area_struct;
struct page;
@@ -105,6 +107,7 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
#ifndef copy_to_user_page
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
+ instrument_copy_to_user(dst, src, len); \
memcpy(dst, src, len); \
flush_icache_user_page(vma, page, vaddr, len); \
} while (0)
@@ -112,7 +115,11 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)

#ifndef copy_from_user_page
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
- memcpy(dst, src, len)
+ do { \
+ instrument_copy_from_user_before(dst, src, len); \
+ memcpy(dst, src, len); \
+ instrument_copy_from_user_after(dst, src, len, 0); \
+ } while (0)
#endif

#endif /* _ASM_GENERIC_CACHEFLUSH_H */
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 07/43] compiler_attributes.h: add __disable_sanitizer_instrumentation

The new attribute maps to
__attribute__((disable_sanitizer_instrumentation)), which will be
supported by Clang >= 14.0. Future support in GCC is also possible.

This attribute disables compiler instrumentation for kernel sanitizer
tools, making it easier to implement noinstr. It is different from the
existing __no_sanitize* attributes, which may still allow certain types
of instrumentation to prevent false positives.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic0123ce99b33ab7d5ed1ae90593425be8d3d774a
---
include/linux/compiler_attributes.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index b9121afd87331..37e2600202216 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -308,6 +308,24 @@
# define __compiletime_warning(msg)
#endif

+/*
+ * Optional: only supported since clang >= 14.0
+ *
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#disable-sanitizer-instrumentation
+ *
+ * disable_sanitizer_instrumentation is not always similar to
+ * no_sanitize((<sanitizer-name>)): the latter may still let specific sanitizers
+ * insert code into functions to prevent false positives. Unlike that,
+ * disable_sanitizer_instrumentation prevents all kinds of instrumentation to
+ * functions with the attribute.
+ */
+#if __has_attribute(disable_sanitizer_instrumentation)
+# define __disable_sanitizer_instrumentation \
+ __attribute__((disable_sanitizer_instrumentation))
+#else
+# define __disable_sanitizer_instrumentation
+#endif
+
/*
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-weak-function-attribute
* gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-weak-variable-attribute
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:28

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 09/43] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks

__no_sanitize_memory is a function attribute that instructs KMSAN to
skip a function during instrumentation. This is needed to e.g. implement
the noinstr functions.

__no_kmsan_checks is a function attribute that makes KMSAN
ignore the uninitialized values coming from the function's
inputs, and initialize the function's outputs.

Functions marked with this attribute can't be inlined into functions
not marked with it, and vice versa.

__SANITIZE_MEMORY__ is a macro that's defined iff the file is
instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is
defined for every file.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I004ff0360c918d3cd8b18767ddd1381c6d3281be
---
include/linux/compiler-clang.h | 23 +++++++++++++++++++++++
include/linux/compiler-gcc.h | 6 ++++++
2 files changed, 29 insertions(+)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 3c4de9b6c6e3e..5f11a6f269e28 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -51,6 +51,29 @@
#define __no_sanitize_undefined
#endif

+#if __has_feature(memory_sanitizer)
+#define __SANITIZE_MEMORY__
+/*
+ * Unlike other sanitizers, KMSAN still inserts code into functions marked with
+ * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation
+ * provides the behavior consistent with other __no_sanitize_ attributes,
+ * guaranteeing that __no_sanitize_memory functions remain uninstrumented.
+ */
+#define __no_sanitize_memory __disable_sanitizer_instrumentation
+
+/*
+ * The __no_kmsan_checks attribute ensures that a function does not produce
+ * false positive reports by:
+ * - initializing all local variables and memory stores in this function;
+ * - skipping all shadow checks;
+ * - passing initialized arguments to this function's callees.
+ */
+#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory")))
+#else
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+#endif
+
/*
* Support for __has_feature(coverage_sanitizer) was added in Clang 13 together
* with no_sanitize("coverage"). Prior versions of Clang support coverage
diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index ccbbd31b3aae5..f6e69387aad05 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -129,6 +129,12 @@
#define __SANITIZE_ADDRESS__
#endif

+/*
+ * GCC does not support KMSAN.
+ */
+#define __no_sanitize_memory
+#define __no_kmsan_checks
+
/*
* Turn individual warnings and errors on and off locally, depending
* on version.
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:32

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 08/43] kmsan: add ReST documentation

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 411 ++++++++++++++++++++++++++++++
2 files changed, 412 insertions(+)
create mode 100644 Documentation/dev-tools/kmsan.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 010a2af1e7d9e..2fc71f769f481 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
kcov
gcov
kasan
+ kmsan
ubsan
kmemleak
kcsan
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 0000000000000..121a1c46820a9
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,411 @@
+=============================
+KernelMemorySanitizer (KMSAN)
+=============================
+
+KMSAN is a dynamic error detector aimed at finding uses of uninitialized
+values. It is based on compiler instrumentation, and is quite similar to the
+userspace `MemorySanitizer tool`_.
+
+Example report
+==============
+
+Here is an example of a KMSAN report::
+
+ =====================================================
+ BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
+ test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Uninit was stored to memory at:
+ do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Local variable uninit created at:
+ do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+
+ Bytes 4-7 of 8 are uninitialized
+ Memory access of size 8 starts at ffff888083fe3da0
+
+ CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
+ =====================================================
+
+
+The report says that the local variable ``uninit`` was created uninitialized in
+``do_uninit_local_array()``. The lower stack trace corresponds to the place
+where this variable was created.
+
+The upper stack shows where the uninit value was used - in
+``test_uninit_kmsan_check_memory()``. The tool shows the bytes which were left
+uninitialized in the local variable, as well as the stack where the value was
+copied to another memory location before use.
+
+Please note that KMSAN only reports an error when an uninitialized value is
+actually used (e.g. in a condition or pointer dereference). A lot of
+uninitialized values in the kernel are never used, and reporting them would
+result in too many false positives.
+
+KMSAN and Clang
+===============
+
+In order for KMSAN to work the kernel must be built with Clang, which so far is
+the only compiler that has KMSAN support. The kernel instrumentation pass is
+based on the userspace `MemorySanitizer tool`_.
+
+How to build
+============
+
+In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
+Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
+
+Now configure and build the kernel with CONFIG_KMSAN enabled.
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a metadata byte (also called shadow byte) with every byte of
+kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
+kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
+setting its shadow bytes to ``0xff``) is called poisoning, marking it
+initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
+
+When a new variable is allocated on the stack, it is poisoned by default by
+instrumentation code inserted by the compiler (unless it is a stack variable
+that is immediately initialized). Any new heap allocation done without
+``__GFP_ZERO`` is also poisoned.
+
+Compiler instrumentation also tracks the shadow values with the help from the
+runtime library in ``mm/kmsan/``.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length. When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+ int a = 0xff; // i.e. 0x000000ff
+ int b;
+ int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin assigned to
+them. This origin describes the point in program execution at which the
+uninitialized value was created. Every origin is associated with either the
+full allocation stack (for heap-allocated memory), or the function containing
+the uninitialized variable (for locals).
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value.
+When a value is read from memory, its origin is also read and kept together
+with the shadow. For every instruction that takes one or more values the origin
+of the result is one of the origins corresponding to any of the uninitialized
+inputs. If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+ int a = 42;
+ int b;
+ int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk. In this case every write to either variable updates the
+origin for all of them. We have to sacrifice precision in this case, because
+storing origins for individual bits (and even bytes) would be too costly.
+
+Example 2::
+
+ int combine(short a, short b) {
+ union ret_t {
+ int i;
+ short s[2];
+ } ret;
+ ret.s[0] = a;
+ ret.s[1] = b;
+ return ret.i;
+ }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will be never used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+
+To ease debugging, KMSAN creates a new origin for every store of an
+uninitialized value to memory. The new origin references both its creation stack
+and the previous origin the value had. This may cause increased memory
+consumption, so we limit the length of origin chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/instrumentation.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+ typedef struct {
+ void *shadow, *origin;
+ } shadow_origin_ptr_t
+
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
+
+The function name depends on the memory access size.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory. When a value is stored to memory, its shadow and
+origin are also stored using the metadata pointers.
+
+Origin tracking
+~~~~~~~~~~~~~~~
+
+A special function is used to create a new origin value for a local variable and
+set the origin of that variable to that value::
+
+ void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+ kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+ struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+ };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+ void *__msan_memcpy(void *dst, void *src, uintptr_t n)
+ void *__msan_memmove(void *dst, void *src, uintptr_t n)
+ void *__msan_memset(void *dst, int c, uintptr_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each pointer dereference and each condition the compiler emits a shadow
+check that calls ``__msan_warning()`` in the case a poisoned value is being
+used::
+
+ void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+ void __msan_instrument_asm_store(void *addr, uintptr_t size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly do not point to valid memory.
+In such cases they are ignored at runtime.
+
+Disabling the instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
+ignore uninitialized values in that function and mark its output as initialized.
+As a result, the user will not get KMSAN reports related to that function.
+
+Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
+Applying this attribute to a function will result in KMSAN not instrumenting it,
+which can be helpful if we do not want the compiler to mess up some low-level
+code (e.g. that marked with ``noinstr``).
+
+This however comes at a cost: stack allocations from such functions will have
+incorrect shadow/origin values, likely leading to false positives. Functions
+called from non-instrumented code may also receive incorrect metadata for their
+parameters.
+
+As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
+
+It is also possible to disable KMSAN for a single file (e.g. main.o)::
+
+ KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+ KMSAN_SANITIZE := n
+
+in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
+function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
+their code gets broken by KMSAN (e.g. runs at early boot time).
+
+Runtime library
+---------------
+
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+ struct kmsan_context {
+ ...
+ bool allow_reporting;
+ struct kmsan_context_state cstate;
+ ...
+ }
+
+ struct task_struct {
+ ...
+ struct kmsan_context kmsan;
+ ...
+ }
+
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+ DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+ struct page {
+ ...
+ struct page *shadow, *origin;
+ ...
+ };
+
+At boot-time, the kernel allocates shadow and origin pages for every available
+kernel page. This is done quite late, when the kernel address space is already
+fragmented, so normal data pages may arbitrarily interleave with the metadata
+pages.
+
+This means that in general for two contiguous memory pages their shadow/origin
+pages may not be contiguous. So, if a memory access crosses the boundary
+of a memory block, accesses to shadow/origin memory may potentially corrupt
+other pages or read incorrect values from them.
+
+In practice, contiguous memory pages returned by the same ``alloc_pages()``
+call will have contiguous metadata, whereas if these pages belong to two
+different allocations their metadata pages can be fragmented.
+
+For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
+there also are no guarantees on metadata contiguity.
+
+In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
+pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
+
+ char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+ char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+2. For vmalloc memory and modules, there is a direct mapping between the memory
+range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+3. For CPU entry area there are separate per-CPU arrays that hold its
+metadata::
+
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);
+
+When calculating shadow and origin addresses for a given memory address, KMSAN
+checks whether the address belongs to the physical page range, the virtual page
+range or CPU entry area.
+
+Handling ``pt_regs``
+~~~~~~~~~~~~~~~~~~~~
+
+Many functions receive a ``struct pt_regs`` holding the register state at a
+certain point. Registers do not have (easily calculatable) shadow or origin
+associated with them, so we assume they are always initialized.
+
+References
+==========
+
+E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
+memory use in C++
+<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
+In Proceedings of CGO 2015.
+
+.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
+.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 10/43] kmsan: pgtable: reduce vmalloc space

KMSAN is going to use 3/4 of existing vmalloc space to hold the
metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
allocate past the first 1/4.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
---
arch/x86/include/asm/pgtable_64_types.h | 41 ++++++++++++++++++++++++-
arch/x86/mm/init_64.c | 2 +-
2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 91ac106545703..7f15d43754a34 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -139,7 +139,46 @@ extern unsigned int ptrs_per_p4d;
# define VMEMMAP_START __VMEMMAP_BASE_L4
#endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */

-#define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+#define VMEMORY_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
+
+#ifndef CONFIG_KMSAN
+#define VMALLOC_END VMEMORY_END
+#else
+/*
+ * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
+ * are used to keep the metadata for virtual pages. The memory formerly
+ * belonging to vmalloc area is now laid out as follows:
+ *
+ * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
+ * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
+ * VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
+ * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
+ * VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
+ * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
+ * - shadow for modules,
+ * KMSAN_MODULES_ORIGIN_START to
+ * KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
+ */
+#define VMALLOC_QUARTER_SIZE ((VMALLOC_SIZE_TB << 40) >> 2)
+#define VMALLOC_END (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
+
+/*
+ * vmalloc metadata addresses are calculated by adding shadow/origin offsets
+ * to vmalloc address.
+ */
+#define KMSAN_VMALLOC_SHADOW_OFFSET VMALLOC_QUARTER_SIZE
+#define KMSAN_VMALLOC_ORIGIN_OFFSET (VMALLOC_QUARTER_SIZE << 1)
+
+#define KMSAN_VMALLOC_SHADOW_START (VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
+#define KMSAN_VMALLOC_ORIGIN_START (VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
+
+/*
+ * The shadow/origin for modules are placed one by one in the last 1/4 of
+ * vmalloc space.
+ */
+#define KMSAN_MODULES_SHADOW_START (VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
+#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
+#endif /* CONFIG_KMSAN */

#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
/* The module sections ends with the start of the fixmap */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 36098226a9573..8e884e44a8d1e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
unsigned long addr;
const char *lvl;

- for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
+ for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
pgd_t *pgd = pgd_offset_k(addr);
p4d_t *p4d;
pud_t *pud;
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:38

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 11/43] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 6f8ce114032d0..b50aecd1dd423 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -663,7 +663,7 @@ void devm_namespace_disable(struct device *dev,
struct nd_namespace_common *ndns);
#if IS_ENABLED(CONFIG_ND_CLAIM)
/* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
#else
static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index 58eda16f5c534..07a539195cc8b 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -785,7 +785,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
* when populating the vmemmap. This *should* be equal to
* PMD_SIZE for most architectures.
*
- * Also make sure size of struct page is less than 64. We
+ * Also make sure size of struct page is less than 128. We
* want to make sure we use large enough size here so that
* we don't have a dynamic reserve space depending on
* struct page size. But we also want to make sure we notice
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:43

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

kcov used to be broken prior to Clang 11, but right now that version is
already the minimum required to build with KCSAN, that is why we don't
need KCSAN_KCOV_BROKEN anymore.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ida287421577f37de337139b5b5b9e977e4a6fee2
---
lib/Kconfig.kcsan | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
index e0a93ffdef30e..b81454b2a0d09 100644
--- a/lib/Kconfig.kcsan
+++ b/lib/Kconfig.kcsan
@@ -10,21 +10,10 @@ config HAVE_KCSAN_COMPILER
For the list of compilers that support KCSAN, please see
<file:Documentation/dev-tools/kcsan.rst>.

-config KCSAN_KCOV_BROKEN
- def_bool KCOV && CC_HAS_SANCOV_TRACE_PC
- depends on CC_IS_CLANG
- depends on !$(cc-option,-Werror=unused-command-line-argument -fsanitize=thread -fsanitize-coverage=trace-pc)
- help
- Some versions of clang support either KCSAN and KCOV but not the
- combination of the two.
- See https://bugs.llvm.org/show_bug.cgi?id=45831 for the status
- in newer releases.
-
menuconfig KCSAN
bool "KCSAN: dynamic data race detector"
depends on HAVE_ARCH_KCSAN && HAVE_KCSAN_COMPILER
depends on DEBUG_KERNEL && !KASAN
- depends on !KCSAN_KCOV_BROKEN
select STACKTRACE
help
The Kernel Concurrency Sanitizer (KCSAN) is a dynamic
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:44

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 16/43] kmsan: mm: call KMSAN hooks from SLUB code

In order to report uninitialized memory coming from heap allocations
KMSAN has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
---
mm/slab.h | 1 +
mm/slub.c | 26 +++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/slab.h b/mm/slab.h
index 56ad7eea3ddfb..6175a74047b47 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -521,6 +521,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
memset(p[i], 0, s->object_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
+ kmsan_slab_alloc(s, p[i], flags);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slub.c b/mm/slub.c
index abe7db581d686..5a63486e52531 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -22,6 +22,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
@@ -346,10 +347,13 @@ static inline void *freelist_dereference(const struct kmem_cache *s,
(unsigned long)ptr_addr);
}

+/*
+ * See the comment to get_freepointer_safe().
+ */
static inline void *get_freepointer(struct kmem_cache *s, void *object)
{
object = kasan_reset_tag(object);
- return freelist_dereference(s, object + s->offset);
+ return kmsan_init(freelist_dereference(s, object + s->offset));
}

static void prefetch_freepointer(const struct kmem_cache *s, void *object)
@@ -357,18 +361,28 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
prefetchw(object + s->offset);
}

+/*
+ * When running under KMSAN, get_freepointer_safe() may return an uninitialized
+ * pointer value in the case the current thread loses the race for the next
+ * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
+ * slab_alloc_node() will fail, so the uninitialized value won't be used, but
+ * KMSAN will still check all arguments of cmpxchg because of imperfect
+ * handling of inline assembly.
+ * To work around this problem, use kmsan_init() to force initialize the
+ * return value of get_freepointer_safe().
+ */
static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
{
unsigned long freepointer_addr;
void *p;

if (!debug_pagealloc_enabled_static())
- return get_freepointer(s, object);
+ return kmsan_init(get_freepointer(s, object));

object = kasan_reset_tag(object);
freepointer_addr = (unsigned long)object + s->offset;
copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
- return freelist_ptr(s, p, freepointer_addr);
+ return kmsan_init(freelist_ptr(s, p, freepointer_addr));
}

static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
@@ -1678,6 +1692,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
ptr = kasan_kmalloc_large(ptr, size, flags);
/* As ptr might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ptr, size, 1, flags);
+ kmsan_kmalloc_large(ptr, size, flags);
return ptr;
}

@@ -1685,12 +1700,14 @@ static __always_inline void kfree_hook(void *x)
{
kmemleak_free(x);
kasan_kfree_large(x);
+ kmsan_kfree_large(x);
}

static __always_inline bool slab_free_hook(struct kmem_cache *s,
void *x, bool init)
{
kmemleak_free_recursive(x, s->flags);
+ kmsan_slab_free(s, x);

debug_check_no_locks_freed(x, s->object_size);

@@ -3729,6 +3746,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
*/
slab_post_alloc_hook(s, objcg, flags, size, p,
slab_want_init_on_alloc(flags, s));
+
return i;
error:
slub_put_cpu_ptr(s->cpu_slab);
@@ -5905,6 +5923,7 @@ static char *create_unique_id(struct kmem_cache *s)
p += sprintf(p, "%07u", s->size);

BUG_ON(p > name + ID_STR_LENGTH - 1);
+ kmsan_unpoison_memory(name, p - name);
return name;
}

@@ -6006,6 +6025,7 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
al->name = name;
al->next = alias_list;
alias_list = al;
+ kmsan_unpoison_memory(al, sizeof(struct saved_alias));
return 0;
}

--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:50

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 15/43] kmsan: mm: maintain KMSAN metadata for page operations

Insert KMSAN hooks that make the necessary bookkeeping changes:
- poison page shadow and origins in alloc_pages()/free_page();
- clear page shadow and origins in clear_page(), copy_user_highpage();
- copy page metadata in copy_highpage(), wp_page_copy();
- handle vmap()/vunmap()/iounmap();

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
---
arch/x86/include/asm/page_64.h | 13 +++++++++++++
arch/x86/mm/ioremap.c | 3 +++
include/linux/highmem.h | 3 +++
mm/memory.c | 2 ++
mm/page_alloc.c | 14 ++++++++++++++
mm/vmalloc.c | 20 ++++++++++++++++++--
6 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index 4bde0dc66100c..c10547510f1f4 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -44,14 +44,27 @@ void clear_page_orig(void *page);
void clear_page_rep(void *page);
void clear_page_erms(void *page);

+/* This is an assembly header, avoid including too much of kmsan.h */
+#ifdef CONFIG_KMSAN
+void kmsan_unpoison_memory(const void *addr, size_t size);
+#endif
+__no_sanitize_memory
static inline void clear_page(void *page)
{
+#ifdef CONFIG_KMSAN
+ /* alternative_call_2() changes @page. */
+ void *page_copy = page;
+#endif
alternative_call_2(clear_page_orig,
clear_page_rep, X86_FEATURE_REP_GOOD,
clear_page_erms, X86_FEATURE_ERMS,
"=D" (page),
"0" (page)
: "cc", "memory", "rax", "rcx");
+#ifdef CONFIG_KMSAN
+ /* Clear KMSAN shadow for the pages that have it. */
+ kmsan_unpoison_memory(page_copy, PAGE_SIZE);
+#endif
}

void copy_page(void *to, void *from);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 026031b3b7829..4d0349ecc7cd7 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
#include <linux/cc_platform.h>
#include <linux/efi.h>
#include <linux/pgtable.h>
+#include <linux/kmsan.h>

#include <asm/set_memory.h>
#include <asm/e820/api.h>
@@ -474,6 +475,8 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ kmsan_iounmap_page_range((unsigned long)addr,
+ (unsigned long)addr + get_vm_area_size(p));
memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));

/* Finally remove it */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9cd..3e1898a44d7e3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/bug.h>
#include <linux/cacheflush.h>
+#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <linux/hardirq.h>
@@ -277,6 +278,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_user_page(vto, vfrom, vaddr, to);
+ kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
kunmap_local(vto);
kunmap_local(vfrom);
}
@@ -292,6 +294,7 @@ static inline void copy_highpage(struct page *to, struct page *from)
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_page(vto, vfrom);
+ kmsan_copy_page_meta(to, from);
kunmap_local(vto);
kunmap_local(vfrom);
}
diff --git a/mm/memory.c b/mm/memory.c
index 8f1de811a1dcb..ea9e48daadb15 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -51,6 +51,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/memremap.h>
+#include <linux/kmsan.h>
#include <linux/ksm.h>
#include <linux/rmap.h>
#include <linux/export.h>
@@ -3003,6 +3004,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
put_page(old_page);
return 0;
}
+ kmsan_copy_page_meta(new_page, old_page);
}

if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c5952749ad40b..fa8029b714a81 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -26,6 +26,7 @@
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/module.h>
#include <linux/suspend.h>
#include <linux/pagevec.h>
@@ -1288,6 +1289,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(PageTail(page), page);

trace_mm_page_free(page, order);
+ kmsan_free_page(page, order);

if (unlikely(PageHWPoison(page)) && !order) {
/*
@@ -1734,6 +1736,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
{
if (early_page_uninitialised(pfn))
return;
+ if (!kmsan_memblock_free_pages(page, order))
+ /* KMSAN will take care of these pages. */
+ return;
__free_pages_core(page, order);
}

@@ -3663,6 +3668,14 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
/*
* Allocate a page from the given zone. Use pcplists for order-0 allocations.
*/
+
+/*
+ * Do not instrument rmqueue() with KMSAN. This function may call
+ * __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
+ * If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
+ * may call rmqueue() again, which will result in a deadlock.
+ */
+__no_sanitize_memory
static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
@@ -5389,6 +5402,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
}

trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ kmsan_alloc_page(page, order, alloc_gfp);

return page;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d2a00ad4e1dd1..333de26b3c56e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -319,6 +319,9 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
+ if (!err)
+ kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+ ioremap_max_page_shift);
return err;
}

@@ -418,7 +421,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-void vunmap_range_noflush(unsigned long start, unsigned long end)
+void __vunmap_range_noflush(unsigned long start, unsigned long end)
{
unsigned long next;
pgd_t *pgd;
@@ -440,6 +443,12 @@ void vunmap_range_noflush(unsigned long start, unsigned long end)
arch_sync_kernel_mappings(start, end);
}

+void vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ kmsan_vunmap_range_noflush(start, end);
+ __vunmap_range_noflush(start, end);
+}
+
/**
* vunmap_range - unmap kernel virtual addresses
* @addr: start of the VM area to unmap
@@ -574,7 +583,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
@@ -600,6 +609,13 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
return 0;
}

+int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int page_shift)
+{
+ kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+ return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+}
+
/**
* vmap_pages_range - map pages to a kernel virtual address
* @addr: start of the VM area to map
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:53

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 14/43] MAINTAINERS: add entry for KMSAN

Add entry for KMSAN maintainers/reviewers.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
---
MAINTAINERS | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 13f9a84a617e3..94add5a5404e4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10615,6 +10615,18 @@ F: kernel/kmod.c
F: lib/test_kmod.c
F: tools/testing/selftests/kmod/

+KMSAN
+M: Alexander Potapenko <[email protected]>
+R: Marco Elver <[email protected]>
+R: Dmitry Vyukov <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/dev-tools/kmsan.rst
+F: include/linux/kmsan*.h
+F: lib/Kconfig.kmsan
+F: mm/kmsan/
+F: scripts/Makefile.kmsan
+
KPROBES
M: Naveen N. Rao <[email protected]>
M: Anil S Keshavamurthy <[email protected]>
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:59

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 17/43] kmsan: handle task creation and exiting

Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
---
kernel/exit.c | 2 ++
kernel/fork.c | 2 ++
2 files changed, 4 insertions(+)

diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686e..a276f6716dcd5 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -59,6 +59,7 @@
#include <linux/writeback.h>
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/kmsan.h>
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
@@ -767,6 +768,7 @@ void __noreturn do_exit(long code)

profile_task_exit(tsk);
kcov_task_exit(tsk);
+ kmsan_task_exit(tsk);

coredump_task_exit(tsk);
ptrace_event(PTRACE_EVENT_EXIT, code);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697d..5d53ffab2cda7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
#include <linux/fdtable.h>
#include <linux/iocontext.h>
#include <linux/key.h>
+#include <linux/kmsan.h>
#include <linux/binfmts.h>
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
@@ -955,6 +956,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
account_kernel_stack(tsk, 1);

kcov_task_init(tsk);
+ kmsan_task_create(tsk);
kmap_local_fork(tsk);

#ifdef CONFIG_FAULT_INJECTION
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:22:57

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 13/43] kmsan: add KMSAN runtime core

This patch adds the core parts of KMSAN runtime and associated files:

- include/linux/kmsan-checks.h: user API to poison/unpoison/check
the kernel memory;
- include/linux/kmsan.h: declarations of KMSAN hooks to be referenced
outside of KMSAN runtime;
- lib/Kconfig.kmsan: CONFIG_KMSAN and related declarations;
- Makefile, mm/Makefile, mm/kmsan/Makefile: boilerplate Makefile code;
- mm/kmsan/annotations.c: non-inlineable implementation of KMSAN_INIT();
- mm/kmsan/core.c: core functions that operate with shadow and origin
memory and perform checks, utility functions;
- mm/kmsan/hooks.c: KMSAN hooks for kernel subsystems;
- mm/kmsan/init.c: KMSAN initialization routines;
- mm/kmsan/instrumentation.c: functions called by KMSAN instrumentation;
- mm/kmsan/kmsan.h: internal KMSAN declarations;
- mm/kmsan/shadow.c: routines that encapsulate metadata creation and
addressing;
- scripts/Makefile.kmsan: CFLAGS_KMSAN
- scripts/Makefile.lib: KMSAN_SANITIZE and KMSAN_ENABLE_CHECKS macros

The patch also adds the necessary bookkeeping bits to struct page and
struct task_struct:
- each struct page now contains pointers to two struct pages holding
KMSAN metadata (shadow and origins) for the original struct page;
- each task_struct contains a struct kmsan_task_state used to track
the metadata of function parameters and return values for that task.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
---
Makefile | 1 +
include/linux/kmsan-checks.h | 123 ++++++++++
include/linux/kmsan.h | 365 ++++++++++++++++++++++++++++++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
lib/Kconfig.debug | 1 +
lib/Kconfig.kmsan | 18 ++
mm/Makefile | 1 +
mm/kmsan/Makefile | 22 ++
mm/kmsan/annotations.c | 28 +++
mm/kmsan/core.c | 427 +++++++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 400 ++++++++++++++++++++++++++++++++
mm/kmsan/init.c | 238 +++++++++++++++++++
mm/kmsan/instrumentation.c | 233 +++++++++++++++++++
mm/kmsan/kmsan.h | 197 ++++++++++++++++
mm/kmsan/report.c | 210 +++++++++++++++++
mm/kmsan/shadow.c | 332 +++++++++++++++++++++++++++
scripts/Makefile.kmsan | 1 +
scripts/Makefile.lib | 9 +
19 files changed, 2623 insertions(+)
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/annotations.c
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/init.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

diff --git a/Makefile b/Makefile
index 765115c99655f..7af3edfb2d0de 100644
--- a/Makefile
+++ b/Makefile
@@ -1012,6 +1012,7 @@ include-y := scripts/Makefile.extrawarn
include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
include-$(CONFIG_KASAN) += scripts/Makefile.kasan
include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
+include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
new file mode 100644
index 0000000000000..d41868c723d1e
--- /dev/null
+++ b/include/linux/kmsan-checks.h
@@ -0,0 +1,123 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN checks to be used for one-off annotations in subsystems.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#ifndef _LINUX_KMSAN_CHECKS_H
+#define _LINUX_KMSAN_CHECKS_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_KMSAN
+
+/*
+ * Helper functions that mark the return value initialized.
+ * See mm/kmsan/annotations.c.
+ */
+u8 kmsan_init_1(u8 value);
+u16 kmsan_init_2(u16 value);
+u32 kmsan_init_4(u32 value);
+u64 kmsan_init_8(u64 value);
+
+static inline void *kmsan_init_ptr(void *ptr)
+{
+ return (void *)kmsan_init_8((u64)ptr);
+}
+
+static inline char kmsan_init_char(char value)
+{
+ return (u8)kmsan_init_1((u8)value);
+}
+
+#define __decl_kmsan_init_type(type, fn) unsigned type : fn, signed type : fn
+
+/**
+ * kmsan_init - Make the value initialized.
+ * @val: 1-, 2-, 4- or 8-byte integer that may be treated as uninitialized by
+ * KMSAN.
+ *
+ * Return: value of @val that KMSAN treats as initialized.
+ */
+#define kmsan_init(val) \
+ ( \
+ (typeof(val))(_Generic((val), \
+ __decl_kmsan_init_type(char, kmsan_init_1), \
+ __decl_kmsan_init_type(short, kmsan_init_2), \
+ __decl_kmsan_init_type(int, kmsan_init_4), \
+ __decl_kmsan_init_type(long, kmsan_init_8), \
+ char : kmsan_init_char, \
+ void * : kmsan_init_ptr)(val)))
+
+/**
+ * kmsan_poison_memory() - Mark the memory range as uninitialized.
+ * @address: address to start with.
+ * @size: size of buffer to poison.
+ * @flags: GFP flags for allocations done by this function.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * uninitialized. Error reports for this memory will reference the call site of
+ * kmsan_poison_memory() as origin.
+ */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
+
+/**
+ * kmsan_unpoison_memory() - Mark the memory range as initialized.
+ * @address: address to start with.
+ * @size: size of buffer to unpoison.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * initialized.
+ */
+void kmsan_unpoison_memory(const void *address, size_t size);
+
+/**
+ * kmsan_check_memory() - Check the memory range for being initialized.
+ * @address: address to start with.
+ * @size: size of buffer to check.
+ *
+ * If any piece of the given range is marked as uninitialized, KMSAN will report
+ * an error.
+ */
+void kmsan_check_memory(const void *address, size_t size);
+
+/**
+ * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace.
+ * @to: destination address in the userspace.
+ * @from: source address in the kernel.
+ * @to_copy: number of bytes to copy.
+ * @left: number of bytes not copied.
+ *
+ * If this is a real userspace data transfer, KMSAN checks the bytes that were
+ * actually copied to ensure there was no information leak. If @to belongs to
+ * the kernel space (which is possible for compat syscalls), KMSAN just copies
+ * the metadata.
+ */
+void kmsan_copy_to_user(const void *to, const void *from, size_t to_copy,
+ size_t left);
+
+#else
+
+#define kmsan_init(value) (value)
+
+static inline void kmsan_poison_memory(const void *address, size_t size,
+ gfp_t flags)
+{
+}
+static inline void kmsan_unpoison_memory(const void *address, size_t size)
+{
+}
+static inline void kmsan_check_memory(const void *address, size_t size)
+{
+}
+static inline void kmsan_copy_to_user(const void *to, const void *from,
+ size_t to_copy, size_t left)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_CHECKS_H */
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
new file mode 100644
index 0000000000000..f17bc9ded7b97
--- /dev/null
+++ b/include/linux/kmsan.h
@@ -0,0 +1,365 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN API for subsystems.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+#ifndef _LINUX_KMSAN_H
+#define _LINUX_KMSAN_H
+
+#include <linux/dma-direction.h>
+#include <linux/gfp.h>
+#include <linux/kmsan-checks.h>
+#include <linux/stackdepot.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+struct page;
+struct kmem_cache;
+struct task_struct;
+struct scatterlist;
+struct urb;
+
+#ifdef CONFIG_KMSAN
+
+/* These constants are defined in the MSan LLVM instrumentation pass. */
+#define KMSAN_RETVAL_SIZE 800
+#define KMSAN_PARAM_SIZE 800
+
+struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+};
+
+#undef KMSAN_PARAM_SIZE
+#undef KMSAN_RETVAL_SIZE
+
+struct kmsan_ctx {
+ struct kmsan_context_state cstate;
+ int kmsan_in_runtime;
+ bool allow_reporting;
+};
+
+/**
+ * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
+ *
+ * Allocate and initialize KMSAN metadata for early allocations.
+ */
+void __init kmsan_init_shadow(void);
+
+/**
+ * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
+ */
+void __init kmsan_init_runtime(void);
+
+/**
+ * kmsan_memblock_free_pages() - handle freeing of memblock pages.
+ * @page: struct page to free.
+ * @order: order of @page.
+ *
+ * Freed pages are either returned to buddy allocator or held back to be used
+ * as metadata pages.
+ */
+bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
+
+/**
+ * kmsan_task_create() - Initialize KMSAN state for the task.
+ * @task: task to initialize.
+ */
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
+/**
+ * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
+ * @page: struct page pointer returned by alloc_pages().
+ * @order: order of allocated struct page.
+ * @flags: GFP flags used by alloc_pages()
+ *
+ * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
+ * @flags contain __GFP_ZERO.
+ */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
+
+/**
+ * kmsan_free_page() - Notify KMSAN about a free_pages() call.
+ * @page: struct page pointer passed to free_pages().
+ * @order: order of deallocated struct page.
+ *
+ * KMSAN marks freed memory as uninitialized.
+ */
+void kmsan_free_page(struct page *page, unsigned int order);
+
+/**
+ * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
+ * @dst: destination page.
+ * @src: source page.
+ *
+ * KMSAN copies the contents of metadata pages for @src into the metadata pages
+ * for @dst. If @dst has no associated metadata pages, nothing happens.
+ * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
+ */
+void kmsan_copy_page_meta(struct page *dst, struct page *src);
+
+/**
+ * kmsan_gup_pgd_range() - Notify KMSAN about a gup_pgd_range() call.
+ * @pages: array of struct page pointers.
+ * @nr: array size.
+ *
+ * gup_pgd_range() creates new pages, some of which may belong to the userspace
+ * memory. In that case KMSAN marks them as initialized.
+ */
+void kmsan_gup_pgd_range(struct page **pages, int nr);
+
+/**
+ * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
+ * newly created object, marking it as initialized or uninitialized.
+ */
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
+
+/**
+ * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ *
+ * KMSAN marks the freed object as uninitialized.
+ */
+void kmsan_slab_free(struct kmem_cache *s, void *object);
+
+/**
+ * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
+ * @ptr: object pointer.
+ * @size: object size.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Similar to kmsan_slab_alloc(), but for large allocations.
+ */
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+
+/**
+ * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
+ * @ptr: object pointer.
+ *
+ * Similar to kmsan_slab_free(), but for large allocations.
+ */
+void kmsan_kfree_large(const void *ptr);
+
+/**
+ * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
+ * @start: start of vmapped range.
+ * @end: end of vmapped range.
+ * @prot: page protection flags used for vmap.
+ * @pages: array of pages.
+ * @page_shift: page_shift passed to vmap_range_noflush().
+ *
+ * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
+ * vmalloc metadata address range.
+ */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
+/**
+ * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
+ * @start: start of vunmapped range.
+ * @end: end of vunmapped range.
+ *
+ * KMSAN unmaps the contiguous metadata ranges created by
+ * kmsan_map_kernel_range_noflush().
+ */
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
+ * @addr: range start.
+ * @end: range end.
+ * @phys_addr: physical range start.
+ * @prot: page protection flags used for ioremap_page_range().
+ * @page_shift: page_shift argument passed to vmap_range_noflush().
+ *
+ * KMSAN creates new metadata pages for the physical pages mapped into the
+ * virtual memory.
+ */
+void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift);
+
+/**
+ * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
+ * @start: range start.
+ * @end: range end.
+ *
+ * KMSAN unmaps the metadata pages for the given range and, unlike for
+ * vunmap_page_range(), also deallocates them.
+ */
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_handle_dma() - Handle a DMA data transfer.
+ * @page: first page of the buffer.
+ * @offset: offset of the buffer within the first page.
+ * @size: buffer size.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffer, if it is copied to device;
+ * * initializes the buffer, if it is copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir);
+
+/**
+ * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist.
+ * @sg: scatterlist holding DMA buffers.
+ * @nents: number of scatterlist entries.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffers in the scatterlist, if they are copied to device;
+ * * initializes the buffers, if they are copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir);
+
+/**
+ * kmsan_handle_urb() - Handle a USB data transfer.
+ * @urb: struct urb pointer.
+ * @is_out: data transfer direction (true means output to hardware).
+ *
+ * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise,
+ * KMSAN initializes the transfer buffer.
+ */
+void kmsan_handle_urb(const struct urb *urb, bool is_out);
+
+/**
+ * kmsan_instrumentation_begin() - handle instrumentation_begin().
+ * @regs: pointer to struct pt_regs that non-instrumented code passes to
+ * instrumented code.
+ */
+void kmsan_instrumentation_begin(struct pt_regs *regs);
+
+#else
+
+static inline void kmsan_init_shadow(void)
+{
+}
+
+static inline void kmsan_init_runtime(void)
+{
+}
+
+static inline bool kmsan_memblock_free_pages(struct page *page,
+ unsigned int order)
+{
+ return true;
+}
+
+static inline void kmsan_task_create(struct task_struct *task)
+{
+}
+
+static inline void kmsan_task_exit(struct task_struct *task)
+{
+}
+
+static inline int kmsan_alloc_page(struct page *page, unsigned int order,
+ gfp_t flags)
+{
+ return 0;
+}
+
+static inline void kmsan_free_page(struct page *page, unsigned int order)
+{
+}
+
+static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+}
+
+static inline void kmsan_gup_pgd_range(struct page **pages, int nr)
+{
+}
+
+static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+}
+
+static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_kfree_large(const void *ptr)
+{
+}
+
+static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
+ unsigned long end,
+ pgprot_t prot,
+ struct page **pages,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_vunmap_range_noflush(unsigned long start,
+ unsigned long end)
+{
+}
+
+static inline void kmsan_ioremap_page_range(unsigned long start,
+ unsigned long end,
+ phys_addr_t phys_addr,
+ pgprot_t prot,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_iounmap_page_range(unsigned long start,
+ unsigned long end)
+{
+}
+
+static inline void kmsan_handle_dma(struct page *page, size_t offset,
+ size_t size, enum dma_data_direction dir)
+{
+}
+
+static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+}
+
+static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+}
+
+static inline void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3a6e62096006..bdbe4b39b826d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -233,6 +233,18 @@ struct page {
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */

+#ifdef CONFIG_KMSAN
+ /*
+ * KMSAN metadata for this page:
+ * - shadow page: every bit indicates whether the corresponding
+ * bit of the original page is initialized (0) or not (1);
+ * - origin page: every 4 bytes contain an id of the stack trace
+ * where the uninitialized value was created.
+ */
+ struct page *kmsan_shadow;
+ struct page *kmsan_origin;
+#endif
+
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec6..8d076f82d5072 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -14,6 +14,7 @@
#include <linux/pid.h>
#include <linux/sem.h>
#include <linux/shm.h>
+#include <linux/kmsan.h>
#include <linux/mutex.h>
#include <linux/plist.h>
#include <linux/hrtimer.h>
@@ -1341,6 +1342,10 @@ struct task_struct {
#endif
#endif

+#ifdef CONFIG_KMSAN
+ struct kmsan_ctx kmsan_ctx;
+#endif
+
#if IS_ENABLED(CONFIG_KUNIT)
struct kunit *kunit_test;
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5e14e32056add..304374f2c300a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW

source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
+source "lib/Kconfig.kmsan"

endmenu # "Memory Debugging"

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
new file mode 100644
index 0000000000000..02fd6db792b1f
--- /dev/null
+++ b/lib/Kconfig.kmsan
@@ -0,0 +1,18 @@
+config HAVE_ARCH_KMSAN
+ bool
+
+config HAVE_KMSAN_COMPILER
+ def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1))
+
+config KMSAN
+ bool "KMSAN: detector of uninitialized values use"
+ depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
+ depends on SLUB && !KASAN && !KCSAN
+ depends on CC_IS_CLANG && CLANG_VERSION >= 140000
+ select STACKDEPOT
+ help
+ KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
+ uninitialized values in the kernel. It is based on compiler
+ instrumentation provided by Clang and thus requires Clang to build.
+
+ See <file:Documentation/dev-tools/kmsan.rst> for more details.
diff --git a/mm/Makefile b/mm/Makefile
index d6c0042e3aa0d..8e9319a9affea 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_KASAN) += kasan/
obj-$(CONFIG_KFENCE) += kfence/
+obj-$(CONFIG_KMSAN) += kmsan/
obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_MEMTEST) += memtest.o
obj-$(CONFIG_MIGRATION) += migrate.o
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
new file mode 100644
index 0000000000000..f57a956cb1c8b
--- /dev/null
+++ b/mm/kmsan/Makefile
@@ -0,0 +1,22 @@
+obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o annotations.o
+
+KMSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+UBSAN_SANITIZE := n
+
+KMSAN_SANITIZE_kmsan_annotations.o := y
+
+# Disable instrumentation of KMSAN runtime with other tools.
+CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
+CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
+CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
+
+CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_annotations.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/annotations.c b/mm/kmsan/annotations.c
new file mode 100644
index 0000000000000..037468d1840f2
--- /dev/null
+++ b/mm/kmsan/annotations.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN annotations.
+ *
+ * The kmsan_init_SIZE functions reside in a separate translation unit to
+ * prevent inlining them. Clang may inline functions marked with
+ * __no_sanitize_memory attribute into functions without it, which effectively
+ * results in ignoring the attribute.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <linux/export.h>
+#include <linux/kmsan-checks.h>
+
+#define DECLARE_KMSAN_INIT(size, t) \
+ __no_sanitize_memory t kmsan_init_##size(t value) \
+ { \
+ return value; \
+ } \
+ EXPORT_SYMBOL(kmsan_init_##size)
+
+DECLARE_KMSAN_INIT(1, u8);
+DECLARE_KMSAN_INIT(2, u16);
+DECLARE_KMSAN_INIT(4, u32);
+DECLARE_KMSAN_INIT(8, u64);
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
new file mode 100644
index 0000000000000..b2bb25a8013e4
--- /dev/null
+++ b/mm/kmsan/core.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN runtime library.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <asm/page.h>
+#include <linux/compiler.h>
+#include <linux/export.h>
+#include <linux/highmem.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/mmzone.h>
+#include <linux/percpu-defs.h>
+#include <linux/preempt.h>
+#include <linux/slab.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Avoid creating too long origin chains, these are unlikely to participate in
+ * real reports.
+ */
+#define MAX_CHAIN_DEPTH 7
+#define NUM_SKIPPED_TO_WARN 10000
+
+bool kmsan_enabled __read_mostly;
+
+/*
+ * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
+ * unavaliable.
+ */
+DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+void kmsan_internal_task_create(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ __memset(ctx, 0, sizeof(struct kmsan_ctx));
+ ctx->allow_reporting = true;
+ kmsan_internal_unpoison_memory(current_thread_info(),
+ sizeof(struct thread_info), false);
+}
+
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags)
+{
+ u32 extra_bits =
+ kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
+ bool checked = poison_flags & KMSAN_POISON_CHECK;
+ depot_stack_handle_t handle;
+
+ handle = kmsan_save_stack_with_flags(flags, extra_bits);
+ kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
+}
+
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
+{
+ kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
+}
+
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra)
+{
+ unsigned long entries[KMSAN_STACK_DEPTH];
+ unsigned int nr_entries;
+
+ nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
+ nr_entries = filter_irq_stacks(entries, nr_entries);
+
+ /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
+ flags &= ~__GFP_DIRECT_RECLAIM;
+
+ return __stack_depot_save(entries, nr_entries, extra, flags, true);
+}
+
+/* Copy the metadata following the memmove() behavior. */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
+{
+ depot_stack_handle_t old_origin = 0, chain_origin, new_origin = 0;
+ int src_slots, dst_slots, i, iter, step, skip_bits;
+ depot_stack_handle_t *origin_src, *origin_dst;
+ void *shadow_src, *shadow_dst;
+ u32 *align_shadow_src, shadow;
+ bool backwards;
+
+ shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
+ if (!shadow_dst)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
+
+ shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
+ if (!shadow_src) {
+ /*
+ * |src| is untracked: zero out destination shadow, ignore the
+ * origins, we're done.
+ */
+ __memset(shadow_dst, 0, n);
+ return;
+ }
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
+
+ __memmove(shadow_dst, shadow_src, n);
+
+ origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
+ origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin_dst || !origin_src);
+ src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ KMSAN_WARN_ON(!src_slots || !dst_slots);
+ KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
+ KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
+ (dst_slots - src_slots < -1));
+
+ backwards = dst > src;
+ i = backwards ? min(src_slots, dst_slots) - 1 : 0;
+ iter = backwards ? -1 : 1;
+
+ align_shadow_src =
+ (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
+ for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
+ KMSAN_WARN_ON(i < 0);
+ shadow = align_shadow_src[i];
+ if (i == 0) {
+ /*
+ * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
+ * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
+ * of the first shadow slot.
+ */
+ skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow << skip_bits) >> skip_bits;
+ }
+ if (i == src_slots - 1) {
+ /*
+ * If |src + n| isn't aligned on
+ * KMSAN_ORIGIN_SIZE, don't look at the last
+ * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
+ * last shadow slot.
+ */
+ skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow >> skip_bits) << skip_bits;
+ }
+ /*
+ * Overwrite the origin only if the corresponding
+ * shadow is nonempty.
+ */
+ if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
+ old_origin = origin_src[i];
+ chain_origin = kmsan_internal_chain_origin(old_origin);
+ /*
+ * kmsan_internal_chain_origin() may return
+ * NULL, but we don't want to lose the previous
+ * origin value.
+ */
+ if (chain_origin)
+ new_origin = chain_origin;
+ else
+ new_origin = old_origin;
+ }
+ if (shadow)
+ origin_dst[i] = new_origin;
+ else
+ origin_dst[i] = 0;
+ }
+}
+
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
+{
+ unsigned long entries[3];
+ u32 extra_bits;
+ int depth;
+ bool uaf;
+
+ if (!id)
+ return id;
+ /*
+ * Make sure we have enough spare bits in |id| to hold the UAF bit and
+ * the chain depth.
+ */
+ BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
+
+ extra_bits = stack_depot_get_extra_bits(id);
+ depth = kmsan_depth_from_eb(extra_bits);
+ uaf = kmsan_uaf_from_eb(extra_bits);
+
+ if (depth >= MAX_CHAIN_DEPTH) {
+ static atomic_long_t kmsan_skipped_origins;
+ long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
+
+ if (skipped % NUM_SKIPPED_TO_WARN == 0) {
+ pr_warn("not chained %d origins\n", skipped);
+ dump_stack();
+ kmsan_print_origin(id);
+ }
+ return id;
+ }
+ depth++;
+ extra_bits = kmsan_extra_bits(depth, uaf);
+
+ entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
+ entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
+ entries[2] = id;
+ return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
+ GFP_ATOMIC, true);
+}
+
+void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
+ u32 origin, bool checked)
+{
+ u64 address = (u64)addr;
+ void *shadow_start;
+ u32 *origin_start;
+ size_t pad = 0;
+ int i;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
+ if (!shadow_start) {
+ /*
+ * kmsan_metadata_is_contiguous() is true, so either all shadow
+ * and origin pages are NULL, or all are non-NULL.
+ */
+ if (checked) {
+ pr_err("%s: not memsetting %d bytes starting at %px, because the shadow is NULL\n",
+ __func__, size, addr);
+ BUG();
+ }
+ return;
+ }
+ __memset(shadow_start, b, size);
+
+ if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ pad = address % KMSAN_ORIGIN_SIZE;
+ address -= pad;
+ size += pad;
+ }
+ size = ALIGN(size, KMSAN_ORIGIN_SIZE);
+ origin_start =
+ (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
+
+ for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
+ origin_start[i] = origin;
+}
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
+{
+ struct page *page;
+
+ if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
+ !kmsan_internal_is_module_addr(vaddr))
+ return NULL;
+ page = vmalloc_to_page(vaddr);
+ if (pfn_valid(page_to_pfn(page)))
+ return page;
+ else
+ return NULL;
+}
+
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason)
+{
+ depot_stack_handle_t cur_origin = 0, new_origin = 0;
+ unsigned long addr64 = (unsigned long)addr;
+ depot_stack_handle_t *origin = NULL;
+ unsigned char *shadow = NULL;
+ int cur_off_start = -1;
+ int i, chunk_size;
+ size_t pos = 0;
+
+ if (!size)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ while (pos < size) {
+ chunk_size = min(size - pos,
+ PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
+ shadow = kmsan_get_metadata((void *)(addr64 + pos),
+ KMSAN_META_SHADOW);
+ if (!shadow) {
+ /*
+ * This page is untracked. If there were uninitialized
+ * bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos - 1, user_addr,
+ reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ pos += chunk_size;
+ continue;
+ }
+ for (i = 0; i < chunk_size; i++) {
+ if (!shadow[i]) {
+ /*
+ * This byte is unpoisoned. If there were
+ * poisoned bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ continue;
+ }
+ origin = kmsan_get_metadata((void *)(addr64 + pos + i),
+ KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin);
+ new_origin = *origin;
+ /*
+ * Encountered new origin - report the previous
+ * uninitialized range.
+ */
+ if (cur_origin != new_origin) {
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = new_origin;
+ cur_off_start = pos + i;
+ }
+ }
+ pos += chunk_size;
+ }
+ KMSAN_WARN_ON(pos != size);
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+}
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size)
+{
+ char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
+ *next_origin = NULL;
+ u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
+ depot_stack_handle_t *origin_p;
+ bool all_untracked = false;
+
+ if (!size)
+ return true;
+
+ /* The whole range belongs to the same page. */
+ if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
+ ALIGN_DOWN(cur_addr, PAGE_SIZE))
+ return true;
+
+ cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
+ if (!cur_shadow)
+ all_untracked = true;
+ cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
+ if (all_untracked && cur_origin)
+ goto report;
+
+ for (; next_addr < (u64)addr + size;
+ cur_addr = next_addr, cur_shadow = next_shadow,
+ cur_origin = next_origin, next_addr += PAGE_SIZE) {
+ next_shadow = kmsan_get_metadata((void *)next_addr, false);
+ next_origin = kmsan_get_metadata((void *)next_addr, true);
+ if (all_untracked) {
+ if (next_shadow || next_origin)
+ goto report;
+ if (!next_shadow && !next_origin)
+ continue;
+ }
+ if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
+ ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
+ continue;
+ goto report;
+ }
+ return true;
+
+report:
+ pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
+ pr_err("Access of size %d at %px.\n", size, addr);
+ pr_err("Addresses belonging to different ranges: %px and %px\n",
+ cur_addr, next_addr);
+ pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
+ next_shadow);
+ pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
+ next_origin);
+ origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
+ if (origin_p) {
+ pr_err("Origin: %08x\n", *origin_p);
+ kmsan_print_origin(*origin_p);
+ } else {
+ pr_err("Origin: unavailable\n");
+ }
+ return false;
+}
+
+bool kmsan_internal_is_module_addr(void *vaddr)
+{
+ return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
+}
+
+bool kmsan_internal_is_vmalloc_addr(void *addr)
+{
+ return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
+}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
new file mode 100644
index 0000000000000..4012d7a4adb53
--- /dev/null
+++ b/mm/kmsan/hooks.c
@@ -0,0 +1,400 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN hooks for kernel subsystems.
+ *
+ * These functions handle creation of KMSAN metadata for memory allocations.
+ *
+ * Copyright (C) 2018-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <linux/cacheflush.h>
+#include <linux/dma-direction.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/usb.h>
+
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Instrumented functions shouldn't be called under
+ * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
+ * skipping effects of functions like memset() inside instrumented code.
+ */
+
+void kmsan_task_create(struct task_struct *task)
+{
+ kmsan_enter_runtime();
+ kmsan_internal_task_create(task);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_task_create);
+
+void kmsan_task_exit(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ctx->allow_reporting = false;
+}
+EXPORT_SYMBOL(kmsan_task_exit);
+
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
+{
+ if (unlikely(object == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * There's a ctor or this is an RCU cache - do nothing. The memory
+ * status hasn't changed since last use.
+ */
+ if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
+ return;
+
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory(object, s->object_size,
+ KMSAN_POISON_CHECK);
+ else
+ kmsan_internal_poison_memory(object, s->object_size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_alloc);
+
+void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ /* RCU slabs could be legally used after free within the RCU period */
+ if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+ return;
+ /*
+ * If there's a constructor, freed memory must remain in the same state
+ * until the next allocation. We cannot save its state to detect
+ * use-after-free bugs, instead we just keep it unpoisoned.
+ */
+ if (s->ctor)
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_free);
+
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+{
+ if (unlikely(ptr == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory((void *)ptr, size,
+ /*checked*/ true);
+ else
+ kmsan_internal_poison_memory((void *)ptr, size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kmalloc_large);
+
+void kmsan_kfree_large(const void *ptr)
+{
+ struct page *page;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ page = virt_to_head_page((void *)ptr);
+ KMSAN_WARN_ON(ptr != page_address(page));
+ kmsan_internal_poison_memory((void *)ptr,
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kfree_large);
+
+static unsigned long vmalloc_shadow(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_SHADOW);
+}
+
+static unsigned long vmalloc_origin(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_ORIGIN);
+}
+
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ __vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
+ __vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+}
+EXPORT_SYMBOL(kmsan_vunmap_range_noflush);
+
+/*
+ * This function creates new shadow/origin pages for the physical pages mapped
+ * into the virtual memory. If those physical pages already had shadow/origin,
+ * those are ignored.
+ */
+void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift)
+{
+ gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+ struct page *shadow, *origin;
+ unsigned long off = 0;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ for (i = 0; i < nr; i++, off += PAGE_SIZE) {
+ shadow = alloc_pages(gfp_mask, 1);
+ origin = alloc_pages(gfp_mask, 1);
+ __vmap_pages_range_noflush(
+ vmalloc_shadow(start + off),
+ vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
+ page_shift);
+ __vmap_pages_range_noflush(
+ vmalloc_origin(start + off),
+ vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
+ page_shift);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_ioremap_page_range);
+
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
+{
+ unsigned long v_shadow, v_origin;
+ struct page *shadow, *origin;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ v_shadow = (unsigned long)vmalloc_shadow(start);
+ v_origin = (unsigned long)vmalloc_origin(start);
+ for (i = 0; i < nr; i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
+ shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
+ origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
+ __vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
+ __vunmap_range_noflush(v_origin, vmalloc_origin(end));
+ if (shadow)
+ __free_pages(shadow, 1);
+ if (origin)
+ __free_pages(origin, 1);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_iounmap_page_range);
+
+void kmsan_copy_to_user(const void *to, const void *from, size_t to_copy,
+ size_t left)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * At this point we've copied the memory already. It's hard to check it
+ * before copying, as the size of actually copied buffer is unknown.
+ */
+
+ /* copy_to_user() may copy zero bytes. No need to check. */
+ if (!to_copy)
+ return;
+ /* Or maybe copy_to_user() failed to copy anything. */
+ if (to_copy <= left)
+ return;
+
+ ua_flags = user_access_save();
+ if ((u64)to < TASK_SIZE) {
+ /* This is a user memory access, check it. */
+ kmsan_internal_check_memory((void *)from, to_copy - left, to,
+ REASON_COPY_TO_USER);
+ user_access_restore(ua_flags);
+ return;
+ }
+ /* Otherwise this is a kernel memory access. This happens when a compat
+ * syscall passes an argument allocated on the kernel stack to a real
+ * syscall.
+ * Don't check anything, just copy the shadow of the copied bytes.
+ */
+ kmsan_internal_memmove_metadata((void *)to, (void *)from,
+ to_copy - left);
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_copy_to_user);
+
+/* Helper function to check an URB. */
+void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+ if (!urb)
+ return;
+ if (is_out)
+ kmsan_internal_check_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*user_addr*/ 0, REASON_SUBMIT_URB);
+ else
+ kmsan_internal_unpoison_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*checked*/ false);
+}
+EXPORT_SYMBOL(kmsan_handle_urb);
+
+static void kmsan_handle_dma_page(const void *addr, size_t size,
+ enum dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_BIDIRECTIONAL:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_TO_DEVICE:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ break;
+ case DMA_FROM_DEVICE:
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_NONE:
+ break;
+ }
+}
+
+/* Helper function to handle DMA data transfers. */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir)
+{
+ u64 page_offset, to_go, addr;
+
+ if (PageHighMem(page))
+ return;
+ addr = (u64)page_address(page) + offset;
+ /*
+ * The kernel may occasionally give us adjacent DMA pages not belonging
+ * to the same allocation. Process them separately to avoid triggering
+ * internal KMSAN checks.
+ */
+ while (size > 0) {
+ page_offset = addr % PAGE_SIZE;
+ to_go = min(PAGE_SIZE - page_offset, (u64)size);
+ kmsan_handle_dma_page((void *)addr, to_go, dir);
+ addr += to_go;
+ size -= to_go;
+ }
+}
+EXPORT_SYMBOL(kmsan_handle_dma);
+
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *item;
+ int i;
+
+ for_each_sg(sg, item, nents, i)
+ kmsan_handle_dma(sg_page(item), item->offset, item->length,
+ dir);
+}
+EXPORT_SYMBOL(kmsan_handle_dma_sg);
+
+/* Functions from kmsan-checks.h follow. */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_poison_memory((void *)address, size, flags,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_poison_memory);
+
+void kmsan_unpoison_memory(const void *address, size_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_unpoison_memory((void *)address, size,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_unpoison_memory);
+
+void kmsan_gup_pgd_range(struct page **pages, int nr)
+{
+ void *page_addr;
+ int i;
+
+ /*
+ * gup_pgd_range() has just created a number of new pages that KMSAN
+ * treats as uninitialized. In the case they belong to the userspace
+ * memory, unpoison the corresponding kernel pages.
+ */
+ for (i = 0; i < nr; i++) {
+ if (PageHighMem(pages[i]))
+ continue;
+ page_addr = page_address(pages[i]);
+ if (((u64)page_addr < TASK_SIZE) &&
+ ((u64)page_addr + PAGE_SIZE < TASK_SIZE))
+ kmsan_unpoison_memory(page_addr, PAGE_SIZE);
+ }
+}
+EXPORT_SYMBOL(kmsan_gup_pgd_range);
+
+void kmsan_check_memory(const void *addr, size_t size)
+{
+ if (!kmsan_enabled)
+ return;
+ return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+}
+EXPORT_SYMBOL(kmsan_check_memory);
+
+void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+ struct kmsan_context_state *state = &kmsan_get_context()->cstate;
+
+ if (state)
+ __memset(state, 0, sizeof(struct kmsan_context_state));
+ if (!kmsan_enabled || !regs)
+ return;
+ kmsan_internal_unpoison_memory(regs, sizeof(*regs), /*checked*/ true);
+}
+EXPORT_SYMBOL(kmsan_instrumentation_begin);
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
new file mode 100644
index 0000000000000..49ab06cde082a
--- /dev/null
+++ b/mm/kmsan/init.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN initialization routines.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include "kmsan.h"
+
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+
+#define NUM_FUTURE_RANGES 128
+struct start_end_pair {
+ u64 start, end;
+};
+
+static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
+static int future_index __initdata;
+
+/*
+ * Record a range of memory for which the metadata pages will be created once
+ * the page allocator becomes available.
+ */
+static void __init kmsan_record_future_shadow_range(void *start, void *end)
+{
+ u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
+ bool merged = false;
+ int i;
+
+ KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
+ KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+ nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
+ nend = ALIGN(nend, PAGE_SIZE);
+
+ /*
+ * Scan the existing ranges to see if any of them overlaps with
+ * [start, end). In that case, merge the two ranges instead of
+ * creating a new one.
+ * The number of ranges is less than 20, so there is no need to organize
+ * them into a more intelligent data structure.
+ */
+ for (i = 0; i < future_index; i++) {
+ cstart = start_end_pairs[i].start;
+ cend = start_end_pairs[i].end;
+ if ((cstart < nstart && cend < nstart) ||
+ (cstart > nend && cend > nend))
+ /* ranges are disjoint - do not merge */
+ continue;
+ start_end_pairs[i].start = min(nstart, cstart);
+ start_end_pairs[i].end = max(nend, cend);
+ merged = true;
+ break;
+ }
+ if (merged)
+ return;
+ start_end_pairs[future_index].start = nstart;
+ start_end_pairs[future_index].end = nend;
+ future_index++;
+}
+
+/*
+ * Initialize the shadow for existing mappings during kernel initialization.
+ * These include kernel text/data sections, NODE_DATA and future ranges
+ * registered while creating other data (e.g. percpu).
+ *
+ * Allocations via memblock can be only done before slab is initialized.
+ */
+void __init kmsan_init_shadow(void)
+{
+ const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+ phys_addr_t p_start, p_end;
+ int nid;
+ u64 i;
+
+ for_each_reserved_mem_range(i, &p_start, &p_end)
+ kmsan_record_future_shadow_range(phys_to_virt(p_start),
+ phys_to_virt(p_end));
+ /* Allocate shadow for .data */
+ kmsan_record_future_shadow_range(_sdata, _edata);
+
+ for_each_online_node(nid)
+ kmsan_record_future_shadow_range(
+ NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
+
+ for (i = 0; i < future_index; i++)
+ kmsan_init_alloc_meta_for_range(
+ (void *)start_end_pairs[i].start,
+ (void *)start_end_pairs[i].end);
+}
+EXPORT_SYMBOL(kmsan_init_shadow);
+
+struct page_pair {
+ struct page *shadow, *origin;
+};
+static struct page_pair held_back[MAX_ORDER] __initdata;
+
+/*
+ * Eager metadata allocation. When the memblock allocator is freeing pages to
+ * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
+ * We store the pointers to the returned blocks of pages in held_back[] grouped
+ * by their order: when kmsan_memblock_free_pages() is called for the first
+ * time with a certain order, it is reserved as a shadow block, for the second
+ * time - as an origin block. On the third time the incoming block receives its
+ * shadow and origin ranges from the previously saved shadow and origin blocks,
+ * after which held_back[order] can be used again.
+ *
+ * At the very end there may be leftover blocks in held_back[]. They are
+ * collected later by kmsan_memblock_discard().
+ */
+bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
+{
+ struct page *shadow, *origin;
+
+ if (!held_back[order].shadow) {
+ held_back[order].shadow = page;
+ return false;
+ }
+ if (!held_back[order].origin) {
+ held_back[order].origin = page;
+ return false;
+ }
+ shadow = held_back[order].shadow;
+ origin = held_back[order].origin;
+ kmsan_setup_meta(page, shadow, origin, order);
+
+ held_back[order].shadow = NULL;
+ held_back[order].origin = NULL;
+ return true;
+}
+
+#define MAX_BLOCKS 8
+struct smallstack {
+ struct page *items[MAX_BLOCKS];
+ int index;
+ int order;
+};
+
+struct smallstack collect = {
+ .index = 0,
+ .order = MAX_ORDER,
+};
+
+static void smallstack_push(struct smallstack *stack, struct page *pages)
+{
+ KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
+ stack->items[stack->index] = pages;
+ stack->index++;
+}
+#undef MAX_BLOCKS
+
+static struct page *smallstack_pop(struct smallstack *stack)
+{
+ struct page *ret;
+
+ KMSAN_WARN_ON(stack->index == 0);
+ stack->index--;
+ ret = stack->items[stack->index];
+ stack->items[stack->index] = NULL;
+ return ret;
+}
+
+static void do_collection(void)
+{
+ struct page *page, *shadow, *origin;
+
+ while (collect.index >= 3) {
+ page = smallstack_pop(&collect);
+ shadow = smallstack_pop(&collect);
+ origin = smallstack_pop(&collect);
+ kmsan_setup_meta(page, shadow, origin, collect.order);
+ __free_pages_core(page, collect.order);
+ }
+}
+
+static void collect_split(void)
+{
+ struct smallstack tmp = {
+ .order = collect.order - 1,
+ .index = 0,
+ };
+ struct page *page;
+
+ if (!collect.order)
+ return;
+ while (collect.index) {
+ page = smallstack_pop(&collect);
+ smallstack_push(&tmp, &page[0]);
+ smallstack_push(&tmp, &page[1 << tmp.order]);
+ }
+ __memcpy(&collect, &tmp, sizeof(struct smallstack));
+}
+
+/*
+ * Memblock is about to go away. Split the page blocks left over in held_back[]
+ * and return 1/3 of that memory to the system.
+ */
+static void kmsan_memblock_discard(void)
+{
+ int i;
+
+ /*
+ * For each order=N:
+ * - push held_back[N].shadow and .origin to |collect|;
+ * - while there are >= 3 elements in |collect|, do garbage collection:
+ * - pop 3 ranges from |collect|;
+ * - use two of them as shadow and origin for the third one;
+ * - repeat;
+ * - split each remaining element from |collect| into 2 ranges of
+ * order=N-1,
+ * - repeat.
+ */
+ collect.order = MAX_ORDER - 1;
+ for (i = MAX_ORDER - 1; i >= 0; i--) {
+ if (held_back[i].shadow)
+ smallstack_push(&collect, held_back[i].shadow);
+ if (held_back[i].origin)
+ smallstack_push(&collect, held_back[i].origin);
+ held_back[i].shadow = NULL;
+ held_back[i].origin = NULL;
+ do_collection();
+ collect_split();
+ }
+}
+
+void __init kmsan_init_runtime(void)
+{
+ /* Assuming current is init_task */
+ kmsan_internal_task_create(current);
+ kmsan_memblock_discard();
+ pr_info("vmalloc area at: %px\n", VMALLOC_START);
+ pr_info("Starting KernelMemorySanitizer\n");
+ kmsan_enabled = true;
+}
+EXPORT_SYMBOL(kmsan_init_runtime);
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
new file mode 100644
index 0000000000000..1eb2d64aa39a6
--- /dev/null
+++ b/mm/kmsan/instrumentation.c
@@ -0,0 +1,233 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN compiler API.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include "kmsan.h"
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
+{
+ if ((u64)addr < TASK_SIZE)
+ return true;
+ if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
+ return true;
+ return false;
+}
+
+static inline struct shadow_origin_ptr
+get_shadow_origin_ptr(void *addr, u64 size, bool store)
+{
+ unsigned long ua_flags = user_access_save();
+ struct shadow_origin_ptr ret;
+
+ ret = kmsan_get_shadow_origin_ptr(addr, size, store);
+ user_access_restore(ua_flags);
+ return ret;
+}
+
+struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ false);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
+
+struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ true);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
+
+#define DECLARE_METADATA_PTR_GETTER(size) \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ false); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size); \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ true); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
+
+DECLARE_METADATA_PTR_GETTER(1);
+DECLARE_METADATA_PTR_GETTER(2);
+DECLARE_METADATA_PTR_GETTER(4);
+DECLARE_METADATA_PTR_GETTER(8);
+
+void __msan_instrument_asm_store(void *addr, uintptr_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ /*
+ * Most of the accesses are below 32 bytes. The two exceptions so far
+ * are clwb() (64 bytes) and FPU state (512 bytes).
+ * It's unlikely that the assembly will touch more than 512 bytes.
+ */
+ if (size > 512) {
+ WARN_ONCE(1, "assembly store size too big: %d\n", size);
+ size = 8;
+ }
+ if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
+ user_access_restore(ua_flags);
+ return;
+ }
+ kmsan_enter_runtime();
+ /* Unpoisoning the memory on best effort. */
+ kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_instrument_asm_store);
+
+void *__msan_memmove(void *dst, const void *src, uintptr_t n)
+{
+ void *result;
+
+ result = __memmove(dst, src, n);
+ if (!n)
+ /* Some people call memmove() with zero length. */
+ return result;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memmove);
+
+void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
+{
+ void *result;
+
+ result = __memcpy(dst, src, n);
+ if (!n)
+ /* Some people call memcpy() with zero length. */
+ return result;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ /* Using memmove instead of memcpy doesn't affect correctness. */
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memcpy);
+
+void *__msan_memset(void *dst, int c, uintptr_t n)
+{
+ void *result;
+
+ result = __memset(dst, c, n);
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_enter_runtime();
+ /*
+ * Clang doesn't pass parameter metadata here, so it is impossible to
+ * use shadow of @c to set up the shadow for @dst.
+ */
+ kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
+ kmsan_leave_runtime();
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memset);
+
+depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
+{
+ depot_stack_handle_t ret = 0;
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return ret;
+
+ ua_flags = user_access_save();
+
+ /* Creating new origins may allocate memory. */
+ kmsan_enter_runtime();
+ ret = kmsan_internal_chain_origin(origin);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+ return ret;
+}
+EXPORT_SYMBOL(__msan_chain_origin);
+
+void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
+{
+ depot_stack_handle_t handle;
+ unsigned long entries[4];
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
+ entries[1] = (u64)descr;
+ entries[2] = (u64)__builtin_return_address(0);
+ /*
+ * With frame pointers enabled, it is possible to quickly fetch the
+ * second frame of the caller stack without calling the unwinder.
+ * Without them, simply do not bother.
+ */
+ if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
+ entries[3] = (u64)__builtin_return_address(1);
+ else
+ entries[3] = 0;
+
+ /* stack_depot_save() may allocate memory. */
+ kmsan_enter_runtime();
+ handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
+ kmsan_leave_runtime();
+
+ kmsan_internal_set_shadow_origin(address, size, -1, handle,
+ /*checked*/ true);
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_poison_alloca);
+
+void __msan_unpoison_alloca(void *address, uintptr_t size)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ kmsan_enter_runtime();
+ kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_unpoison_alloca);
+
+void __msan_warning(u32 origin)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_report(origin, /*address*/ 0, /*size*/ 0,
+ /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_warning);
+
+struct kmsan_context_state *__msan_get_context_state(void)
+{
+ return &kmsan_get_context()->cstate;
+}
+EXPORT_SYMBOL(__msan_get_context_state);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
new file mode 100644
index 0000000000000..29c91b6e28799
--- /dev/null
+++ b/mm/kmsan/kmsan.h
@@ -0,0 +1,197 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Functions used by the KMSAN runtime.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#ifndef __MM_KMSAN_KMSAN_H
+#define __MM_KMSAN_KMSAN_H
+
+#include <asm/pgtable_64_types.h>
+#include <linux/irqflags.h>
+#include <linux/sched.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/nmi.h>
+#include <linux/mm.h>
+#include <linux/printk.h>
+
+#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
+#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
+
+#define KMSAN_POISON_NOCHECK 0x0
+#define KMSAN_POISON_CHECK 0x1
+#define KMSAN_POISON_FREE 0x2
+
+#define KMSAN_ORIGIN_SIZE 4
+
+#define KMSAN_STACK_DEPTH 64
+
+#define KMSAN_META_SHADOW (false)
+#define KMSAN_META_ORIGIN (true)
+
+extern bool kmsan_enabled;
+extern int panic_on_kmsan;
+
+/*
+ * KMSAN performs a lot of consistency checks that are currently enabled by
+ * default. BUG_ON is normally discouraged in the kernel, unless used for
+ * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
+ * recover if something goes wrong.
+ */
+#define KMSAN_WARN_ON(cond) \
+ ({ \
+ const bool __cond = WARN_ON(cond); \
+ if (unlikely(__cond)) { \
+ WRITE_ONCE(kmsan_enabled, false); \
+ if (panic_on_kmsan) { \
+ /* Can't call panic() here because */ \
+ /* of uaccess checks.*/ \
+ BUG(); \
+ } \
+ } \
+ __cond; \
+ })
+
+/*
+ * A pair of metadata pointers to be returned by the instrumentation functions.
+ */
+struct shadow_origin_ptr {
+ void *shadow, *origin;
+};
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
+ bool store);
+void *kmsan_get_metadata(void *addr, bool is_origin);
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end);
+
+enum kmsan_bug_reason {
+ REASON_ANY,
+ REASON_COPY_TO_USER,
+ REASON_SUBMIT_URB,
+};
+
+void kmsan_print_origin(depot_stack_handle_t origin);
+
+/**
+ * kmsan_report() - Report a use of uninitialized value.
+ * @origin: Stack ID of the uninitialized value.
+ * @address: Address at which the memory access happens.
+ * @size: Memory access size.
+ * @off_first: Offset (from @address) of the first byte to be reported.
+ * @off_last: Offset (from @address) of the last byte to be reported.
+ * @user_addr: When non-NULL, denotes the userspace address to which the kernel
+ * is leaking data.
+ * @reason: Error type from enum kmsan_bug_reason.
+ *
+ * kmsan_report() prints an error message for a consequent group of bytes
+ * sharing the same origin. If an uninitialized value is used in a comparison,
+ * this function is called once without specifying the addresses. When checking
+ * a memory range, KMSAN may call kmsan_report() multiple times with the same
+ * @address, @size, @user_addr and @reason, but different @off_first and
+ * @off_last corresponding to different @origin values.
+ */
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason);
+
+DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+static __always_inline struct kmsan_ctx *kmsan_get_context(void)
+{
+ return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
+}
+
+/*
+ * When a compiler hook is invoked, it may make a call to instrumented code
+ * and eventually call itself recursively. To avoid that, we protect the
+ * runtime entry points with kmsan_enter_runtime()/kmsan_leave_runtime() and
+ * exit the hook if kmsan_in_runtime() is true.
+ */
+
+static __always_inline bool kmsan_in_runtime(void)
+{
+ if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
+ return true;
+ return kmsan_get_context()->kmsan_in_runtime;
+}
+
+static __always_inline void kmsan_enter_runtime(void)
+{
+ struct kmsan_ctx *ctx;
+
+ ctx = kmsan_get_context();
+ KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
+}
+
+static __always_inline void kmsan_leave_runtime(void)
+{
+ struct kmsan_ctx *ctx = kmsan_get_context();
+
+ KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
+}
+
+depot_stack_handle_t kmsan_save_stack(void);
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra_bits);
+
+/*
+ * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
+ * provided by the stack depot.
+ * The UAF flag is stored in the lowest bit, followed by the depth in the upper
+ * bits.
+ * set_dsh_extra_bits() is responsible for clamping the value.
+ */
+static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
+ bool uaf)
+{
+ return (depth << 1) | uaf;
+}
+
+static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
+{
+ return extra_bits & 1;
+}
+
+static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
+{
+ return extra_bits >> 1;
+}
+
+/*
+ * kmsan_internal_ functions are supposed to be very simple and not require the
+ * kmsan_in_runtime() checks.
+ */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags);
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
+void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
+ u32 origin, bool checked);
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
+
+void kmsan_internal_task_create(struct task_struct *task);
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size);
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason);
+bool kmsan_internal_is_module_addr(void *vaddr);
+bool kmsan_internal_is_vmalloc_addr(void *addr);
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order);
+
+/* Declared in mm/vmalloc.c */
+void __vunmap_range_noflush(unsigned long start, unsigned long end);
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
+/* Declared in mm/internal.h */
+void __free_pages_core(struct page *page, unsigned int order);
+
+#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
new file mode 100644
index 0000000000000..d539fe1129fb9
--- /dev/null
+++ b/mm/kmsan/report.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN error reporting routines.
+ *
+ * Copyright (C) 2019-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <linux/console.h>
+#include <linux/moduleparam.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/uaccess.h>
+
+#include "kmsan.h"
+
+static DEFINE_SPINLOCK(kmsan_report_lock);
+#define DESCR_SIZE 128
+/* Protected by kmsan_report_lock */
+static char report_local_descr[DESCR_SIZE];
+int panic_on_kmsan __read_mostly;
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "kmsan."
+module_param_named(panic, panic_on_kmsan, int, 0);
+
+/*
+ * Skip internal KMSAN frames.
+ */
+static int get_stack_skipnr(const unsigned long stack_entries[],
+ int num_entries)
+{
+ int len, skip;
+ char buf[64];
+
+ for (skip = 0; skip < num_entries; ++skip) {
+ len = scnprintf(buf, sizeof(buf), "%ps",
+ (void *)stack_entries[skip]);
+
+ /* Never show __msan_* or kmsan_* functions. */
+ if ((strnstr(buf, "__msan_", len) == buf) ||
+ (strnstr(buf, "kmsan_", len) == buf))
+ continue;
+
+ /*
+ * No match for runtime functions -- @skip entries to skip to
+ * get to first frame of interest.
+ */
+ break;
+ }
+
+ return skip;
+}
+
+/*
+ * Currently the descriptions of locals generated by Clang look as follows:
+ * ----local_name@function_name
+ * We want to print only the name of the local, as other information in that
+ * description can be confusing.
+ * The meaningful part of the description is copied to a global buffer to avoid
+ * allocating memory.
+ */
+static char *pretty_descr(char *descr)
+{
+ int i, pos = 0, len = strlen(descr);
+
+ for (i = 0; i < len; i++) {
+ if (descr[i] == '@')
+ break;
+ if (descr[i] == '-')
+ continue;
+ report_local_descr[pos] = descr[i];
+ if (pos + 1 == DESCR_SIZE)
+ break;
+ pos++;
+ }
+ report_local_descr[pos] = 0;
+ return report_local_descr;
+}
+
+void kmsan_print_origin(depot_stack_handle_t origin)
+{
+ unsigned long *entries = NULL, *chained_entries = NULL;
+ unsigned int nr_entries, chained_nr_entries, skipnr;
+ void *pc1 = NULL, *pc2 = NULL;
+ depot_stack_handle_t head;
+ unsigned long magic;
+ char *descr = NULL;
+
+ if (!origin)
+ return;
+
+ while (true) {
+ nr_entries = stack_depot_fetch(origin, &entries);
+ magic = nr_entries ? entries[0] : 0;
+ if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
+ descr = (char *)entries[1];
+ pc1 = (void *)entries[2];
+ pc2 = (void *)entries[3];
+ pr_err("Local variable %s created at:\n",
+ pretty_descr(descr));
+ if (pc1)
+ pr_err(" %pS\n", pc1);
+ if (pc2)
+ pr_err(" %pS\n", pc2);
+ break;
+ }
+ if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
+ head = entries[1];
+ origin = entries[2];
+ pr_err("Uninit was stored to memory at:\n");
+ chained_nr_entries =
+ stack_depot_fetch(head, &chained_entries);
+ kmsan_internal_unpoison_memory(
+ chained_entries,
+ chained_nr_entries * sizeof(*chained_entries),
+ /*checked*/ false);
+ skipnr = get_stack_skipnr(chained_entries,
+ chained_nr_entries);
+ stack_trace_print(chained_entries + skipnr,
+ chained_nr_entries - skipnr, 0);
+ pr_err("\n");
+ continue;
+ }
+ pr_err("Uninit was created at:\n");
+ if (nr_entries) {
+ skipnr = get_stack_skipnr(entries, nr_entries);
+ stack_trace_print(entries + skipnr, nr_entries - skipnr,
+ 0);
+ } else {
+ pr_err("(stack is not available)\n");
+ }
+ break;
+ }
+}
+
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason)
+{
+ unsigned long stack_entries[KMSAN_STACK_DEPTH];
+ int num_stack_entries, skipnr;
+ char *bug_type = NULL;
+ unsigned long flags, ua_flags;
+ bool is_uaf;
+
+ if (!kmsan_enabled)
+ return;
+ if (!current->kmsan_ctx.allow_reporting)
+ return;
+ if (!origin)
+ return;
+
+ current->kmsan_ctx.allow_reporting = false;
+ ua_flags = user_access_save();
+ spin_lock_irqsave(&kmsan_report_lock, flags);
+ pr_err("=====================================================\n");
+ is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
+ switch (reason) {
+ case REASON_ANY:
+ bug_type = is_uaf ? "use-after-free" : "uninit-value";
+ break;
+ case REASON_COPY_TO_USER:
+ bug_type = is_uaf ? "kernel-infoleak-after-free" :
+ "kernel-infoleak";
+ break;
+ case REASON_SUBMIT_URB:
+ bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
+ "kernel-usb-infoleak";
+ break;
+ }
+
+ num_stack_entries =
+ stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
+ skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+
+ pr_err("BUG: KMSAN: %s in %pS\n", bug_type, stack_entries[skipnr]);
+ stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+ 0);
+ pr_err("\n");
+
+ kmsan_print_origin(origin);
+
+ if (size) {
+ pr_err("\n");
+ if (off_first == off_last)
+ pr_err("Byte %d of %d is uninitialized\n", off_first,
+ size);
+ else
+ pr_err("Bytes %d-%d of %d are uninitialized\n",
+ off_first, off_last, size);
+ }
+ if (address)
+ pr_err("Memory access of size %d starts at %px\n", size,
+ address);
+ if (user_addr && reason == REASON_COPY_TO_USER)
+ pr_err("Data copied to user address %px\n", user_addr);
+ pr_err("\n");
+ dump_stack_print_info(KERN_ERR);
+ pr_err("=====================================================\n");
+ add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+ spin_unlock_irqrestore(&kmsan_report_lock, flags);
+ if (panic_on_kmsan)
+ panic("kmsan.panic set ...\n");
+ user_access_restore(ua_flags);
+ current->kmsan_ctx.allow_reporting = true;
+}
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
new file mode 100644
index 0000000000000..c71b0ce19ea6d
--- /dev/null
+++ b/mm/kmsan/shadow.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN shadow implementation.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <asm/page.h>
+#include <asm/pgtable_64_types.h>
+#include <asm/tlbflush.h>
+#include <linux/cacheflush.h>
+#include <linux/memblock.h>
+#include <linux/mm_types.h>
+#include <linux/percpu-defs.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stddef.h>
+
+#include "kmsan.h"
+
+#define shadow_page_for(page) ((page)->kmsan_shadow)
+
+#define origin_page_for(page) ((page)->kmsan_origin)
+
+static void *shadow_ptr_for(struct page *page)
+{
+ return page_address(shadow_page_for(page));
+}
+
+static void *origin_ptr_for(struct page *page)
+{
+ return page_address(origin_page_for(page));
+}
+
+static bool page_has_metadata(struct page *page)
+{
+ return shadow_page_for(page) && origin_page_for(page);
+}
+
+static void set_no_shadow_origin_page(struct page *page)
+{
+ shadow_page_for(page) = NULL;
+ origin_page_for(page) = NULL;
+}
+
+/*
+ * Dummy load and store pages to be used when the real metadata is unavailable.
+ * There are separate pages for loads and stores, so that every load returns a
+ * zero, and every store doesn't affect other loads.
+ */
+static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
+ */
+static int kmsan_phys_addr_valid(unsigned long addr)
+{
+ if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+ return !(addr >> boot_cpu_data.x86_phys_bits);
+ else
+ return 1;
+}
+
+/*
+ * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
+ */
+static bool kmsan_virt_addr_valid(void *addr)
+{
+ unsigned long x = (unsigned long)addr;
+ unsigned long y = x - __START_KERNEL_map;
+
+ /* use the carry flag to determine if x was < __START_KERNEL_map */
+ if (unlikely(x > y)) {
+ x = y + phys_base;
+
+ if (y >= KERNEL_IMAGE_SIZE)
+ return false;
+ } else {
+ x = y + (__START_KERNEL_map - PAGE_OFFSET);
+
+ /* carry flag will be set if starting x was >= PAGE_OFFSET */
+ if ((x > y) || !kmsan_phys_addr_valid(x))
+ return false;
+ }
+
+ return pfn_valid(x >> PAGE_SHIFT);
+}
+
+static unsigned long vmalloc_meta(void *addr, bool is_origin)
+{
+ unsigned long addr64 = (unsigned long)addr, off;
+
+ KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
+ if (kmsan_internal_is_vmalloc_addr(addr)) {
+ off = addr64 - VMALLOC_START;
+ return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
+ KMSAN_VMALLOC_SHADOW_START);
+ }
+ if (kmsan_internal_is_module_addr(addr)) {
+ off = addr64 - MODULES_VADDR;
+ return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
+ KMSAN_MODULES_SHADOW_START);
+ }
+ return 0;
+}
+
+static struct page *virt_to_page_or_null(void *vaddr)
+{
+ if (kmsan_virt_addr_valid(vaddr))
+ return virt_to_page(vaddr);
+ else
+ return NULL;
+}
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
+ bool store)
+{
+ struct shadow_origin_ptr ret;
+ void *shadow;
+
+ /*
+ * Even if we redirect this memory access to the dummy page, it will
+ * go out of bounds.
+ */
+ KMSAN_WARN_ON(size > PAGE_SIZE);
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ goto return_dummy;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
+ shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
+ if (!shadow)
+ goto return_dummy;
+
+ ret.shadow = shadow;
+ ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
+ return ret;
+
+return_dummy:
+ if (store) {
+ /* Ignore this store. */
+ ret.shadow = dummy_store_page;
+ ret.origin = dummy_store_page;
+ } else {
+ /* This load will return zero. */
+ ret.shadow = dummy_load_page;
+ ret.origin = dummy_load_page;
+ }
+ return ret;
+}
+
+/*
+ * Obtain the shadow or origin pointer for the given address, or NULL if there's
+ * none. The caller must check the return value for being non-NULL if needed.
+ * The return value of this function should not depend on whether we're in the
+ * runtime or not.
+ */
+void *kmsan_get_metadata(void *address, bool is_origin)
+{
+ u64 addr = (u64)address, pad, off;
+ struct page *page;
+ void *ret;
+
+ if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
+ pad = addr % KMSAN_ORIGIN_SIZE;
+ addr -= pad;
+ }
+ address = (void *)addr;
+ if (kmsan_internal_is_vmalloc_addr(address) ||
+ kmsan_internal_is_module_addr(address))
+ return (void *)vmalloc_meta(address, is_origin);
+
+ page = virt_to_page_or_null(address);
+ if (!page)
+ return NULL;
+ if (!page_has_metadata(page))
+ return NULL;
+ off = addr % PAGE_SIZE;
+
+ ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
+ return ret;
+}
+
+/* Allocate metadata for pages allocated at boot time. */
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
+{
+ struct page *shadow_p, *origin_p;
+ void *shadow, *origin;
+ struct page *page;
+ u64 addr, size;
+
+ start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
+ size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
+ shadow = memblock_alloc(size, PAGE_SIZE);
+ origin = memblock_alloc(size, PAGE_SIZE);
+ for (addr = 0; addr < size; addr += PAGE_SIZE) {
+ page = virt_to_page_or_null((char *)start + addr);
+ shadow_p = virt_to_page_or_null((char *)shadow + addr);
+ set_no_shadow_origin_page(shadow_p);
+ shadow_page_for(page) = shadow_p;
+ origin_p = virt_to_page_or_null((char *)origin + addr);
+ set_no_shadow_origin_page(origin_p);
+ origin_page_for(page) = origin_p;
+ }
+}
+
+/* Called from mm/memory.c */
+void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ if (!dst || !page_has_metadata(dst))
+ return;
+ if (!src || !page_has_metadata(src)) {
+ kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
+ /*checked*/ false);
+ return;
+ }
+
+ kmsan_enter_runtime();
+ __memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
+ __memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
+ kmsan_leave_runtime();
+}
+
+/* Called from mm/page_alloc.c */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
+{
+ bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
+ struct page *shadow, *origin;
+ depot_stack_handle_t handle;
+ int pages = 1 << order;
+ int i;
+
+ if (!page)
+ return;
+
+ shadow = shadow_page_for(page);
+ origin = origin_page_for(page);
+
+ if (initialized) {
+ __memset(page_address(shadow), 0, PAGE_SIZE * pages);
+ __memset(page_address(origin), 0, PAGE_SIZE * pages);
+ return;
+ }
+
+ /* Zero pages allocated by the runtime should also be initialized. */
+ if (kmsan_in_runtime())
+ return;
+
+ __memset(page_address(shadow), -1, PAGE_SIZE * pages);
+ kmsan_enter_runtime();
+ handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
+ kmsan_leave_runtime();
+ /*
+ * Addresses are page-aligned, pages are contiguous, so it's ok
+ * to just fill the origin pages with |handle|.
+ */
+ for (i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
+ ((depot_stack_handle_t *)page_address(origin))[i] = handle;
+}
+
+/* Called from mm/page_alloc.c */
+void kmsan_free_page(struct page *page, unsigned int order)
+{
+ return; // really nothing to do here. Could rewrite shadow instead.
+}
+
+/* Called from mm/vmalloc.c */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift)
+{
+ unsigned long shadow_start, origin_start, shadow_end, origin_end;
+ struct page **s_pages, **o_pages;
+ int nr, i, mapped;
+
+ if (!kmsan_enabled)
+ return;
+
+ shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
+ shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
+ if (!shadow_start)
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ s_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ o_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ if (!s_pages || !o_pages)
+ goto ret;
+ for (i = 0; i < nr; i++) {
+ s_pages[i] = shadow_page_for(pages[i]);
+ o_pages[i] = origin_page_for(pages[i]);
+ }
+ prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
+ prot = PAGE_KERNEL;
+
+ origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
+ origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
+ kmsan_enter_runtime();
+ mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
+ s_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
+ o_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ kmsan_leave_runtime();
+ flush_tlb_kernel_range(shadow_start, shadow_end);
+ flush_tlb_kernel_range(origin_start, origin_end);
+ flush_cache_vmap(shadow_start, shadow_end);
+ flush_cache_vmap(origin_start, origin_end);
+
+ret:
+ kfree(s_pages);
+ kfree(o_pages);
+}
+
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order)
+{
+ int i;
+
+ for (i = 0; i < (1 << order); i++) {
+ set_no_shadow_origin_page(&shadow[i]);
+ set_no_shadow_origin_page(&origin[i]);
+ shadow_page_for(&page[i]) = &shadow[i];
+ origin_page_for(&page[i]) = &origin[i];
+ }
+}
diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
new file mode 100644
index 0000000000000..9793591f9855c
--- /dev/null
+++ b/scripts/Makefile.kmsan
@@ -0,0 +1 @@
+export CFLAGS_KMSAN := -fsanitize=kernel-memory
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index d1f865b8c0cba..3a0dbcea51d01 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -162,6 +162,15 @@ _c_flags += $(if $(patsubst n%,, \
endif
endif

+ifeq ($(CONFIG_KMSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
+ $(CFLAGS_KMSAN))
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
+ , -mllvm -msan-disable-checks=1)
+endif
+
ifeq ($(CONFIG_UBSAN),y)
_c_flags += $(if $(patsubst n%,, \
$(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:01

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 18/43] kmsan: unpoison @tlb in arch_tlb_gather_mmu()

This is a hack to reduce stackdepot pressure.

struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
int value. The remaining 25 bits remain uninitialized and are never used,
but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
thus creating very long origin chains. This is technically correct, but
consumes too much memory.

Unpoisoning the whole structure will prevent creating such chains.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I76abee411b8323acfdbc29bc3a60dca8cff2de77
---
mm/mmu_gather.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index 1b9837419bf9c..72e4c4ca01d27 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -1,6 +1,7 @@
#include <linux/gfp.h>
#include <linux/highmem.h>
#include <linux/kernel.h>
+#include <linux/kmsan-checks.h>
#include <linux/mmdebug.h>
#include <linux/mm_types.h>
#include <linux/pagemap.h>
@@ -252,6 +253,15 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
bool fullmm)
{
+ /*
+ * struct mmu_gather contains 7 1-bit fields packed into a 32-bit
+ * unsigned int value. The remaining 25 bits remain uninitialized
+ * and are never used, but KMSAN updates the origin for them in
+ * zap_pXX_range() in mm/memory.c, thus creating very long origin
+ * chains. This is technically correct, but consumes too much memory.
+ * Unpoisoning the whole structure will prevent creating such chains.
+ */
+ kmsan_unpoison_memory(tlb, sizeof(*tlb));
tlb->mm = mm;
tlb->fullmm = fullmm;

--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:05

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 20/43] instrumented.h: add KMSAN support

To avoid false positives, KMSAN needs to unpoison the data copied from
the userspace. To detect infoleaks - check the memory buffer passed to
copy_to_user().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I43e93b9c02709e6be8d222342f1b044ac8bdbaaf
---
include/linux/instrumented.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index ee8f7d17d34f5..c73c1b19e9227 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -2,7 +2,7 @@

/*
* This header provides generic wrappers for memory access instrumentation that
- * the compiler cannot emit for: KASAN, KCSAN.
+ * the compiler cannot emit for: KASAN, KCSAN, KMSAN.
*/
#ifndef _LINUX_INSTRUMENTED_H
#define _LINUX_INSTRUMENTED_H
@@ -10,6 +10,7 @@
#include <linux/compiler.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>
+#include <linux/kmsan-checks.h>
#include <linux/types.h>

/**
@@ -117,6 +118,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
{
kasan_check_read(from, n);
kcsan_check_read(from, n);
+ kmsan_copy_to_user(to, from, n, 0);
}

/**
@@ -151,6 +153,7 @@ static __always_inline void
instrument_copy_from_user_after(const void *to, const void __user *from,
unsigned long n, unsigned long left)
{
+ kmsan_unpoison_memory(to, n - left);
}

#endif /* _LINUX_INSTRUMENTED_H */
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:14

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 21/43] kmsan: mark noinstr as __no_sanitize_memory

noinstr functions should never be instrumented, so make KMSAN skip them
by applying the __no_sanitize_memory attribute.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
---
include/linux/compiler_types.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 1d32f4c03c9ef..37b82564e93e5 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -210,7 +210,8 @@ struct ftrace_likely_data {
/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
- __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+ __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
+ __no_sanitize_memory

#endif /* __KERNEL__ */

--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:17

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 22/43] kmsan: initialize the output of READ_ONCE_NOCHECK()

READ_ONCE_NOCHECK() is already used by KASAN to ignore memory accesses
from e.g. stack unwinders.
Define READ_ONCE_NOCHECK() for KMSAN so that it returns initialized
values. This helps defeat false positives from leftover stack contents.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I07499eb3e8e59c0ad2fd486cedc932d958b37afd
---
include/asm-generic/rwonce.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/rwonce.h b/include/asm-generic/rwonce.h
index 8d0a6280e9824..7cf993af8e1ea 100644
--- a/include/asm-generic/rwonce.h
+++ b/include/asm-generic/rwonce.h
@@ -25,6 +25,7 @@
#include <linux/compiler_types.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>
+#include <linux/kmsan-checks.h>

/*
* Yes, this permits 64-bit accesses on 32-bit architectures. These will
@@ -69,14 +70,14 @@ unsigned long __read_once_word_nocheck(const void *addr)

/*
* Use READ_ONCE_NOCHECK() instead of READ_ONCE() if you need to load a
- * word from memory atomically but without telling KASAN/KCSAN. This is
+ * word from memory atomically but without telling KASAN/KCSAN/KMSAN. This is
* usually used by unwinding code when walking the stack of a running process.
*/
#define READ_ONCE_NOCHECK(x) \
({ \
compiletime_assert(sizeof(x) == sizeof(unsigned long), \
"Unsupported access size for READ_ONCE_NOCHECK()."); \
- (typeof(x))__read_once_word_nocheck(&(x)); \
+ kmsan_init((typeof(x))__read_once_word_nocheck(&(x))); \
})

static __no_kasan_or_inline
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:20

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 19/43] kmsan: init: call KMSAN initialization routines

kmsan_initialize_shadow() creates metadata pages for mappings created
at boot time.

kmsan_initialize() initializes the bookkeeping for init_task and enables
KMSAN.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
---
init/main.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/init/main.c b/init/main.c
index bb984ed79de0e..2fc5025db0810 100644
--- a/init/main.c
+++ b/init/main.c
@@ -34,6 +34,7 @@
#include <linux/percpu.h>
#include <linux/kmod.h>
#include <linux/kprobes.h>
+#include <linux/kmsan.h>
#include <linux/vmalloc.h>
#include <linux/kernel_stat.h>
#include <linux/start_kernel.h>
@@ -834,6 +835,7 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
+ kmsan_init_shadow();
stack_depot_init();
mem_init();
mem_init_print_info();
@@ -848,6 +850,7 @@ static void __init mm_init(void)
init_espfix_bsp();
/* Should be run after espfix64 is set up. */
pti_init();
+ kmsan_init_runtime();
}

#ifdef CONFIG_HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:23

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 23/43] kmsan: make READ_ONCE_TASK_STACK() return initialized values

To avoid false positives, assume that reading from the task stack
always produces initialized values.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I9e2350bf3e88688dd83537e12a23456480141997
---
arch/x86/include/asm/unwind.h | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 2a1f8734416dc..51173b19ac4d5 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -129,18 +129,19 @@ unsigned long unwind_recover_ret_addr(struct unwind_state *state,
}

/*
- * This disables KASAN checking when reading a value from another task's stack,
- * since the other task could be running on another CPU and could have poisoned
- * the stack in the meantime.
+ * This disables KASAN/KMSAN checking when reading a value from another task's
+ * stack, since the other task could be running on another CPU and could have
+ * poisoned the stack in the meantime. Frame pointers are uninitialized by
+ * default, so for KMSAN we mark the return value initialized unconditionally.
*/
-#define READ_ONCE_TASK_STACK(task, x) \
-({ \
- unsigned long val; \
- if (task == current) \
- val = READ_ONCE(x); \
- else \
- val = READ_ONCE_NOCHECK(x); \
- val; \
+#define READ_ONCE_TASK_STACK(task, x) \
+({ \
+ unsigned long val; \
+ if (task == current && !IS_ENABLED(CONFIG_KMSAN)) \
+ val = READ_ONCE(x); \
+ else \
+ val = READ_ONCE_NOCHECK(x); \
+ val; \
})

static inline bool task_on_another_cpu(struct task_struct *task)
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 24/43] kmsan: disable KMSAN instrumentation for certain kernel parts

Instrumenting some files with KMSAN will result in kernel being unable
to link, boot or crashing at runtime for various reasons (e.g. infinite
recursion caused by instrumentation hooks calling instrumented code again).

Completely omit KMSAN instrumentation in the following places:
- arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386;
- arch/x86/entry/vdso, which isn't linked with KMSAN runtime;
- three files in arch/x86/kernel - boot problems;
- arch/x86/mm/cpu_entry_area.c - recursion;
- EFI stub - build failures;
- kcov, stackdepot, lockdep - recursion.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Id5e5c4a9f9d53c24a35ebb633b814c414628d81b
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/vdso/Makefile | 3 +++
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/mm/Makefile | 2 ++
arch/x86/realmode/rm/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 1 +
kernel/Makefile | 1 +
kernel/locking/Makefile | 3 ++-
lib/Makefile | 1 +
11 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index b5aecb524a8aa..d5623232b763f 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -12,6 +12,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Kernel does not boot with kcov instrumentation here.
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 431bf7f846c3c..c4a284b738e71 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -20,6 +20,7 @@
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index a2dddcc189f69..f2a175d872b07 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -11,6 +11,9 @@ include $(srctree)/lib/vdso/Makefile

# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
+KMSAN_SANITIZE_vclock_gettime.o := n
+KMSAN_SANITIZE_vgetcpu.o := n
+
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f4269..0b9fc3ecce2de 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -35,6 +35,8 @@ KASAN_SANITIZE_cc_platform.o := n
# With some compiler versions the generated code results in boot hangs, caused
# by several compilation units. To be safe, disable all instrumentation.
KCSAN_SANITIZE := n
+KMSAN_SANITIZE_head$(BITS).o := n
+KMSAN_SANITIZE_nmi.o := n

OBJECT_FILES_NON_STANDARD_test_nx.o := y

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 9661e3e802be5..f10a921ee7565 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -12,6 +12,7 @@ endif
# If these files are instrumented, boot hangs during the first second.
KCOV_INSTRUMENT_common.o := n
KCOV_INSTRUMENT_perf_event.o := n
+KMSAN_SANITIZE_common.o := n

# As above, instrumenting secondary CPU boot code causes boot hangs.
KCSAN_SANITIZE_common.o := n
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 5864219221ca8..747d4630d52ce 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -10,6 +10,8 @@ KASAN_SANITIZE_mem_encrypt_identity.o := n
# Disable KCSAN entirely, because otherwise we get warnings that some functions
# reference __initdata sections.
KCSAN_SANITIZE := n
+# Avoid recursion by not calling KMSAN hooks for CEA code.
+KMSAN_SANITIZE_cpu_entry_area.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_mem_encrypt.o = -pg
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 83f1b6a56449f..f614009d3e4e2 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -10,6 +10,7 @@
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e9..81432d0c904b1 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -46,6 +46,7 @@ GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

diff --git a/kernel/Makefile b/kernel/Makefile
index 186c49582f45b..e5dd600e63d8a 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -39,6 +39,7 @@ KCOV_INSTRUMENT_kcov.o := n
KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
UBSAN_SANITIZE_kcov.o := n
+KMSAN_SANITIZE_kcov.o := n
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector

# Don't instrument error handlers
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f382..ea925731fa40f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,8 +5,9 @@ KCOV_INSTRUMENT := n

obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o

-# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
KCSAN_SANITIZE_lockdep.o := n
+KMSAN_SANITIZE_lockdep.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/lib/Makefile b/lib/Makefile
index 364c23f155781..8e5ae9d5966de 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -268,6 +268,7 @@ obj-$(CONFIG_IRQ_POLL) += irq_poll.o
CFLAGS_stackdepot.o += -fno-builtin
obj-$(CONFIG_STACKDEPOT) += stackdepot.o
KASAN_SANITIZE_stackdepot.o := n
+KMSAN_SANITIZE_stackdepot.o := n
KCOV_INSTRUMENT_stackdepot.o := n

libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:35

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 25/43] kmsan: skip shadow checks in files doing context switches

When instrumenting functions, KMSAN obtains the per-task state (mostly
pointers to metadata for function arguments and return values) once per
function at its beginning.

If a function performs a context switch, instrumented code won't notice
that, and will still refer to the old state, possibly corrupting it or
using stale data. This may result in false positive reports.

To deal with that, we need to apply __no_kmsan_checks to the functions
performing context switching - this will result in skipping all KMSAN
shadow checks and marking newly created values as initialized,
preventing all false positive reports in those functions. False negatives
are still possible, but we expect them to be rare and impersistent.

To improve maintainability, we choose to apply __no_kmsan_checks not
just to a handful of functions, but to the whole files that may perform
context switching - this is done via KMSAN_ENABLE_CHECKS:=n.
This decision can be reconsidered in the future, when KMSAN won't need
so much attention.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Id40563d36792b4482534c9a0134965d77a5581fa
---
arch/x86/kernel/Makefile | 4 ++++
kernel/sched/Makefile | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0b9fc3ecce2de..308d4d0323263 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -38,6 +38,10 @@ KCSAN_SANITIZE := n
KMSAN_SANITIZE_head$(BITS).o := n
KMSAN_SANITIZE_nmi.o := n

+# Some functions in process_64.c perform context switching.
+# Apply __no_kmsan_checks to the whole file to avoid false positives.
+KMSAN_ENABLE_CHECKS_process_64.o := n
+
OBJECT_FILES_NON_STANDARD_test_nx.o := y

ifdef CONFIG_FRAME_POINTER
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index c7421f2d05e15..d9bf8223a064a 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -17,6 +17,10 @@ KCOV_INSTRUMENT := n
# eventually.
KCSAN_SANITIZE := n

+# Some functions in core.c perform context switching. Apply __no_kmsan_checks
+# to the whole file to avoid false positives.
+KMSAN_ENABLE_CHECKS_core.o := n
+
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
# needed for x86 only. Why this used to be enabled for all architectures is beyond
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:39

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 26/43] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()

If vring doesn't use the DMA API, KMSAN is unable to tell whether the
memory is initialized by hardware. Explicitly call kmsan_handle_dma()
from vring_map_one_sg() in this case to prevent false positives.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
---
drivers/virtio/virtio_ring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6d2614e34470f..bf4d5b331e99d 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
+#include <linux/kmsan-checks.h>
#include <linux/spinlock.h>
#include <xen/xen.h>

@@ -331,8 +332,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
struct scatterlist *sg,
enum dma_data_direction direction)
{
- if (!vq->use_dma_api)
+ if (!vq->use_dma_api) {
+ /*
+ * If DMA is not used, KMSAN doesn't know that the scatterlist
+ * is initialized by the hardware. Explicitly check/unpoison it
+ * depending on the direction.
+ */
+ kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
return (dma_addr_t)sg_phys(sg);
+ }

/*
* We can't use dma_map_sg, because we don't use scatterlists in
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:48

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 27/43] x86: kmsan: add iomem support

Functions from lib/iomap.c and arch/x86/lib/iomem.c interact with hardware,
so KMSAN must ensure that:
- every read function returns an initialized value
- every write function checks values before sending them to hardware.

Signed-off-by: Alexander Potapenko <[email protected]>

---
Link: https://linux-review.googlesource.com/id/I45527599f09090aca046dfe1a26df453adab100d
---
arch/x86/lib/iomem.c | 5 +++++
lib/iomap.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index df50451d94ef7..2307770f3f4c8 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -1,6 +1,7 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#define movs(type,to,from) \
asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
@@ -37,6 +38,8 @@ void memcpy_fromio(void *to, const volatile void __iomem *from, size_t n)
n-=2;
}
rep_movs(to, (const void *)from, n);
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(to, n);
}
EXPORT_SYMBOL(memcpy_fromio);

@@ -45,6 +48,8 @@ void memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
if (unlikely(!n))
return;

+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(from, n);
/* Align any unaligned destination IO */
if (unlikely(1 & (unsigned long)to)) {
movs("b", to, from);
diff --git a/lib/iomap.c b/lib/iomap.c
index fbaa3e8f19d6c..bdda1a42771b2 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -6,6 +6,7 @@
*/
#include <linux/pci.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#include <linux/export.h>

@@ -70,26 +71,31 @@ static void bad_io_access(unsigned long port, const char *access)
#define mmio_read64be(addr) swab64(readq(addr))
#endif

+__no_sanitize_memory
unsigned int ioread8(const void __iomem *addr)
{
IO_COND(addr, return inb(port), return readb(addr));
return 0xff;
}
+__no_sanitize_memory
unsigned int ioread16(const void __iomem *addr)
{
IO_COND(addr, return inw(port), return readw(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread16be(const void __iomem *addr)
{
IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread32(const void __iomem *addr)
{
IO_COND(addr, return inl(port), return readl(addr));
return 0xffffffff;
}
+__no_sanitize_memory
unsigned int ioread32be(const void __iomem *addr)
{
IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
@@ -142,18 +148,21 @@ static u64 pio_read64be_hi_lo(unsigned long port)
return lo | (hi << 32);
}

+__no_sanitize_memory
u64 ioread64_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_lo_hi(port),
@@ -161,6 +170,7 @@ u64 ioread64be_lo_hi(const void __iomem *addr)
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_hi_lo(port),
@@ -188,22 +198,32 @@ EXPORT_SYMBOL(ioread64be_hi_lo);

void iowrite8(u8 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outb(val,port), writeb(val, addr));
}
void iowrite16(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outw(val,port), writew(val, addr));
}
void iowrite16be(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
}
void iowrite32(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outl(val,port), writel(val, addr));
}
void iowrite32be(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
@@ -239,24 +259,32 @@ static void pio_write64be_hi_lo(u64 val, unsigned long port)

void iowrite64_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_lo_hi(val, port),
writeq(val, addr));
}

void iowrite64_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_hi_lo(val, port),
writeq(val, addr));
}

void iowrite64be_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_lo_hi(val, port),
mmio_write64be(val, addr));
}

void iowrite64be_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_hi_lo(val, port),
mmio_write64be(val, addr));
}
@@ -328,14 +356,20 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count);
}
void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 2);
}
void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 4);
}
EXPORT_SYMBOL(ioread8_rep);
EXPORT_SYMBOL(ioread16_rep);
@@ -343,14 +377,20 @@ EXPORT_SYMBOL(ioread32_rep);

void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count);
IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
}
void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 2);
IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
}
void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 4);
IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
}
EXPORT_SYMBOL(iowrite8_rep);
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:53

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 28/43] kmsan: dma: unpoison DMA mappings

KMSAN doesn't know about DMA memory writes performed by devices.
We unpoison such memory when it's mapped to avoid false positive
reports.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ia162dc4c5a92e74d4686c1be32a4dfeffc5c32cd
---
kernel/dma/mapping.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 9478eccd1c8e6..0560080813761 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -156,6 +156,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ kmsan_handle_dma(page, offset, size, dir);
debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);

return addr;
@@ -194,11 +195,13 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);

- if (ents > 0)
+ if (ents > 0) {
+ kmsan_handle_dma_sg(sg, nents, dir);
debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
- else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
- ents != -EIO))
+ } else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
+ ents != -EIO)) {
return -EIO;
+ }

return ents;
}
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:23:55

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 29/43] kmsan: handle memory sent to/from USB

Depending on the value of is_out kmsan_handle_urb() KMSAN either
marks the data copied to the kernel from a USB device as initialized,
or checks the data sent to the device for being initialized.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Ifa67fb72015d4de14c30e971556f99fc8b2ee506
---
drivers/usb/core/urb.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 30727729a44cc..0e84acc9aea53 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -8,6 +8,7 @@
#include <linux/bitops.h>
#include <linux/slab.h>
#include <linux/log2.h>
+#include <linux/kmsan-checks.h>
#include <linux/usb.h>
#include <linux/wait.h>
#include <linux/usb/hcd.h>
@@ -426,6 +427,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL |
URB_DMA_SG_COMBINED);
urb->transfer_flags |= (is_out ? URB_DIR_OUT : URB_DIR_IN);
+ kmsan_handle_urb(urb, is_out);

if (xfertype != USB_ENDPOINT_XFER_CONTROL &&
dev->state < USB_STATE_CONFIGURED)
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:00

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 31/43] kmsan: disable strscpy() optimization under KMSAN

Disable the efficient 8-byte reading under KMSAN to avoid false positives.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Iffd8336965e88fce915db2e6a9d6524422975f69
---
lib/string.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 485777c9da832..4ece4c7e7831b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -197,6 +197,14 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
#endif

+ /*
+ * read_word_at_a_time() below may read uninitialized bytes after the
+ * trailing zero and use them in comparisons. Disable this optimization
+ * under KMSAN to prevent false positive reports.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ max = 0;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;

--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:03

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 32/43] crypto: kmsan: disable accelerated configs under KMSAN

KMSAN is unable to understand when initialized values come from assembly.
Disable accelerated configs in KMSAN builds to prevent false positive
reports.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Idb2334bf3a1b68b31b399709baefaa763038cc50
---
crypto/Kconfig | 30 ++++++++++++++++++++++++++++++
drivers/net/Kconfig | 1 +
2 files changed, 31 insertions(+)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 285f82647d2b7..c6c71acf7d56e 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -290,6 +290,7 @@ config CRYPTO_CURVE25519
config CRYPTO_CURVE25519_X86
tristate "x86_64 accelerated Curve25519 scalar multiplication library"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_CURVE25519_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CURVE25519

@@ -338,11 +339,13 @@ config CRYPTO_AEGIS128
config CRYPTO_AEGIS128_SIMD
bool "Support SIMD acceleration for AEGIS-128"
depends on CRYPTO_AEGIS128 && ((ARM || ARM64) && KERNEL_MODE_NEON)
+ depends on !KMSAN # avoid false positives from assembly
default y

config CRYPTO_AEGIS128_AESNI_SSE2
tristate "AEGIS-128 AEAD algorithm (x86_64 AESNI+SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_SIMD
help
@@ -478,6 +481,7 @@ config CRYPTO_NHPOLY1305
config CRYPTO_NHPOLY1305_SSE2
tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
SSE2 optimized implementation of the hash function used by the
@@ -486,6 +490,7 @@ config CRYPTO_NHPOLY1305_SSE2
config CRYPTO_NHPOLY1305_AVX2
tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_NHPOLY1305
help
AVX2 optimized implementation of the hash function used by the
@@ -599,6 +604,7 @@ config CRYPTO_CRC32C
config CRYPTO_CRC32C_INTEL
tristate "CRC32c INTEL hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
In Intel processor with SSE4.2 supported, the processor will
@@ -639,6 +645,7 @@ config CRYPTO_CRC32
config CRYPTO_CRC32_PCLMUL
tristate "CRC32 PCLMULQDQ hardware acceleration"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
select CRC32
help
@@ -704,6 +711,7 @@ config CRYPTO_BLAKE2S
config CRYPTO_BLAKE2S_X86
tristate "BLAKE2s digest algorithm (x86 accelerated version)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_BLAKE2S_GENERIC
select CRYPTO_ARCH_HAVE_LIB_BLAKE2S

@@ -718,6 +726,7 @@ config CRYPTO_CRCT10DIF
config CRYPTO_CRCT10DIF_PCLMUL
tristate "CRCT10DIF PCLMULQDQ hardware acceleration"
depends on X86 && 64BIT && CRC_T10DIF
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_HASH
help
For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
@@ -765,6 +774,7 @@ config CRYPTO_POLY1305
config CRYPTO_POLY1305_X86_64
tristate "Poly1305 authenticator algorithm (x86_64/SSE2/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_LIB_POLY1305_GENERIC
select CRYPTO_ARCH_HAVE_LIB_POLY1305
help
@@ -853,6 +863,7 @@ config CRYPTO_SHA1
config CRYPTO_SHA1_SSSE3
tristate "SHA1 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA1
select CRYPTO_HASH
help
@@ -864,6 +875,7 @@ config CRYPTO_SHA1_SSSE3
config CRYPTO_SHA256_SSSE3
tristate "SHA256 digest algorithm (SSSE3/AVX/AVX2/SHA-NI)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA256
select CRYPTO_HASH
help
@@ -876,6 +888,7 @@ config CRYPTO_SHA256_SSSE3
config CRYPTO_SHA512_SSSE3
tristate "SHA512 digest algorithm (SSSE3/AVX/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SHA512
select CRYPTO_HASH
help
@@ -1034,6 +1047,7 @@ config CRYPTO_WP512
config CRYPTO_GHASH_CLMUL_NI_INTEL
tristate "GHASH hash function (CLMUL-NI accelerated)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CRYPTD
help
This is the x86_64 CLMUL-NI accelerated implementation of
@@ -1084,6 +1098,7 @@ config CRYPTO_AES_TI
config CRYPTO_AES_NI_INTEL
tristate "AES cipher algorithms (AES-NI)"
depends on X86
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_AEAD
select CRYPTO_LIB_AES
select CRYPTO_ALGAPI
@@ -1208,6 +1223,7 @@ config CRYPTO_BLOWFISH_COMMON
config CRYPTO_BLOWFISH_X86_64
tristate "Blowfish cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_BLOWFISH_COMMON
imply CRYPTO_CTR
@@ -1238,6 +1254,7 @@ config CRYPTO_CAMELLIA
config CRYPTO_CAMELLIA_X86_64
tristate "Camellia cipher algorithm (x86_64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
imply CRYPTO_CTR
help
@@ -1254,6 +1271,7 @@ config CRYPTO_CAMELLIA_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAMELLIA_X86_64
select CRYPTO_SIMD
@@ -1272,6 +1290,7 @@ config CRYPTO_CAMELLIA_AESNI_AVX_X86_64
config CRYPTO_CAMELLIA_AESNI_AVX2_X86_64
tristate "Camellia cipher algorithm (x86_64/AES-NI/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_CAMELLIA_AESNI_AVX_X86_64
help
Camellia cipher algorithm module (x86_64/AES-NI/AVX2).
@@ -1317,6 +1336,7 @@ config CRYPTO_CAST5
config CRYPTO_CAST5_AVX_X86_64
tristate "CAST5 (CAST-128) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST5
select CRYPTO_CAST_COMMON
@@ -1340,6 +1360,7 @@ config CRYPTO_CAST6
config CRYPTO_CAST6_AVX_X86_64
tristate "CAST6 (CAST-256) cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_CAST6
select CRYPTO_CAST_COMMON
@@ -1373,6 +1394,7 @@ config CRYPTO_DES_SPARC64
config CRYPTO_DES3_EDE_X86_64
tristate "Triple DES EDE cipher algorithm (x86-64)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_DES
imply CRYPTO_CTR
@@ -1430,6 +1452,7 @@ config CRYPTO_CHACHA20
config CRYPTO_CHACHA20_X86_64
tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_LIB_CHACHA_GENERIC
select CRYPTO_ARCH_HAVE_LIB_CHACHA
@@ -1473,6 +1496,7 @@ config CRYPTO_SERPENT
config CRYPTO_SERPENT_SSE2_X86_64
tristate "Serpent cipher algorithm (x86_64/SSE2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1492,6 +1516,7 @@ config CRYPTO_SERPENT_SSE2_X86_64
config CRYPTO_SERPENT_SSE2_586
tristate "Serpent cipher algorithm (i586/SSE2)"
depends on X86 && !64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1511,6 +1536,7 @@ config CRYPTO_SERPENT_SSE2_586
config CRYPTO_SERPENT_AVX_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SERPENT
select CRYPTO_SIMD
@@ -1531,6 +1557,7 @@ config CRYPTO_SERPENT_AVX_X86_64
config CRYPTO_SERPENT_AVX2_X86_64
tristate "Serpent cipher algorithm (x86_64/AVX2)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SERPENT_AVX_X86_64
help
Serpent cipher algorithm, by Anderson, Biham & Knudsen.
@@ -1672,6 +1699,7 @@ config CRYPTO_TWOFISH_586
config CRYPTO_TWOFISH_X86_64
tristate "Twofish cipher algorithm (x86_64)"
depends on (X86 || UML_X86) && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_ALGAPI
select CRYPTO_TWOFISH_COMMON
imply CRYPTO_CTR
@@ -1689,6 +1717,7 @@ config CRYPTO_TWOFISH_X86_64
config CRYPTO_TWOFISH_X86_64_3WAY
tristate "Twofish cipher algorithm (x86_64, 3-way parallel)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_TWOFISH_COMMON
select CRYPTO_TWOFISH_X86_64
@@ -1709,6 +1738,7 @@ config CRYPTO_TWOFISH_X86_64_3WAY
config CRYPTO_TWOFISH_AVX_X86_64
tristate "Twofish cipher algorithm (x86_64/AVX)"
depends on X86 && 64BIT
+ depends on !KMSAN # avoid false positives from assembly
select CRYPTO_SKCIPHER
select CRYPTO_SIMD
select CRYPTO_TWOFISH_COMMON
diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 6cccc3dc00bcf..d09dabc607a69 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -76,6 +76,7 @@ config WIREGUARD
tristate "WireGuard secure network tunnel"
depends on NET && INET
depends on IPV6 || !IPV6
+ depends on !KMSAN # KMSAN doesn't support the crypto configs below
select NET_UDP_TUNNEL
select DST_CACHE
select CRYPTO
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:07

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 33/43] kmsan: disable physical page merging in biovec

KMSAN metadata for consequent physical pages may be inconsequent,
therefore accessing such pages together may lead to metadata
corruption.
We disable merging pages in biovec to prevent such corruptions.

Signed-off-by: Alexander Potapenko <[email protected]>
---

Link: https://linux-review.googlesource.com/id/Iece16041be5ee47904fbc98121b105e5be5fea5c
---
block/blk.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index ccde6e6f17360..e0c62a5d5639e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -103,6 +103,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;

+ /*
+ * Merging consequent physical pages may not work correctly under KMSAN
+ * if their metadata pages aren't consequent. Just disable merging.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ return false;
+
if (addr1 + vec1->bv_len != addr2)
return false;
if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:10

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 34/43] kmsan: block: skip bio block merging logic for KMSAN

KMSAN doesn't allow treating adjacent memory pages as such, if they were
allocated by different alloc_pages() calls.
The block layer however does so: adjacent pages end up being used
together. To prevent this, make page_is_mergeable() return false under
KMSAN.

Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Ie29cc2464c70032347c32ab2a22e1e7a0b37b905
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 15ab0d6d1c06e..b94283463196d 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -805,6 +805,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return false;

*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+ if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
+ return false;
if (*same_page)
return true;
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:19

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 30/43] kmsan: add tests for KMSAN

The testing module triggers KMSAN warnings in different cases and checks
that the errors are properly reported, using console probes to capture
the tool's output.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I49c3f59014cc37fd13541c80beb0b75a75244650
---
lib/Kconfig.kmsan | 16 ++
mm/kmsan/Makefile | 4 +
mm/kmsan/kmsan_test.c | 444 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 464 insertions(+)
create mode 100644 mm/kmsan/kmsan_test.c

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
index 02fd6db792b1f..940598f60b3a6 100644
--- a/lib/Kconfig.kmsan
+++ b/lib/Kconfig.kmsan
@@ -16,3 +16,19 @@ config KMSAN
instrumentation provided by Clang and thus requires Clang to build.

See <file:Documentation/dev-tools/kmsan.rst> for more details.
+
+if KMSAN
+
+config KMSAN_KUNIT_TEST
+ tristate "KMSAN integration test suite" if !KUNIT_ALL_TESTS
+ default KUNIT_ALL_TESTS
+ depends on TRACEPOINTS && KUNIT
+ help
+ Test suite for KMSAN, testing various error detection scenarios,
+ and checking that reports are correctly output to console.
+
+ Say Y here if you want the test to be built into the kernel and run
+ during boot; say M if you want the test to build as a module; say N
+ if you are unsure.
+
+endif
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index f57a956cb1c8b..7be6a7e92394f 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -20,3 +20,7 @@ CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
+
+obj-$(CONFIG_KMSAN_KUNIT_TEST) += kmsan_test.o
+KMSAN_SANITIZE_kmsan_test.o := y
+CFLAGS_kmsan_test.o += $(call cc-disable-warning, uninitialized)
diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
new file mode 100644
index 0000000000000..caf1094411487
--- /dev/null
+++ b/mm/kmsan/kmsan_test.c
@@ -0,0 +1,444 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KMSAN.
+ * For each test case checks the presence (or absence) of generated reports.
+ * Relies on 'console' tracepoint to capture reports as they appear in the
+ * kernel log.
+ *
+ * Copyright (C) 2021, Google LLC.
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <kunit/test.h>
+#include "kmsan.h"
+
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+static DEFINE_PER_CPU(int, per_cpu_var);
+
+/* Report as observed from console. */
+static struct {
+ spinlock_t lock;
+ bool available;
+ bool ignore; /* Stop console output collection. */
+ char header[256];
+} observed = {
+ .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+ unsigned long flags;
+
+ if (observed.ignore)
+ return;
+ spin_lock_irqsave(&observed.lock, flags);
+
+ if (strnstr(buf, "BUG: KMSAN: ", len)) {
+ /*
+ * KMSAN report and related to the test.
+ *
+ * The provided @buf is not NUL-terminated; copy no more than
+ * @len bytes and let strscpy() add the missing NUL-terminator.
+ */
+ strscpy(observed.header, buf,
+ min(len + 1, sizeof(observed.header)));
+ WRITE_ONCE(observed.available, true);
+ observed.ignore = true;
+ }
+ spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+ return READ_ONCE(observed.available);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+ const char *error_type; /* Error type. */
+ /*
+ * Kernel symbol from the error header, or NULL if no report is
+ * expected.
+ */
+ const char *symbol;
+};
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+ typeof(observed.header) expected_header;
+ unsigned long flags;
+ bool ret = false;
+ const char *end;
+ char *cur;
+
+ /* Doubled-checked locking. */
+ if (!report_available() || !r->symbol)
+ return (!report_available() && !r->symbol);
+
+ /* Generate expected report contents. */
+
+ /* Title */
+ cur = expected_header;
+ end = &expected_header[sizeof(expected_header) - 1];
+
+ cur += scnprintf(cur, end - cur, "BUG: KMSAN: %s", r->error_type);
+
+ scnprintf(cur, end - cur, " in %s", r->symbol);
+ /* The exact offset won't match, remove it; also strip module name. */
+ cur = strchr(expected_header, '+');
+ if (cur)
+ *cur = '\0';
+
+ spin_lock_irqsave(&observed.lock, flags);
+ if (!report_available())
+ goto out; /* A new report is being captured. */
+
+ /* Finally match expected output to what we actually observed. */
+ ret = strstr(observed.header, expected_header);
+out:
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return ret;
+}
+
+/* ===== Test cases ===== */
+
+/* Prevent replacing branch with select in LLVM. */
+static noinline void check_true(char *arg)
+{
+ pr_info("%s is true\n", arg);
+}
+
+static noinline void check_false(char *arg)
+{
+ pr_info("%s is false\n", arg);
+}
+
+#define USE(x) \
+ do { \
+ if (x) \
+ check_true(#x); \
+ else \
+ check_false(#x); \
+ } while (0)
+
+#define EXPECTATION_ETYPE_FN(e, reason, fn) \
+ struct expect_report e = { \
+ .error_type = reason, \
+ .symbol = fn, \
+ }
+
+#define EXPECTATION_NO_REPORT(e) EXPECTATION_ETYPE_FN(e, NULL, NULL)
+#define EXPECTATION_UNINIT_VALUE_FN(e, fn) \
+ EXPECTATION_ETYPE_FN(e, "uninit-value", fn)
+#define EXPECTATION_UNINIT_VALUE(e) EXPECTATION_UNINIT_VALUE_FN(e, __func__)
+#define EXPECTATION_USE_AFTER_FREE(e) \
+ EXPECTATION_ETYPE_FN(e, "use-after-free", __func__)
+
+static int signed_sum3(int a, int b, int c)
+{
+ return a + b + c;
+}
+
+static void test_uninit_kmalloc(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ int *ptr;
+
+ kunit_info(test, "uninitialized kmalloc test (UMR report)\n");
+ ptr = kmalloc(sizeof(int), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_init_kmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kmalloc test (no reports)\n");
+ ptr = kmalloc(sizeof(int), GFP_KERNEL);
+ memset(ptr, 0, sizeof(int));
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_init_kzalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kzalloc test (no reports)\n");
+ ptr = kzalloc(sizeof(int), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_uninit_multiple_args(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile char b = 3, c;
+ volatile int a;
+
+ kunit_info(test, "uninitialized local passed to fn (UMR report)\n");
+ USE(signed_sum3(a, b, c));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_uninit_stack_var(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int cond;
+
+ kunit_info(test, "uninitialized stack variable (UMR report)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_init_stack_var(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ volatile int cond = 1;
+
+ kunit_info(test, "initialized stack variable (no reports)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void two_param_fn_2(int arg1, int arg2)
+{
+ USE(arg1);
+ USE(arg2);
+}
+
+static noinline void one_param_fn(int arg)
+{
+ two_param_fn_2(arg, arg);
+ USE(arg);
+}
+
+static noinline void two_param_fn(int arg1, int arg2)
+{
+ int init = 0;
+
+ one_param_fn(init);
+ USE(arg1);
+ USE(arg2);
+}
+
+static void test_params(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "two_param_fn");
+ volatile int uninit, init = 1;
+
+ kunit_info(test,
+ "uninit passed through a function parameter (UMR report)\n");
+ two_param_fn(uninit, init);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void do_uninit_local_array(char *array, int start, int stop)
+{
+ volatile char uninit;
+ int i;
+
+ for (i = start; i < stop; i++)
+ array[i] = uninit;
+}
+
+static void test_uninit_kmsan_check_memory(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_uninit_kmsan_check_memory");
+ volatile char local_array[8];
+
+ kunit_info(
+ test,
+ "kmsan_check_memory() called on uninit local (UMR report)\n");
+ do_uninit_local_array((char *)local_array, 5, 7);
+
+ kmsan_check_memory((char *)local_array, 8);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_init_kmsan_vmap_vunmap(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ const int npages = 2;
+ struct page **pages;
+ void *vbuf;
+ int i;
+
+ kunit_info(test, "pages initialized via vmap (no reports)\n");
+
+ pages = kmalloc_array(npages, sizeof(struct page), GFP_KERNEL);
+ for (i = 0; i < npages; i++)
+ pages[i] = alloc_page(GFP_KERNEL);
+ vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+ memset(vbuf, 0xfe, npages * PAGE_SIZE);
+ for (i = 0; i < npages; i++)
+ kmsan_check_memory(page_address(pages[i]), PAGE_SIZE);
+
+ if (vbuf)
+ vunmap(vbuf);
+ for (i = 0; i < npages; i++)
+ if (pages[i])
+ __free_page(pages[i]);
+ kfree(pages);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_init_vmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int npages = 8, i;
+ char *buf;
+
+ kunit_info(test, "pages initialized via vmap (no reports)\n");
+ buf = vmalloc(PAGE_SIZE * npages);
+ buf[0] = 1;
+ memset(buf, 0xfe, PAGE_SIZE * npages);
+ USE(buf[0]);
+ for (i = 0; i < npages; i++)
+ kmsan_check_memory(&buf[PAGE_SIZE * i], PAGE_SIZE);
+ vfree(buf);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_uaf(struct kunit *test)
+{
+ EXPECTATION_USE_AFTER_FREE(expect);
+ volatile int value;
+ volatile int *var;
+
+ kunit_info(test, "use-after-free in kmalloc-ed buffer (UMR report)\n");
+ var = kmalloc(80, GFP_KERNEL);
+ var[3] = 0xfeedface;
+ kfree((int *)var);
+ /* Copy the invalid value before checking it. */
+ value = var[3];
+ USE(value);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_percpu_propagate(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int uninit, check;
+
+ kunit_info(test,
+ "uninit local stored to per_cpu memory (UMR report)\n");
+
+ this_cpu_write(per_cpu_var, uninit);
+ check = this_cpu_read(per_cpu_var);
+ USE(check);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static void test_printk(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "number");
+ volatile int uninit;
+
+ kunit_info(test, "uninit local passed to pr_info() (UMR report)\n");
+ pr_info("%px contains %d\n", &uninit, uninit);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static struct kunit_case kmsan_test_cases[] = {
+ KUNIT_CASE(test_uninit_kmalloc),
+ KUNIT_CASE(test_init_kmalloc),
+ KUNIT_CASE(test_init_kzalloc),
+ KUNIT_CASE(test_uninit_multiple_args),
+ KUNIT_CASE(test_uninit_stack_var),
+ KUNIT_CASE(test_init_stack_var),
+ KUNIT_CASE(test_params),
+ KUNIT_CASE(test_uninit_kmsan_check_memory),
+ KUNIT_CASE(test_init_kmsan_vmap_vunmap),
+ KUNIT_CASE(test_init_vmalloc),
+ KUNIT_CASE(test_uaf),
+ KUNIT_CASE(test_percpu_propagate),
+ KUNIT_CASE(test_printk),
+ {},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&observed.lock, flags);
+ observed.header[0] = '\0';
+ observed.ignore = false;
+ observed.available = false;
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+}
+
+static struct kunit_suite kmsan_test_suite = {
+ .name = "kmsan",
+ .test_cases = kmsan_test_cases,
+ .init = test_init,
+ .exit = test_exit,
+};
+static struct kunit_suite *kmsan_test_suites[] = { &kmsan_test_suite, NULL };
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ check_trace_callback_type_console(probe_console);
+ if (!strcmp(tp->name, "console"))
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ if (!strcmp(tp->name, "console"))
+ tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+/*
+ * We only want to do tracepoints setup and teardown once, therefore we have to
+ * customize the init and exit functions and cannot rely on kunit_test_suite().
+ */
+static int __init kmsan_test_init(void)
+{
+ /*
+ * Because we want to be able to build the test as a module, we need to
+ * iterate through all known tracepoints, since the static registration
+ * won't work here.
+ */
+ for_each_kernel_tracepoint(register_tracepoints, NULL);
+ return __kunit_test_suites_init(kmsan_test_suites);
+}
+
+static void kmsan_test_exit(void)
+{
+ __kunit_test_suites_exit(kmsan_test_suites);
+ for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+ tracepoint_synchronize_unregister();
+}
+
+late_initcall_sync(kmsan_test_init);
+module_exit(kmsan_test_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alexander Potapenko <[email protected]>");
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:21

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 35/43] x86: kmsan: use __msan_ string functions where possible.

Unless stated otherwise (by explicitly calling __memcpy(), __memset() or
__memmove()) we want all string functions to call their __msan_ versions
(e.g. __msan_memcpy() instead of memcpy()), so that shadow and origin
values are updated accordingly.

Bootloader must still use the default string functions to avoid crashes.

Signed-off-by: Alexander Potapenko <[email protected]>
---

Link: https://linux-review.googlesource.com/id/I7ca9bd6b4f5c9b9816404862ae87ca7984395f33
---
arch/x86/include/asm/string_64.h | 23 +++++++++++++++++++++--
include/linux/fortify-string.h | 2 ++
2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/string_64.h b/arch/x86/include/asm/string_64.h
index 6e450827f677a..3b87d889b6e16 100644
--- a/arch/x86/include/asm/string_64.h
+++ b/arch/x86/include/asm/string_64.h
@@ -11,11 +11,23 @@
function. */

#define __HAVE_ARCH_MEMCPY 1
+#if defined(__SANITIZE_MEMORY__)
+#undef memcpy
+void *__msan_memcpy(void *dst, const void *src, size_t size);
+#define memcpy __msan_memcpy
+#else
extern void *memcpy(void *to, const void *from, size_t len);
+#endif
extern void *__memcpy(void *to, const void *from, size_t len);

#define __HAVE_ARCH_MEMSET
+#if defined(__SANITIZE_MEMORY__)
+extern void *__msan_memset(void *s, int c, size_t n);
+#undef memset
+#define memset __msan_memset
+#else
void *memset(void *s, int c, size_t n);
+#endif
void *__memset(void *s, int c, size_t n);

#define __HAVE_ARCH_MEMSET16
@@ -55,7 +67,13 @@ static inline void *memset64(uint64_t *s, uint64_t v, size_t n)
}

#define __HAVE_ARCH_MEMMOVE
+#if defined(__SANITIZE_MEMORY__)
+#undef memmove
+void *__msan_memmove(void *dest, const void *src, size_t len);
+#define memmove __msan_memmove
+#else
void *memmove(void *dest, const void *src, size_t count);
+#endif
void *__memmove(void *dest, const void *src, size_t count);

int memcmp(const void *cs, const void *ct, size_t count);
@@ -64,8 +82,7 @@ char *strcpy(char *dest, const char *src);
char *strcat(char *dest, const char *src);
int strcmp(const char *cs, const char *ct);

-#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
-
+#if (defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__))
/*
* For files that not instrumented (e.g. mm/slub.c) we
* should use not instrumented version of mem* functions.
@@ -73,7 +90,9 @@ int strcmp(const char *cs, const char *ct);

#undef memcpy
#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#undef memmove
#define memmove(dst, src, len) __memmove(dst, src, len)
+#undef memset
#define memset(s, c, n) __memset(s, c, n)

#ifndef __NO_FORTIFY
diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h
index a6cd6815f2490..b2c74cb85e20e 100644
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@ -198,6 +198,7 @@ __FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count)
return p;
}

+#ifndef CONFIG_KMSAN
__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size)
{
size_t p_size = __builtin_object_size(p, 0);
@@ -240,6 +241,7 @@ __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size)
fortify_panic(__func__);
return __underlying_memmove(p, q, size);
}
+#endif

extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan);
__FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size)
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:26

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 37/43] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN

This is needed to allow memory tools like KASAN and KMSAN see the
memory accesses from the checksum code. Without CONFIG_GENERIC_CSUM the
tools can't see memory accesses originating from handwritten assembly
code.
For KASAN it's a question of detecting more bugs, for KMSAN using the C
implementation also helps avoid false positives originating from
seemingly uninitialized checksum values.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/I3e95247be55b1112af59dbba07e8cbf34e50a581
---
arch/x86/Kconfig | 4 ++++
arch/x86/include/asm/checksum.h | 16 ++++++++++------
arch/x86/lib/Makefile | 2 ++
3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2efb..760570ff3f3e4 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -310,6 +310,10 @@ config GENERIC_ISA_DMA
def_bool y
depends on ISA_DMA_API

+config GENERIC_CSUM
+ bool
+ default y if KMSAN || KASAN
+
config GENERIC_BUG
def_bool y
depends on BUG
diff --git a/arch/x86/include/asm/checksum.h b/arch/x86/include/asm/checksum.h
index bca625a60186c..6df6ece8a28ec 100644
--- a/arch/x86/include/asm/checksum.h
+++ b/arch/x86/include/asm/checksum.h
@@ -1,9 +1,13 @@
/* SPDX-License-Identifier: GPL-2.0 */
-#define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
-#define HAVE_CSUM_COPY_USER
-#define _HAVE_ARCH_CSUM_AND_COPY
-#ifdef CONFIG_X86_32
-# include <asm/checksum_32.h>
+#ifdef CONFIG_GENERIC_CSUM
+# include <asm-generic/checksum.h>
#else
-# include <asm/checksum_64.h>
+# define _HAVE_ARCH_COPY_AND_CSUM_FROM_USER 1
+# define HAVE_CSUM_COPY_USER
+# define _HAVE_ARCH_CSUM_AND_COPY
+# ifdef CONFIG_X86_32
+# include <asm/checksum_32.h>
+# else
+# include <asm/checksum_64.h>
+# endif
#endif
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index c6506c6a70922..81be8498353a6 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -66,7 +66,9 @@ endif
lib-$(CONFIG_X86_USE_3DNOW) += mmx_32.o
else
obj-y += iomap_copy_64.o
+ifneq ($(CONFIG_GENERIC_CSUM),y)
lib-y += csum-partial_64.o csum-copy_64.o csum-wrappers_64.o
+endif
lib-y += clear_page_64.o copy_page_64.o
lib-y += memmove_64.o memset_64.o
lib-y += copy_user_64.o
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:28

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 36/43] x86: kmsan: sync metadata pages on page fault

KMSAN assumes shadow and origin pages for every allocated page are
accessible. For pages between [VMALLOC_START, VMALLOC_END] those metadata
pages start at KMSAN_VMALLOC_SHADOW_START and
KMSAN_VMALLOC_ORIGIN_START, therefore we must sync a bigger memory
region.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Ia5bd541e54f1ecc11b86666c3ec87c62ac0bdfb8
---
arch/x86/mm/fault.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 4bfed53e210ec..abed0aedf00d2 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -260,7 +260,7 @@ static noinline int vmalloc_fault(unsigned long address)
}
NOKPROBE_SYMBOL(vmalloc_fault);

-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+void __arch_sync_kernel_mappings(unsigned long start, unsigned long end)
{
unsigned long addr;

@@ -284,6 +284,26 @@ void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
}
}

+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+ __arch_sync_kernel_mappings(start, end);
+ /*
+ * KMSAN maintains two additional metadata page mappings for the
+ * [VMALLOC_START, VMALLOC_END) range. These mappings start at
+ * KMSAN_VMALLOC_SHADOW_START and KMSAN_VMALLOC_ORIGIN_START and
+ * have to be synced together with the vmalloc memory mapping.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN) &&
+ start >= VMALLOC_START && end < VMALLOC_END) {
+ __arch_sync_kernel_mappings(
+ start - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START,
+ end - VMALLOC_START + KMSAN_VMALLOC_SHADOW_START);
+ __arch_sync_kernel_mappings(
+ start - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START,
+ end - VMALLOC_START + KMSAN_VMALLOC_ORIGIN_START);
+ }
+}
+
static bool low_pfn(unsigned long pfn)
{
return pfn < max_low_pfn;
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:31

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 38/43] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS

dentry_string_cmp() calls read_word_at_a_time(), which might read
uninitialized bytes to optimize string comparisons.
Disabling CONFIG_DCACHE_WORD_ACCESS should prohibit this optimization,
as well as (probably) similar ones.

Suggested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I4c0073224ac2897cafb8c037362c49dda9cfa133
---
arch/x86/Kconfig | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 760570ff3f3e4..0dc77352bc3c9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -125,7 +125,9 @@ config X86
select CLKEVT_I8253
select CLOCKSOURCE_VALIDATE_LAST_CYCLE
select CLOCKSOURCE_WATCHDOG
- select DCACHE_WORD_ACCESS
+ # Word-size accesses may read uninitialized data past the trailing \0
+ # in strings and cause false KMSAN reports.
+ select DCACHE_WORD_ACCESS if !KMSAN
select DYNAMIC_SIGFRAME
select EDAC_ATOMIC_SCRUB
select EDAC_SUPPORT
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:39

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 39/43] x86: kmsan: handle register passing from uninstrumented code

When calling KMSAN-instrumented functions from non-instrumented
functions, function parameters may not be initialized properly, leading
to false positive reports. In particular, this happens all the time when
calling interrupt handlers from `noinstr` IDT entries.

Fortunately, x86 code has instrumentation_begin() and
instrumentation_end(), which denote the regions where `noinstr` code
calls potentially instrumented code. We add calls to
kmsan_instrumentation_begin() to those regions, which:
- wipe the current KMSAN state at the beginning of the region, ensuring
that the first call of an instrumented function receives initialized
parameters (this is a pretty good approximation of having all other
instrumented functions receive initialized parameters);
- unpoison the `struct pt_regs` set up by the non-instrumented assembly
code.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I435ec076cd21752c2f877f5da81f5eced62a2ea4
---
arch/x86/entry/common.c | 2 ++
arch/x86/include/asm/idtentry.h | 5 +++++
arch/x86/kernel/cpu/mce/core.c | 1 +
arch/x86/kernel/kvm.c | 1 +
arch/x86/kernel/nmi.c | 1 +
arch/x86/kernel/sev.c | 2 ++
arch/x86/kernel/traps.c | 7 +++++++
arch/x86/mm/fault.c | 1 +
kernel/entry/common.c | 3 +++
9 files changed, 23 insertions(+)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b337..a0f90588c514e 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/errno.h>
+#include <linux/kmsan.h>
#include <linux/ptrace.h>
#include <linux/export.h>
#include <linux/nospec.h>
@@ -76,6 +77,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
nr = syscall_enter_from_user_mode(regs, nr);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
/* Invalid system call, but still a system call. */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 1345088e99025..f025fdc0f25df 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -52,6 +52,7 @@ __visible noinstr void func(struct pt_regs *regs) \
irqentry_state_t state = irqentry_enter(regs); \
\
instrumentation_begin(); \
+ kmsan_instrumentation_begin(regs); \
__##func (regs); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -99,6 +100,7 @@ __visible noinstr void func(struct pt_regs *regs, \
irqentry_state_t state = irqentry_enter(regs); \
\
instrumentation_begin(); \
+ kmsan_instrumentation_begin(regs); \
__##func (regs, error_code); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -196,6 +198,7 @@ __visible noinstr void func(struct pt_regs *regs, \
u32 vector = (u32)(u8)error_code; \
\
instrumentation_begin(); \
+ kmsan_instrumentation_begin(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_irq_on_irqstack_cond(__##func, regs, vector); \
instrumentation_end(); \
@@ -236,6 +239,7 @@ __visible noinstr void func(struct pt_regs *regs) \
irqentry_state_t state = irqentry_enter(regs); \
\
instrumentation_begin(); \
+ kmsan_instrumentation_begin(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_sysvec_on_irqstack_cond(__##func, regs); \
instrumentation_end(); \
@@ -263,6 +267,7 @@ __visible noinstr void func(struct pt_regs *regs) \
irqentry_state_t state = irqentry_enter(regs); \
\
instrumentation_begin(); \
+ kmsan_instrumentation_begin(regs); \
__irq_enter_raw(); \
kvm_set_cpu_l1tf_flush_l1d(); \
__##func (regs); \
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 6ed365337a3b1..b49e2c6bb8ca2 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1314,6 +1314,7 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba
static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
instrumentation_end();
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 59abbdad7729c..55ffe1bc73b00 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -250,6 +250,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)

state = irqentry_enter(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

/*
* If the host managed to inject an async #PF into an interrupt
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 4bce802d25fb1..d91327d271359 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -330,6 +330,7 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
__this_cpu_write(last_nmi_rip, regs->ip);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index a9fc2ac7a8bd5..421d59b982cae 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1426,6 +1426,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)
irq_state = irqentry_nmi_enter(regs);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/* Show some debug info */
@@ -1458,6 +1459,7 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)

irqentry_enter_from_user_mode(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index c9d566dcf89a0..3a821010def63 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -230,6 +230,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
* All lies, just get the WARN/BUG out.
*/
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
/*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
@@ -261,6 +262,7 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)

state = irqentry_enter(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
handle_invalid_op(regs);
instrumentation_end();
irqentry_exit(regs, state);
@@ -415,6 +417,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)

irqentry_nmi_enter(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);

tsk->thread.error_code = error_code;
@@ -690,6 +693,7 @@ DEFINE_IDTENTRY_RAW(exc_int3)
if (user_mode(regs)) {
irqentry_enter_from_user_mode(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
do_int3_user(regs);
instrumentation_end();
irqentry_exit_to_user_mode(regs);
@@ -697,6 +701,7 @@ DEFINE_IDTENTRY_RAW(exc_int3)
irqentry_state_t irq_state = irqentry_nmi_enter(regs);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
if (!do_int3(regs))
die("int3", regs, 0);
instrumentation_end();
@@ -896,6 +901,7 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
unsigned long dr7 = local_db_save();
irqentry_state_t irq_state = irqentry_nmi_enter(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

/*
* If something gets miswired and we end up here for a user mode
@@ -975,6 +981,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,

irqentry_enter_from_user_mode(regs);
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);

/*
* Start the virtual/ptrace DR6 value with just the DR_STEP mask
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index abed0aedf00d2..0437d2fe31ecb 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1558,6 +1558,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
state = irqentry_enter(regs);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
handle_page_fault(regs, error_code, address);
instrumentation_end();

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index d5a61d565ad5d..3a569ea5a78fb 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -104,6 +104,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
__enter_from_user_mode(regs);

instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
local_irq_enable();
ret = __syscall_enter_from_user_work(regs, syscall);
instrumentation_end();
@@ -297,6 +298,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs)
__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
{
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
__syscall_exit_to_user_mode_work(regs);
instrumentation_end();
__exit_to_user_mode();
@@ -310,6 +312,7 @@ noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)
noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs)
{
instrumentation_begin();
+ kmsan_instrumentation_begin(regs);
exit_to_user_mode_prepare(regs);
instrumentation_end();
__exit_to_user_mode();
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:24:42

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 40/43] kmsan: kcov: unpoison area->list in kcov_remote_area_put()

KMSAN does not instrument kernel/kcov.c for performance reasons (with
CONFIG_KCOV=y virtually every place in the kernel invokes kcov
instrumentation). Therefore the tool may miss writes from kcov.c that
initialize memory.

When CONFIG_DEBUG_LIST is enabled, list pointers from kernel/kcov.c are
passed to instrumented helpers in lib/list_debug.c, resulting in false
positives.

To work around these reports, we unpoison the contents of area->list after
initializing it.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ie17f2ee47a7af58f5cdf716d585ebf0769348a5a
---
kernel/kcov.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index 36ca640c4f8e7..88ffdddc99ba1 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/hashtable.h>
#include <linux/init.h>
+#include <linux/kmsan-checks.h>
#include <linux/mm.h>
#include <linux/preempt.h>
#include <linux/printk.h>
@@ -152,6 +153,12 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
INIT_LIST_HEAD(&area->list);
area->size = size;
list_add(&area->list, &kcov_remote_areas);
+ /*
+ * KMSAN doesn't instrument this file, so it may not know area->list
+ * is initialized. Unpoison it explicitly to avoid reports in
+ * kcov_remote_area_get().
+ */
+ kmsan_unpoison_memory(&area->list, sizeof(struct list_head));
}

static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:25:15

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 41/43] security: kmsan: fix interoperability with auto-initialization

Heap and stack initialization is great, but not when we are trying
uses of uninitialized memory. When the kernel is built with KMSAN,
having kernel memory initialization enabled may introduce false
negatives.

We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
under CONFIG_KMSAN, making it impossible to auto-initialize stack
variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
auto-initialization.

We however still let the users enable heap auto-initialization at
boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
a warning is printed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
---
mm/page_alloc.c | 4 ++++
security/Kconfig.hardening | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fa8029b714a81..4218dea0c76a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -855,6 +855,10 @@ void init_mem_debugging_and_hardening(void)
else
static_branch_disable(&init_on_free);

+ if (IS_ENABLED(CONFIG_KMSAN) &&
+ (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
+ pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
+
#ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index d051f8ceefddd..bd13a46024457 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -106,6 +106,7 @@ choice
config INIT_STACK_ALL_PATTERN
bool "pattern-init everything (strongest)"
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a specific debug value. This is intended to eliminate
@@ -124,6 +125,7 @@ choice
config INIT_STACK_ALL_ZERO
bool "zero-init everything (strongest and safest)"
depends on CC_HAS_AUTO_VAR_INIT_ZERO
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a zero value. This is intended to eliminate all
@@ -208,6 +210,7 @@ config STACKLEAK_RUNTIME_DISABLE

config INIT_ON_ALLOC_DEFAULT_ON
bool "Enable heap memory zeroing on allocation by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_alloc=1" on the kernel
command line. This can be disabled with "init_on_alloc=0".
@@ -220,6 +223,7 @@ config INIT_ON_ALLOC_DEFAULT_ON

config INIT_ON_FREE_DEFAULT_ON
bool "Enable heap memory zeroing on free by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_free=1" on the kernel
command line. This can be disabled with "init_on_free=0".
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:25:26

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 42/43] objtool: kmsan: list KMSAN API functions as uaccess-safe

KMSAN inserts API function calls in a lot of places (function entries
and exits, local variables, memory accesses), so they may get called
from the uaccess regions as well.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
---
tools/objtool/check.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 21735829b860c..9620b5224754e 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -937,6 +937,25 @@ static const char *uaccess_safe_builtin[] = {
"__sanitizer_cov_trace_cmp4",
"__sanitizer_cov_trace_cmp8",
"__sanitizer_cov_trace_switch",
+ /* KMSAN */
+ "kmsan_copy_to_user",
+ "kmsan_report",
+ "kmsan_unpoison_memory",
+ "__msan_chain_origin",
+ "__msan_get_context_state",
+ "__msan_instrument_asm_store",
+ "__msan_metadata_ptr_for_load_1",
+ "__msan_metadata_ptr_for_load_2",
+ "__msan_metadata_ptr_for_load_4",
+ "__msan_metadata_ptr_for_load_8",
+ "__msan_metadata_ptr_for_load_n",
+ "__msan_metadata_ptr_for_store_1",
+ "__msan_metadata_ptr_for_store_2",
+ "__msan_metadata_ptr_for_store_4",
+ "__msan_metadata_ptr_for_store_8",
+ "__msan_metadata_ptr_for_store_n",
+ "__msan_poison_alloca",
+ "__msan_warning",
/* UBSAN */
"ubsan_type_mismatch_common",
"__ubsan_handle_type_mismatch",
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:25:42

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH 43/43] x86: kmsan: enable KMSAN builds for x86

Make KMSAN usable by adding the necessary Kconfig bits.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I1d295ce8159ce15faa496d20089d953a919c125e
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0dc77352bc3c9..b5740d0ab0eb9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -165,6 +165,7 @@ config X86
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if X86_64
select HAVE_ARCH_KFENCE
+ select HAVE_ARCH_KMSAN if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
--
2.34.1.173.g76aa8bc2d0-goog


2021-12-14 16:34:09

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

On Tue, Dec 14, 2021 at 05:20:20PM +0100, Alexander Potapenko wrote:
> This patch adds the core parts of KMSAN runtime and associated files:
>
> - include/linux/kmsan-checks.h: user API to poison/unpoison/check
> the kernel memory;
> - include/linux/kmsan.h: declarations of KMSAN hooks to be referenced
> outside of KMSAN runtime;
> - lib/Kconfig.kmsan: CONFIG_KMSAN and related declarations;
> - Makefile, mm/Makefile, mm/kmsan/Makefile: boilerplate Makefile code;
> - mm/kmsan/annotations.c: non-inlineable implementation of KMSAN_INIT();
> - mm/kmsan/core.c: core functions that operate with shadow and origin
> memory and perform checks, utility functions;
> - mm/kmsan/hooks.c: KMSAN hooks for kernel subsystems;
> - mm/kmsan/init.c: KMSAN initialization routines;
> - mm/kmsan/instrumentation.c: functions called by KMSAN instrumentation;
> - mm/kmsan/kmsan.h: internal KMSAN declarations;
> - mm/kmsan/shadow.c: routines that encapsulate metadata creation and
> addressing;
> - scripts/Makefile.kmsan: CFLAGS_KMSAN
> - scripts/Makefile.lib: KMSAN_SANITIZE and KMSAN_ENABLE_CHECKS macros


That's an odd way to write a changelog, don't you think?

You need to describe what you are doing here and why you are doing it.
Not a list of file names, we can see that in the diffstat.

Also, you don't mention you are doing USB stuff here at all. And why
are you doing it here? That should be added in a later patch.

Break this up into smaller, logical, pieces that add the infrastructure
and build on it. Don't just chop your patches up on a logical-file
boundry, as you are adding stuff in this patch that you do not need for
many more later on, which means it was not needed here.

thanks,

greg k-h

2021-12-14 16:36:51

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 00/43] Add KernelMemorySanitizer infrastructure

On Tue, Dec 14, 2021 at 05:20:07PM +0100, Alexander Potapenko wrote:
> KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
> uninitialized memory. It relies on compile-time Clang instrumentation
> (similar to MSan in the userspace [1]) and tracks the state of every bit
> of kernel memory, being able to report an error if uninitialized value is
> used in a condition, dereferenced, or escapes to userspace, USB or DMA.

Why is USB unique here? What about serial data? i2c? spi? w1? We
have a lot of different I/O bus types :)

And how is DMA checked given that the kernel shouldn't be seeing dma
memory?

thanks,

greg k-h

2021-12-14 16:38:11

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 41/43] security: kmsan: fix interoperability with auto-initialization

On Tue, Dec 14, 2021 at 05:20:48PM +0100, Alexander Potapenko wrote:
> Heap and stack initialization is great, but not when we are trying
> uses of uninitialized memory. When the kernel is built with KMSAN,
> having kernel memory initialization enabled may introduce false
> negatives.
>
> We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
> under CONFIG_KMSAN, making it impossible to auto-initialize stack
> variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
> and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
> auto-initialization.
>
> We however still let the users enable heap auto-initialization at
> boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
> a warning is printed.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
> ---
> mm/page_alloc.c | 4 ++++
> security/Kconfig.hardening | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index fa8029b714a81..4218dea0c76a2 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -855,6 +855,10 @@ void init_mem_debugging_and_hardening(void)
> else
> static_branch_disable(&init_on_free);
>
> + if (IS_ENABLED(CONFIG_KMSAN) &&
> + (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
> + pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
> +
> #ifdef CONFIG_DEBUG_PAGEALLOC
> if (!debug_pagealloc_enabled())
> return;
> diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
> index d051f8ceefddd..bd13a46024457 100644
> --- a/security/Kconfig.hardening
> +++ b/security/Kconfig.hardening
> @@ -106,6 +106,7 @@ choice
> config INIT_STACK_ALL_PATTERN
> bool "pattern-init everything (strongest)"
> depends on CC_HAS_AUTO_VAR_INIT_PATTERN
> + depends on !KMSAN
> help
> Initializes everything on the stack (including padding)
> with a specific debug value. This is intended to eliminate
> @@ -124,6 +125,7 @@ choice
> config INIT_STACK_ALL_ZERO
> bool "zero-init everything (strongest and safest)"
> depends on CC_HAS_AUTO_VAR_INIT_ZERO
> + depends on !KMSAN

So this means KMSAN is a developer debugging feature only and should
never be turned on on a real device/server that has users?

thanks,

greg k-h

2021-12-14 17:01:22

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 41/43] security: kmsan: fix interoperability with auto-initialization

On Tue, Dec 14, 2021 at 5:38 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> > @@ -124,6 +125,7 @@ choice
> > config INIT_STACK_ALL_ZERO
> > bool "zero-init everything (strongest and safest)"
> > depends on CC_HAS_AUTO_VAR_INIT_ZERO
> > + depends on !KMSAN
>
> So this means KMSAN is a developer debugging feature only and should
> never be turned on on a real device/server that has users?

100% correct. KMSAN is way slower than KASAN, it also eats 2/3 of your
memory to store the metadata.
I thought it was sort of self-evident, but I can surely mention this
explicitly in the cover letter.

2021-12-14 17:34:04

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 41/43] security: kmsan: fix interoperability with auto-initialization

On Tue, Dec 14, 2021 at 06:00:41PM +0100, Alexander Potapenko wrote:
> On Tue, Dec 14, 2021 at 5:38 PM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > > @@ -124,6 +125,7 @@ choice
> > > config INIT_STACK_ALL_ZERO
> > > bool "zero-init everything (strongest and safest)"
> > > depends on CC_HAS_AUTO_VAR_INIT_ZERO
> > > + depends on !KMSAN
> >
> > So this means KMSAN is a developer debugging feature only and should
> > never be turned on on a real device/server that has users?
>
> 100% correct. KMSAN is way slower than KASAN, it also eats 2/3 of your
> memory to store the metadata.
> I thought it was sort of self-evident, but I can surely mention this
> explicitly in the cover letter.

Please mention it here and in the Kconfig option for it as well (don't
know if it was there or not.)

Also you might want to print out very large "DO NOT USE THIS ON A REAL
MACHINE" to the kernel log when booting, like other kernel options are
starting to do that should not be enabled.

thanks,

greg k-h

2021-12-15 13:24:16

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 07/43] compiler_attributes.h: add __disable_sanitizer_instrumentation

On Tue, Dec 14, 2021 at 05:20:14PM +0100, Alexander Potapenko wrote:
> The new attribute maps to
> __attribute__((disable_sanitizer_instrumentation)), which will be
> supported by Clang >= 14.0. Future support in GCC is also possible.
>
> This attribute disables compiler instrumentation for kernel sanitizer
> tools, making it easier to implement noinstr. It is different from the
> existing __no_sanitize* attributes, which may still allow certain types
> of instrumentation to prevent false positives.

When you say the __no_sanitize* attributes allow some instrumentation, does
that apply to any of the existing KASAN/KCSAN/KCOV support, or just for KMSAN?

The documentation just says the same as the commit message:

| This is not the same as __attribute__((no_sanitize(...))), which depending on
| the tool may still insert instrumentation to prevent false positive reports.

... which implies the other instrumentation might not be suprressed.

I ask because architectures which select ARCH_WANTS_NO_INSTR *need* to be able
to suppress all instrumentation. It's fine if that means they need a new
version of clang for KMSAN, but if there's latent instrumentation we have more
bugs to fix first...

Thanks,
Mark.

> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/Ic0123ce99b33ab7d5ed1ae90593425be8d3d774a
> ---
> include/linux/compiler_attributes.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
> index b9121afd87331..37e2600202216 100644
> --- a/include/linux/compiler_attributes.h
> +++ b/include/linux/compiler_attributes.h
> @@ -308,6 +308,24 @@
> # define __compiletime_warning(msg)
> #endif
>
> +/*
> + * Optional: only supported since clang >= 14.0
> + *
> + * clang: https://clang.llvm.org/docs/AttributeReference.html#disable-sanitizer-instrumentation
> + *
> + * disable_sanitizer_instrumentation is not always similar to
> + * no_sanitize((<sanitizer-name>)): the latter may still let specific sanitizers
> + * insert code into functions to prevent false positives. Unlike that,
> + * disable_sanitizer_instrumentation prevents all kinds of instrumentation to
> + * functions with the attribute.
> + */
> +#if __has_attribute(disable_sanitizer_instrumentation)
> +# define __disable_sanitizer_instrumentation \
> + __attribute__((disable_sanitizer_instrumentation))
> +#else
> +# define __disable_sanitizer_instrumentation
> +#endif
> +
> /*
> * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-weak-function-attribute
> * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-weak-variable-attribute
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 13:27:52

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 09/43] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks

On Tue, Dec 14, 2021 at 05:20:16PM +0100, Alexander Potapenko wrote:
> __no_sanitize_memory is a function attribute that instructs KMSAN to
> skip a function during instrumentation. This is needed to e.g. implement
> the noinstr functions.
>
> __no_kmsan_checks is a function attribute that makes KMSAN
> ignore the uninitialized values coming from the function's
> inputs, and initialize the function's outputs.
>
> Functions marked with this attribute can't be inlined into functions
> not marked with it, and vice versa.

Just to check, I assume an unmarked __always_inline() function can be inlined
into a marked function? Otherwise this is going to be really painful to manage
for low-level helper functions.

Thanks,
Mark.

>
> __SANITIZE_MEMORY__ is a macro that's defined iff the file is
> instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is
> defined for every file.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I004ff0360c918d3cd8b18767ddd1381c6d3281be
> ---
> include/linux/compiler-clang.h | 23 +++++++++++++++++++++++
> include/linux/compiler-gcc.h | 6 ++++++
> 2 files changed, 29 insertions(+)
>
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index 3c4de9b6c6e3e..5f11a6f269e28 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -51,6 +51,29 @@
> #define __no_sanitize_undefined
> #endif
>
> +#if __has_feature(memory_sanitizer)
> +#define __SANITIZE_MEMORY__
> +/*
> + * Unlike other sanitizers, KMSAN still inserts code into functions marked with
> + * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation
> + * provides the behavior consistent with other __no_sanitize_ attributes,
> + * guaranteeing that __no_sanitize_memory functions remain uninstrumented.
> + */
> +#define __no_sanitize_memory __disable_sanitizer_instrumentation
> +
> +/*
> + * The __no_kmsan_checks attribute ensures that a function does not produce
> + * false positive reports by:
> + * - initializing all local variables and memory stores in this function;
> + * - skipping all shadow checks;
> + * - passing initialized arguments to this function's callees.
> + */
> +#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory")))
> +#else
> +#define __no_sanitize_memory
> +#define __no_kmsan_checks
> +#endif
> +
> /*
> * Support for __has_feature(coverage_sanitizer) was added in Clang 13 together
> * with no_sanitize("coverage"). Prior versions of Clang support coverage
> diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> index ccbbd31b3aae5..f6e69387aad05 100644
> --- a/include/linux/compiler-gcc.h
> +++ b/include/linux/compiler-gcc.h
> @@ -129,6 +129,12 @@
> #define __SANITIZE_ADDRESS__
> #endif
>
> +/*
> + * GCC does not support KMSAN.
> + */
> +#define __no_sanitize_memory
> +#define __no_kmsan_checks
> +
> /*
> * Turn individual warnings and errors on and off locally, depending
> * on version.
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 13:33:51

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 07/43] compiler_attributes.h: add __disable_sanitizer_instrumentation

A

On Wed, 15 Dec 2021 at 14:24, Mark Rutland <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:14PM +0100, Alexander Potapenko wrote:
> > The new attribute maps to
> > __attribute__((disable_sanitizer_instrumentation)), which will be
> > supported by Clang >= 14.0. Future support in GCC is also possible.
> >
> > This attribute disables compiler instrumentation for kernel sanitizer
> > tools, making it easier to implement noinstr. It is different from the
> > existing __no_sanitize* attributes, which may still allow certain types
> > of instrumentation to prevent false positives.
>
> When you say the __no_sanitize* attributes allow some instrumentation, does
> that apply to any of the existing KASAN/KCSAN/KCOV support, or just for KMSAN?
>
> The documentation just says the same as the commit message:
>
> | This is not the same as __attribute__((no_sanitize(...))), which depending on
> | the tool may still insert instrumentation to prevent false positive reports.
>
> ... which implies the other instrumentation might not be suprressed.
>
> I ask because architectures which select ARCH_WANTS_NO_INSTR *need* to be able
> to suppress all instrumentation. It's fine if that means they need a new
> version of clang for KMSAN, but if there's latent instrumentation we have more
> bugs to fix first...

Thus far, none of the existing K*SANs added other instrumentation.
Apart from KMSAN here, this will change with KCSAN's barrier
instrumentation, which is why this patch is also part of KCSAN's
upcoming changes -- recall I said I fixed barrier instrumentation for
arm64 as well, this is how :-)

See https://lore.kernel.org/all/[email protected]/
how I resolved it for KCSAN on architectures that don't have objtool.

I expect this patch will be dropped from the KMSAN series once it
reaches mainline through the KCSAN changes.

Also note, this applies only for bug-detection tools, that may want to
avoid false positives. So by definition, it is irrelevant for KCOV
(which had its own attribute woes a while back though). Yeah, it's
been a long road to get the compilers to play along ... :-/

Thanks,
-- Marco

2021-12-15 13:33:58

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

On Tue, Dec 14, 2021 at 05:20:19PM +0100, Alexander Potapenko wrote:
> kcov used to be broken prior to Clang 11, but right now that version is
> already the minimum required to build with KCSAN, that is why we don't
> need KCSAN_KCOV_BROKEN anymore.

Just to check, how is that requirement enforced?

I see the core Makefiles enforce 10.0.1+, but I couldn't spot an explicit
version dependency in Kconfig.kcsan.

Otherwise, this looks good to me!

Mark.

> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/Ida287421577f37de337139b5b5b9e977e4a6fee2
> ---
> lib/Kconfig.kcsan | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/lib/Kconfig.kcsan b/lib/Kconfig.kcsan
> index e0a93ffdef30e..b81454b2a0d09 100644
> --- a/lib/Kconfig.kcsan
> +++ b/lib/Kconfig.kcsan
> @@ -10,21 +10,10 @@ config HAVE_KCSAN_COMPILER
> For the list of compilers that support KCSAN, please see
> <file:Documentation/dev-tools/kcsan.rst>.
>
> -config KCSAN_KCOV_BROKEN
> - def_bool KCOV && CC_HAS_SANCOV_TRACE_PC
> - depends on CC_IS_CLANG
> - depends on !$(cc-option,-Werror=unused-command-line-argument -fsanitize=thread -fsanitize-coverage=trace-pc)
> - help
> - Some versions of clang support either KCSAN and KCOV but not the
> - combination of the two.
> - See https://bugs.llvm.org/show_bug.cgi?id=45831 for the status
> - in newer releases.
> -
> menuconfig KCSAN
> bool "KCSAN: dynamic data race detector"
> depends on HAVE_ARCH_KCSAN && HAVE_KCSAN_COMPILER
> depends on DEBUG_KERNEL && !KASAN
> - depends on !KCSAN_KCOV_BROKEN
> select STACKTRACE
> help
> The Kernel Concurrency Sanitizer (KCSAN) is a dynamic
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 13:36:27

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 10/43] kmsan: pgtable: reduce vmalloc space

On Tue, Dec 14, 2021 at 05:20:17PM +0100, Alexander Potapenko wrote:
> KMSAN is going to use 3/4 of existing vmalloc space to hold the
> metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
> allocate past the first 1/4.
>
> Signed-off-by: Alexander Potapenko <[email protected]>

It might be worth adding an 'x86: ' prefix to the commit title, since this
specifically affects x86 headers.

Mark.

> ---
> Link: https://linux-review.googlesource.com/id/I9d8b7f0a88a639f1263bc693cbd5c136626f7efd
> ---
> arch/x86/include/asm/pgtable_64_types.h | 41 ++++++++++++++++++++++++-
> arch/x86/mm/init_64.c | 2 +-
> 2 files changed, 41 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
> index 91ac106545703..7f15d43754a34 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -139,7 +139,46 @@ extern unsigned int ptrs_per_p4d;
> # define VMEMMAP_START __VMEMMAP_BASE_L4
> #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */
>
> -#define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
> +#define VMEMORY_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1)
> +
> +#ifndef CONFIG_KMSAN
> +#define VMALLOC_END VMEMORY_END
> +#else
> +/*
> + * In KMSAN builds vmalloc area is four times smaller, and the remaining 3/4
> + * are used to keep the metadata for virtual pages. The memory formerly
> + * belonging to vmalloc area is now laid out as follows:
> + *
> + * 1st quarter: VMALLOC_START to VMALLOC_END - new vmalloc area
> + * 2nd quarter: KMSAN_VMALLOC_SHADOW_START to
> + * VMALLOC_END+KMSAN_VMALLOC_SHADOW_OFFSET - vmalloc area shadow
> + * 3rd quarter: KMSAN_VMALLOC_ORIGIN_START to
> + * VMALLOC_END+KMSAN_VMALLOC_ORIGIN_OFFSET - vmalloc area origins
> + * 4th quarter: KMSAN_MODULES_SHADOW_START to KMSAN_MODULES_ORIGIN_START
> + * - shadow for modules,
> + * KMSAN_MODULES_ORIGIN_START to
> + * KMSAN_MODULES_ORIGIN_START + MODULES_LEN - origins for modules.
> + */
> +#define VMALLOC_QUARTER_SIZE ((VMALLOC_SIZE_TB << 40) >> 2)
> +#define VMALLOC_END (VMALLOC_START + VMALLOC_QUARTER_SIZE - 1)
> +
> +/*
> + * vmalloc metadata addresses are calculated by adding shadow/origin offsets
> + * to vmalloc address.
> + */
> +#define KMSAN_VMALLOC_SHADOW_OFFSET VMALLOC_QUARTER_SIZE
> +#define KMSAN_VMALLOC_ORIGIN_OFFSET (VMALLOC_QUARTER_SIZE << 1)
> +
> +#define KMSAN_VMALLOC_SHADOW_START (VMALLOC_START + KMSAN_VMALLOC_SHADOW_OFFSET)
> +#define KMSAN_VMALLOC_ORIGIN_START (VMALLOC_START + KMSAN_VMALLOC_ORIGIN_OFFSET)
> +
> +/*
> + * The shadow/origin for modules are placed one by one in the last 1/4 of
> + * vmalloc space.
> + */
> +#define KMSAN_MODULES_SHADOW_START (VMALLOC_END + KMSAN_VMALLOC_ORIGIN_OFFSET + 1)
> +#define KMSAN_MODULES_ORIGIN_START (KMSAN_MODULES_SHADOW_START + MODULES_LEN)
> +#endif /* CONFIG_KMSAN */
>
> #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
> /* The module sections ends with the start of the fixmap */
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index 36098226a9573..8e884e44a8d1e 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1287,7 +1287,7 @@ static void __init preallocate_vmalloc_pages(void)
> unsigned long addr;
> const char *lvl;
>
> - for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
> + for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) {
> pgd_t *pgd = pgd_offset_k(addr);
> p4d_t *p4d;
> pud_t *pud;
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 13:39:59

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH 12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

On Wed, 15 Dec 2021 at 14:33, Mark Rutland <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:19PM +0100, Alexander Potapenko wrote:
> > kcov used to be broken prior to Clang 11, but right now that version is
> > already the minimum required to build with KCSAN, that is why we don't
> > need KCSAN_KCOV_BROKEN anymore.
>
> Just to check, how is that requirement enforced?

HAVE_KCSAN_COMPILER will only be true with Clang 11 or later, due to
no prior compiler having "-tsan-distinguish-volatile=1".

> I see the core Makefiles enforce 10.0.1+, but I couldn't spot an explicit
> version dependency in Kconfig.kcsan.
>
> Otherwise, this looks good to me!

I think 5.17 will be Clang 11 only, so we could actually revert
ea91a1d45d19469001a4955583187b0d75915759:
https://lkml.kernel.org/r/Yao86FeC2ybOobLO@archlinux-ax161

I should resend that to be added to the -kbuild tree.

2021-12-15 13:49:32

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 21/43] kmsan: mark noinstr as __no_sanitize_memory

On Tue, Dec 14, 2021 at 05:20:28PM +0100, Alexander Potapenko wrote:
> noinstr functions should never be instrumented, so make KMSAN skip them
> by applying the __no_sanitize_memory attribute.

To make this easier to review, it would be nice if this patch were moved
earlier, grouped with patches:

* 7: "compiler_attributes.h: add __disable_sanitizer_instrumentation"
* 9: "kmsan: introduce __no_sanitize_memory and __no_kmsan_checks"

... since that way a reviewer will spot them all in one go, rather than having
to jump around the series, and then any later patch in the series can rely on
all of these attributes, including `noinstr`.

Mark.

>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
> ---
> include/linux/compiler_types.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
> index 1d32f4c03c9ef..37b82564e93e5 100644
> --- a/include/linux/compiler_types.h
> +++ b/include/linux/compiler_types.h
> @@ -210,7 +210,8 @@ struct ftrace_likely_data {
> /* Section for code which can't be instrumented at all */
> #define noinstr \
> noinline notrace __attribute((__section__(".noinstr.text"))) \
> - __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
> + __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
> + __no_sanitize_memory
>
> #endif /* __KERNEL__ */
>
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 13:54:02

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 24/43] kmsan: disable KMSAN instrumentation for certain kernel parts

On Tue, Dec 14, 2021 at 05:20:31PM +0100, Alexander Potapenko wrote:
> Instrumenting some files with KMSAN will result in kernel being unable
> to link, boot or crashing at runtime for various reasons (e.g. infinite
> recursion caused by instrumentation hooks calling instrumented code again).
>
> Completely omit KMSAN instrumentation in the following places:
> - arch/x86/boot and arch/x86/realmode/rm, as KMSAN doesn't work for i386;
> - arch/x86/entry/vdso, which isn't linked with KMSAN runtime;
> - three files in arch/x86/kernel - boot problems;
> - arch/x86/mm/cpu_entry_area.c - recursion;
> - EFI stub - build failures;
> - kcov, stackdepot, lockdep - recursion.

It probably makes sense to split the arch/x86/ bits from everything else, and
to group the arch/x86 enablement patches together closer to the end of the
series.

The non-x86 changes all look fine to me.

Mark.

>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/Id5e5c4a9f9d53c24a35ebb633b814c414628d81b
> ---
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/entry/vdso/Makefile | 3 +++
> arch/x86/kernel/Makefile | 2 ++
> arch/x86/kernel/cpu/Makefile | 1 +
> arch/x86/mm/Makefile | 2 ++
> arch/x86/realmode/rm/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 1 +
> kernel/Makefile | 1 +
> kernel/locking/Makefile | 3 ++-
> lib/Makefile | 1 +
> 11 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index b5aecb524a8aa..d5623232b763f 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -12,6 +12,7 @@
> # Sanitizer runtimes are unavailable and cannot be linked for early boot code.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> +KMSAN_SANITIZE := n
> OBJECT_FILES_NON_STANDARD := y
>
> # Kernel does not boot with kcov instrumentation here.
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 431bf7f846c3c..c4a284b738e71 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -20,6 +20,7 @@
> # Sanitizer runtimes are unavailable and cannot be linked for early boot code.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> +KMSAN_SANITIZE := n
> OBJECT_FILES_NON_STANDARD := y
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
> index a2dddcc189f69..f2a175d872b07 100644
> --- a/arch/x86/entry/vdso/Makefile
> +++ b/arch/x86/entry/vdso/Makefile
> @@ -11,6 +11,9 @@ include $(srctree)/lib/vdso/Makefile
>
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> +KMSAN_SANITIZE_vclock_gettime.o := n
> +KMSAN_SANITIZE_vgetcpu.o := n
> +
> UBSAN_SANITIZE := n
> KCSAN_SANITIZE := n
> OBJECT_FILES_NON_STANDARD := y
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 2ff3e600f4269..0b9fc3ecce2de 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -35,6 +35,8 @@ KASAN_SANITIZE_cc_platform.o := n
> # With some compiler versions the generated code results in boot hangs, caused
> # by several compilation units. To be safe, disable all instrumentation.
> KCSAN_SANITIZE := n
> +KMSAN_SANITIZE_head$(BITS).o := n
> +KMSAN_SANITIZE_nmi.o := n
>
> OBJECT_FILES_NON_STANDARD_test_nx.o := y
>
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index 9661e3e802be5..f10a921ee7565 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -12,6 +12,7 @@ endif
> # If these files are instrumented, boot hangs during the first second.
> KCOV_INSTRUMENT_common.o := n
> KCOV_INSTRUMENT_perf_event.o := n
> +KMSAN_SANITIZE_common.o := n
>
> # As above, instrumenting secondary CPU boot code causes boot hangs.
> KCSAN_SANITIZE_common.o := n
> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
> index 5864219221ca8..747d4630d52ce 100644
> --- a/arch/x86/mm/Makefile
> +++ b/arch/x86/mm/Makefile
> @@ -10,6 +10,8 @@ KASAN_SANITIZE_mem_encrypt_identity.o := n
> # Disable KCSAN entirely, because otherwise we get warnings that some functions
> # reference __initdata sections.
> KCSAN_SANITIZE := n
> +# Avoid recursion by not calling KMSAN hooks for CEA code.
> +KMSAN_SANITIZE_cpu_entry_area.o := n
>
> ifdef CONFIG_FUNCTION_TRACER
> CFLAGS_REMOVE_mem_encrypt.o = -pg
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 83f1b6a56449f..f614009d3e4e2 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -10,6 +10,7 @@
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> +KMSAN_SANITIZE := n
> OBJECT_FILES_NON_STANDARD := y
>
> # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index d0537573501e9..81432d0c904b1 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -46,6 +46,7 @@ GCOV_PROFILE := n
> # Sanitizer runtimes are unavailable and cannot be linked here.
> KASAN_SANITIZE := n
> KCSAN_SANITIZE := n
> +KMSAN_SANITIZE := n
> UBSAN_SANITIZE := n
> OBJECT_FILES_NON_STANDARD := y
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 186c49582f45b..e5dd600e63d8a 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -39,6 +39,7 @@ KCOV_INSTRUMENT_kcov.o := n
> KASAN_SANITIZE_kcov.o := n
> KCSAN_SANITIZE_kcov.o := n
> UBSAN_SANITIZE_kcov.o := n
> +KMSAN_SANITIZE_kcov.o := n
> CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector
>
> # Don't instrument error handlers
> diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
> index d51cabf28f382..ea925731fa40f 100644
> --- a/kernel/locking/Makefile
> +++ b/kernel/locking/Makefile
> @@ -5,8 +5,9 @@ KCOV_INSTRUMENT := n
>
> obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o
>
> -# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
> +# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
> KCSAN_SANITIZE_lockdep.o := n
> +KMSAN_SANITIZE_lockdep.o := n
>
> ifdef CONFIG_FUNCTION_TRACER
> CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
> diff --git a/lib/Makefile b/lib/Makefile
> index 364c23f155781..8e5ae9d5966de 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -268,6 +268,7 @@ obj-$(CONFIG_IRQ_POLL) += irq_poll.o
> CFLAGS_stackdepot.o += -fno-builtin
> obj-$(CONFIG_STACKDEPOT) += stackdepot.o
> KASAN_SANITIZE_stackdepot.o := n
> +KMSAN_SANITIZE_stackdepot.o := n
> KCOV_INSTRUMENT_stackdepot.o := n
>
> libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 14:13:20

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 25/43] kmsan: skip shadow checks in files doing context switches

On Tue, Dec 14, 2021 at 05:20:32PM +0100, Alexander Potapenko wrote:
> When instrumenting functions, KMSAN obtains the per-task state (mostly
> pointers to metadata for function arguments and return values) once per
> function at its beginning.

How does KMSAN instrumentation acquire the per-task state? What's used as the
base for that?

> If a function performs a context switch, instrumented code won't notice
> that, and will still refer to the old state, possibly corrupting it or
> using stale data. This may result in false positive reports.
>
> To deal with that, we need to apply __no_kmsan_checks to the functions
> performing context switching - this will result in skipping all KMSAN
> shadow checks and marking newly created values as initialized,
> preventing all false positive reports in those functions. False negatives
> are still possible, but we expect them to be rare and impersistent.
>
> To improve maintainability, we choose to apply __no_kmsan_checks not
> just to a handful of functions, but to the whole files that may perform
> context switching - this is done via KMSAN_ENABLE_CHECKS:=n.
> This decision can be reconsidered in the future, when KMSAN won't need
> so much attention.

I worry this might be the wrong approach (and I've given some rationale below),
but it's not clear to me exactly how this goes wrong. Could you give an example
flow where stale data gets used?

>
> Suggested-by: Marco Elver <[email protected]>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/Id40563d36792b4482534c9a0134965d77a5581fa
> ---
> arch/x86/kernel/Makefile | 4 ++++
> kernel/sched/Makefile | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 0b9fc3ecce2de..308d4d0323263 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -38,6 +38,10 @@ KCSAN_SANITIZE := n
> KMSAN_SANITIZE_head$(BITS).o := n
> KMSAN_SANITIZE_nmi.o := n
>
> +# Some functions in process_64.c perform context switching.
> +# Apply __no_kmsan_checks to the whole file to avoid false positives.
> +KMSAN_ENABLE_CHECKS_process_64.o := n

Which state are you worried about here? The __switch_to() funciton is
tail-called from __switch_to_asm(), so the GPRs and SP should all be consistent
with the new task.

Are you concerned with the segment registers? Something else?

> +
> OBJECT_FILES_NON_STANDARD_test_nx.o := y
>
> ifdef CONFIG_FRAME_POINTER
> diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
> index c7421f2d05e15..d9bf8223a064a 100644
> --- a/kernel/sched/Makefile
> +++ b/kernel/sched/Makefile
> @@ -17,6 +17,10 @@ KCOV_INSTRUMENT := n
> # eventually.
> KCSAN_SANITIZE := n
>
> +# Some functions in core.c perform context switching. Apply __no_kmsan_checks
> +# to the whole file to avoid false positives.
> +KMSAN_ENABLE_CHECKS_core.o := n

As above, the actual context-switch occurs in arch code --I assume the
out-of-line call *must* act as a clobber from the instrumentation's PoV or we'd
have many more problems. I also didn't spot any *explciit* state switching
being added there that would seem to affect KMSAN.

... so I don't understand why checks need to be inhibited for the core sched code.

Thanks
Mark.

> +
> ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
> # According to Alan Modra <[email protected]>, the -fno-omit-frame-pointer is
> # needed for x86 only. Why this used to be enabled for all architectures is beyond
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 14:17:21

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 33/43] kmsan: disable physical page merging in biovec

On Tue, Dec 14, 2021 at 05:20:40PM +0100, Alexander Potapenko wrote:
> KMSAN metadata for consequent physical pages may be inconsequent,

I think you mean 'adjacent'/ rather than 'consequent' here, i.e.

| KMSAN metadata for adjacent physical pages may not be adjacent

> therefore accessing such pages together may lead to metadata
> corruption.
> We disable merging pages in biovec to prevent such corruptions.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
>
> Link: https://linux-review.googlesource.com/id/Iece16041be5ee47904fbc98121b105e5be5fea5c
> ---
> block/blk.h | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/block/blk.h b/block/blk.h
> index ccde6e6f17360..e0c62a5d5639e 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -103,6 +103,13 @@ static inline bool biovec_phys_mergeable(struct request_queue *q,
> phys_addr_t addr1 = page_to_phys(vec1->bv_page) + vec1->bv_offset;
> phys_addr_t addr2 = page_to_phys(vec2->bv_page) + vec2->bv_offset;
>
> + /*
> + * Merging consequent physical pages may not work correctly under KMSAN
> + * if their metadata pages aren't consequent. Just disable merging.
> + */

Likewise here.

Mark.

> + if (IS_ENABLED(CONFIG_KMSAN))
> + return false;
> +
> if (addr1 + vec1->bv_len != addr2)
> return false;
> if (xen_domain() && !xen_biovec_phys_mergeable(vec1, vec2->bv_page))
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2021-12-15 14:43:55

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

On Wed, Dec 15, 2021 at 02:39:43PM +0100, Marco Elver wrote:
> On Wed, 15 Dec 2021 at 14:33, Mark Rutland <[email protected]> wrote:
> >
> > On Tue, Dec 14, 2021 at 05:20:19PM +0100, Alexander Potapenko wrote:
> > > kcov used to be broken prior to Clang 11, but right now that version is
> > > already the minimum required to build with KCSAN, that is why we don't
> > > need KCSAN_KCOV_BROKEN anymore.
> >
> > Just to check, how is that requirement enforced?
>
> HAVE_KCSAN_COMPILER will only be true with Clang 11 or later, due to
> no prior compiler having "-tsan-distinguish-volatile=1".

I see -- could we add wording to that effect into the commit messge?

> > I see the core Makefiles enforce 10.0.1+, but I couldn't spot an explicit
> > version dependency in Kconfig.kcsan.
> >
> > Otherwise, this looks good to me!
>
> I think 5.17 will be Clang 11 only, so we could actually revert
> ea91a1d45d19469001a4955583187b0d75915759:
> https://lkml.kernel.org/r/Yao86FeC2ybOobLO@archlinux-ax161
>
> I should resend that to be added to the -kbuild tree.

FWIW, that also works for me.

Thanks,
Mark.

2021-12-15 16:29:02

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 25/43] kmsan: skip shadow checks in files doing context switches

On Wed, Dec 15, 2021 at 3:13 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:32PM +0100, Alexander Potapenko wrote:
> > When instrumenting functions, KMSAN obtains the per-task state (mostly
> > pointers to metadata for function arguments and return values) once per
> > function at its beginning.
>
> How does KMSAN instrumentation acquire the per-task state? What's used as the
> base for that?
>

To preserve kernel ABI (so that instrumented functions can call
non-instrumented ones and vice versa) KMSAN uses a per-task struct
that keeps shadow values of function call parameters and return
values:

struct kmsan_context_state {
char param_tls[...];
char retval_tls[...];
char va_arg_tls[...];
char va_arg_origin_tls[...];
u64 va_arg_overflow_size_tls;
depot_stack_handle_t param_origin_tls[...];
depot_stack_handle_t retval_origin_tls;
};

It is mostly dealt with by the compiler, so its layout isn't really important.
The compiler inserts a call to __msan_get_context_state() at the
beginning of every instrumented function to obtain a pointer to that
struct.
Then, every time a function pointer is used, a value is returned, or
another function is called, the compiler adds code that updates the
shadow values in this struct.

E.g. the following function:

int sum(int a, int b) {
...
result = a + b;
return result;
}

will now look as follows:

int sum(int a, int b) {
struct kmsan_context_state *kcs = __msan_get_context_state();
int s_a = ((int)kcs->param_tls)[0]; // shadow of a
int s_b = ((int)kcs->param_tls)[1]; // shadow of b
...
result = a + b;
s_result = s_a | s_b;
((int)kcs->retval_tls)[0] = s_result; // returned shadow
return result;
}

> >
> > To deal with that, we need to apply __no_kmsan_checks to the functions
> > performing context switching - this will result in skipping all KMSAN
> > shadow checks and marking newly created values as initialized,
> > preventing all false positive reports in those functions. False negatives
> > are still possible, but we expect them to be rare and impersistent.
> >
> > To improve maintainability, we choose to apply __no_kmsan_checks not
> > just to a handful of functions, but to the whole files that may perform
> > context switching - this is done via KMSAN_ENABLE_CHECKS:=n.
> > This decision can be reconsidered in the future, when KMSAN won't need
> > so much attention.
>
> I worry this might be the wrong approach (and I've given some rationale below),
> but it's not clear to me exactly how this goes wrong. Could you give an example
> flow where stale data gets used?

The scheme I described above works well until a context switch occurs.
Then, IIUC, at some point `current` changes, so that the previously
fetched KMSAN context state becomes stale:

void foo(...) {
baz(...);
// context switch here changes `current`
baz(...);
}

In this case we'll have foo() setting up kmsan_context_state for the
old task when calling bar(), but bar() taking shadow for its arguments
from the new task's kmsan_context_state.

Does this make sense?

> As above, the actual context-switch occurs in arch code --I assume the
> out-of-line call *must* act as a clobber from the instrumentation's PoV or we'd
> have many more problems.

Treating a function call as a clobber of kmsan_context_state() is
actually an interesting idea.
Adding yet another call to __msan_get_context_state() after every
function call may sound harsh, but we already instrument every memory
access anyway.
What remains unclear is handling the return value of the innermost
function that performed the switch: it will be saved to the old task's
state, but taken from that of the new task.

> I also didn't spot any *explciit* state switching
> being added there that would seem to affect KMSAN.
>
> ... so I don't understand why checks need to be inhibited for the core sched code.

In fact for a long time there were only three functions annotated with
__no_kmsan_checks right in arch/x86/kernel/process_64.c and
kernel/sched/core.c
We decided to apply this attribute to every function in both files,
just to make sure nothing breaks too early while upstreaming KMSAN.

2021-12-15 16:30:56

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 33/43] kmsan: disable physical page merging in biovec

On Wed, Dec 15, 2021 at 3:17 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:40PM +0100, Alexander Potapenko wrote:
> > KMSAN metadata for consequent physical pages may be inconsequent,
>
> I think you mean 'adjacent'/ rather than 'consequent' here, i.e.
Correct, thank you!

> > + /*
> > + * Merging consequent physical pages may not work correctly under KMSAN
> > + * if their metadata pages aren't consequent. Just disable merging.
> > + */
>
> Likewise here.
Ack.

2021-12-15 17:23:40

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 25/43] kmsan: skip shadow checks in files doing context switches

On Wed, Dec 15, 2021 at 05:28:21PM +0100, Alexander Potapenko wrote:
> On Wed, Dec 15, 2021 at 3:13 PM Mark Rutland <[email protected]> wrote:
> >
> > On Tue, Dec 14, 2021 at 05:20:32PM +0100, Alexander Potapenko wrote:
> > > When instrumenting functions, KMSAN obtains the per-task state (mostly
> > > pointers to metadata for function arguments and return values) once per
> > > function at its beginning.
> >
> > How does KMSAN instrumentation acquire the per-task state? What's used as the
> > base for that?
> >
>
> To preserve kernel ABI (so that instrumented functions can call
> non-instrumented ones and vice versa) KMSAN uses a per-task struct
> that keeps shadow values of function call parameters and return
> values:
>
> struct kmsan_context_state {
> char param_tls[...];
> char retval_tls[...];
> char va_arg_tls[...];
> char va_arg_origin_tls[...];
> u64 va_arg_overflow_size_tls;
> depot_stack_handle_t param_origin_tls[...];
> depot_stack_handle_t retval_origin_tls;
> };
>
> It is mostly dealt with by the compiler, so its layout isn't really important.
> The compiler inserts a call to __msan_get_context_state() at the
> beginning of every instrumented function to obtain a pointer to that
> struct.
> Then, every time a function pointer is used, a value is returned, or
> another function is called, the compiler adds code that updates the
> shadow values in this struct.
>
> E.g. the following function:
>
> int sum(int a, int b) {
> ...
> result = a + b;
> return result;
> }
>
> will now look as follows:
>
> int sum(int a, int b) {
> struct kmsan_context_state *kcs = __msan_get_context_state();
> int s_a = ((int)kcs->param_tls)[0]; // shadow of a
> int s_b = ((int)kcs->param_tls)[1]; // shadow of b
> ...
> result = a + b;
> s_result = s_a | s_b;
> ((int)kcs->retval_tls)[0] = s_result; // returned shadow
> return result;
> }

Ok; thanks for that description, that makes it much easier to understand where
there could be problems.

> > > To deal with that, we need to apply __no_kmsan_checks to the functions
> > > performing context switching - this will result in skipping all KMSAN
> > > shadow checks and marking newly created values as initialized,
> > > preventing all false positive reports in those functions. False negatives
> > > are still possible, but we expect them to be rare and impersistent.
> > >
> > > To improve maintainability, we choose to apply __no_kmsan_checks not
> > > just to a handful of functions, but to the whole files that may perform
> > > context switching - this is done via KMSAN_ENABLE_CHECKS:=n.
> > > This decision can be reconsidered in the future, when KMSAN won't need
> > > so much attention.
> >
> > I worry this might be the wrong approach (and I've given some rationale below),
> > but it's not clear to me exactly how this goes wrong. Could you give an example
> > flow where stale data gets used?
>
> The scheme I described above works well until a context switch occurs.
> Then, IIUC, at some point `current` changes, so that the previously
> fetched KMSAN context state becomes stale:
>
> void foo(...) {
> baz(...);
> // context switch here changes `current`
> baz(...);
> }
>
> In this case we'll have foo() setting up kmsan_context_state for the
> old task when calling bar(), but bar() taking shadow for its arguments
> from the new task's kmsan_context_state.
>
> Does this make sense?

I understand where you're coming from, but I think this affects less code than
you think it does, due to the way the switch works.

Importantly, the value of `current` only changes within low-level arch code.
From the PoV of the core scheduler code, `current` never changes. For example,
form the PoV of a single thread, thread_a, calling into the scheduler's
context-switch:

context_switch(rq, thread_a, thread_b, rf)
{
...
/* `current` is `thread_a` here */
// call blocks for an indefinite period while another thread runs
switch_to(thread_a, thread_b, thread_a);
/* `current` is `thread_a` here */
...
}

You're correct that on x86 `current` does change within the __switch_to()
function, since `current` is implemented as a per-cpu variable called
`current_task`, updated within __switch_to().

So not instrumenting arch/86/kernel/process_64.c might be necessary, but I
don't see any reason to aovid instrumenting kernel/sched/core.c, since current
should never change from the PoV of code that lives there.

For contrast, on arm64 we place place `current` within the SP_EL0 register, and
switch that in our cpu_switch_to() assembly along with the GPRs and stack, so
it never changes from the PoV of any C code.

It might make sense to have x86 do likewise and update `current_task` in asm,
or to split the raw context-switch code out into a separate file, since it
should probably all be noinstr anyway...

> > As above, the actual context-switch occurs in arch code --I assume the
> > out-of-line call *must* act as a clobber from the instrumentation's PoV or we'd
> > have many more problems.
>
> Treating a function call as a clobber of kmsan_context_state() is
> actually an interesting idea.
> Adding yet another call to __msan_get_context_state() after every
> function call may sound harsh, but we already instrument every memory
> access anyway.

As above, I don't think that clobbering is necessary after all; you only need
to ensure the function which performs the switch and whatever it calls are not
instrumented.

> What remains unclear is handling the return value of the innermost
> function that performed the switch: it will be saved to the old task's
> state, but taken from that of the new task.

As above, I think you just need ot protect x86-64's __switch_to() and callees,
and perhaps wherever this is first initiailized for a CPU.

> > I also didn't spot any *explciit* state switching
> > being added there that would seem to affect KMSAN.
> >
> > ... so I don't understand why checks need to be inhibited for the core sched code.
>
> In fact for a long time there were only three functions annotated with
> __no_kmsan_checks right in arch/x86/kernel/process_64.c and
> kernel/sched/core.c
> We decided to apply this attribute to every function in both files,
> just to make sure nothing breaks too early while upstreaming KMSAN.

I appreciate that rationale, but I think it would be better to be precise for
now; otherwise it'll be much harder to remove the limitation in future as we
won't know what we're actually protecting, and it means the other code in those
files will benefit from KMSAN.

Thanks,
Mark.

2021-12-16 10:13:33

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 00/43] Add KernelMemorySanitizer infrastructure

On Tue, Dec 14, 2021 at 5:36 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:07PM +0100, Alexander Potapenko wrote:
> > KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
> > uninitialized memory. It relies on compile-time Clang instrumentation
> > (similar to MSan in the userspace [1]) and tracks the state of every bit
> > of kernel memory, being able to report an error if uninitialized value is
> > used in a condition, dereferenced, or escapes to userspace, USB or DMA.
>
> Why is USB unique here?

syzkaller just happens to be good at fuzzing USB drivers, so it was
fairly easy to implement and test USB support for KMSAN.
This should give the maintainers of other buses an idea of how this
could be done :)

What about serial data? i2c? spi? w1? We
> have a lot of different I/O bus types :)

We hope to cover those after KMSAN hits upstream.

>
> And how is DMA checked given that the kernel shouldn't be seeing dma
> memory?

Before writing a buffer to DMA, that buffer's contents are checked by
KMSAN. If there are uninitialized bytes, those will be reported.
After reading a buffer from DMA, it is marked as initialized to avoid
false positives.
We do not track DMA memory itself.

> thanks,
>
> greg k-h



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2021-12-16 10:34:39

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

On Tue, Dec 14, 2021 at 5:34 PM Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:20PM +0100, Alexander Potapenko wrote:
> > This patch adds the core parts of KMSAN runtime and associated files:
> >
> > - include/linux/kmsan-checks.h: user API to poison/unpoison/check
> > the kernel memory;
> > - include/linux/kmsan.h: declarations of KMSAN hooks to be referenced
> > outside of KMSAN runtime;
> > - lib/Kconfig.kmsan: CONFIG_KMSAN and related declarations;
> > - Makefile, mm/Makefile, mm/kmsan/Makefile: boilerplate Makefile code;
> > - mm/kmsan/annotations.c: non-inlineable implementation of KMSAN_INIT();
> > - mm/kmsan/core.c: core functions that operate with shadow and origin
> > memory and perform checks, utility functions;
> > - mm/kmsan/hooks.c: KMSAN hooks for kernel subsystems;
> > - mm/kmsan/init.c: KMSAN initialization routines;
> > - mm/kmsan/instrumentation.c: functions called by KMSAN instrumentation;
> > - mm/kmsan/kmsan.h: internal KMSAN declarations;
> > - mm/kmsan/shadow.c: routines that encapsulate metadata creation and
> > addressing;
> > - scripts/Makefile.kmsan: CFLAGS_KMSAN
> > - scripts/Makefile.lib: KMSAN_SANITIZE and KMSAN_ENABLE_CHECKS macros
>
>
> That's an odd way to write a changelog, don't you think?

Agreed. I'll try to concentrate on the functionality instead. Sorry about that.

> You need to describe what you are doing here and why you are doing it.
> Not a list of file names, we can see that in the diffstat.
>
> Also, you don't mention you are doing USB stuff here at all. And why
> are you doing it here? That should be added in a later patch.

You are right, USB is a good example of a stand-alone feature that can
be moved to a separate patch.

> Break this up into smaller, logical, pieces that add the infrastructure
> and build on it. Don't just chop your patches up on a logical-file
> boundry, as you are adding stuff in this patch that you do not need for
> many more later on, which means it was not needed here.

Just to make sure I don't misunderstand - for example for "kmsan: mm:
call KMSAN hooks from SLUB code", would it be better to pull the code
in mm/kmsan/core.c implementing kmsan_slab_alloc() and
kmsan_slab_free() into that patch?
I thought maintainers would prefer to have patches to their code
separated from KMSAN code, but if it's not true, I can surely fix
that.

> thanks,
>
> greg k-h



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2021-12-17 16:22:37

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

On Thu, Dec 16, 2021 at 11:33:56AM +0100, Alexander Potapenko wrote:
> On Tue, Dec 14, 2021 at 5:34 PM Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > On Tue, Dec 14, 2021 at 05:20:20PM +0100, Alexander Potapenko wrote:
> > > This patch adds the core parts of KMSAN runtime and associated files:
> > >
> > > - include/linux/kmsan-checks.h: user API to poison/unpoison/check
> > > the kernel memory;
> > > - include/linux/kmsan.h: declarations of KMSAN hooks to be referenced
> > > outside of KMSAN runtime;
> > > - lib/Kconfig.kmsan: CONFIG_KMSAN and related declarations;
> > > - Makefile, mm/Makefile, mm/kmsan/Makefile: boilerplate Makefile code;
> > > - mm/kmsan/annotations.c: non-inlineable implementation of KMSAN_INIT();
> > > - mm/kmsan/core.c: core functions that operate with shadow and origin
> > > memory and perform checks, utility functions;
> > > - mm/kmsan/hooks.c: KMSAN hooks for kernel subsystems;
> > > - mm/kmsan/init.c: KMSAN initialization routines;
> > > - mm/kmsan/instrumentation.c: functions called by KMSAN instrumentation;
> > > - mm/kmsan/kmsan.h: internal KMSAN declarations;
> > > - mm/kmsan/shadow.c: routines that encapsulate metadata creation and
> > > addressing;
> > > - scripts/Makefile.kmsan: CFLAGS_KMSAN
> > > - scripts/Makefile.lib: KMSAN_SANITIZE and KMSAN_ENABLE_CHECKS macros
> >
> >
> > That's an odd way to write a changelog, don't you think?
>
> Agreed. I'll try to concentrate on the functionality instead. Sorry about that.
>
> > You need to describe what you are doing here and why you are doing it.
> > Not a list of file names, we can see that in the diffstat.
> >
> > Also, you don't mention you are doing USB stuff here at all. And why
> > are you doing it here? That should be added in a later patch.
>
> You are right, USB is a good example of a stand-alone feature that can
> be moved to a separate patch.
>
> > Break this up into smaller, logical, pieces that add the infrastructure
> > and build on it. Don't just chop your patches up on a logical-file
> > boundry, as you are adding stuff in this patch that you do not need for
> > many more later on, which means it was not needed here.
>
> Just to make sure I don't misunderstand - for example for "kmsan: mm:
> call KMSAN hooks from SLUB code", would it be better to pull the code
> in mm/kmsan/core.c implementing kmsan_slab_alloc() and
> kmsan_slab_free() into that patch?

Yes.

> I thought maintainers would prefer to have patches to their code
> separated from KMSAN code, but if it's not true, I can surely fix
> that.

As a maintainer, I want to know what the function call that you just
added to my subsystem to call does. Wouldn't you? Put it all in the
same patch.

Think about submitting a patch series as telling a story. You need to
show the progression forward of the feature so that everyone can
understand what is going on. To just throw tiny snippets at us is
impossible to follow along with what your goal is.

You want reviewers to be able to easily see if the things you describe
being done in the changelog actually are implemented in the diff.
Dividing stuff up by files does not show that at all.

thanks,

greg k-h

2021-12-17 21:52:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 39/43] x86: kmsan: handle register passing from uninstrumented code

Alexander,

On Tue, Dec 14 2021 at 17:20, Alexander Potapenko wrote:
> When calling KMSAN-instrumented functions from non-instrumented
> functions, function parameters may not be initialized properly, leading
> to false positive reports. In particular, this happens all the time when
> calling interrupt handlers from `noinstr` IDT entries.
>
> Fortunately, x86 code has instrumentation_begin() and

It's not only x86 code:
> kernel/entry/common.c | 3 +++

> @@ -76,6 +77,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> nr = syscall_enter_from_user_mode(regs, nr);
>
> instrumentation_begin();
> + kmsan_instrumentation_begin(regs);

Can we please make this something like:

instrumentation_begin_at_entry(regs);

or some other sensible name which hides that kmsan gunk and avoids to
touch all of this again when KFOOSAN comes around?

Thanks,

tglx




2021-12-20 14:35:53

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 39/43] x86: kmsan: handle register passing from uninstrumented code

On Fri, Dec 17, 2021 at 10:51 PM Thomas Gleixner <[email protected]> wrote:
>
> Alexander,
>
> On Tue, Dec 14 2021 at 17:20, Alexander Potapenko wrote:
> > When calling KMSAN-instrumented functions from non-instrumented
> > functions, function parameters may not be initialized properly, leading
> > to false positive reports. In particular, this happens all the time when
> > calling interrupt handlers from `noinstr` IDT entries.
> >
> > Fortunately, x86 code has instrumentation_begin() and
>
> It's not only x86 code:
> > kernel/entry/common.c | 3 +++

Shall this bit go into a separate patch?

> > @@ -76,6 +77,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> > nr = syscall_enter_from_user_mode(regs, nr);
> >
> > instrumentation_begin();
> > + kmsan_instrumentation_begin(regs);
>
> Can we please make this something like:
>
> instrumentation_begin_at_entry(regs);

Fine, will do.
Do you think it would make sense to hide it inside
instrumentation_begin(), or is it ok to have both macros follow each
other?

> or some other sensible name which hides that kmsan gunk and avoids to
> touch all of this again when KFOOSAN comes around?
>
> Thanks,
>
> tglx
>
>
>


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

2022-01-03 16:28:05

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

On Tue, 14 Dec 2021 at 17:22, Alexander Potapenko <[email protected]> wrote:
>
> This patch adds the core parts of KMSAN runtime and associated files:
>
> - include/linux/kmsan-checks.h: user API to poison/unpoison/check
> the kernel memory;
> - include/linux/kmsan.h: declarations of KMSAN hooks to be referenced
> outside of KMSAN runtime;
> - lib/Kconfig.kmsan: CONFIG_KMSAN and related declarations;
> - Makefile, mm/Makefile, mm/kmsan/Makefile: boilerplate Makefile code;
> - mm/kmsan/annotations.c: non-inlineable implementation of KMSAN_INIT();
> - mm/kmsan/core.c: core functions that operate with shadow and origin
> memory and perform checks, utility functions;
> - mm/kmsan/hooks.c: KMSAN hooks for kernel subsystems;
> - mm/kmsan/init.c: KMSAN initialization routines;
> - mm/kmsan/instrumentation.c: functions called by KMSAN instrumentation;
> - mm/kmsan/kmsan.h: internal KMSAN declarations;
> - mm/kmsan/shadow.c: routines that encapsulate metadata creation and
> addressing;
> - scripts/Makefile.kmsan: CFLAGS_KMSAN
> - scripts/Makefile.lib: KMSAN_SANITIZE and KMSAN_ENABLE_CHECKS macros
>
> The patch also adds the necessary bookkeeping bits to struct page and
> struct task_struct:
> - each struct page now contains pointers to two struct pages holding
> KMSAN metadata (shadow and origins) for the original struct page;
> - each task_struct contains a struct kmsan_task_state used to track
> the metadata of function parameters and return values for that task.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
> ---
> Makefile | 1 +
> include/linux/kmsan-checks.h | 123 ++++++++++
> include/linux/kmsan.h | 365 ++++++++++++++++++++++++++++++
> include/linux/mm_types.h | 12 +
> include/linux/sched.h | 5 +
> lib/Kconfig.debug | 1 +
> lib/Kconfig.kmsan | 18 ++
> mm/Makefile | 1 +
> mm/kmsan/Makefile | 22 ++
> mm/kmsan/annotations.c | 28 +++
> mm/kmsan/core.c | 427 +++++++++++++++++++++++++++++++++++
> mm/kmsan/hooks.c | 400 ++++++++++++++++++++++++++++++++
> mm/kmsan/init.c | 238 +++++++++++++++++++
> mm/kmsan/instrumentation.c | 233 +++++++++++++++++++
> mm/kmsan/kmsan.h | 197 ++++++++++++++++
> mm/kmsan/report.c | 210 +++++++++++++++++
> mm/kmsan/shadow.c | 332 +++++++++++++++++++++++++++
> scripts/Makefile.kmsan | 1 +
> scripts/Makefile.lib | 9 +
> 19 files changed, 2623 insertions(+)
> create mode 100644 include/linux/kmsan-checks.h
> create mode 100644 include/linux/kmsan.h
> create mode 100644 lib/Kconfig.kmsan
> create mode 100644 mm/kmsan/Makefile
> create mode 100644 mm/kmsan/annotations.c
> create mode 100644 mm/kmsan/core.c
> create mode 100644 mm/kmsan/hooks.c
> create mode 100644 mm/kmsan/init.c
> create mode 100644 mm/kmsan/instrumentation.c
> create mode 100644 mm/kmsan/kmsan.h
> create mode 100644 mm/kmsan/report.c
> create mode 100644 mm/kmsan/shadow.c
> create mode 100644 scripts/Makefile.kmsan
>
> diff --git a/Makefile b/Makefile
> index 765115c99655f..7af3edfb2d0de 100644
> --- a/Makefilekmsan: add KMSAN runtime core
> +++ b/Makefile
> @@ -1012,6 +1012,7 @@ include-y := scripts/Makefile.extrawarn
> include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
> include-$(CONFIG_KASAN) += scripts/Makefile.kasan
> include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
> +include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
> include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
> include-$(CONFIG_KCOV) += scripts/Makefile.kcov
> include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
> diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
> new file mode 100644
> index 0000000000000..d41868c723d1e
> --- /dev/null
> +++ b/include/linux/kmsan-checks.h
> @@ -0,0 +1,123 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN checks to be used for one-off annotations in subsystems.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#ifndef _LINUX_KMSAN_CHECKS_H
> +#define _LINUX_KMSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_KMSAN
> +
> +/*
> + * Helper functions that mark the return value initialized.
> + * See mm/kmsan/annotations.c.
> + */
> +u8 kmsan_init_1(u8 value);
> +u16 kmsan_init_2(u16 value);
> +u32 kmsan_init_4(u32 value);
> +u64 kmsan_init_8(u64 value);
> +
> +static inline void *kmsan_init_ptr(void *ptr)
> +{
> + return (void *)kmsan_init_8((u64)ptr);
> +}
> +
> +static inline char kmsan_init_char(char value)
> +{
> + return (u8)kmsan_init_1((u8)value);
> +}
> +
> +#define __decl_kmsan_init_type(type, fn) unsigned type : fn, signed type : fn
> +
> +/**
> + * kmsan_init - Make the value initialized.
> + * @val: 1-, 2-, 4- or 8-byte integer that may be treated as uninitialized by
> + * KMSAN.
> + *
> + * Return: value of @val that KMSAN treats as initialized.
> + */
> +#define kmsan_init(val) \
> + ( \
> + (typeof(val))(_Generic((val), \
> + __decl_kmsan_init_type(char, kmsan_init_1), \
> + __decl_kmsan_init_type(short, kmsan_init_2), \
> + __decl_kmsan_init_type(int, kmsan_init_4), \
> + __decl_kmsan_init_type(long, kmsan_init_8), \
> + char : kmsan_init_char, \
> + void * : kmsan_init_ptr)(val)))
> +
> +/**
> + * kmsan_poison_memory() - Mark the memory range as uninitialized.
> + * @address: address to start with.
> + * @size: size of buffer to poison.
> + * @flags: GFP flags for allocations done by this function.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * uninitialized. Error reports for this memory will reference the call site of
> + * kmsan_poison_memory() as origin.
> + */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
> +
> +/**
> + * kmsan_unpoison_memory() - Mark the memory range as initialized.
> + * @address: address to start with.
> + * @size: size of buffer to unpoison.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * initialized.
> + */
> +void kmsan_unpoison_memory(const void *address, size_t size);
> +
> +/**
> + * kmsan_check_memory() - Check the memory range for being initialized.
> + * @address: address to start with.
> + * @size: size of buffer to check.
> + *
> + * If any piece of the given range is marked as uninitialized, KMSAN will report
> + * an error.
> + */
> +void kmsan_check_memory(const void *address, size_t size);
> +
> +/**
> + * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace.
> + * @to: destination address in the userspace.
> + * @from: source address in the kernel.
> + * @to_copy: number of bytes to copy.
> + * @left: number of bytes not copied.
> + *
> + * If this is a real userspace data transfer, KMSAN checks the bytes that were
> + * actually copied to ensure there was no information leak. If @to belongs to
> + * the kernel space (which is possible for compat syscalls), KMSAN just copies
> + * the metadata.
> + */
> +void kmsan_copy_to_user(const void *to, const void *from, size_t to_copy,
> + size_t left);
> +
> +#else
> +
> +#define kmsan_init(value) (value)
> +
> +static inline void kmsan_poison_memory(const void *address, size_t size,
> + gfp_t flags)
> +{
> +}
> +static inline void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> +}
> +static inline void kmsan_check_memory(const void *address, size_t size)
> +{
> +}
> +static inline void kmsan_copy_to_user(const void *to, const void *from,
> + size_t to_copy, size_t left)
> +{
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_CHECKS_H */
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> new file mode 100644
> index 0000000000000..f17bc9ded7b97
> --- /dev/null
> +++ b/include/linux/kmsan.h
> @@ -0,0 +1,365 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN API for subsystems.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +#ifndef _LINUX_KMSAN_H
> +#define _LINUX_KMSAN_H
> +
> +#include <linux/dma-direction.h>
> +#include <linux/gfp.h>
> +#include <linux/kmsan-checks.h>
> +#include <linux/stackdepot.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +struct page;
> +struct kmem_cache;
> +struct task_struct;
> +struct scatterlist;
> +struct urb;
> +
> +#ifdef CONFIG_KMSAN
> +
> +/* These constants are defined in the MSan LLVM instrumentation pass. */
> +#define KMSAN_RETVAL_SIZE 800
> +#define KMSAN_PARAM_SIZE 800
> +
> +struct kmsan_context_state {
> + char param_tls[KMSAN_PARAM_SIZE];
> + char retval_tls[KMSAN_RETVAL_SIZE];
> + char va_arg_tls[KMSAN_PARAM_SIZE];
> + char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> + u64 va_arg_overflow_size_tls;
> + char param_origin_tls[KMSAN_PARAM_SIZE];
> + depot_stack_handle_t retval_origin_tls;
> +};
> +
> +#undef KMSAN_PARAM_SIZE
> +#undef KMSAN_RETVAL_SIZE
> +
> +struct kmsan_ctx {
> + struct kmsan_context_state cstate;
> + int kmsan_in_runtime;
> + bool allow_reporting;
> +};
> +
> +/**
> + * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
> + *
> + * Allocate and initialize KMSAN metadata for early allocations.
> + */
> +void __init kmsan_init_shadow(void);
> +
> +/**
> + * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
> + */
> +void __init kmsan_init_runtime(void);
> +
> +/**
> + * kmsan_memblock_free_pages() - handle freeing of memblock pages.
> + * @page: struct page to free.
> + * @order: order of @page.
> + *
> + * Freed pages are either returned to buddy allocator or held back to be used
> + * as metadata pages.
> + */
> +bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
> +
> +/**
> + * kmsan_task_create() - Initialize KMSAN state for the task.
> + * @task: task to initialize.
> + */
> +void kmsan_task_create(struct task_struct *task);
> +
> +/**
> + * kmsan_task_exit() - Notify KMSAN that a task has exited.
> + * @task: task about to finish.
> + */
> +void kmsan_task_exit(struct task_struct *task);
> +
> +/**
> + * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
> + * @page: struct page pointer returned by alloc_pages().
> + * @order: order of allocated struct page.
> + * @flags: GFP flags used by alloc_pages()
> + *
> + * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
> + * @flags contain __GFP_ZERO.
> + */
> +void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
> +
> +/**
> + * kmsan_free_page() - Notify KMSAN about a free_pages() call.
> + * @page: struct page pointer passed to free_pages().
> + * @order: order of deallocated struct page.
> + *
> + * KMSAN marks freed memory as uninitialized.
> + */
> +void kmsan_free_page(struct page *page, unsigned int order);
> +
> +/**
> + * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
> + * @dst: destination page.
> + * @src: source page.
> + *
> + * KMSAN copies the contents of metadata pages for @src into the metadata pages
> + * for @dst. If @dst has no associated metadata pages, nothing happens.
> + * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
> + */
> +void kmsan_copy_page_meta(struct page *dst, struct page *src);
> +
> +/**
> + * kmsan_gup_pgd_range() - Notify KMSAN about a gup_pgd_range() call.
> + * @pages: array of struct page pointers.
> + * @nr: array size.
> + *
> + * gup_pgd_range() creates new pages, some of which may belong to the userspace
> + * memory. In that case KMSAN marks them as initialized.
> + */
> +void kmsan_gup_pgd_range(struct page **pages, int nr);
> +
> +/**
> + * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
> + * @s: slab cache the object belongs to.
> + * @object: object pointer.
> + * @flags: GFP flags passed to the allocator.
> + *
> + * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
> + * newly created object, marking it as initialized or uninitialized.
> + */
> +void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
> +
> +/**
> + * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
> + * @s: slab cache the object belongs to.
> + * @object: object pointer.
> + *
> + * KMSAN marks the freed object as uninitialized.
> + */
> +void kmsan_slab_free(struct kmem_cache *s, void *object);
> +
> +/**
> + * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
> + * @ptr: object pointer.
> + * @size: object size.
> + * @flags: GFP flags passed to the allocator.
> + *
> + * Similar to kmsan_slab_alloc(), but for large allocations.
> + */
> +void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
> +
> +/**
> + * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
> + * @ptr: object pointer.
> + *
> + * Similar to kmsan_slab_free(), but for large allocations.
> + */
> +void kmsan_kfree_large(const void *ptr);
> +
> +/**
> + * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
> + * @start: start of vmapped range.
> + * @end: end of vmapped range.
> + * @prot: page protection flags used for vmap.
> + * @pages: array of pages.
> + * @page_shift: page_shift passed to vmap_range_noflush().
> + *
> + * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
> + * vmalloc metadata address range.
> + */
> +void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
> + pgprot_t prot, struct page **pages,
> + unsigned int page_shift);
> +
> +/**
> + * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
> + * @start: start of vunmapped range.
> + * @end: end of vunmapped range.
> + *
> + * KMSAN unmaps the contiguous metadata ranges created by
> + * kmsan_map_kernel_range_noflush().
> + */
> +void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
> +
> +/**
> + * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
> + * @addr: range start.
> + * @end: range end.
> + * @phys_addr: physical range start.
> + * @prot: page protection flags used for ioremap_page_range().
> + * @page_shift: page_shift argument passed to vmap_range_noflush().
> + *
> + * KMSAN creates new metadata pages for the physical pages mapped into the
> + * virtual memory.
> + */
> +void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + unsigned int page_shift);
> +
> +/**
> + * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
> + * @start: range start.
> + * @end: range end.
> + *
> + * KMSAN unmaps the metadata pages for the given range and, unlike for
> + * vunmap_page_range(), also deallocates them.
> + */
> +void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
> +
> +/**
> + * kmsan_handle_dma() - Handle a DMA data transfer.
> + * @page: first page of the buffer.
> + * @offset: offset of the buffer within the first page.
> + * @size: buffer size.
> + * @dir: one of possible dma_data_direction values.
> + *
> + * Depending on @direction, KMSAN:
> + * * checks the buffer, if it is copied to device;
> + * * initializes the buffer, if it is copied from device;
> + * * does both, if this is a DMA_BIDIRECTIONAL transfer.
> + */
> +void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> + enum dma_data_direction dir);
> +
> +/**
> + * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist.
> + * @sg: scatterlist holding DMA buffers.
> + * @nents: number of scatterlist entries.
> + * @dir: one of possible dma_data_direction values.
> + *
> + * Depending on @direction, KMSAN:
> + * * checks the buffers in the scatterlist, if they are copied to device;
> + * * initializes the buffers, if they are copied from device;
> + * * does both, if this is a DMA_BIDIRECTIONAL transfer.
> + */
> +void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
> + enum dma_data_direction dir);
> +
> +/**
> + * kmsan_handle_urb() - Handle a USB data transfer.
> + * @urb: struct urb pointer.
> + * @is_out: data transfer direction (true means output to hardware).
> + *
> + * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise,
> + * KMSAN initializes the transfer buffer.
> + */
> +void kmsan_handle_urb(const struct urb *urb, bool is_out);
> +
> +/**
> + * kmsan_instrumentation_begin() - handle instrumentation_begin().
> + * @regs: pointer to struct pt_regs that non-instrumented code passes to
> + * instrumented code.
> + */
> +void kmsan_instrumentation_begin(struct pt_regs *regs);
> +
> +#else
> +
> +static inline void kmsan_init_shadow(void)
> +{
> +}
> +
> +static inline void kmsan_init_runtime(void)
> +{
> +}
> +
> +static inline bool kmsan_memblock_free_pages(struct page *page,
> + unsigned int order)
> +{
> + return true;
> +}
> +
> +static inline void kmsan_task_create(struct task_struct *task)
> +{
> +}
> +
> +static inline void kmsan_task_exit(struct task_struct *task)
> +{
> +}
> +
> +static inline int kmsan_alloc_page(struct page *page, unsigned int order,
> + gfp_t flags)
> +{
> + return 0;
> +}
> +
> +static inline void kmsan_free_page(struct page *page, unsigned int order)
> +{
> +}
> +
> +static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
> +{
> +}
> +
> +static inline void kmsan_gup_pgd_range(struct page **pages, int nr)
> +{
> +}
> +
> +static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
> + gfp_t flags)
> +{
> +}
> +
> +static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
> +{
> +}
> +
> +static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
> + gfp_t flags)
> +{
> +}
> +
> +static inline void kmsan_kfree_large(const void *ptr)
> +{
> +}
> +
> +static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
> + unsigned long end,
> + pgprot_t prot,
> + struct page **pages,
> + unsigned int page_shift)
> +{
> +}
> +
> +static inline void kmsan_vunmap_range_noflush(unsigned long start,
> + unsigned long end)
> +{
> +}
> +
> +static inline void kmsan_ioremap_page_range(unsigned long start,
> + unsigned long end,
> + phys_addr_t phys_addr,
> + pgprot_t prot,
> + unsigned int page_shift)
> +{
> +}
> +
> +static inline void kmsan_iounmap_page_range(unsigned long start,
> + unsigned long end)
> +{
> +}
> +
> +static inline void kmsan_handle_dma(struct page *page, size_t offset,
> + size_t size, enum dma_data_direction dir)
> +{
> +}
> +
> +static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
> + enum dma_data_direction dir)
> +{
> +}
> +
> +static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
> +{
> +}
> +
> +static inline void kmsan_instrumentation_begin(struct pt_regs *regs)
> +{
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_H */
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index c3a6e62096006..bdbe4b39b826d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -233,6 +233,18 @@ struct page {
> not kmapped, ie. highmem) */
> #endif /* WANT_PAGE_VIRTUAL */
>
> +#ifdef CONFIG_KMSAN
> + /*
> + * KMSAN metadata for this page:
> + * - shadow page: every bit indicates whether the corresponding
> + * bit of the original page is initialized (0) or not (1);
> + * - origin page: every 4 bytes contain an id of the stack trace
> + * where the uninitialized value was created.
> + */
> + struct page *kmsan_shadow;
> + struct page *kmsan_origin;
> +#endif
> +
> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> int _last_cpupid;
> #endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78c351e35fec6..8d076f82d5072 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -14,6 +14,7 @@
> #include <linux/pid.h>
> #include <linux/sem.h>
> #include <linux/shm.h>
> +#include <linux/kmsan.h>
> #include <linux/mutex.h>
> #include <linux/plist.h>
> #include <linux/hrtimer.h>
> @@ -1341,6 +1342,10 @@ struct task_struct {
> #endif
> #endif
>
> +#ifdef CONFIG_KMSAN
> + struct kmsan_ctx kmsan_ctx;
> +#endif
> +
> #if IS_ENABLED(CONFIG_KUNIT)
> struct kunit *kunit_test;
> #endif
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 5e14e32056add..304374f2c300a 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -963,6 +963,7 @@ config DEBUG_STACKOVERFLOW
>
> source "lib/Kconfig.kasan"
> source "lib/Kconfig.kfence"
> +source "lib/Kconfig.kmsan"
>
> endmenu # "Memory Debugging"
>
> diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> new file mode 100644
> index 0000000000000..02fd6db792b1f
> --- /dev/null
> +++ b/lib/Kconfig.kmsan
> @@ -0,0 +1,18 @@
> +config HAVE_ARCH_KMSAN
> + bool
> +
> +config HAVE_KMSAN_COMPILER
> + def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1))
> +
> +config KMSAN
> + bool "KMSAN: detector of uninitialized values use"
> + depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> + depends on SLUB && !KASAN && !KCSAN
> + depends on CC_IS_CLANG && CLANG_VERSION >= 140000
> + select STACKDEPOT
> + help
> + KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
> + uninitialized values in the kernel. It is based on compiler
> + instrumentation provided by Clang and thus requires Clang to build.
> +
> + See <file:Documentation/dev-tools/kmsan.rst> for more details.
> diff --git a/mm/Makefile b/mm/Makefile
> index d6c0042e3aa0d..8e9319a9affea 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -87,6 +87,7 @@ obj-$(CONFIG_SLAB) += slab.o
> obj-$(CONFIG_SLUB) += slub.o
> obj-$(CONFIG_KASAN) += kasan/
> obj-$(CONFIG_KFENCE) += kfence/
> +obj-$(CONFIG_KMSAN) += kmsan/
> obj-$(CONFIG_FAILSLAB) += failslab.o
> obj-$(CONFIG_MEMTEST) += memtest.o
> obj-$(CONFIG_MIGRATION) += migrate.o
> diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
> new file mode 100644
> index 0000000000000..f57a956cb1c8b
> --- /dev/null
> +++ b/mm/kmsan/Makefile
> @@ -0,0 +1,22 @@
> +obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o annotations.o
> +
> +KMSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +UBSAN_SANITIZE := n
> +
> +KMSAN_SANITIZE_kmsan_annotations.o := y
> +
> +# Disable instrumentation of KMSAN runtime with other tools.
> +CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
> +CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
> +CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
> +
> +CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_annotations.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
> diff --git a/mm/kmsan/annotations.c b/mm/kmsan/annotations.c
> new file mode 100644
> index 0000000000000..037468d1840f2
> --- /dev/null
> +++ b/mm/kmsan/annotations.c
> @@ -0,0 +1,28 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN annotations.
> + *
> + * The kmsan_init_SIZE functions reside in a separate translation unit to
> + * prevent inlining them. Clang may inline functions marked with
> + * __no_sanitize_memory attribute into functions without it, which effectively
> + * results in ignoring the attribute.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <linux/export.h>
> +#include <linux/kmsan-checks.h>
> +
> +#define DECLARE_KMSAN_INIT(size, t) \
> + __no_sanitize_memory t kmsan_init_##size(t value) \
> + { \
> + return value; \
> + } \
> + EXPORT_SYMBOL(kmsan_init_##size)
> +
> +DECLARE_KMSAN_INIT(1, u8);
> +DECLARE_KMSAN_INIT(2, u16);
> +DECLARE_KMSAN_INIT(4, u32);
> +DECLARE_KMSAN_INIT(8, u64);
> diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
> new file mode 100644
> index 0000000000000..b2bb25a8013e4
> --- /dev/null
> +++ b/mm/kmsan/core.c
> @@ -0,0 +1,427 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN runtime library.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <asm/page.h>
> +#include <linux/compiler.h>
> +#include <linux/export.h>
> +#include <linux/highmem.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/kmsan.h>
> +#include <linux/memory.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/mmzone.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/preempt.h>
> +#include <linux/slab.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Avoid creating too long origin chains, these are unlikely to participate in
> + * real reports.
> + */
> +#define MAX_CHAIN_DEPTH 7
> +#define NUM_SKIPPED_TO_WARN 10000
> +
> +bool kmsan_enabled __read_mostly;
> +
> +/*
> + * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
> + * unavaliable.
> + */
> +DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +void kmsan_internal_task_create(struct task_struct *task)
> +{
> + struct kmsan_ctx *ctx = &task->kmsan_ctx;
> +
> + __memset(ctx, 0, sizeof(struct kmsan_ctx));
> + ctx->allow_reporting = true;
> + kmsan_internal_unpoison_memory(current_thread_info(),
> + sizeof(struct thread_info), false);
> +}
> +
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> + unsigned int poison_flags)
> +{
> + u32 extra_bits =
> + kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
> + bool checked = poison_flags & KMSAN_POISON_CHECK;
> + depot_stack_handle_t handle;
> +
> + handle = kmsan_save_stack_with_flags(flags, extra_bits);
> + kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
> +}
> +
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
> +{
> + kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> + unsigned int extra)
> +{
> + unsigned long entries[KMSAN_STACK_DEPTH];
> + unsigned int nr_entries;
> +
> + nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
> + nr_entries = filter_irq_stacks(entries, nr_entries);
> +
> + /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
> + flags &= ~__GFP_DIRECT_RECLAIM;
> +
> + return __stack_depot_save(entries, nr_entries, extra, flags, true);
> +}
> +
> +/* Copy the metadata following the memmove() behavior. */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
> +{
> + depot_stack_handle_t old_origin = 0, chain_origin, new_origin = 0;
> + int src_slots, dst_slots, i, iter, step, skip_bits;
> + depot_stack_handle_t *origin_src, *origin_dst;
> + void *shadow_src, *shadow_dst;
> + u32 *align_shadow_src, shadow;
> + bool backwards;
> +
> + shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
> + if (!shadow_dst)
> + return;
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
> +
> + shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
> + if (!shadow_src) {
> + /*
> + * |src| is untracked: zero out destination shadow, ignore the
> + * origins, we're done.
> + */
> + __memset(shadow_dst, 0, n);
> + return;
> + }
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
> +
> + __memmove(shadow_dst, shadow_src, n);
> +
> + origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
> + origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
> + KMSAN_WARN_ON(!origin_dst || !origin_src);
> + src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
> + ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
> + KMSAN_ORIGIN_SIZE;
> + dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
> + ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
> + KMSAN_ORIGIN_SIZE;
> + KMSAN_WARN_ON(!src_slots || !dst_slots);
> + KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));

The above 2 checks look equivalent.

> + KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
> + (dst_slots - src_slots < -1));
> + backwards = dst > src;
> + i = backwards ? min(src_slots, dst_slots) - 1 : 0;
> + iter = backwards ? -1 : 1;
> +
> + align_shadow_src =
> + (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
> + for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
> + KMSAN_WARN_ON(i < 0);
> + shadow = align_shadow_src[i];
> + if (i == 0) {
> + /*
> + * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
> + * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
> + * of the first shadow slot.
> + */
> + skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow << skip_bits) >> skip_bits;

Is this correct?...
For the first slot we want to ignore some of the first (low) bits. To
ignore low bits we need to shift right and then left, no?

> + }
> + if (i == src_slots - 1) {
> + /*
> + * If |src + n| isn't aligned on
> + * KMSAN_ORIGIN_SIZE, don't look at the last
> + * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
> + * last shadow slot.
> + */
> + skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow >> skip_bits) << skip_bits;

Same here.

> + }
> + /*
> + * Overwrite the origin only if the corresponding
> + * shadow is nonempty.
> + */
> + if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
> + old_origin = origin_src[i];
> + chain_origin = kmsan_internal_chain_origin(old_origin);
> + /*
> + * kmsan_internal_chain_origin() may return
> + * NULL, but we don't want to lose the previous
> + * origin value.
> + */
> + if (chain_origin)
> + new_origin = chain_origin;
> + else
> + new_origin = old_origin;

This can be a bit shorted and w/o the temp var as:

new_origin = kmsan_internal_chain_origin(old_origin);
/*
* kmsan_internal_chain_origin() may return
* NULL, but we don't want to lose the previous
* origin value.
*/
if (!new_origin)
new_origin = old_origin;


> + }
> + if (shadow)
> + origin_dst[i] = new_origin;

Are we sure that origin_dst is aligned here?


> + else
> + origin_dst[i] = 0;
> + }
> +}
> +
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
> +{
> + unsigned long entries[3];
> + u32 extra_bits;
> + int depth;
> + bool uaf;
> +
> + if (!id)
> + return id;
> + /*
> + * Make sure we have enough spare bits in |id| to hold the UAF bit and
> + * the chain depth.
> + */
> + BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
> +
> + extra_bits = stack_depot_get_extra_bits(id);
> + depth = kmsan_depth_from_eb(extra_bits);
> + uaf = kmsan_uaf_from_eb(extra_bits);
> +
> + if (depth >= MAX_CHAIN_DEPTH) {
> + static atomic_long_t kmsan_skipped_origins;
> + long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
> +
> + if (skipped % NUM_SKIPPED_TO_WARN == 0) {
> + pr_warn("not chained %d origins\n", skipped);
> + dump_stack();
> + kmsan_print_origin(id);
> + }
> + return id;
> + }
> + depth++;
> + extra_bits = kmsan_extra_bits(depth, uaf);
> +
> + entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
> + entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
> + entries[2] = id;
> + return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
> + GFP_ATOMIC, true);
> +}
> +
> +void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
> + u32 origin, bool checked)
> +{
> + u64 address = (u64)addr;
> + void *shadow_start;
> + u32 *origin_start;
> + size_t pad = 0;
> + int i;
> +
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> + shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
> + if (!shadow_start) {
> + /*
> + * kmsan_metadata_is_contiguous() is true, so either all shadow
> + * and origin pages are NULL, or all are non-NULL.
> + */
> + if (checked) {
> + pr_err("%s: not memsetting %d bytes starting at %px, because the shadow is NULL\n",
> + __func__, size, addr);
> + BUG();
> + }
> + return;
> + }
> + __memset(shadow_start, b, size);
> +
> + if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
> + pad = address % KMSAN_ORIGIN_SIZE;
> + address -= pad;
> + size += pad;
> + }
> + size = ALIGN(size, KMSAN_ORIGIN_SIZE);
> + origin_start =
> + (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
> +
> + for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
> + origin_start[i] = origin;
> +}
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
> +{
> + struct page *page;
> +
> + if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
> + !kmsan_internal_is_module_addr(vaddr))
> + return NULL;
> + page = vmalloc_to_page(vaddr);
> + if (pfn_valid(page_to_pfn(page)))
> + return page;
> + else
> + return NULL;
> +}
> +
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> + int reason)
> +{
> + depot_stack_handle_t cur_origin = 0, new_origin = 0;
> + unsigned long addr64 = (unsigned long)addr;
> + depot_stack_handle_t *origin = NULL;
> + unsigned char *shadow = NULL;
> + int cur_off_start = -1;
> + int i, chunk_size;
> + size_t pos = 0;
> +
> + if (!size)
> + return;
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> + while (pos < size) {
> + chunk_size = min(size - pos,
> + PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
> + shadow = kmsan_get_metadata((void *)(addr64 + pos),
> + KMSAN_META_SHADOW);
> + if (!shadow) {
> + /*
> + * This page is untracked. If there were uninitialized
> + * bytes before, report them.
> + */
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos - 1, user_addr,
> + reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = 0;
> + cur_off_start = -1;
> + pos += chunk_size;
> + continue;
> + }
> + for (i = 0; i < chunk_size; i++) {
> + if (!shadow[i]) {
> + /*
> + * This byte is unpoisoned. If there were
> + * poisoned bytes before, report them.
> + */
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos + i - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = 0;
> + cur_off_start = -1;
> + continue;
> + }
> + origin = kmsan_get_metadata((void *)(addr64 + pos + i),
> + KMSAN_META_ORIGIN);
> + KMSAN_WARN_ON(!origin);
> + new_origin = *origin;
> + /*
> + * Encountered new origin - report the previous
> + * uninitialized range.
> + */
> + if (cur_origin != new_origin) {
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos + i - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = new_origin;
> + cur_off_start = pos + i;
> + }
> + }
> + pos += chunk_size;
> + }
> + KMSAN_WARN_ON(pos != size);
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> +}
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size)
> +{
> + char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
> + *next_origin = NULL;
> + u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
> + depot_stack_handle_t *origin_p;
> + bool all_untracked = false;
> +
> + if (!size)
> + return true;
> +
> + /* The whole range belongs to the same page. */
> + if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
> + ALIGN_DOWN(cur_addr, PAGE_SIZE))
> + return true;
> +
> + cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
> + if (!cur_shadow)
> + all_untracked = true;
> + cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
> + if (all_untracked && cur_origin)
> + goto report;
> +
> + for (; next_addr < (u64)addr + size;
> + cur_addr = next_addr, cur_shadow = next_shadow,
> + cur_origin = next_origin, next_addr += PAGE_SIZE) {
> + next_shadow = kmsan_get_metadata((void *)next_addr, false);
> + next_origin = kmsan_get_metadata((void *)next_addr, true);
> + if (all_untracked) {
> + if (next_shadow || next_origin)
> + goto report;
> + if (!next_shadow && !next_origin)
> + continue;
> + }
> + if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
> + ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
> + continue;
> + goto report;
> + }
> + return true;
> +
> +report:
> + pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
> + pr_err("Access of size %d at %px.\n", size, addr);
> + pr_err("Addresses belonging to different ranges: %px and %px\n",
> + cur_addr, next_addr);
> + pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
> + next_shadow);
> + pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
> + next_origin);
> + origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
> + if (origin_p) {
> + pr_err("Origin: %08x\n", *origin_p);
> + kmsan_print_origin(*origin_p);
> + } else {
> + pr_err("Origin: unavailable\n");
> + }
> + return false;
> +}
> +
> +bool kmsan_internal_is_module_addr(void *vaddr)
> +{
> + return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
> +}
> +
> +bool kmsan_internal_is_vmalloc_addr(void *addr)
> +{
> + return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
> +}
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> new file mode 100644
> index 0000000000000..4012d7a4adb53
> --- /dev/null
> +++ b/mm/kmsan/hooks.c
> @@ -0,0 +1,400 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN hooks for kernel subsystems.
> + *
> + * These functions handle creation of KMSAN metadata for memory allocations.
> + *
> + * Copyright (C) 2018-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <linux/cacheflush.h>
> +#include <linux/dma-direction.h>
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/usb.h>
> +
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Instrumented functions shouldn't be called under
> + * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
> + * skipping effects of functions like memset() inside instrumented code.
> + */
> +
> +void kmsan_task_create(struct task_struct *task)
> +{
> + kmsan_enter_runtime();
> + kmsan_internal_task_create(task);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_task_create);
> +
> +void kmsan_task_exit(struct task_struct *task)
> +{
> + struct kmsan_ctx *ctx = &task->kmsan_ctx;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ctx->allow_reporting = false;
> +}
> +EXPORT_SYMBOL(kmsan_task_exit);
> +
> +void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
> +{
> + if (unlikely(object == NULL))
> + return;
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + /*
> + * There's a ctor or this is an RCU cache - do nothing. The memory
> + * status hasn't changed since last use.
> + */
> + if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
> + return;
> +
> + kmsan_enter_runtime();
> + if (flags & __GFP_ZERO)
> + kmsan_internal_unpoison_memory(object, s->object_size,
> + KMSAN_POISON_CHECK);
> + else
> + kmsan_internal_poison_memory(object, s->object_size, flags,
> + KMSAN_POISON_CHECK);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_slab_alloc);
> +
> +void kmsan_slab_free(struct kmem_cache *s, void *object)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + /* RCU slabs could be legally used after free within the RCU period */
> + if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
> + return;
> + /*
> + * If there's a constructor, freed memory must remain in the same state
> + * until the next allocation. We cannot save its state to detect
> + * use-after-free bugs, instead we just keep it unpoisoned.
> + */
> + if (s->ctor)
> + return;
> + kmsan_enter_runtime();
> + kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
> + KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_slab_free);
> +
> +void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
> +{
> + if (unlikely(ptr == NULL))
> + return;
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + if (flags & __GFP_ZERO)
> + kmsan_internal_unpoison_memory((void *)ptr, size,
> + /*checked*/ true);
> + else
> + kmsan_internal_poison_memory((void *)ptr, size, flags,
> + KMSAN_POISON_CHECK);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_kmalloc_large);
> +
> +void kmsan_kfree_large(const void *ptr)
> +{
> + struct page *page;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + page = virt_to_head_page((void *)ptr);
> + KMSAN_WARN_ON(ptr != page_address(page));
> + kmsan_internal_poison_memory((void *)ptr,
> + PAGE_SIZE << compound_order(page),
> + GFP_KERNEL,
> + KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_kfree_large);
> +
> +static unsigned long vmalloc_shadow(unsigned long addr)
> +{
> + return (unsigned long)kmsan_get_metadata((void *)addr,
> + KMSAN_META_SHADOW);
> +}
> +
> +static unsigned long vmalloc_origin(unsigned long addr)
> +{
> + return (unsigned long)kmsan_get_metadata((void *)addr,
> + KMSAN_META_ORIGIN);
> +}
> +
> +void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
> +{
> + __vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
> + __vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
> + flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
> + flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
> +}
> +EXPORT_SYMBOL(kmsan_vunmap_range_noflush);
> +
> +/*
> + * This function creates new shadow/origin pages for the physical pages mapped
> + * into the virtual memory. If those physical pages already had shadow/origin,
> + * those are ignored.
> + */
> +void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
> + phys_addr_t phys_addr, pgprot_t prot,
> + unsigned int page_shift)
> +{
> + gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
> + struct page *shadow, *origin;
> + unsigned long off = 0;
> + int i, nr;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + nr = (end - start) / PAGE_SIZE;
> + kmsan_enter_runtime();
> + for (i = 0; i < nr; i++, off += PAGE_SIZE) {
> + shadow = alloc_pages(gfp_mask, 1);
> + origin = alloc_pages(gfp_mask, 1);
> + __vmap_pages_range_noflush(
> + vmalloc_shadow(start + off),
> + vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
> + page_shift);
> + __vmap_pages_range_noflush(
> + vmalloc_origin(start + off),
> + vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
> + page_shift);
> + }
> + flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
> + flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_ioremap_page_range);
> +
> +void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
> +{
> + unsigned long v_shadow, v_origin;
> + struct page *shadow, *origin;
> + int i, nr;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + nr = (end - start) / PAGE_SIZE;
> + kmsan_enter_runtime();
> + v_shadow = (unsigned long)vmalloc_shadow(start);
> + v_origin = (unsigned long)vmalloc_origin(start);
> + for (i = 0; i < nr; i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
> + shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
> + origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
> + __vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
> + __vunmap_range_noflush(v_origin, vmalloc_origin(end));
> + if (shadow)
> + __free_pages(shadow, 1);
> + if (origin)
> + __free_pages(origin, 1);
> + }
> + flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
> + flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_iounmap_page_range);
> +
> +void kmsan_copy_to_user(const void *to, const void *from, size_t to_copy,
> + size_t left)
> +{
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + /*
> + * At this point we've copied the memory already. It's hard to check it
> + * before copying, as the size of actually copied buffer is unknown.
> + */
> +
> + /* copy_to_user() may copy zero bytes. No need to check. */
> + if (!to_copy)
> + return;
> + /* Or maybe copy_to_user() failed to copy anything. */
> + if (to_copy <= left)
> + return;
> +
> + ua_flags = user_access_save();
> + if ((u64)to < TASK_SIZE) {
> + /* This is a user memory access, check it. */
> + kmsan_internal_check_memory((void *)from, to_copy - left, to,
> + REASON_COPY_TO_USER);
> + user_access_restore(ua_flags);
> + return;
> + }
> + /* Otherwise this is a kernel memory access. This happens when a compat
> + * syscall passes an argument allocated on the kernel stack to a real
> + * syscall.
> + * Don't check anything, just copy the shadow of the copied bytes.
> + */
> + kmsan_internal_memmove_metadata((void *)to, (void *)from,
> + to_copy - left);
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(kmsan_copy_to_user);
> +
> +/* Helper function to check an URB. */
> +void kmsan_handle_urb(const struct urb *urb, bool is_out)
> +{
> + if (!urb)
> + return;
> + if (is_out)
> + kmsan_internal_check_memory(urb->transfer_buffer,
> + urb->transfer_buffer_length,
> + /*user_addr*/ 0, REASON_SUBMIT_URB);
> + else
> + kmsan_internal_unpoison_memory(urb->transfer_buffer,
> + urb->transfer_buffer_length,
> + /*checked*/ false);
> +}
> +EXPORT_SYMBOL(kmsan_handle_urb);
> +
> +static void kmsan_handle_dma_page(const void *addr, size_t size,
> + enum dma_data_direction dir)
> +{
> + switch (dir) {
> + case DMA_BIDIRECTIONAL:
> + kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
> + REASON_ANY);
> + kmsan_internal_unpoison_memory((void *)addr, size,
> + /*checked*/ false);
> + break;
> + case DMA_TO_DEVICE:
> + kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
> + REASON_ANY);
> + break;
> + case DMA_FROM_DEVICE:
> + kmsan_internal_unpoison_memory((void *)addr, size,
> + /*checked*/ false);
> + break;
> + case DMA_NONE:
> + break;
> + }
> +}
> +
> +/* Helper function to handle DMA data transfers. */
> +void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> + enum dma_data_direction dir)
> +{
> + u64 page_offset, to_go, addr;
> +
> + if (PageHighMem(page))
> + return;
> + addr = (u64)page_address(page) + offset;
> + /*
> + * The kernel may occasionally give us adjacent DMA pages not belonging
> + * to the same allocation. Process them separately to avoid triggering
> + * internal KMSAN checks.
> + */
> + while (size > 0) {
> + page_offset = addr % PAGE_SIZE;
> + to_go = min(PAGE_SIZE - page_offset, (u64)size);
> + kmsan_handle_dma_page((void *)addr, to_go, dir);
> + addr += to_go;
> + size -= to_go;
> + }
> +}
> +EXPORT_SYMBOL(kmsan_handle_dma);
> +
> +void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
> + enum dma_data_direction dir)
> +{
> + struct scatterlist *item;
> + int i;
> +
> + for_each_sg(sg, item, nents, i)
> + kmsan_handle_dma(sg_page(item), item->offset, item->length,
> + dir);
> +}
> +EXPORT_SYMBOL(kmsan_handle_dma_sg);
> +
> +/* Functions from kmsan-checks.h follow. */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + /* The users may want to poison/unpoison random memory. */
> + kmsan_internal_poison_memory((void *)address, size, flags,
> + KMSAN_POISON_NOCHECK);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_poison_memory);
> +
> +void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + kmsan_enter_runtime();
> + /* The users may want to poison/unpoison random memory. */
> + kmsan_internal_unpoison_memory((void *)address, size,
> + KMSAN_POISON_NOCHECK);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(kmsan_unpoison_memory);
> +
> +void kmsan_gup_pgd_range(struct page **pages, int nr)
> +{
> + void *page_addr;
> + int i;
> +
> + /*
> + * gup_pgd_range() has just created a number of new pages that KMSAN
> + * treats as uninitialized. In the case they belong to the userspace
> + * memory, unpoison the corresponding kernel pages.
> + */
> + for (i = 0; i < nr; i++) {
> + if (PageHighMem(pages[i]))
> + continue;
> + page_addr = page_address(pages[i]);
> + if (((u64)page_addr < TASK_SIZE) &&
> + ((u64)page_addr + PAGE_SIZE < TASK_SIZE))
> + kmsan_unpoison_memory(page_addr, PAGE_SIZE);
> + }
> +}
> +EXPORT_SYMBOL(kmsan_gup_pgd_range);
> +
> +void kmsan_check_memory(const void *addr, size_t size)
> +{
> + if (!kmsan_enabled)
> + return;
> + return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
> + REASON_ANY);
> +}
> +EXPORT_SYMBOL(kmsan_check_memory);
> +
> +void kmsan_instrumentation_begin(struct pt_regs *regs)
> +{
> + struct kmsan_context_state *state = &kmsan_get_context()->cstate;
> +
> + if (state)
> + __memset(state, 0, sizeof(struct kmsan_context_state));
> + if (!kmsan_enabled || !regs)
> + return;
> + kmsan_internal_unpoison_memory(regs, sizeof(*regs), /*checked*/ true);
> +}
> +EXPORT_SYMBOL(kmsan_instrumentation_begin);
> diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
> new file mode 100644
> index 0000000000000..49ab06cde082a
> --- /dev/null
> +++ b/mm/kmsan/init.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN initialization routines.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include "kmsan.h"
> +
> +#include <asm/sections.h>
> +#include <linux/mm.h>
> +#include <linux/memblock.h>
> +
> +#define NUM_FUTURE_RANGES 128
> +struct start_end_pair {
> + u64 start, end;
> +};
> +
> +static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
> +static int future_index __initdata;
> +
> +/*
> + * Record a range of memory for which the metadata pages will be created once
> + * the page allocator becomes available.
> + */
> +static void __init kmsan_record_future_shadow_range(void *start, void *end)
> +{
> + u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
> + bool merged = false;
> + int i;
> +
> + KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
> + KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
> + nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
> + nend = ALIGN(nend, PAGE_SIZE);
> +
> + /*
> + * Scan the existing ranges to see if any of them overlaps with
> + * [start, end). In that case, merge the two ranges instead of
> + * creating a new one.
> + * The number of ranges is less than 20, so there is no need to organize
> + * them into a more intelligent data structure.
> + */
> + for (i = 0; i < future_index; i++) {
> + cstart = start_end_pairs[i].start;
> + cend = start_end_pairs[i].end;
> + if ((cstart < nstart && cend < nstart) ||
> + (cstart > nend && cend > nend))
> + /* ranges are disjoint - do not merge */
> + continue;
> + start_end_pairs[i].start = min(nstart, cstart);
> + start_end_pairs[i].end = max(nend, cend);
> + merged = true;
> + break;
> + }
> + if (merged)
> + return;
> + start_end_pairs[future_index].start = nstart;
> + start_end_pairs[future_index].end = nend;
> + future_index++;
> +}
> +
> +/*
> + * Initialize the shadow for existing mappings during kernel initialization.
> + * These include kernel text/data sections, NODE_DATA and future ranges
> + * registered while creating other data (e.g. percpu).
> + *
> + * Allocations via memblock can be only done before slab is initialized.
> + */
> +void __init kmsan_init_shadow(void)
> +{
> + const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
> + phys_addr_t p_start, p_end;
> + int nid;
> + u64 i;
> +
> + for_each_reserved_mem_range(i, &p_start, &p_end)
> + kmsan_record_future_shadow_range(phys_to_virt(p_start),
> + phys_to_virt(p_end));
> + /* Allocate shadow for .data */
> + kmsan_record_future_shadow_range(_sdata, _edata);
> +
> + for_each_online_node(nid)
> + kmsan_record_future_shadow_range(
> + NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
> +
> + for (i = 0; i < future_index; i++)
> + kmsan_init_alloc_meta_for_range(
> + (void *)start_end_pairs[i].start,
> + (void *)start_end_pairs[i].end);
> +}
> +EXPORT_SYMBOL(kmsan_init_shadow);
> +
> +struct page_pair {
> + struct page *shadow, *origin;
> +};
> +static struct page_pair held_back[MAX_ORDER] __initdata;
> +
> +/*
> + * Eager metadata allocation. When the memblock allocator is freeing pages to
> + * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
> + * We store the pointers to the returned blocks of pages in held_back[] grouped
> + * by their order: when kmsan_memblock_free_pages() is called for the first
> + * time with a certain order, it is reserved as a shadow block, for the second
> + * time - as an origin block. On the third time the incoming block receives its
> + * shadow and origin ranges from the previously saved shadow and origin blocks,
> + * after which held_back[order] can be used again.
> + *
> + * At the very end there may be leftover blocks in held_back[]. They are
> + * collected later by kmsan_memblock_discard().
> + */
> +bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
> +{
> + struct page *shadow, *origin;
> +
> + if (!held_back[order].shadow) {
> + held_back[order].shadow = page;
> + return false;
> + }
> + if (!held_back[order].origin) {
> + held_back[order].origin = page;
> + return false;
> + }
> + shadow = held_back[order].shadow;
> + origin = held_back[order].origin;
> + kmsan_setup_meta(page, shadow, origin, order);
> +
> + held_back[order].shadow = NULL;
> + held_back[order].origin = NULL;
> + return true;
> +}
> +
> +#define MAX_BLOCKS 8
> +struct smallstack {
> + struct page *items[MAX_BLOCKS];
> + int index;
> + int order;
> +};
> +
> +struct smallstack collect = {
> + .index = 0,
> + .order = MAX_ORDER,
> +};
> +
> +static void smallstack_push(struct smallstack *stack, struct page *pages)
> +{
> + KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
> + stack->items[stack->index] = pages;
> + stack->index++;
> +}
> +#undef MAX_BLOCKS
> +
> +static struct page *smallstack_pop(struct smallstack *stack)
> +{
> + struct page *ret;
> +
> + KMSAN_WARN_ON(stack->index == 0);
> + stack->index--;
> + ret = stack->items[stack->index];
> + stack->items[stack->index] = NULL;
> + return ret;
> +}
> +
> +static void do_collection(void)
> +{
> + struct page *page, *shadow, *origin;
> +
> + while (collect.index >= 3) {
> + page = smallstack_pop(&collect);
> + shadow = smallstack_pop(&collect);
> + origin = smallstack_pop(&collect);
> + kmsan_setup_meta(page, shadow, origin, collect.order);
> + __free_pages_core(page, collect.order);
> + }
> +}
> +
> +static void collect_split(void)
> +{
> + struct smallstack tmp = {
> + .order = collect.order - 1,
> + .index = 0,
> + };
> + struct page *page;
> +
> + if (!collect.order)
> + return;
> + while (collect.index) {
> + page = smallstack_pop(&collect);
> + smallstack_push(&tmp, &page[0]);
> + smallstack_push(&tmp, &page[1 << tmp.order]);
> + }
> + __memcpy(&collect, &tmp, sizeof(struct smallstack));
> +}
> +
> +/*
> + * Memblock is about to go away. Split the page blocks left over in held_back[]
> + * and return 1/3 of that memory to the system.
> + */
> +static void kmsan_memblock_discard(void)
> +{
> + int i;
> +
> + /*
> + * For each order=N:
> + * - push held_back[N].shadow and .origin to |collect|;
> + * - while there are >= 3 elements in |collect|, do garbage collection:
> + * - pop 3 ranges from |collect|;
> + * - use two of them as shadow and origin for the third one;
> + * - repeat;
> + * - split each remaining element from |collect| into 2 ranges of
> + * order=N-1,
> + * - repeat.
> + */
> + collect.order = MAX_ORDER - 1;
> + for (i = MAX_ORDER - 1; i >= 0; i--) {
> + if (held_back[i].shadow)
> + smallstack_push(&collect, held_back[i].shadow);
> + if (held_back[i].origin)
> + smallstack_push(&collect, held_back[i].origin);
> + held_back[i].shadow = NULL;
> + held_back[i].origin = NULL;
> + do_collection();
> + collect_split();
> + }
> +}
> +
> +void __init kmsan_init_runtime(void)
> +{
> + /* Assuming current is init_task */
> + kmsan_internal_task_create(current);
> + kmsan_memblock_discard();
> + pr_info("vmalloc area at: %px\n", VMALLOC_START);
> + pr_info("Starting KernelMemorySanitizer\n");
> + kmsan_enabled = true;
> +}
> +EXPORT_SYMBOL(kmsan_init_runtime);
> diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
> new file mode 100644
> index 0000000000000..1eb2d64aa39a6
> --- /dev/null
> +++ b/mm/kmsan/instrumentation.c
> @@ -0,0 +1,233 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN compiler API.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include "kmsan.h"
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/uaccess.h>
> +
> +static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
> +{
> + if ((u64)addr < TASK_SIZE)
> + return true;
> + if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
> + return true;
> + return false;
> +}
> +
> +static inline struct shadow_origin_ptr
> +get_shadow_origin_ptr(void *addr, u64 size, bool store)
> +{
> + unsigned long ua_flags = user_access_save();
> + struct shadow_origin_ptr ret;
> +
> + ret = kmsan_get_shadow_origin_ptr(addr, size, store);
> + user_access_restore(ua_flags);
> + return ret;
> +}
> +
> +struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
> + uintptr_t size)
> +{
> + return get_shadow_origin_ptr(addr, size, /*store*/ false);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
> +
> +struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
> + uintptr_t size)
> +{
> + return get_shadow_origin_ptr(addr, size, /*store*/ true);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
> +
> +#define DECLARE_METADATA_PTR_GETTER(size) \
> + struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size( \
> + void *addr) \
> + { \
> + return get_shadow_origin_ptr(addr, size, /*store*/ false); \
> + } \
> + EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size); \
> + struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size( \
> + void *addr) \
> + { \
> + return get_shadow_origin_ptr(addr, size, /*store*/ true); \
> + } \
> + EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
> +
> +DECLARE_METADATA_PTR_GETTER(1);
> +DECLARE_METADATA_PTR_GETTER(2);
> +DECLARE_METADATA_PTR_GETTER(4);
> +DECLARE_METADATA_PTR_GETTER(8);
> +
> +void __msan_instrument_asm_store(void *addr, uintptr_t size)
> +{
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + /*
> + * Most of the accesses are below 32 bytes. The two exceptions so far
> + * are clwb() (64 bytes) and FPU state (512 bytes).
> + * It's unlikely that the assembly will touch more than 512 bytes.
> + */
> + if (size > 512) {
> + WARN_ONCE(1, "assembly store size too big: %d\n", size);
> + size = 8;
> + }
> + if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
> + user_access_restore(ua_flags);
> + return;
> + }
> + kmsan_enter_runtime();
> + /* Unpoisoning the memory on best effort. */
> + kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_instrument_asm_store);
> +
> +void *__msan_memmove(void *dst, const void *src, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memmove(dst, src, n);
> + if (!n)
> + /* Some people call memmove() with zero length. */
> + return result;
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memmove);
> +
> +void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memcpy(dst, src, n);
> + if (!n)
> + /* Some people call memcpy() with zero length. */
> + return result;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + /* Using memmove instead of memcpy doesn't affect correctness. */
> + kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memcpy);
> +
> +void *__msan_memset(void *dst, int c, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memset(dst, c, n);
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + kmsan_enter_runtime();
> + /*
> + * Clang doesn't pass parameter metadata here, so it is impossible to
> + * use shadow of @c to set up the shadow for @dst.
> + */
> + kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
> + kmsan_leave_runtime();
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memset);
> +
> +depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
> +{
> + depot_stack_handle_t ret = 0;
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return ret;
> +
> + ua_flags = user_access_save();
> +
> + /* Creating new origins may allocate memory. */
> + kmsan_enter_runtime();
> + ret = kmsan_internal_chain_origin(origin);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> + return ret;
> +}
> +EXPORT_SYMBOL(__msan_chain_origin);
> +
> +void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
> +{
> + depot_stack_handle_t handle;
> + unsigned long entries[4];
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
> + entries[1] = (u64)descr;
> + entries[2] = (u64)__builtin_return_address(0);
> + /*
> + * With frame pointers enabled, it is possible to quickly fetch the
> + * second frame of the caller stack without calling the unwinder.
> + * Without them, simply do not bother.
> + */
> + if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
> + entries[3] = (u64)__builtin_return_address(1);
> + else
> + entries[3] = 0;
> +
> + /* stack_depot_save() may allocate memory. */
> + kmsan_enter_runtime();
> + handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
> + kmsan_leave_runtime();
> +
> + kmsan_internal_set_shadow_origin(address, size, -1, handle,
> + /*checked*/ true);
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_poison_alloca);
> +
> +void __msan_unpoison_alloca(void *address, uintptr_t size)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + kmsan_enter_runtime();
> + kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_unpoison_alloca);
> +
> +void __msan_warning(u32 origin)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + kmsan_report(origin, /*address*/ 0, /*size*/ 0,
> + /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
> + REASON_ANY);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_warning);
> +
> +struct kmsan_context_state *__msan_get_context_state(void)
> +{
> + return &kmsan_get_context()->cstate;
> +}
> +EXPORT_SYMBOL(__msan_get_context_state);
> diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
> new file mode 100644
> index 0000000000000..29c91b6e28799
> --- /dev/null
> +++ b/mm/kmsan/kmsan.h
> @@ -0,0 +1,197 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Functions used by the KMSAN runtime.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#ifndef __MM_KMSAN_KMSAN_H
> +#define __MM_KMSAN_KMSAN_H
> +
> +#include <asm/pgtable_64_types.h>
> +#include <linux/irqflags.h>
> +#include <linux/sched.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/nmi.h>
> +#include <linux/mm.h>
> +#include <linux/printk.h>
> +
> +#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
> +#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
> +
> +#define KMSAN_POISON_NOCHECK 0x0
> +#define KMSAN_POISON_CHECK 0x1
> +#define KMSAN_POISON_FREE 0x2
> +
> +#define KMSAN_ORIGIN_SIZE 4
> +
> +#define KMSAN_STACK_DEPTH 64
> +
> +#define KMSAN_META_SHADOW (false)
> +#define KMSAN_META_ORIGIN (true)
> +
> +extern bool kmsan_enabled;
> +extern int panic_on_kmsan;
> +
> +/*
> + * KMSAN performs a lot of consistency checks that are currently enabled by
> + * default. BUG_ON is normally discouraged in the kernel, unless used for
> + * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
> + * recover if something goes wrong.
> + */
> +#define KMSAN_WARN_ON(cond) \
> + ({ \
> + const bool __cond = WARN_ON(cond); \
> + if (unlikely(__cond)) { \
> + WRITE_ONCE(kmsan_enabled, false); \
> + if (panic_on_kmsan) { \
> + /* Can't call panic() here because */ \
> + /* of uaccess checks.*/ \
> + BUG(); \
> + } \
> + } \
> + __cond; \
> + })
> +
> +/*
> + * A pair of metadata pointers to be returned by the instrumentation functions.
> + */
> +struct shadow_origin_ptr {
> + void *shadow, *origin;
> +};
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
> + bool store);
> +void *kmsan_get_metadata(void *addr, bool is_origin);
> +void __init kmsan_init_alloc_meta_for_range(void *start, void *end);
> +
> +enum kmsan_bug_reason {
> + REASON_ANY,
> + REASON_COPY_TO_USER,
> + REASON_SUBMIT_URB,
> +};
> +
> +void kmsan_print_origin(depot_stack_handle_t origin);
> +
> +/**
> + * kmsan_report() - Report a use of uninitialized value.
> + * @origin: Stack ID of the uninitialized value.
> + * @address: Address at which the memory access happens.
> + * @size: Memory access size.
> + * @off_first: Offset (from @address) of the first byte to be reported.
> + * @off_last: Offset (from @address) of the last byte to be reported.
> + * @user_addr: When non-NULL, denotes the userspace address to which the kernel
> + * is leaking data.
> + * @reason: Error type from enum kmsan_bug_reason.
> + *
> + * kmsan_report() prints an error message for a consequent group of bytes
> + * sharing the same origin. If an uninitialized value is used in a comparison,
> + * this function is called once without specifying the addresses. When checking
> + * a memory range, KMSAN may call kmsan_report() multiple times with the same
> + * @address, @size, @user_addr and @reason, but different @off_first and
> + * @off_last corresponding to different @origin values.
> + */
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> + int off_first, int off_last, const void *user_addr,
> + enum kmsan_bug_reason reason);
> +
> +DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +static __always_inline struct kmsan_ctx *kmsan_get_context(void)
> +{
> + return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
> +}
> +
> +/*
> + * When a compiler hook is invoked, it may make a call to instrumented code
> + * and eventually call itself recursively. To avoid that, we protect the
> + * runtime entry points with kmsan_enter_runtime()/kmsan_leave_runtime() and
> + * exit the hook if kmsan_in_runtime() is true.
> + */
> +
> +static __always_inline bool kmsan_in_runtime(void)
> +{
> + if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> + return true;
> + return kmsan_get_context()->kmsan_in_runtime;
> +}
> +
> +static __always_inline void kmsan_enter_runtime(void)
> +{
> + struct kmsan_ctx *ctx;
> +
> + ctx = kmsan_get_context();
> + KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
> +}
> +
> +static __always_inline void kmsan_leave_runtime(void)
> +{
> + struct kmsan_ctx *ctx = kmsan_get_context();
> +
> + KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack(void);
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> + unsigned int extra_bits);
> +
> +/*
> + * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
> + * provided by the stack depot.
> + * The UAF flag is stored in the lowest bit, followed by the depth in the upper
> + * bits.
> + * set_dsh_extra_bits() is responsible for clamping the value.
> + */
> +static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
> + bool uaf)
> +{
> + return (depth << 1) | uaf;
> +}
> +
> +static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
> +{
> + return extra_bits & 1;
> +}
> +
> +static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
> +{
> + return extra_bits >> 1;
> +}
> +
> +/*
> + * kmsan_internal_ functions are supposed to be very simple and not require the
> + * kmsan_in_runtime() checks.
> + */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> + unsigned int poison_flags);
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
> +void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
> + u32 origin, bool checked);
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
> +
> +void kmsan_internal_task_create(struct task_struct *task);
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size);
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> + int reason);
> +bool kmsan_internal_is_module_addr(void *vaddr);
> +bool kmsan_internal_is_vmalloc_addr(void *addr);
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
> +void kmsan_setup_meta(struct page *page, struct page *shadow,
> + struct page *origin, int order);
> +
> +/* Declared in mm/vmalloc.c */
> +void __vunmap_range_noflush(unsigned long start, unsigned long end);
> +int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
> + pgprot_t prot, struct page **pages,
> + unsigned int page_shift);
> +
> +/* Declared in mm/internal.h */
> +void __free_pages_core(struct page *page, unsigned int order);
> +
> +#endif /* __MM_KMSAN_KMSAN_H */
> diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
> new file mode 100644
> index 0000000000000..d539fe1129fb9
> --- /dev/null
> +++ b/mm/kmsan/report.c
> @@ -0,0 +1,210 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN error reporting routines.
> + *
> + * Copyright (C) 2019-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <linux/console.h>
> +#include <linux/moduleparam.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/uaccess.h>
> +
> +#include "kmsan.h"
> +
> +static DEFINE_SPINLOCK(kmsan_report_lock);
> +#define DESCR_SIZE 128
> +/* Protected by kmsan_report_lock */
> +static char report_local_descr[DESCR_SIZE];
> +int panic_on_kmsan __read_mostly;
> +
> +#ifdef MODULE_PARAM_PREFIX
> +#undef MODULE_PARAM_PREFIX
> +#endif
> +#define MODULE_PARAM_PREFIX "kmsan."
> +module_param_named(panic, panic_on_kmsan, int, 0);
> +
> +/*
> + * Skip internal KMSAN frames.
> + */
> +static int get_stack_skipnr(const unsigned long stack_entries[],
> + int num_entries)
> +{
> + int len, skip;
> + char buf[64];
> +
> + for (skip = 0; skip < num_entries; ++skip) {
> + len = scnprintf(buf, sizeof(buf), "%ps",
> + (void *)stack_entries[skip]);
> +
> + /* Never show __msan_* or kmsan_* functions. */
> + if ((strnstr(buf, "__msan_", len) == buf) ||
> + (strnstr(buf, "kmsan_", len) == buf))
> + continue;
> +
> + /*
> + * No match for runtime functions -- @skip entries to skip to
> + * get to first frame of interest.
> + */
> + break;
> + }
> +
> + return skip;
> +}
> +
> +/*
> + * Currently the descriptions of locals generated by Clang look as follows:
> + * ----local_name@function_name
> + * We want to print only the name of the local, as other information in that
> + * description can be confusing.
> + * The meaningful part of the description is copied to a global buffer to avoid
> + * allocating memory.
> + */
> +static char *pretty_descr(char *descr)
> +{
> + int i, pos = 0, len = strlen(descr);
> +
> + for (i = 0; i < len; i++) {
> + if (descr[i] == '@')
> + break;
> + if (descr[i] == '-')
> + continue;
> + report_local_descr[pos] = descr[i];
> + if (pos + 1 == DESCR_SIZE)
> + break;
> + pos++;
> + }
> + report_local_descr[pos] = 0;
> + return report_local_descr;
> +}
> +
> +void kmsan_print_origin(depot_stack_handle_t origin)
> +{
> + unsigned long *entries = NULL, *chained_entries = NULL;
> + unsigned int nr_entries, chained_nr_entries, skipnr;
> + void *pc1 = NULL, *pc2 = NULL;
> + depot_stack_handle_t head;
> + unsigned long magic;
> + char *descr = NULL;
> +
> + if (!origin)
> + return;
> +
> + while (true) {
> + nr_entries = stack_depot_fetch(origin, &entries);
> + magic = nr_entries ? entries[0] : 0;
> + if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
> + descr = (char *)entries[1];
> + pc1 = (void *)entries[2];
> + pc2 = (void *)entries[3];
> + pr_err("Local variable %s created at:\n",
> + pretty_descr(descr));
> + if (pc1)
> + pr_err(" %pS\n", pc1);
> + if (pc2)
> + pr_err(" %pS\n", pc2);
> + break;
> + }
> + if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
> + head = entries[1];
> + origin = entries[2];
> + pr_err("Uninit was stored to memory at:\n");
> + chained_nr_entries =
> + stack_depot_fetch(head, &chained_entries);
> + kmsan_internal_unpoison_memory(
> + chained_entries,
> + chained_nr_entries * sizeof(*chained_entries),
> + /*checked*/ false);
> + skipnr = get_stack_skipnr(chained_entries,
> + chained_nr_entries);
> + stack_trace_print(chained_entries + skipnr,
> + chained_nr_entries - skipnr, 0);
> + pr_err("\n");
> + continue;
> + }
> + pr_err("Uninit was created at:\n");
> + if (nr_entries) {
> + skipnr = get_stack_skipnr(entries, nr_entries);
> + stack_trace_print(entries + skipnr, nr_entries - skipnr,
> + 0);
> + } else {
> + pr_err("(stack is not available)\n");
> + }
> + break;
> + }
> +}
> +
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> + int off_first, int off_last, const void *user_addr,
> + enum kmsan_bug_reason reason)
> +{
> + unsigned long stack_entries[KMSAN_STACK_DEPTH];
> + int num_stack_entries, skipnr;
> + char *bug_type = NULL;
> + unsigned long flags, ua_flags;
> + bool is_uaf;
> +
> + if (!kmsan_enabled)
> + return;
> + if (!current->kmsan_ctx.allow_reporting)
> + return;
> + if (!origin)
> + return;
> +
> + current->kmsan_ctx.allow_reporting = false;
> + ua_flags = user_access_save();
> + spin_lock_irqsave(&kmsan_report_lock, flags);
> + pr_err("=====================================================\n");
> + is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
> + switch (reason) {
> + case REASON_ANY:
> + bug_type = is_uaf ? "use-after-free" : "uninit-value";
> + break;
> + case REASON_COPY_TO_USER:
> + bug_type = is_uaf ? "kernel-infoleak-after-free" :
> + "kernel-infoleak";
> + break;
> + case REASON_SUBMIT_URB:
> + bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
> + "kernel-usb-infoleak";
> + break;
> + }
> +
> + num_stack_entries =
> + stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
> + skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +
> + pr_err("BUG: KMSAN: %s in %pS\n", bug_type, stack_entries[skipnr]);
> + stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> + 0);
> + pr_err("\n");
> +
> + kmsan_print_origin(origin);
> +
> + if (size) {
> + pr_err("\n");
> + if (off_first == off_last)
> + pr_err("Byte %d of %d is uninitialized\n", off_first,
> + size);
> + else
> + pr_err("Bytes %d-%d of %d are uninitialized\n",
> + off_first, off_last, size);
> + }
> + if (address)
> + pr_err("Memory access of size %d starts at %px\n", size,
> + address);
> + if (user_addr && reason == REASON_COPY_TO_USER)
> + pr_err("Data copied to user address %px\n", user_addr);
> + pr_err("\n");
> + dump_stack_print_info(KERN_ERR);
> + pr_err("=====================================================\n");
> + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> + spin_unlock_irqrestore(&kmsan_report_lock, flags);
> + if (panic_on_kmsan)
> + panic("kmsan.panic set ...\n");
> + user_access_restore(ua_flags);
> + current->kmsan_ctx.allow_reporting = true;
> +}
> diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
> new file mode 100644
> index 0000000000000..c71b0ce19ea6d
> --- /dev/null
> +++ b/mm/kmsan/shadow.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN shadow implementation.
> + *
> + * Copyright (C) 2017-2021 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <asm/page.h>
> +#include <asm/pgtable_64_types.h>
> +#include <asm/tlbflush.h>
> +#include <linux/cacheflush.h>
> +#include <linux/memblock.h>
> +#include <linux/mm_types.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/stddef.h>
> +
> +#include "kmsan.h"
> +
> +#define shadow_page_for(page) ((page)->kmsan_shadow)
> +
> +#define origin_page_for(page) ((page)->kmsan_origin)
> +
> +static void *shadow_ptr_for(struct page *page)
> +{
> + return page_address(shadow_page_for(page));
> +}
> +
> +static void *origin_ptr_for(struct page *page)
> +{
> + return page_address(origin_page_for(page));
> +}
> +
> +static bool page_has_metadata(struct page *page)
> +{
> + return shadow_page_for(page) && origin_page_for(page);
> +}
> +
> +static void set_no_shadow_origin_page(struct page *page)
> +{
> + shadow_page_for(page) = NULL;
> + origin_page_for(page) = NULL;
> +}
> +
> +/*
> + * Dummy load and store pages to be used when the real metadata is unavailable.
> + * There are separate pages for loads and stores, so that every load returns a
> + * zero, and every store doesn't affect other loads.
> + */
> +static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +/*
> + * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
> + */
> +static int kmsan_phys_addr_valid(unsigned long addr)
> +{
> + if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
> + return !(addr >> boot_cpu_data.x86_phys_bits);
> + else
> + return 1;
> +}
> +
> +/*
> + * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
> + */
> +static bool kmsan_virt_addr_valid(void *addr)
> +{
> + unsigned long x = (unsigned long)addr;
> + unsigned long y = x - __START_KERNEL_map;
> +
> + /* use the carry flag to determine if x was < __START_KERNEL_map */
> + if (unlikely(x > y)) {
> + x = y + phys_base;
> +
> + if (y >= KERNEL_IMAGE_SIZE)
> + return false;
> + } else {
> + x = y + (__START_KERNEL_map - PAGE_OFFSET);
> +
> + /* carry flag will be set if starting x was >= PAGE_OFFSET */
> + if ((x > y) || !kmsan_phys_addr_valid(x))
> + return false;
> + }
> +
> + return pfn_valid(x >> PAGE_SHIFT);
> +}
> +
> +static unsigned long vmalloc_meta(void *addr, bool is_origin)
> +{
> + unsigned long addr64 = (unsigned long)addr, off;
> +
> + KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
> + if (kmsan_internal_is_vmalloc_addr(addr)) {
> + off = addr64 - VMALLOC_START;
> + return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
> + KMSAN_VMALLOC_SHADOW_START);
> + }
> + if (kmsan_internal_is_module_addr(addr)) {
> + off = addr64 - MODULES_VADDR;
> + return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
> + KMSAN_MODULES_SHADOW_START);
> + }
> + return 0;
> +}
> +
> +static struct page *virt_to_page_or_null(void *vaddr)
> +{
> + if (kmsan_virt_addr_valid(vaddr))
> + return virt_to_page(vaddr);
> + else
> + return NULL;
> +}
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
> + bool store)
> +{
> + struct shadow_origin_ptr ret;
> + void *shadow;
> +
> + /*
> + * Even if we redirect this memory access to the dummy page, it will
> + * go out of bounds.
> + */
> + KMSAN_WARN_ON(size > PAGE_SIZE);
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + goto return_dummy;
> +
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
> + shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
> + if (!shadow)
> + goto return_dummy;
> +
> + ret.shadow = shadow;
> + ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
> + return ret;
> +
> +return_dummy:
> + if (store) {
> + /* Ignore this store. */
> + ret.shadow = dummy_store_page;
> + ret.origin = dummy_store_page;
> + } else {
> + /* This load will return zero. */
> + ret.shadow = dummy_load_page;
> + ret.origin = dummy_load_page;
> + }
> + return ret;
> +}
> +
> +/*
> + * Obtain the shadow or origin pointer for the given address, or NULL if there's
> + * none. The caller must check the return value for being non-NULL if needed.
> + * The return value of this function should not depend on whether we're in the
> + * runtime or not.
> + */
> +void *kmsan_get_metadata(void *address, bool is_origin)
> +{
> + u64 addr = (u64)address, pad, off;
> + struct page *page;
> + void *ret;
> +
> + if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
> + pad = addr % KMSAN_ORIGIN_SIZE;
> + addr -= pad;
> + }
> + address = (void *)addr;
> + if (kmsan_internal_is_vmalloc_addr(address) ||
> + kmsan_internal_is_module_addr(address))
> + return (void *)vmalloc_meta(address, is_origin);
> +
> + page = virt_to_page_or_null(address);
> + if (!page)
> + return NULL;
> + if (!page_has_metadata(page))
> + return NULL;
> + off = addr % PAGE_SIZE;
> +
> + ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
> + return ret;
> +}
> +
> +/* Allocate metadata for pages allocated at boot time. */
> +void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
> +{
> + struct page *shadow_p, *origin_p;
> + void *shadow, *origin;
> + struct page *page;
> + u64 addr, size;
> +
> + start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
> + size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
> + shadow = memblock_alloc(size, PAGE_SIZE);
> + origin = memblock_alloc(size, PAGE_SIZE);
> + for (addr = 0; addr < size; addr += PAGE_SIZE) {
> + page = virt_to_page_or_null((char *)start + addr);
> + shadow_p = virt_to_page_or_null((char *)shadow + addr);
> + set_no_shadow_origin_page(shadow_p);
> + shadow_page_for(page) = shadow_p;
> + origin_p = virt_to_page_or_null((char *)origin + addr);
> + set_no_shadow_origin_page(origin_p);
> + origin_page_for(page) = origin_p;
> + }
> +}
> +
> +/* Called from mm/memory.c */
> +void kmsan_copy_page_meta(struct page *dst, struct page *src)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + if (!dst || !page_has_metadata(dst))
> + return;
> + if (!src || !page_has_metadata(src)) {
> + kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
> + /*checked*/ false);
> + return;
> + }
> +
> + kmsan_enter_runtime();
> + __memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
> + __memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
> + kmsan_leave_runtime();
> +}
> +
> +/* Called from mm/page_alloc.c */
> +void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
> +{
> + bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
> + struct page *shadow, *origin;
> + depot_stack_handle_t handle;
> + int pages = 1 << order;
> + int i;
> +
> + if (!page)
> + return;
> +
> + shadow = shadow_page_for(page);
> + origin = origin_page_for(page);
> +
> + if (initialized) {
> + __memset(page_address(shadow), 0, PAGE_SIZE * pages);
> + __memset(page_address(origin), 0, PAGE_SIZE * pages);
> + return;
> + }
> +
> + /* Zero pages allocated by the runtime should also be initialized. */
> + if (kmsan_in_runtime())
> + return;
> +
> + __memset(page_address(shadow), -1, PAGE_SIZE * pages);
> + kmsan_enter_runtime();
> + handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
> + kmsan_leave_runtime();
> + /*
> + * Addresses are page-aligned, pages are contiguous, so it's ok
> + * to just fill the origin pages with |handle|.
> + */
> + for (i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
> + ((depot_stack_handle_t *)page_address(origin))[i] = handle;
> +}
> +
> +/* Called from mm/page_alloc.c */
> +void kmsan_free_page(struct page *page, unsigned int order)
> +{
> + return; // really nothing to do here. Could rewrite shadow instead.
> +}
> +
> +/* Called from mm/vmalloc.c */
> +void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
> + pgprot_t prot, struct page **pages,
> + unsigned int page_shift)
> +{
> + unsigned long shadow_start, origin_start, shadow_end, origin_end;
> + struct page **s_pages, **o_pages;
> + int nr, i, mapped;
> +
> + if (!kmsan_enabled)
> + return;
> +
> + shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
> + shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
> + if (!shadow_start)
> + return;
> +
> + nr = (end - start) / PAGE_SIZE;
> + s_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
> + o_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
> + if (!s_pages || !o_pages)
> + goto ret;
> + for (i = 0; i < nr; i++) {
> + s_pages[i] = shadow_page_for(pages[i]);
> + o_pages[i] = origin_page_for(pages[i]);
> + }
> + prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
> + prot = PAGE_KERNEL;
> +
> + origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
> + origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
> + kmsan_enter_runtime();
> + mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
> + s_pages, page_shift);
> + KMSAN_WARN_ON(mapped);
> + mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
> + o_pages, page_shift);
> + KMSAN_WARN_ON(mapped);
> + kmsan_leave_runtime();
> + flush_tlb_kernel_range(shadow_start, shadow_end);
> + flush_tlb_kernel_range(origin_start, origin_end);
> + flush_cache_vmap(shadow_start, shadow_end);
> + flush_cache_vmap(origin_start, origin_end);
> +
> +ret:
> + kfree(s_pages);
> + kfree(o_pages);
> +}
> +
> +void kmsan_setup_meta(struct page *page, struct page *shadow,
> + struct page *origin, int order)
> +{
> + int i;
> +
> + for (i = 0; i < (1 << order); i++) {
> + set_no_shadow_origin_page(&shadow[i]);
> + set_no_shadow_origin_page(&origin[i]);
> + shadow_page_for(&page[i]) = &shadow[i];
> + origin_page_for(&page[i]) = &origin[i];
> + }
> +}
> diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
> new file mode 100644
> index 0000000000000..9793591f9855c
> --- /dev/null
> +++ b/scripts/Makefile.kmsan
> @@ -0,0 +1 @@
> +export CFLAGS_KMSAN := -fsanitize=kernel-memory
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index d1f865b8c0cba..3a0dbcea51d01 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -162,6 +162,15 @@ _c_flags += $(if $(patsubst n%,, \
> endif
> endif
>
> +ifeq ($(CONFIG_KMSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
> + $(CFLAGS_KMSAN))
> +_c_flags += $(if $(patsubst n%,, \
> + $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
> + , -mllvm -msan-disable-checks=1)
> +endif
> +
> ifeq ($(CONFIG_UBSAN),y)
> _c_flags += $(if $(patsubst n%,, \
> $(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
> --
> 2.34.1.173.g76aa8bc2d0-goog
>

2022-01-06 12:46:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 26/43] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()

On Tue, Dec 14, 2021 at 05:20:33PM +0100, Alexander Potapenko wrote:
> If vring doesn't use the DMA API, KMSAN is unable to tell whether the
> memory is initialized by hardware. Explicitly call kmsan_handle_dma()
> from vring_map_one_sg() in this case to prevent false positives.
>
> Signed-off-by: Alexander Potapenko <[email protected]>

OK I guess

Acked-by: Michael S. Tsirkin <[email protected]>

IIUC this depends on the rest of the patchset, so feel free to
merge.

> ---
> Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
> ---
> drivers/virtio/virtio_ring.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 6d2614e34470f..bf4d5b331e99d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -11,6 +11,7 @@
> #include <linux/module.h>
> #include <linux/hrtimer.h>
> #include <linux/dma-mapping.h>
> +#include <linux/kmsan-checks.h>
> #include <linux/spinlock.h>
> #include <xen/xen.h>
>
> @@ -331,8 +332,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> struct scatterlist *sg,
> enum dma_data_direction direction)
> {
> - if (!vq->use_dma_api)
> + if (!vq->use_dma_api) {
> + /*
> + * If DMA is not used, KMSAN doesn't know that the scatterlist
> + * is initialized by the hardware. Explicitly check/unpoison it
> + * depending on the direction.
> + */
> + kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
> return (dma_addr_t)sg_phys(sg);
> + }
>
> /*
> * We can't use dma_map_sg, because we don't use scatterlists in
> --
> 2.34.1.173.g76aa8bc2d0-goog


2022-01-07 17:22:14

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH 16/43] kmsan: mm: call KMSAN hooks from SLUB code

On 12/14/21 17:20, Alexander Potapenko wrote:
> In order to report uninitialized memory coming from heap allocations
> KMSAN has to poison them unless they're created with __GFP_ZERO.
>
> It's handy that we need KMSAN hooks in the places where
> init_on_alloc/init_on_free initialization is performed.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
> ---
> mm/slab.h | 1 +
> mm/slub.c | 26 +++++++++++++++++++++++---
> 2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slab.h b/mm/slab.h
> index 56ad7eea3ddfb..6175a74047b47 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -521,6 +521,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
> memset(p[i], 0, s->object_size);
> kmemleak_alloc_recursive(p[i], s->object_size, 1,
> s->flags, flags);
> + kmsan_slab_alloc(s, p[i], flags);
> }
>
> memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
> diff --git a/mm/slub.c b/mm/slub.c
> index abe7db581d686..5a63486e52531 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -22,6 +22,7 @@
> #include <linux/proc_fs.h>
> #include <linux/seq_file.h>
> #include <linux/kasan.h>
> +#include <linux/kmsan.h>
> #include <linux/cpu.h>
> #include <linux/cpuset.h>
> #include <linux/mempolicy.h>
> @@ -346,10 +347,13 @@ static inline void *freelist_dereference(const struct kmem_cache *s,
> (unsigned long)ptr_addr);
> }
>
> +/*
> + * See the comment to get_freepointer_safe().
> + */

I did...

> static inline void *get_freepointer(struct kmem_cache *s, void *object)
> {
> object = kasan_reset_tag(object);
> - return freelist_dereference(s, object + s->offset);
> + return kmsan_init(freelist_dereference(s, object + s->offset));

... but I don't see why it applies to get_freepointer() too? What am I missing?

> }
>
> static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> @@ -357,18 +361,28 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> prefetchw(object + s->offset);
> }
>
> +/*
> + * When running under KMSAN, get_freepointer_safe() may return an uninitialized
> + * pointer value in the case the current thread loses the race for the next
> + * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
> + * slab_alloc_node() will fail, so the uninitialized value won't be used, but
> + * KMSAN will still check all arguments of cmpxchg because of imperfect
> + * handling of inline assembly.
> + * To work around this problem, use kmsan_init() to force initialize the
> + * return value of get_freepointer_safe().
> + */
> static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> {
> unsigned long freepointer_addr;
> void *p;
>
> if (!debug_pagealloc_enabled_static())
> - return get_freepointer(s, object);
> + return kmsan_init(get_freepointer(s, object));

So here kmsan_init() is done twice?

>
> object = kasan_reset_tag(object);
> freepointer_addr = (unsigned long)object + s->offset;
> copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
> - return freelist_ptr(s, p, freepointer_addr);
> + return kmsan_init(freelist_ptr(s, p, freepointer_addr));
> }
>
> static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)

2022-03-18 15:16:38

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 12/43] kcsan: clang: retire CONFIG_KCSAN_KCOV_BROKEN

On Wed, Dec 15, 2021 at 3:43 PM Mark Rutland <[email protected]> wrote:
>
> On Wed, Dec 15, 2021 at 02:39:43PM +0100, Marco Elver wrote:
> > On Wed, 15 Dec 2021 at 14:33, Mark Rutland <[email protected]> wrote:
> > >
> > > On Tue, Dec 14, 2021 at 05:20:19PM +0100, Alexander Potapenko wrote:
> > > > kcov used to be broken prior to Clang 11, but right now that version is
> > > > already the minimum required to build with KCSAN, that is why we don't
> > > > need KCSAN_KCOV_BROKEN anymore.
> > >
> > > Just to check, how is that requirement enforced?
> >
> > HAVE_KCSAN_COMPILER will only be true with Clang 11 or later, due to
> > no prior compiler having "-tsan-distinguish-volatile=1".
>
> I see -- could we add wording to that effect into the commit messge?

Will be done.

> > > I see the core Makefiles enforce 10.0.1+, but I couldn't spot an explicit
> > > version dependency in Kconfig.kcsan.
> > >
> > > Otherwise, this looks good to me!
> >
> > I think 5.17 will be Clang 11 only, so we could actually revert
> > ea91a1d45d19469001a4955583187b0d75915759:
> > https://lkml.kernel.org/r/Yao86FeC2ybOobLO@archlinux-ax161
> >
> > I should resend that to be added to the -kbuild tree.
>
> FWIW, that also works for me.
>
> Thanks,
> Mark.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-03-19 22:55:14

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 10/43] kmsan: pgtable: reduce vmalloc space

On Wed, Dec 15, 2021 at 2:36 PM Mark Rutland <[email protected]> wrote:
>
> On Tue, Dec 14, 2021 at 05:20:17PM +0100, Alexander Potapenko wrote:
> > KMSAN is going to use 3/4 of existing vmalloc space to hold the
> > metadata, therefore we lower VMALLOC_END to make sure vmalloc() doesn't
> > allocate past the first 1/4.
> >
> > Signed-off-by: Alexander Potapenko <[email protected]>
>
> It might be worth adding an 'x86: ' prefix to the commit title, since this
> specifically affects x86 headers.

Makes sense, will do!

2022-03-21 22:16:46

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

> > + KMSAN_WARN_ON(!src_slots || !dst_slots);
> > + KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
>
> The above 2 checks look equivalent.
Right, I'll drop the first one.

> > + KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
> > + (dst_slots - src_slots < -1));
> > + backwards = dst > src;
> > + i = backwards ? min(src_slots, dst_slots) - 1 : 0;
> > + iter = backwards ? -1 : 1;
> > +
> > + align_shadow_src =
> > + (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
> > + for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
> > + KMSAN_WARN_ON(i < 0);
> > + shadow = align_shadow_src[i];
> > + if (i == 0) {
> > + /*
> > + * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
> > + * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
> > + * of the first shadow slot.
> > + */
> > + skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
> > + shadow = (shadow << skip_bits) >> skip_bits;
>
> Is this correct?...
> For the first slot we want to ignore some of the first (low) bits. To
> ignore low bits we need to shift right and then left, no?

Yes, you are right, I forgot about the endianness. Will try to add
some tests for this case.

> > + }
> > + if (i == src_slots - 1) {
> > + /*
> > + * If |src + n| isn't aligned on
> > + * KMSAN_ORIGIN_SIZE, don't look at the last
> > + * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
> > + * last shadow slot.
> > + */
> > + skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
> > + shadow = (shadow >> skip_bits) << skip_bits;
>
> Same here.
Done
>


> This can be a bit shorted and w/o the temp var as:
>
> new_origin = kmsan_internal_chain_origin(old_origin);
> /*
> * kmsan_internal_chain_origin() may return
> * NULL, but we don't want to lose the previous
> * origin value.
> */
> if (!new_origin)
> new_origin = old_origin;

Done.

>
>
> > + }
> > + if (shadow)
> > + origin_dst[i] = new_origin;
>
> Are we sure that origin_dst is aligned here?
Yes, kmsan_get_metadata(..., KMSAN_META_ORIGIN) always returns aligned pointers.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-03-21 23:09:05

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 13/43] kmsan: add KMSAN runtime core

> >
> > Just to make sure I don't misunderstand - for example for "kmsan: mm:
> > call KMSAN hooks from SLUB code", would it be better to pull the code
> > in mm/kmsan/core.c implementing kmsan_slab_alloc() and
> > kmsan_slab_free() into that patch?
>
> Yes.
>
> > I thought maintainers would prefer to have patches to their code
> > separated from KMSAN code, but if it's not true, I can surely fix
> > that.
>
> As a maintainer, I want to know what the function call that you just
> added to my subsystem to call does. Wouldn't you? Put it all in the
> same patch.

Ok, will be done in v2, thanks!

> Think about submitting a patch series as telling a story. You need to
> show the progression forward of the feature so that everyone can
> understand what is going on. To just throw tiny snippets at us is
> impossible to follow along with what your goal is.
>
> You want reviewers to be able to easily see if the things you describe
> being done in the changelog actually are implemented in the diff.
> Dividing stuff up by files does not show that at all.
>
> thanks,
>
> greg k-h



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-03-25 19:14:27

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH 16/43] kmsan: mm: call KMSAN hooks from SLUB code

> > static inline void *get_freepointer(struct kmem_cache *s, void *object)
> > {
> > object = kasan_reset_tag(object);
> > - return freelist_dereference(s, object + s->offset);
> > + return kmsan_init(freelist_dereference(s, object + s->offset));
>
> ... but I don't see why it applies to get_freepointer() too? What am I missing?

Agreed, kmsan_init() is not needed here.

> > }
> >
> > static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> > @@ -357,18 +361,28 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
> > prefetchw(object + s->offset);
> > }
> >
> > +/*
> > + * When running under KMSAN, get_freepointer_safe() may return an uninitialized
> > + * pointer value in the case the current thread loses the race for the next
> > + * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
> > + * slab_alloc_node() will fail, so the uninitialized value won't be used, but
> > + * KMSAN will still check all arguments of cmpxchg because of imperfect
> > + * handling of inline assembly.
> > + * To work around this problem, use kmsan_init() to force initialize the
> > + * return value of get_freepointer_safe().
> > + */
> > static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
> > {
> > unsigned long freepointer_addr;
> > void *p;
> >
> > if (!debug_pagealloc_enabled_static())
> > - return get_freepointer(s, object);
> > + return kmsan_init(get_freepointer(s, object));
>
> So here kmsan_init() is done twice?

Yeah, removing it from get_freepointer() does not introduce new
errors. I'll fix this in v2.