2022-04-27 08:56:47

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 00/46] Add KernelMemorySanitizer infrastructure

KernelMemorySanitizer (KMSAN) is a detector of errors related to uses of
uninitialized memory. It relies on compile-time Clang instrumentation
(similar to MSan in the userspace [1]) and tracks the state of every bit
of kernel memory, being able to report an error if uninitialized value is
used in a condition, dereferenced, or escapes to userspace, USB or DMA.

KMSAN has reported more than 300 bugs in the past few years (recently
fixed bugs: [2]), most of them with the help of syzkaller. Such bugs
keep getting introduced into the kernel despite new compiler warnings and
other analyses (the 5.16 cycle already resulted in several KMSAN-reported
bugs, e.g. [3]). Mitigations like total stack and heap initialization are
unfortunately very far from being deployable.

The proposed patchset contains KMSAN runtime implementation together with
small changes to other subsystems needed to make KMSAN work.

The latter changes fall into several categories:

1. Changes and refactorings of existing code required to add KMSAN:
- [1/46] x86: add missing include to sparsemem.h
- [2/46] stackdepot: reserve 5 extra bits in depot_stack_handle_t
- [3/46] kasan: common: adapt to the new prototype of __stack_depot_save()
- [4/46] instrumented.h: allow instrumenting both sides of copy_from_user()
- [5/46] x86: asm: instrument usercopy in get_user() and __put_user_size()
- [6/46] asm-generic: instrument usercopy in cacheflush.h
- [11/46] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

2. KMSAN-related declarations in generic code, KMSAN runtime library,
docs and configs:
- [7/46] kmsan: add ReST documentation
- [8/46] kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
- [10/46] x86: kmsan: pgtable: reduce vmalloc space
- [12/46] kmsan: add KMSAN runtime core
- [15/46] MAINTAINERS: add entry for KMSAN
- [29/46] kmsan: add tests for KMSAN
- [36/46] objtool: kmsan: list KMSAN API functions as uaccess-safe
- [41/46] x86: kmsan: use __msan_ string functions where possible.
- [46/46] x86: kmsan: enable KMSAN builds for x86

3. Adding hooks from different subsystems to notify KMSAN about memory
state changes:
- [16/46] kmsan: mm: maintain KMSAN metadata for page operations
- [17/46] kmsan: mm: call KMSAN hooks from SLUB code
- [18/46] kmsan: handle task creation and exiting
- [19/46] kmsan: init: call KMSAN initialization routines
- [20/46] instrumented.h: add KMSAN support
- [22/46] kmsan: add iomap support
- [23/46] Input: libps2: mark data received in __ps2_command() as initialized
- [24/46] kmsan: dma: unpoison DMA mappings
- [40/46] x86: kmsan: handle open-coded assembly in lib/iomem.c
- [42/46] x86: kmsan: sync metadata pages on page fault

4. Changes that prevent false reports by explicitly initializing memory,
disabling optimized code that may trick KMSAN, selectively skipping
instrumentation:
- [13/46] kmsan: implement kmsan_init(), initialize READ_ONCE_NOCHECK()
- [14/46] kmsan: disable instrumentation of unsupported common kernel code
- [21/46] kmsan: unpoison @tlb in arch_tlb_gather_mmu()
- [25/46] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
- [26/46] kmsan: handle memory sent to/from USB
- [30/46] kmsan: disable strscpy() optimization under KMSAN
- [31/46] crypto: kmsan: disable accelerated configs under KMSAN
- [32/46] kmsan: disable physical page merging in biovec
- [33/46] kmsan: block: skip bio block merging logic for KMSAN
- [34/46] kmsan: kcov: unpoison area->list in kcov_remote_area_put()
- [35/46] security: kmsan: fix interoperability with auto-initialization
- [37/46] x86: kmsan: make READ_ONCE_TASK_STACK() return initialized values
- [38/46] x86: kmsan: disable instrumentation of unsupported code
- [39/46] x86: kmsan: skip shadow checks in __switch_to()
- [43/46] x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for KASAN/KMSAN
- [44/46] x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS

5. Noinstr handling:
- [9/46] kmsan: mark noinstr as __no_sanitize_memory
- [27/46] kmsan: instrumentation.h: add instrumentation_begin_with_regs()
- [28/46] kmsan: entry: handle register passing from uninstrumented code
- [45/46] x86: kmsan: handle register passing from uninstrumented code

This patchset allows one to boot and run a defconfig+KMSAN kernel on a
QEMU without known false positives. It however doesn't guarantee there
are no false positives in drivers of certain devices or less tested
subsystems, although KMSAN is actively tested on syzbot with a large
config.

The patchset was generated relative to Linux v5.18-rc4. The most
up-to-date KMSAN tree currently resides at
https://github.com/google/kmsan/.
One may find it handy to review these patches in Gerrit:
https://linux-review.googlesource.com/c/linux/kernel/git/torvalds/linux/+/12604/25

A huge thanks goes to the reviewers of the RFC patch series sent to LKML
in 2020
(https://lore.kernel.org/all/[email protected]/).

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://syzkaller.appspot.com/upstream/fixed?manager=ci-upstream-kmsan-gce
[3] https://lore.kernel.org/all/[email protected]/


Alexander Potapenko (45):
stackdepot: reserve 5 extra bits in depot_stack_handle_t
kasan: common: adapt to the new prototype of __stack_depot_save()
instrumented.h: allow instrumenting both sides of copy_from_user()
x86: asm: instrument usercopy in get_user() and __put_user_size()
asm-generic: instrument usercopy in cacheflush.h
kmsan: add ReST documentation
kmsan: introduce __no_sanitize_memory and __no_kmsan_checks
kmsan: mark noinstr as __no_sanitize_memory
x86: kmsan: pgtable: reduce vmalloc space
libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE
kmsan: add KMSAN runtime core
kmsan: implement kmsan_init(), initialize READ_ONCE_NOCHECK()
kmsan: disable instrumentation of unsupported common kernel code
MAINTAINERS: add entry for KMSAN
kmsan: mm: maintain KMSAN metadata for page operations
kmsan: mm: call KMSAN hooks from SLUB code
kmsan: handle task creation and exiting
kmsan: init: call KMSAN initialization routines
instrumented.h: add KMSAN support
kmsan: unpoison @tlb in arch_tlb_gather_mmu()
kmsan: add iomap support
Input: libps2: mark data received in __ps2_command() as initialized
kmsan: dma: unpoison DMA mappings
kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()
kmsan: handle memory sent to/from USB
kmsan: instrumentation.h: add instrumentation_begin_with_regs()
kmsan: entry: handle register passing from uninstrumented code
kmsan: add tests for KMSAN
kmsan: disable strscpy() optimization under KMSAN
crypto: kmsan: disable accelerated configs under KMSAN
kmsan: disable physical page merging in biovec
kmsan: block: skip bio block merging logic for KMSAN
kmsan: kcov: unpoison area->list in kcov_remote_area_put()
security: kmsan: fix interoperability with auto-initialization
objtool: kmsan: list KMSAN API functions as uaccess-safe
x86: kmsan: make READ_ONCE_TASK_STACK() return initialized values
x86: kmsan: disable instrumentation of unsupported code
x86: kmsan: skip shadow checks in __switch_to()
x86: kmsan: handle open-coded assembly in lib/iomem.c
x86: kmsan: use __msan_ string functions where possible.
x86: kmsan: sync metadata pages on page fault
x86: kasan: kmsan: support CONFIG_GENERIC_CSUM on x86, enable it for
KASAN/KMSAN
x86: fs: kmsan: disable CONFIG_DCACHE_WORD_ACCESS
x86: kmsan: handle register passing from uninstrumented code
x86: kmsan: enable KMSAN builds for x86

Dmitry Vyukov (1):
x86: add missing include to sparsemem.h

Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 414 ++++++++++++++++++
MAINTAINERS | 12 +
Makefile | 1 +
arch/x86/Kconfig | 9 +-
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/entry/common.c | 3 +-
arch/x86/entry/vdso/Makefile | 3 +
arch/x86/include/asm/checksum.h | 16 +-
arch/x86/include/asm/idtentry.h | 10 +-
arch/x86/include/asm/page_64.h | 13 +
arch/x86/include/asm/pgtable_64_types.h | 41 +-
arch/x86/include/asm/sparsemem.h | 2 +
arch/x86/include/asm/string_64.h | 23 +-
arch/x86/include/asm/uaccess.h | 7 +
arch/x86/include/asm/unwind.h | 23 +-
arch/x86/kernel/Makefile | 2 +
arch/x86/kernel/cpu/Makefile | 1 +
arch/x86/kernel/cpu/mce/core.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/nmi.c | 2 +-
arch/x86/kernel/process_64.c | 1 +
arch/x86/kernel/sev.c | 4 +-
arch/x86/kernel/traps.c | 14 +-
arch/x86/lib/Makefile | 2 +
arch/x86/lib/iomem.c | 5 +
arch/x86/mm/Makefile | 2 +
arch/x86/mm/fault.c | 25 +-
arch/x86/mm/init_64.c | 2 +-
arch/x86/mm/ioremap.c | 3 +
arch/x86/realmode/rm/Makefile | 1 +
block/bio.c | 2 +
block/blk.h | 7 +
crypto/Kconfig | 30 ++
drivers/firmware/efi/libstub/Makefile | 1 +
drivers/input/serio/libps2.c | 5 +-
drivers/net/Kconfig | 1 +
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
drivers/usb/core/urb.c | 2 +
drivers/virtio/virtio_ring.c | 10 +-
include/asm-generic/cacheflush.h | 9 +-
include/asm-generic/rwonce.h | 5 +-
include/linux/compiler-clang.h | 23 +
include/linux/compiler-gcc.h | 6 +
include/linux/compiler_types.h | 3 +-
include/linux/fortify-string.h | 2 +
include/linux/highmem.h | 3 +
include/linux/instrumentation.h | 6 +
include/linux/instrumented.h | 26 +-
include/linux/kmsan-checks.h | 123 ++++++
include/linux/kmsan.h | 359 ++++++++++++++++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
include/linux/stackdepot.h | 8 +
include/linux/uaccess.h | 19 +-
init/main.c | 3 +
kernel/Makefile | 1 +
kernel/dma/mapping.c | 9 +-
kernel/entry/common.c | 22 +-
kernel/exit.c | 2 +
kernel/fork.c | 2 +
kernel/kcov.c | 7 +
kernel/locking/Makefile | 3 +-
lib/Kconfig.debug | 1 +
lib/Kconfig.kmsan | 39 ++
lib/Makefile | 3 +
lib/iomap.c | 40 ++
lib/iov_iter.c | 9 +-
lib/stackdepot.c | 29 +-
lib/string.c | 8 +
lib/usercopy.c | 3 +-
mm/Makefile | 1 +
mm/internal.h | 6 +
mm/kasan/common.c | 2 +-
mm/kmsan/Makefile | 26 ++
mm/kmsan/annotations.c | 28 ++
mm/kmsan/core.c | 468 +++++++++++++++++++++
mm/kmsan/hooks.c | 384 +++++++++++++++++
mm/kmsan/init.c | 240 +++++++++++
mm/kmsan/instrumentation.c | 267 ++++++++++++
mm/kmsan/kmsan.h | 188 +++++++++
mm/kmsan/kmsan_test.c | 536 ++++++++++++++++++++++++
mm/kmsan/report.c | 211 ++++++++++
mm/kmsan/shadow.c | 336 +++++++++++++++
mm/memory.c | 2 +
mm/mmu_gather.c | 10 +
mm/page_alloc.c | 18 +
mm/slab.h | 1 +
mm/slub.c | 21 +-
mm/vmalloc.c | 20 +-
scripts/Makefile.kmsan | 1 +
scripts/Makefile.lib | 9 +
security/Kconfig.hardening | 4 +
tools/objtool/check.c | 19 +
96 files changed, 4211 insertions(+), 87 deletions(-)
create mode 100644 Documentation/dev-tools/kmsan.rst
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/annotations.c
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/init.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/kmsan_test.c
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

--
2.36.0.rc2.479.g8af0fa9b8e-goog


2022-04-27 08:57:01

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 23/46] Input: libps2: mark data received in __ps2_command() as initialized

KMSAN does not know that the device initializes certain bytes in
ps2dev->cmdbuf. Call kmsan_unpoison_memory() to explicitly mark them as
initialized.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I2d26f6baa45271d37320d3f4a528c39cb7e545f0
---
drivers/input/serio/libps2.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/input/serio/libps2.c b/drivers/input/serio/libps2.c
index 250e213cc80c6..3e19344eda93c 100644
--- a/drivers/input/serio/libps2.c
+++ b/drivers/input/serio/libps2.c
@@ -12,6 +12,7 @@
#include <linux/sched.h>
#include <linux/interrupt.h>
#include <linux/input.h>
+#include <linux/kmsan-checks.h>
#include <linux/serio.h>
#include <linux/i8042.h>
#include <linux/libps2.h>
@@ -294,9 +295,11 @@ int __ps2_command(struct ps2dev *ps2dev, u8 *param, unsigned int command)

serio_pause_rx(ps2dev->serio);

- if (param)
+ if (param) {
for (i = 0; i < receive; i++)
param[i] = ps2dev->cmdbuf[(receive - 1) - i];
+ kmsan_unpoison_memory(param, receive);
+ }

if (ps2dev->cmdcnt &&
(command != PS2_CMD_RESET_BAT || ps2dev->cmdcnt != 1)) {
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:09:28

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 30/46] kmsan: disable strscpy() optimization under KMSAN

Disable the efficient 8-byte reading under KMSAN to avoid false positives.

Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Iffd8336965e88fce915db2e6a9d6524422975f69
---
lib/string.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 485777c9da832..4ece4c7e7831b 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -197,6 +197,14 @@ ssize_t strscpy(char *dest, const char *src, size_t count)
max = 0;
#endif

+ /*
+ * read_word_at_a_time() below may read uninitialized bytes after the
+ * trailing zero and use them in comparisons. Disable this optimization
+ * under KMSAN to prevent false positive reports.
+ */
+ if (IS_ENABLED(CONFIG_KMSAN))
+ max = 0;
+
while (max >= sizeof(unsigned long)) {
unsigned long c, data;

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:23:05

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 02/46] stackdepot: reserve 5 extra bits in depot_stack_handle_t

Some users (currently only KMSAN) may want to use spare bits in
depot_stack_handle_t. Let them do so by adding @extra_bits to
__stack_depot_save() to store arbitrary flags, and providing
stack_depot_get_extra_bits() to retrieve those flags.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0587f6c777667864768daf07821d594bce6d8ff9
---
include/linux/stackdepot.h | 8 ++++++++
lib/stackdepot.c | 29 ++++++++++++++++++++++++-----
2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h
index 17f992fe6355b..fd641d266bead 100644
--- a/include/linux/stackdepot.h
+++ b/include/linux/stackdepot.h
@@ -14,9 +14,15 @@
#include <linux/gfp.h>

typedef u32 depot_stack_handle_t;
+/*
+ * Number of bits in the handle that stack depot doesn't use. Users may store
+ * information in them.
+ */
+#define STACK_DEPOT_EXTRA_BITS 5

depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t gfp_flags, bool can_alloc);

/*
@@ -41,6 +47,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int stack_depot_fetch(depot_stack_handle_t handle,
unsigned long **entries);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle);
+
int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size,
int spaces);

diff --git a/lib/stackdepot.c b/lib/stackdepot.c
index bf5ba9af05009..6dc11a3b7b88e 100644
--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -42,7 +42,8 @@
#define STACK_ALLOC_OFFSET_BITS (STACK_ALLOC_ORDER + PAGE_SHIFT - \
STACK_ALLOC_ALIGN)
#define STACK_ALLOC_INDEX_BITS (DEPOT_STACK_BITS - \
- STACK_ALLOC_NULL_PROTECTION_BITS - STACK_ALLOC_OFFSET_BITS)
+ STACK_ALLOC_NULL_PROTECTION_BITS - \
+ STACK_ALLOC_OFFSET_BITS - STACK_DEPOT_EXTRA_BITS)
#define STACK_ALLOC_SLABS_CAP 8192
#define STACK_ALLOC_MAX_SLABS \
(((1LL << (STACK_ALLOC_INDEX_BITS)) < STACK_ALLOC_SLABS_CAP) ? \
@@ -55,6 +56,7 @@ union handle_parts {
u32 slabindex : STACK_ALLOC_INDEX_BITS;
u32 offset : STACK_ALLOC_OFFSET_BITS;
u32 valid : STACK_ALLOC_NULL_PROTECTION_BITS;
+ u32 extra : STACK_DEPOT_EXTRA_BITS;
};
};

@@ -73,6 +75,14 @@ static int next_slab_inited;
static size_t depot_offset;
static DEFINE_RAW_SPINLOCK(depot_lock);

+unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle)
+{
+ union handle_parts parts = { .handle = handle };
+
+ return parts.extra;
+}
+EXPORT_SYMBOL(stack_depot_get_extra_bits);
+
static bool init_stack_slab(void **prealloc)
{
if (!*prealloc)
@@ -136,6 +146,7 @@ depot_alloc_stack(unsigned long *entries, int size, u32 hash, void **prealloc)
stack->handle.slabindex = depot_index;
stack->handle.offset = depot_offset >> STACK_ALLOC_ALIGN;
stack->handle.valid = 1;
+ stack->handle.extra = 0;
memcpy(stack->entries, entries, flex_array_size(stack, entries, size));
depot_offset += required_size;

@@ -320,6 +331,7 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*
* @entries: Pointer to storage array
* @nr_entries: Size of the storage array
+ * @extra_bits: Flags to store in unused bits of depot_stack_handle_t
* @alloc_flags: Allocation gfp flags
* @can_alloc: Allocate stack slabs (increased chance of failure if false)
*
@@ -331,6 +343,10 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
* If the stack trace in @entries is from an interrupt, only the portion up to
* interrupt entry is saved.
*
+ * Additional opaque flags can be passed in @extra_bits, stored in the unused
+ * bits of the stack handle, and retrieved using stack_depot_get_extra_bits()
+ * without calling stack_depot_fetch().
+ *
* Context: Any context, but setting @can_alloc to %false is required if
* alloc_pages() cannot be used from the current context. Currently
* this is the case from contexts where neither %GFP_ATOMIC nor
@@ -340,10 +356,11 @@ EXPORT_SYMBOL_GPL(stack_depot_fetch);
*/
depot_stack_handle_t __stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
+ unsigned int extra_bits,
gfp_t alloc_flags, bool can_alloc)
{
struct stack_record *found = NULL, **bucket;
- depot_stack_handle_t retval = 0;
+ union handle_parts retval = { .handle = 0 };
struct page *page = NULL;
void *prealloc = NULL;
unsigned long flags;
@@ -427,9 +444,11 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries,
free_pages((unsigned long)prealloc, STACK_ALLOC_ORDER);
}
if (found)
- retval = found->handle.handle;
+ retval.handle = found->handle.handle;
fast_exit:
- return retval;
+ retval.extra = extra_bits;
+
+ return retval.handle;
}
EXPORT_SYMBOL_GPL(__stack_depot_save);

@@ -449,6 +468,6 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries,
unsigned int nr_entries,
gfp_t alloc_flags)
{
- return __stack_depot_save(entries, nr_entries, alloc_flags, true);
+ return __stack_depot_save(entries, nr_entries, 0, alloc_flags, true);
}
EXPORT_SYMBOL_GPL(stack_depot_save);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:41:16

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 06/46] asm-generic: instrument usercopy in cacheflush.h

Notify memory tools about usercopy events in copy_to_user_page() and
copy_from_user_page().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic1ee8da1886325f46ad67f52176f48c2c836c48f
---
include/asm-generic/cacheflush.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/asm-generic/cacheflush.h b/include/asm-generic/cacheflush.h
index 4f07afacbc239..0f63eb325025f 100644
--- a/include/asm-generic/cacheflush.h
+++ b/include/asm-generic/cacheflush.h
@@ -2,6 +2,8 @@
#ifndef _ASM_GENERIC_CACHEFLUSH_H
#define _ASM_GENERIC_CACHEFLUSH_H

+#include <linux/instrumented.h>
+
struct mm_struct;
struct vm_area_struct;
struct page;
@@ -105,6 +107,7 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
#ifndef copy_to_user_page
#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
do { \
+ instrument_copy_to_user(dst, src, len); \
memcpy(dst, src, len); \
flush_icache_user_page(vma, page, vaddr, len); \
} while (0)
@@ -112,7 +115,11 @@ static inline void flush_cache_vunmap(unsigned long start, unsigned long end)

#ifndef copy_from_user_page
#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
- memcpy(dst, src, len)
+ do { \
+ instrument_copy_from_user_before(dst, src, len); \
+ memcpy(dst, src, len); \
+ instrument_copy_from_user_after(dst, src, len, 0); \
+ } while (0)
#endif

#endif /* _ASM_GENERIC_CACHEFLUSH_H */
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:47:02

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 12/46] kmsan: add KMSAN runtime core

For each memory location KernelMemorySanitizer maintains two types of
metadata:
1. The so-called shadow of that location - а byte:byte mapping describing
whether or not individual bits of memory are initialized (shadow is 0)
or not (shadow is 1).
2. The origins of that location - а 4-byte:4-byte mapping containing
4-byte IDs of the stack traces where uninitialized values were
created.

Each struct page now contains pointers to two struct pages holding
KMSAN metadata (shadow and origins) for the original struct page.
Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
metadata creation, addressing, copying and checking.
mm/kmsan/report.c performs error reporting in the cases an uninitialized
value is used in a way that leads to undefined behavior.

KMSAN compiler instrumentation is responsible for tracking the metadata
along with the kernel memory. mm/kmsan/instrumentation.c provides the
implementation for instrumentation hooks that are called from files
compiled with -fsanitize=kernel-memory.

To aid parameter passing (also done at instrumentation level), each
task_struct now contains a struct kmsan_task_state used to track the
metadata of function parameters and return values for that task.

Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
The KMSAN_SANITIZE:=n Makefile directive can be used to completely
disable KMSAN instrumentation for certain files.

Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
created stack memory initialized.

Users can also use functions from include/linux/kmsan-checks.h to mark
certain memory regions as uninitialized or initialized (this is called
"poisoning" and "unpoisoning") or check that a particular region is
initialized.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
rewrote the patch description;
-- addressed comments by Dmitry Vyukov;
-- added a note about KMSAN being not intended for production use.
-- fix case of unaligned dst in kmsan_internal_memmove_metadata()

v3:
-- print build IDs in reports where applicable
-- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
-- remove a stray BUG()

Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
---
Makefile | 1 +
include/linux/kmsan-checks.h | 64 +++++
include/linux/kmsan.h | 47 ++++
include/linux/mm_types.h | 12 +
include/linux/sched.h | 5 +
lib/Kconfig.debug | 1 +
lib/Kconfig.kmsan | 23 ++
mm/Makefile | 1 +
mm/kmsan/Makefile | 18 ++
mm/kmsan/core.c | 458 +++++++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 66 +++++
mm/kmsan/instrumentation.c | 267 ++++++++++++++++++++
mm/kmsan/kmsan.h | 183 ++++++++++++++
mm/kmsan/report.c | 211 ++++++++++++++++
mm/kmsan/shadow.c | 186 ++++++++++++++
scripts/Makefile.kmsan | 1 +
scripts/Makefile.lib | 9 +
17 files changed, 1553 insertions(+)
create mode 100644 include/linux/kmsan-checks.h
create mode 100644 include/linux/kmsan.h
create mode 100644 lib/Kconfig.kmsan
create mode 100644 mm/kmsan/Makefile
create mode 100644 mm/kmsan/core.c
create mode 100644 mm/kmsan/hooks.c
create mode 100644 mm/kmsan/instrumentation.c
create mode 100644 mm/kmsan/kmsan.h
create mode 100644 mm/kmsan/report.c
create mode 100644 mm/kmsan/shadow.c
create mode 100644 scripts/Makefile.kmsan

diff --git a/Makefile b/Makefile
index c3ec1ea423797..d3c7dcd9f0fea 100644
--- a/Makefile
+++ b/Makefile
@@ -1009,6 +1009,7 @@ include-y := scripts/Makefile.extrawarn
include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
include-$(CONFIG_KASAN) += scripts/Makefile.kasan
include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
+include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
include-$(CONFIG_KCOV) += scripts/Makefile.kcov
include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
new file mode 100644
index 0000000000000..a6522a0c28df9
--- /dev/null
+++ b/include/linux/kmsan-checks.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN checks to be used for one-off annotations in subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#ifndef _LINUX_KMSAN_CHECKS_H
+#define _LINUX_KMSAN_CHECKS_H
+
+#include <linux/types.h>
+
+#ifdef CONFIG_KMSAN
+
+/**
+ * kmsan_poison_memory() - Mark the memory range as uninitialized.
+ * @address: address to start with.
+ * @size: size of buffer to poison.
+ * @flags: GFP flags for allocations done by this function.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * uninitialized. Error reports for this memory will reference the call site of
+ * kmsan_poison_memory() as origin.
+ */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
+
+/**
+ * kmsan_unpoison_memory() - Mark the memory range as initialized.
+ * @address: address to start with.
+ * @size: size of buffer to unpoison.
+ *
+ * Until other data is written to this range, KMSAN will treat it as
+ * initialized.
+ */
+void kmsan_unpoison_memory(const void *address, size_t size);
+
+/**
+ * kmsan_check_memory() - Check the memory range for being initialized.
+ * @address: address to start with.
+ * @size: size of buffer to check.
+ *
+ * If any piece of the given range is marked as uninitialized, KMSAN will report
+ * an error.
+ */
+void kmsan_check_memory(const void *address, size_t size);
+
+#else
+
+static inline void kmsan_poison_memory(const void *address, size_t size,
+ gfp_t flags)
+{
+}
+static inline void kmsan_unpoison_memory(const void *address, size_t size)
+{
+}
+static inline void kmsan_check_memory(const void *address, size_t size)
+{
+}
+
+#endif
+
+#endif /* _LINUX_KMSAN_CHECKS_H */
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
new file mode 100644
index 0000000000000..4e35f43eceaa9
--- /dev/null
+++ b/include/linux/kmsan.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * KMSAN API for subsystems.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+#ifndef _LINUX_KMSAN_H
+#define _LINUX_KMSAN_H
+
+#include <linux/gfp.h>
+#include <linux/kmsan-checks.h>
+#include <linux/stackdepot.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+struct page;
+
+#ifdef CONFIG_KMSAN
+
+/* These constants are defined in the MSan LLVM instrumentation pass. */
+#define KMSAN_RETVAL_SIZE 800
+#define KMSAN_PARAM_SIZE 800
+
+struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+};
+
+#undef KMSAN_PARAM_SIZE
+#undef KMSAN_RETVAL_SIZE
+
+struct kmsan_ctx {
+ struct kmsan_context_state cstate;
+ int kmsan_in_runtime;
+ bool allow_reporting;
+};
+
+#endif
+
+#endif /* _LINUX_KMSAN_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8834e38c06a4f..85c97a2145f7e 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -218,6 +218,18 @@ struct page {
not kmapped, ie. highmem) */
#endif /* WANT_PAGE_VIRTUAL */

+#ifdef CONFIG_KMSAN
+ /*
+ * KMSAN metadata for this page:
+ * - shadow page: every bit indicates whether the corresponding
+ * bit of the original page is initialized (0) or not (1);
+ * - origin page: every 4 bytes contain an id of the stack trace
+ * where the uninitialized value was created.
+ */
+ struct page *kmsan_shadow;
+ struct page *kmsan_origin;
+#endif
+
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
int _last_cpupid;
#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a8911b1f35aad..9e53624cd73ac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -14,6 +14,7 @@
#include <linux/pid.h>
#include <linux/sem.h>
#include <linux/shm.h>
+#include <linux/kmsan.h>
#include <linux/mutex.h>
#include <linux/plist.h>
#include <linux/hrtimer.h>
@@ -1352,6 +1353,10 @@ struct task_struct {
#endif
#endif

+#ifdef CONFIG_KMSAN
+ struct kmsan_ctx kmsan_ctx;
+#endif
+
#if IS_ENABLED(CONFIG_KUNIT)
struct kunit *kunit_test;
#endif
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 075cd25363ac3..b81670878acae 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -996,6 +996,7 @@ config DEBUG_STACKOVERFLOW

source "lib/Kconfig.kasan"
source "lib/Kconfig.kfence"
+source "lib/Kconfig.kmsan"

endmenu # "Memory Debugging"

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
new file mode 100644
index 0000000000000..199f79d031f94
--- /dev/null
+++ b/lib/Kconfig.kmsan
@@ -0,0 +1,23 @@
+config HAVE_ARCH_KMSAN
+ bool
+
+config HAVE_KMSAN_COMPILER
+ def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1))
+
+config KMSAN
+ bool "KMSAN: detector of uninitialized values use"
+ depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
+ depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
+ depends on CC_IS_CLANG && CLANG_VERSION >= 140000
+ select STACKDEPOT
+ select STACKDEPOT_ALWAYS_INIT
+ help
+ KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
+ uninitialized values in the kernel. It is based on compiler
+ instrumentation provided by Clang and thus requires Clang to build.
+
+ An important note is that KMSAN is not intended for production use,
+ because it drastically increases kernel memory footprint and slows
+ the whole system down.
+
+ See <file:Documentation/dev-tools/kmsan.rst> for more details.
diff --git a/mm/Makefile b/mm/Makefile
index 4cc13f3179a51..4da7eeaecc214 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -89,6 +89,7 @@ obj-$(CONFIG_SLAB) += slab.o
obj-$(CONFIG_SLUB) += slub.o
obj-$(CONFIG_KASAN) += kasan/
obj-$(CONFIG_KFENCE) += kfence/
+obj-$(CONFIG_KMSAN) += kmsan/
obj-$(CONFIG_FAILSLAB) += failslab.o
obj-$(CONFIG_MEMTEST) += memtest.o
obj-$(CONFIG_MIGRATION) += migrate.o
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
new file mode 100644
index 0000000000000..a80dde1de7048
--- /dev/null
+++ b/mm/kmsan/Makefile
@@ -0,0 +1,18 @@
+obj-y := core.o instrumentation.o hooks.o report.o shadow.o
+
+KMSAN_SANITIZE := n
+KCOV_INSTRUMENT := n
+UBSAN_SANITIZE := n
+
+# Disable instrumentation of KMSAN runtime with other tools.
+CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
+CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
+CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
+
+CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
+
+CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
new file mode 100644
index 0000000000000..933d864d9d467
--- /dev/null
+++ b/mm/kmsan/core.c
@@ -0,0 +1,458 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN runtime library.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <asm/page.h>
+#include <linux/compiler.h>
+#include <linux/export.h>
+#include <linux/highmem.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/memory.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/mmzone.h>
+#include <linux/percpu-defs.h>
+#include <linux/preempt.h>
+#include <linux/slab.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/types.h>
+#include <linux/vmalloc.h>
+
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Avoid creating too long origin chains, these are unlikely to participate in
+ * real reports.
+ */
+#define MAX_CHAIN_DEPTH 7
+#define NUM_SKIPPED_TO_WARN 10000
+
+bool kmsan_enabled __read_mostly;
+
+/*
+ * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
+ * unavaliable.
+ */
+DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags)
+{
+ u32 extra_bits =
+ kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
+ bool checked = poison_flags & KMSAN_POISON_CHECK;
+ depot_stack_handle_t handle;
+
+ handle = kmsan_save_stack_with_flags(flags, extra_bits);
+ kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
+}
+
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
+{
+ kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
+}
+
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra)
+{
+ unsigned long entries[KMSAN_STACK_DEPTH];
+ unsigned int nr_entries;
+
+ nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
+
+ /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
+ flags &= ~__GFP_DIRECT_RECLAIM;
+
+ return __stack_depot_save(entries, nr_entries, extra, flags, true);
+}
+
+/* Copy the metadata following the memmove() behavior. */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
+{
+ depot_stack_handle_t old_origin = 0, new_origin = 0;
+ int src_slots, dst_slots, i, iter, step, skip_bits;
+ depot_stack_handle_t *origin_src, *origin_dst;
+ void *shadow_src, *shadow_dst;
+ u32 *align_shadow_src, shadow;
+ bool backwards;
+
+ shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
+ if (!shadow_dst)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
+
+ shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
+ if (!shadow_src) {
+ /*
+ * |src| is untracked: zero out destination shadow, ignore the
+ * origins, we're done.
+ */
+ __memset(shadow_dst, 0, n);
+ return;
+ }
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
+
+ __memmove(shadow_dst, shadow_src, n);
+
+ origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
+ origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin_dst || !origin_src);
+ src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
+ ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
+ KMSAN_ORIGIN_SIZE;
+ KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
+ KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
+ (dst_slots - src_slots < -1));
+
+ backwards = dst > src;
+ i = backwards ? min(src_slots, dst_slots) - 1 : 0;
+ iter = backwards ? -1 : 1;
+
+ align_shadow_src =
+ (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
+ for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
+ KMSAN_WARN_ON(i < 0);
+ shadow = align_shadow_src[i];
+ if (i == 0) {
+ /*
+ * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
+ * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
+ * of the first shadow slot.
+ */
+ skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow >> skip_bits) << skip_bits;
+ }
+ if (i == src_slots - 1) {
+ /*
+ * If |src + n| isn't aligned on
+ * KMSAN_ORIGIN_SIZE, don't look at the last
+ * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
+ * last shadow slot.
+ */
+ skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow << skip_bits) >> skip_bits;
+ }
+ /*
+ * Overwrite the origin only if the corresponding
+ * shadow is nonempty.
+ */
+ if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
+ old_origin = origin_src[i];
+ new_origin = kmsan_internal_chain_origin(old_origin);
+ /*
+ * kmsan_internal_chain_origin() may return
+ * NULL, but we don't want to lose the previous
+ * origin value.
+ */
+ if (!new_origin)
+ new_origin = old_origin;
+ }
+ if (shadow)
+ origin_dst[i] = new_origin;
+ else
+ origin_dst[i] = 0;
+ }
+ /*
+ * If dst_slots is greater than src_slots (i.e.
+ * dst_slots == src_slots + 1), there is an extra origin slot at the
+ * beginning or end of the destination buffer, for which we take the
+ * origin from the previous slot.
+ * This is only done if the part of the source shadow corresponding to
+ * slot is non-zero.
+ *
+ * E.g. if we copy 8 aligned bytes that are marked as uninitialized
+ * and have origins o111 and o222, to an unaligned buffer with offset 1,
+ * these two origins are copied to three origin slots, so one of then
+ * needs to be duplicated, depending on the copy direction (@backwards)
+ *
+ * src shadow: |uuuu|uuuu|....|
+ * src origin: |o111|o222|....|
+ *
+ * backwards = 0:
+ * dst shadow: |.uuu|uuuu|u...|
+ * dst origin: |....|o111|o222| - fill the empty slot with o111
+ * backwards = 1:
+ * dst shadow: |.uuu|uuuu|u...|
+ * dst origin: |o111|o222|....| - fill the empty slot with o222
+ */
+ if (src_slots < dst_slots) {
+ if (backwards) {
+ shadow = align_shadow_src[src_slots - 1];
+ skip_bits = (((u64)dst + n) % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow << skip_bits) >> skip_bits;
+ if (shadow)
+ /* src_slots > 0, therefore dst_slots is at least 2 */
+ origin_dst[dst_slots - 1] = origin_dst[dst_slots - 2];
+ } else {
+ shadow = align_shadow_src[0];
+ skip_bits = ((u64)dst % KMSAN_ORIGIN_SIZE) * 8;
+ shadow = (shadow >> skip_bits) << skip_bits;
+ if (shadow)
+ origin_dst[0] = origin_dst[1];
+ }
+ }
+}
+
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
+{
+ unsigned long entries[3];
+ u32 extra_bits;
+ int depth;
+ bool uaf;
+
+ if (!id)
+ return id;
+ /*
+ * Make sure we have enough spare bits in |id| to hold the UAF bit and
+ * the chain depth.
+ */
+ BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
+
+ extra_bits = stack_depot_get_extra_bits(id);
+ depth = kmsan_depth_from_eb(extra_bits);
+ uaf = kmsan_uaf_from_eb(extra_bits);
+
+ if (depth >= MAX_CHAIN_DEPTH) {
+ static atomic_long_t kmsan_skipped_origins;
+ long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
+
+ if (skipped % NUM_SKIPPED_TO_WARN == 0) {
+ pr_warn("not chained %ld origins\n", skipped);
+ dump_stack();
+ kmsan_print_origin(id);
+ }
+ return id;
+ }
+ depth++;
+ extra_bits = kmsan_extra_bits(depth, uaf);
+
+ entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
+ entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
+ entries[2] = id;
+ /*
+ * @entries is a local var in non-instrumented code, so KMSAN does not
+ * know it is initialized. Explicitly unpoison it to avoid false
+ * positives when __stack_depot_save() passes it to instrumented code.
+ */
+ kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
+ return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
+ GFP_ATOMIC, true);
+}
+
+void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
+ u32 origin, bool checked)
+{
+ u64 address = (u64)addr;
+ void *shadow_start;
+ u32 *origin_start;
+ size_t pad = 0;
+ int i;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
+ if (!shadow_start) {
+ /*
+ * kmsan_metadata_is_contiguous() is true, so either all shadow
+ * and origin pages are NULL, or all are non-NULL.
+ */
+ if (checked) {
+ pr_err("%s: not memsetting %ld bytes starting at %px, because the shadow is NULL\n",
+ __func__, size, addr);
+ KMSAN_WARN_ON(true);
+ }
+ return;
+ }
+ __memset(shadow_start, b, size);
+
+ if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
+ pad = address % KMSAN_ORIGIN_SIZE;
+ address -= pad;
+ size += pad;
+ }
+ size = ALIGN(size, KMSAN_ORIGIN_SIZE);
+ origin_start =
+ (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
+
+ for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
+ origin_start[i] = origin;
+}
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
+{
+ struct page *page;
+
+ if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
+ !kmsan_internal_is_module_addr(vaddr))
+ return NULL;
+ page = vmalloc_to_page(vaddr);
+ if (pfn_valid(page_to_pfn(page)))
+ return page;
+ else
+ return NULL;
+}
+
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason)
+{
+ depot_stack_handle_t cur_origin = 0, new_origin = 0;
+ unsigned long addr64 = (unsigned long)addr;
+ depot_stack_handle_t *origin = NULL;
+ unsigned char *shadow = NULL;
+ int cur_off_start = -1;
+ int i, chunk_size;
+ size_t pos = 0;
+
+ if (!size)
+ return;
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
+ while (pos < size) {
+ chunk_size = min(size - pos,
+ PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
+ shadow = kmsan_get_metadata((void *)(addr64 + pos),
+ KMSAN_META_SHADOW);
+ if (!shadow) {
+ /*
+ * This page is untracked. If there were uninitialized
+ * bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos - 1, user_addr,
+ reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ pos += chunk_size;
+ continue;
+ }
+ for (i = 0; i < chunk_size; i++) {
+ if (!shadow[i]) {
+ /*
+ * This byte is unpoisoned. If there were
+ * poisoned bytes before, report them.
+ */
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = 0;
+ cur_off_start = -1;
+ continue;
+ }
+ origin = kmsan_get_metadata((void *)(addr64 + pos + i),
+ KMSAN_META_ORIGIN);
+ KMSAN_WARN_ON(!origin);
+ new_origin = *origin;
+ /*
+ * Encountered new origin - report the previous
+ * uninitialized range.
+ */
+ if (cur_origin != new_origin) {
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size,
+ cur_off_start, pos + i - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+ cur_origin = new_origin;
+ cur_off_start = pos + i;
+ }
+ }
+ pos += chunk_size;
+ }
+ KMSAN_WARN_ON(pos != size);
+ if (cur_origin) {
+ kmsan_enter_runtime();
+ kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
+ user_addr, reason);
+ kmsan_leave_runtime();
+ }
+}
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size)
+{
+ char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
+ *next_origin = NULL;
+ u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
+ depot_stack_handle_t *origin_p;
+ bool all_untracked = false;
+
+ if (!size)
+ return true;
+
+ /* The whole range belongs to the same page. */
+ if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
+ ALIGN_DOWN(cur_addr, PAGE_SIZE))
+ return true;
+
+ cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
+ if (!cur_shadow)
+ all_untracked = true;
+ cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
+ if (all_untracked && cur_origin)
+ goto report;
+
+ for (; next_addr < (u64)addr + size;
+ cur_addr = next_addr, cur_shadow = next_shadow,
+ cur_origin = next_origin, next_addr += PAGE_SIZE) {
+ next_shadow = kmsan_get_metadata((void *)next_addr, false);
+ next_origin = kmsan_get_metadata((void *)next_addr, true);
+ if (all_untracked) {
+ if (next_shadow || next_origin)
+ goto report;
+ if (!next_shadow && !next_origin)
+ continue;
+ }
+ if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
+ ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
+ continue;
+ goto report;
+ }
+ return true;
+
+report:
+ pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
+ pr_err("Access of size %ld at %px.\n", size, addr);
+ pr_err("Addresses belonging to different ranges: %px and %px\n",
+ (void *)cur_addr, (void *)next_addr);
+ pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
+ next_shadow);
+ pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
+ next_origin);
+ origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
+ if (origin_p) {
+ pr_err("Origin: %08x\n", *origin_p);
+ kmsan_print_origin(*origin_p);
+ } else {
+ pr_err("Origin: unavailable\n");
+ }
+ return false;
+}
+
+bool kmsan_internal_is_module_addr(void *vaddr)
+{
+ return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
+}
+
+bool kmsan_internal_is_vmalloc_addr(void *addr)
+{
+ return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
+}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
new file mode 100644
index 0000000000000..4ac62fa67a02a
--- /dev/null
+++ b/mm/kmsan/hooks.c
@@ -0,0 +1,66 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN hooks for kernel subsystems.
+ *
+ * These functions handle creation of KMSAN metadata for memory allocations.
+ *
+ * Copyright (C) 2018-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <linux/cacheflush.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mm_types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include "../internal.h"
+#include "../slab.h"
+#include "kmsan.h"
+
+/*
+ * Instrumented functions shouldn't be called under
+ * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
+ * skipping effects of functions like memset() inside instrumented code.
+ */
+
+/* Functions from kmsan-checks.h follow. */
+void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_poison_memory((void *)address, size, flags,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_poison_memory);
+
+void kmsan_unpoison_memory(const void *address, size_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ kmsan_enter_runtime();
+ /* The users may want to poison/unpoison random memory. */
+ kmsan_internal_unpoison_memory((void *)address, size,
+ KMSAN_POISON_NOCHECK);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(kmsan_unpoison_memory);
+
+void kmsan_check_memory(const void *addr, size_t size)
+{
+ if (!kmsan_enabled)
+ return;
+ return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+}
+EXPORT_SYMBOL(kmsan_check_memory);
diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
new file mode 100644
index 0000000000000..fe062d123a76f
--- /dev/null
+++ b/mm/kmsan/instrumentation.c
@@ -0,0 +1,267 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN compiler API.
+ *
+ * This file implements __msan_XXX hooks that Clang inserts into the code
+ * compiled with -fsanitize=kernel-memory.
+ * See Documentation/dev-tools/kmsan.rst for more information on how KMSAN
+ * instrumentation works.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include "kmsan.h"
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/uaccess.h>
+
+static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
+{
+ if ((u64)addr < TASK_SIZE)
+ return true;
+ if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
+ return true;
+ return false;
+}
+
+static inline struct shadow_origin_ptr
+get_shadow_origin_ptr(void *addr, u64 size, bool store)
+{
+ unsigned long ua_flags = user_access_save();
+ struct shadow_origin_ptr ret;
+
+ ret = kmsan_get_shadow_origin_ptr(addr, size, store);
+ user_access_restore(ua_flags);
+ return ret;
+}
+
+/* Get shadow and origin pointers for a memory load with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ false);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
+
+/* Get shadow and origin pointers for a memory store with non-standard size. */
+struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
+ uintptr_t size)
+{
+ return get_shadow_origin_ptr(addr, size, /*store*/ true);
+}
+EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
+
+/*
+ * Declare functions that obtain shadow/origin pointers for loads and stores
+ * with fixed size.
+ */
+#define DECLARE_METADATA_PTR_GETTER(size) \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ false); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size); \
+ struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size( \
+ void *addr) \
+ { \
+ return get_shadow_origin_ptr(addr, size, /*store*/ true); \
+ } \
+ EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
+
+DECLARE_METADATA_PTR_GETTER(1);
+DECLARE_METADATA_PTR_GETTER(2);
+DECLARE_METADATA_PTR_GETTER(4);
+DECLARE_METADATA_PTR_GETTER(8);
+
+/*
+ * Handle a memory store performed by inline assembly. KMSAN conservatively
+ * attempts to unpoison the outputs of asm() directives to prevent false
+ * positives caused by missed stores.
+ */
+void __msan_instrument_asm_store(void *addr, uintptr_t size)
+{
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ /*
+ * Most of the accesses are below 32 bytes. The two exceptions so far
+ * are clwb() (64 bytes) and FPU state (512 bytes).
+ * It's unlikely that the assembly will touch more than 512 bytes.
+ */
+ if (size > 512) {
+ WARN_ONCE(1, "assembly store size too big: %ld\n", size);
+ size = 8;
+ }
+ if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
+ user_access_restore(ua_flags);
+ return;
+ }
+ kmsan_enter_runtime();
+ /* Unpoisoning the memory on best effort. */
+ kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_instrument_asm_store);
+
+/* Handle llvm.memmove intrinsic. */
+void *__msan_memmove(void *dst, const void *src, uintptr_t n)
+{
+ void *result;
+
+ result = __memmove(dst, src, n);
+ if (!n)
+ /* Some people call memmove() with zero length. */
+ return result;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memmove);
+
+/* Handle llvm.memcpy intrinsic. */
+void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
+{
+ void *result;
+
+ result = __memcpy(dst, src, n);
+ if (!n)
+ /* Some people call memcpy() with zero length. */
+ return result;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ /* Using memmove instead of memcpy doesn't affect correctness. */
+ kmsan_internal_memmove_metadata(dst, (void *)src, n);
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memcpy);
+
+/* Handle llvm.memset intrinsic. */
+void *__msan_memset(void *dst, int c, uintptr_t n)
+{
+ void *result;
+
+ result = __memset(dst, c, n);
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return result;
+
+ kmsan_enter_runtime();
+ /*
+ * Clang doesn't pass parameter metadata here, so it is impossible to
+ * use shadow of @c to set up the shadow for @dst.
+ */
+ kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
+ kmsan_leave_runtime();
+
+ return result;
+}
+EXPORT_SYMBOL(__msan_memset);
+
+/*
+ * Create a new origin from an old one. This is done when storing an
+ * uninitialized value to memory. When reporting an error, KMSAN unrolls and
+ * prints the whole chain of stores that preceded the use of this value.
+ */
+depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
+{
+ depot_stack_handle_t ret = 0;
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return ret;
+
+ ua_flags = user_access_save();
+
+ /* Creating new origins may allocate memory. */
+ kmsan_enter_runtime();
+ ret = kmsan_internal_chain_origin(origin);
+ kmsan_leave_runtime();
+ user_access_restore(ua_flags);
+ return ret;
+}
+EXPORT_SYMBOL(__msan_chain_origin);
+
+/* Poison a local variable when entering a function. */
+void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
+{
+ depot_stack_handle_t handle;
+ unsigned long entries[4];
+ unsigned long ua_flags;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ua_flags = user_access_save();
+ entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
+ entries[1] = (u64)descr;
+ entries[2] = (u64)__builtin_return_address(0);
+ /*
+ * With frame pointers enabled, it is possible to quickly fetch the
+ * second frame of the caller stack without calling the unwinder.
+ * Without them, simply do not bother.
+ */
+ if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
+ entries[3] = (u64)__builtin_return_address(1);
+ else
+ entries[3] = 0;
+
+ /* stack_depot_save() may allocate memory. */
+ kmsan_enter_runtime();
+ handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
+ kmsan_leave_runtime();
+
+ kmsan_internal_set_shadow_origin(address, size, -1, handle,
+ /*checked*/ true);
+ user_access_restore(ua_flags);
+}
+EXPORT_SYMBOL(__msan_poison_alloca);
+
+/* Unpoison a local variable. */
+void __msan_unpoison_alloca(void *address, uintptr_t size)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ kmsan_enter_runtime();
+ kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_unpoison_alloca);
+
+/*
+ * Report that an uninitialized value with the given origin was used in a way
+ * that constituted undefined behavior.
+ */
+void __msan_warning(u32 origin)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_report(origin, /*address*/ 0, /*size*/ 0,
+ /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(__msan_warning);
+
+/*
+ * At the beginning of an instrumented function, obtain the pointer to
+ * `struct kmsan_context_state` holding the metadata for function parameters.
+ */
+struct kmsan_context_state *__msan_get_context_state(void)
+{
+ return &kmsan_get_context()->cstate;
+}
+EXPORT_SYMBOL(__msan_get_context_state);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
new file mode 100644
index 0000000000000..bfe38789950a6
--- /dev/null
+++ b/mm/kmsan/kmsan.h
@@ -0,0 +1,183 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Functions used by the KMSAN runtime.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#ifndef __MM_KMSAN_KMSAN_H
+#define __MM_KMSAN_KMSAN_H
+
+#include <asm/pgtable_64_types.h>
+#include <linux/irqflags.h>
+#include <linux/sched.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/nmi.h>
+#include <linux/mm.h>
+#include <linux/printk.h>
+
+#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
+#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
+
+#define KMSAN_POISON_NOCHECK 0x0
+#define KMSAN_POISON_CHECK 0x1
+#define KMSAN_POISON_FREE 0x2
+
+#define KMSAN_ORIGIN_SIZE 4
+
+#define KMSAN_STACK_DEPTH 64
+
+#define KMSAN_META_SHADOW (false)
+#define KMSAN_META_ORIGIN (true)
+
+extern bool kmsan_enabled;
+extern int panic_on_kmsan;
+
+/*
+ * KMSAN performs a lot of consistency checks that are currently enabled by
+ * default. BUG_ON is normally discouraged in the kernel, unless used for
+ * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
+ * recover if something goes wrong.
+ */
+#define KMSAN_WARN_ON(cond) \
+ ({ \
+ const bool __cond = WARN_ON(cond); \
+ if (unlikely(__cond)) { \
+ WRITE_ONCE(kmsan_enabled, false); \
+ if (panic_on_kmsan) { \
+ /* Can't call panic() here because */ \
+ /* of uaccess checks.*/ \
+ BUG(); \
+ } \
+ } \
+ __cond; \
+ })
+
+/*
+ * A pair of metadata pointers to be returned by the instrumentation functions.
+ */
+struct shadow_origin_ptr {
+ void *shadow, *origin;
+};
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
+ bool store);
+void *kmsan_get_metadata(void *addr, bool is_origin);
+
+enum kmsan_bug_reason {
+ REASON_ANY,
+ REASON_COPY_TO_USER,
+ REASON_SUBMIT_URB,
+};
+
+void kmsan_print_origin(depot_stack_handle_t origin);
+
+/**
+ * kmsan_report() - Report a use of uninitialized value.
+ * @origin: Stack ID of the uninitialized value.
+ * @address: Address at which the memory access happens.
+ * @size: Memory access size.
+ * @off_first: Offset (from @address) of the first byte to be reported.
+ * @off_last: Offset (from @address) of the last byte to be reported.
+ * @user_addr: When non-NULL, denotes the userspace address to which the kernel
+ * is leaking data.
+ * @reason: Error type from enum kmsan_bug_reason.
+ *
+ * kmsan_report() prints an error message for a consequent group of bytes
+ * sharing the same origin. If an uninitialized value is used in a comparison,
+ * this function is called once without specifying the addresses. When checking
+ * a memory range, KMSAN may call kmsan_report() multiple times with the same
+ * @address, @size, @user_addr and @reason, but different @off_first and
+ * @off_last corresponding to different @origin values.
+ */
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason);
+
+DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+static __always_inline struct kmsan_ctx *kmsan_get_context(void)
+{
+ return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
+}
+
+/*
+ * When a compiler hook is invoked, it may make a call to instrumented code
+ * and eventually call itself recursively. To avoid that, we protect the
+ * runtime entry points with kmsan_enter_runtime()/kmsan_leave_runtime() and
+ * exit the hook if kmsan_in_runtime() is true.
+ */
+
+static __always_inline bool kmsan_in_runtime(void)
+{
+ if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
+ return true;
+ return kmsan_get_context()->kmsan_in_runtime;
+}
+
+static __always_inline void kmsan_enter_runtime(void)
+{
+ struct kmsan_ctx *ctx;
+
+ ctx = kmsan_get_context();
+ KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
+}
+
+static __always_inline void kmsan_leave_runtime(void)
+{
+ struct kmsan_ctx *ctx = kmsan_get_context();
+
+ KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
+}
+
+depot_stack_handle_t kmsan_save_stack(void);
+depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
+ unsigned int extra_bits);
+
+/*
+ * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
+ * provided by the stack depot.
+ * The UAF flag is stored in the lowest bit, followed by the depth in the upper
+ * bits.
+ * set_dsh_extra_bits() is responsible for clamping the value.
+ */
+static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
+ bool uaf)
+{
+ return (depth << 1) | uaf;
+}
+
+static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
+{
+ return extra_bits & 1;
+}
+
+static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
+{
+ return extra_bits >> 1;
+}
+
+/*
+ * kmsan_internal_ functions are supposed to be very simple and not require the
+ * kmsan_in_runtime() checks.
+ */
+void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
+void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
+ unsigned int poison_flags);
+void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
+void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
+ u32 origin, bool checked);
+depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
+
+bool kmsan_metadata_is_contiguous(void *addr, size_t size);
+void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
+ int reason);
+bool kmsan_internal_is_module_addr(void *vaddr);
+bool kmsan_internal_is_vmalloc_addr(void *addr);
+
+struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+
+#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
new file mode 100644
index 0000000000000..f36fca452e313
--- /dev/null
+++ b/mm/kmsan/report.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN error reporting routines.
+ *
+ * Copyright (C) 2019-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <linux/console.h>
+#include <linux/moduleparam.h>
+#include <linux/stackdepot.h>
+#include <linux/stacktrace.h>
+#include <linux/uaccess.h>
+
+#include "kmsan.h"
+
+static DEFINE_SPINLOCK(kmsan_report_lock);
+#define DESCR_SIZE 128
+/* Protected by kmsan_report_lock */
+static char report_local_descr[DESCR_SIZE];
+int panic_on_kmsan __read_mostly;
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "kmsan."
+module_param_named(panic, panic_on_kmsan, int, 0);
+
+/*
+ * Skip internal KMSAN frames.
+ */
+static int get_stack_skipnr(const unsigned long stack_entries[],
+ int num_entries)
+{
+ int len, skip;
+ char buf[64];
+
+ for (skip = 0; skip < num_entries; ++skip) {
+ len = scnprintf(buf, sizeof(buf), "%ps",
+ (void *)stack_entries[skip]);
+
+ /* Never show __msan_* or kmsan_* functions. */
+ if ((strnstr(buf, "__msan_", len) == buf) ||
+ (strnstr(buf, "kmsan_", len) == buf))
+ continue;
+
+ /*
+ * No match for runtime functions -- @skip entries to skip to
+ * get to first frame of interest.
+ */
+ break;
+ }
+
+ return skip;
+}
+
+/*
+ * Currently the descriptions of locals generated by Clang look as follows:
+ * ----local_name@function_name
+ * We want to print only the name of the local, as other information in that
+ * description can be confusing.
+ * The meaningful part of the description is copied to a global buffer to avoid
+ * allocating memory.
+ */
+static char *pretty_descr(char *descr)
+{
+ int i, pos = 0, len = strlen(descr);
+
+ for (i = 0; i < len; i++) {
+ if (descr[i] == '@')
+ break;
+ if (descr[i] == '-')
+ continue;
+ report_local_descr[pos] = descr[i];
+ if (pos + 1 == DESCR_SIZE)
+ break;
+ pos++;
+ }
+ report_local_descr[pos] = 0;
+ return report_local_descr;
+}
+
+void kmsan_print_origin(depot_stack_handle_t origin)
+{
+ unsigned long *entries = NULL, *chained_entries = NULL;
+ unsigned int nr_entries, chained_nr_entries, skipnr;
+ void *pc1 = NULL, *pc2 = NULL;
+ depot_stack_handle_t head;
+ unsigned long magic;
+ char *descr = NULL;
+
+ if (!origin)
+ return;
+
+ while (true) {
+ nr_entries = stack_depot_fetch(origin, &entries);
+ magic = nr_entries ? entries[0] : 0;
+ if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
+ descr = (char *)entries[1];
+ pc1 = (void *)entries[2];
+ pc2 = (void *)entries[3];
+ pr_err("Local variable %s created at:\n",
+ pretty_descr(descr));
+ if (pc1)
+ pr_err(" %pSb\n", pc1);
+ if (pc2)
+ pr_err(" %pSb\n", pc2);
+ break;
+ }
+ if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
+ head = entries[1];
+ origin = entries[2];
+ pr_err("Uninit was stored to memory at:\n");
+ chained_nr_entries =
+ stack_depot_fetch(head, &chained_entries);
+ kmsan_internal_unpoison_memory(
+ chained_entries,
+ chained_nr_entries * sizeof(*chained_entries),
+ /*checked*/ false);
+ skipnr = get_stack_skipnr(chained_entries,
+ chained_nr_entries);
+ stack_trace_print(chained_entries + skipnr,
+ chained_nr_entries - skipnr, 0);
+ pr_err("\n");
+ continue;
+ }
+ pr_err("Uninit was created at:\n");
+ if (nr_entries) {
+ skipnr = get_stack_skipnr(entries, nr_entries);
+ stack_trace_print(entries + skipnr, nr_entries - skipnr,
+ 0);
+ } else {
+ pr_err("(stack is not available)\n");
+ }
+ break;
+ }
+}
+
+void kmsan_report(depot_stack_handle_t origin, void *address, int size,
+ int off_first, int off_last, const void *user_addr,
+ enum kmsan_bug_reason reason)
+{
+ unsigned long stack_entries[KMSAN_STACK_DEPTH];
+ int num_stack_entries, skipnr;
+ char *bug_type = NULL;
+ unsigned long flags, ua_flags;
+ bool is_uaf;
+
+ if (!kmsan_enabled)
+ return;
+ if (!current->kmsan_ctx.allow_reporting)
+ return;
+ if (!origin)
+ return;
+
+ current->kmsan_ctx.allow_reporting = false;
+ ua_flags = user_access_save();
+ spin_lock_irqsave(&kmsan_report_lock, flags);
+ pr_err("=====================================================\n");
+ is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
+ switch (reason) {
+ case REASON_ANY:
+ bug_type = is_uaf ? "use-after-free" : "uninit-value";
+ break;
+ case REASON_COPY_TO_USER:
+ bug_type = is_uaf ? "kernel-infoleak-after-free" :
+ "kernel-infoleak";
+ break;
+ case REASON_SUBMIT_URB:
+ bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
+ "kernel-usb-infoleak";
+ break;
+ }
+
+ num_stack_entries =
+ stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
+ skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
+
+ pr_err("BUG: KMSAN: %s in %pSb\n",
+ bug_type, (void *)stack_entries[skipnr]);
+ stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
+ 0);
+ pr_err("\n");
+
+ kmsan_print_origin(origin);
+
+ if (size) {
+ pr_err("\n");
+ if (off_first == off_last)
+ pr_err("Byte %d of %d is uninitialized\n", off_first,
+ size);
+ else
+ pr_err("Bytes %d-%d of %d are uninitialized\n",
+ off_first, off_last, size);
+ }
+ if (address)
+ pr_err("Memory access of size %d starts at %px\n", size,
+ address);
+ if (user_addr && reason == REASON_COPY_TO_USER)
+ pr_err("Data copied to user address %px\n", user_addr);
+ pr_err("\n");
+ dump_stack_print_info(KERN_ERR);
+ pr_err("=====================================================\n");
+ add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
+ spin_unlock_irqrestore(&kmsan_report_lock, flags);
+ if (panic_on_kmsan)
+ panic("kmsan.panic set ...\n");
+ user_access_restore(ua_flags);
+ current->kmsan_ctx.allow_reporting = true;
+}
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
new file mode 100644
index 0000000000000..de58cfbc55b9d
--- /dev/null
+++ b/mm/kmsan/shadow.c
@@ -0,0 +1,186 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN shadow implementation.
+ *
+ * Copyright (C) 2017-2022 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <asm/page.h>
+#include <asm/pgtable_64_types.h>
+#include <asm/tlbflush.h>
+#include <linux/cacheflush.h>
+#include <linux/memblock.h>
+#include <linux/mm_types.h>
+#include <linux/percpu-defs.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stddef.h>
+
+#include "../internal.h"
+#include "kmsan.h"
+
+#define shadow_page_for(page) ((page)->kmsan_shadow)
+
+#define origin_page_for(page) ((page)->kmsan_origin)
+
+static void *shadow_ptr_for(struct page *page)
+{
+ return page_address(shadow_page_for(page));
+}
+
+static void *origin_ptr_for(struct page *page)
+{
+ return page_address(origin_page_for(page));
+}
+
+static bool page_has_metadata(struct page *page)
+{
+ return shadow_page_for(page) && origin_page_for(page);
+}
+
+static void set_no_shadow_origin_page(struct page *page)
+{
+ shadow_page_for(page) = NULL;
+ origin_page_for(page) = NULL;
+}
+
+/*
+ * Dummy load and store pages to be used when the real metadata is unavailable.
+ * There are separate pages for loads and stores, so that every load returns a
+ * zero, and every store doesn't affect other loads.
+ */
+static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
+ */
+static int kmsan_phys_addr_valid(unsigned long addr)
+{
+ if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
+ return !(addr >> boot_cpu_data.x86_phys_bits);
+ else
+ return 1;
+}
+
+/*
+ * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
+ */
+static bool kmsan_virt_addr_valid(void *addr)
+{
+ unsigned long x = (unsigned long)addr;
+ unsigned long y = x - __START_KERNEL_map;
+
+ /* use the carry flag to determine if x was < __START_KERNEL_map */
+ if (unlikely(x > y)) {
+ x = y + phys_base;
+
+ if (y >= KERNEL_IMAGE_SIZE)
+ return false;
+ } else {
+ x = y + (__START_KERNEL_map - PAGE_OFFSET);
+
+ /* carry flag will be set if starting x was >= PAGE_OFFSET */
+ if ((x > y) || !kmsan_phys_addr_valid(x))
+ return false;
+ }
+
+ return pfn_valid(x >> PAGE_SHIFT);
+}
+
+static unsigned long vmalloc_meta(void *addr, bool is_origin)
+{
+ unsigned long addr64 = (unsigned long)addr, off;
+
+ KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
+ if (kmsan_internal_is_vmalloc_addr(addr)) {
+ off = addr64 - VMALLOC_START;
+ return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
+ KMSAN_VMALLOC_SHADOW_START);
+ }
+ if (kmsan_internal_is_module_addr(addr)) {
+ off = addr64 - MODULES_VADDR;
+ return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
+ KMSAN_MODULES_SHADOW_START);
+ }
+ return 0;
+}
+
+static struct page *virt_to_page_or_null(void *vaddr)
+{
+ if (kmsan_virt_addr_valid(vaddr))
+ return virt_to_page(vaddr);
+ else
+ return NULL;
+}
+
+struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
+ bool store)
+{
+ struct shadow_origin_ptr ret;
+ void *shadow;
+
+ /*
+ * Even if we redirect this memory access to the dummy page, it will
+ * go out of bounds.
+ */
+ KMSAN_WARN_ON(size > PAGE_SIZE);
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ goto return_dummy;
+
+ KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
+ shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
+ if (!shadow)
+ goto return_dummy;
+
+ ret.shadow = shadow;
+ ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
+ return ret;
+
+return_dummy:
+ if (store) {
+ /* Ignore this store. */
+ ret.shadow = dummy_store_page;
+ ret.origin = dummy_store_page;
+ } else {
+ /* This load will return zero. */
+ ret.shadow = dummy_load_page;
+ ret.origin = dummy_load_page;
+ }
+ return ret;
+}
+
+/*
+ * Obtain the shadow or origin pointer for the given address, or NULL if there's
+ * none. The caller must check the return value for being non-NULL if needed.
+ * The return value of this function should not depend on whether we're in the
+ * runtime or not.
+ */
+void *kmsan_get_metadata(void *address, bool is_origin)
+{
+ u64 addr = (u64)address, pad, off;
+ struct page *page;
+ void *ret;
+
+ if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
+ pad = addr % KMSAN_ORIGIN_SIZE;
+ addr -= pad;
+ }
+ address = (void *)addr;
+ if (kmsan_internal_is_vmalloc_addr(address) ||
+ kmsan_internal_is_module_addr(address))
+ return (void *)vmalloc_meta(address, is_origin);
+
+ page = virt_to_page_or_null(address);
+ if (!page)
+ return NULL;
+ if (!page_has_metadata(page))
+ return NULL;
+ off = addr % PAGE_SIZE;
+
+ ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
+ return ret;
+}
diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
new file mode 100644
index 0000000000000..9793591f9855c
--- /dev/null
+++ b/scripts/Makefile.kmsan
@@ -0,0 +1 @@
+export CFLAGS_KMSAN := -fsanitize=kernel-memory
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 9f69ecdd7977a..49e6e57fdf4c8 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -157,6 +157,15 @@ _c_flags += $(if $(patsubst n%,, \
endif
endif

+ifeq ($(CONFIG_KMSAN),y)
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
+ $(CFLAGS_KMSAN))
+_c_flags += $(if $(patsubst n%,, \
+ $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
+ , -mllvm -msan-disable-checks=1)
+endif
+
ifeq ($(CONFIG_UBSAN),y)
_c_flags += $(if $(patsubst n%,, \
$(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:48:04

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 26/46] kmsan: handle memory sent to/from USB

Depending on the value of is_out kmsan_handle_urb() KMSAN either
marks the data copied to the kernel from a USB device as initialized,
or checks the data sent to the device for being initialized.

Signed-off-by: Alexander Potapenko <[email protected]>

---
v2:
-- move kmsan_handle_urb() implementation to this patch

Link: https://linux-review.googlesource.com/id/Ifa67fb72015d4de14c30e971556f99fc8b2ee506
---
drivers/usb/core/urb.c | 2 ++
include/linux/kmsan.h | 15 +++++++++++++++
mm/kmsan/hooks.c | 17 +++++++++++++++++
3 files changed, 34 insertions(+)

diff --git a/drivers/usb/core/urb.c b/drivers/usb/core/urb.c
index 33d62d7e3929f..1fe3f23205624 100644
--- a/drivers/usb/core/urb.c
+++ b/drivers/usb/core/urb.c
@@ -8,6 +8,7 @@
#include <linux/bitops.h>
#include <linux/slab.h>
#include <linux/log2.h>
+#include <linux/kmsan-checks.h>
#include <linux/usb.h>
#include <linux/wait.h>
#include <linux/usb/hcd.h>
@@ -426,6 +427,7 @@ int usb_submit_urb(struct urb *urb, gfp_t mem_flags)
URB_SETUP_MAP_SINGLE | URB_SETUP_MAP_LOCAL |
URB_DMA_SG_COMBINED);
urb->transfer_flags |= (is_out ? URB_DIR_OUT : URB_DIR_IN);
+ kmsan_handle_urb(urb, is_out);

if (xfertype != USB_ENDPOINT_XFER_CONTROL &&
dev->state < USB_STATE_CONFIGURED)
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index d8667161a10c8..55f976b721566 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -20,6 +20,7 @@ struct page;
struct kmem_cache;
struct task_struct;
struct scatterlist;
+struct urb;

#ifdef CONFIG_KMSAN

@@ -236,6 +237,16 @@ void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
enum dma_data_direction dir);

+/**
+ * kmsan_handle_urb() - Handle a USB data transfer.
+ * @urb: struct urb pointer.
+ * @is_out: data transfer direction (true means output to hardware).
+ *
+ * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise,
+ * KMSAN initializes the transfer buffer.
+ */
+void kmsan_handle_urb(const struct urb *urb, bool is_out);
+
#else

static inline void kmsan_init_shadow(void)
@@ -328,6 +339,10 @@ static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
{
}

+static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 8a6947a2a2f22..9aecbf2825837 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -17,6 +17,7 @@
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
+#include <linux/usb.h>

#include "../internal.h"
#include "../slab.h"
@@ -252,6 +253,22 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
}
EXPORT_SYMBOL(kmsan_copy_to_user);

+/* Helper function to check an URB. */
+void kmsan_handle_urb(const struct urb *urb, bool is_out)
+{
+ if (!urb)
+ return;
+ if (is_out)
+ kmsan_internal_check_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*user_addr*/ 0, REASON_SUBMIT_URB);
+ else
+ kmsan_internal_unpoison_memory(urb->transfer_buffer,
+ urb->transfer_buffer_length,
+ /*checked*/ false);
+}
+EXPORT_SYMBOL(kmsan_handle_urb);
+
static void kmsan_handle_dma_page(const void *addr, size_t size,
enum dma_data_direction dir)
{
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:51:19

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 35/46] security: kmsan: fix interoperability with auto-initialization

Heap and stack initialization is great, but not when we are trying
uses of uninitialized memory. When the kernel is built with KMSAN,
having kernel memory initialization enabled may introduce false
negatives.

We disable CONFIG_INIT_STACK_ALL_PATTERN and CONFIG_INIT_STACK_ALL_ZERO
under CONFIG_KMSAN, making it impossible to auto-initialize stack
variables in KMSAN builds. We also disable CONFIG_INIT_ON_ALLOC_DEFAULT_ON
and CONFIG_INIT_ON_FREE_DEFAULT_ON to prevent accidental use of heap
auto-initialization.

We however still let the users enable heap auto-initialization at
boot-time (by setting init_on_alloc=1 or init_on_free=1), in which case
a warning is printed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I86608dd867018683a14ae1870f1928ad925f42e9
---
mm/page_alloc.c | 4 ++++
security/Kconfig.hardening | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35b1fedb2f09c..4c89729cac7ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -849,6 +849,10 @@ void init_mem_debugging_and_hardening(void)
else
static_branch_disable(&init_on_free);

+ if (IS_ENABLED(CONFIG_KMSAN) &&
+ (_init_on_alloc_enabled_early || _init_on_free_enabled_early))
+ pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n");
+
#ifdef CONFIG_DEBUG_PAGEALLOC
if (!debug_pagealloc_enabled())
return;
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index ded4d7c0d1322..d6cce64899d13 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -106,6 +106,7 @@ choice
config INIT_STACK_ALL_PATTERN
bool "pattern-init everything (strongest)"
depends on CC_HAS_AUTO_VAR_INIT_PATTERN
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a specific debug value. This is intended to eliminate
@@ -124,6 +125,7 @@ choice
config INIT_STACK_ALL_ZERO
bool "zero-init everything (strongest and safest)"
depends on CC_HAS_AUTO_VAR_INIT_ZERO
+ depends on !KMSAN
help
Initializes everything on the stack (including padding)
with a zero value. This is intended to eliminate all
@@ -218,6 +220,7 @@ config STACKLEAK_RUNTIME_DISABLE

config INIT_ON_ALLOC_DEFAULT_ON
bool "Enable heap memory zeroing on allocation by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_alloc=1" on the kernel
command line. This can be disabled with "init_on_alloc=0".
@@ -230,6 +233,7 @@ config INIT_ON_ALLOC_DEFAULT_ON

config INIT_ON_FREE_DEFAULT_ON
bool "Enable heap memory zeroing on free by default"
+ depends on !KMSAN
help
This has the effect of setting "init_on_free=1" on the kernel
command line. This can be disabled with "init_on_free=0".
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:53:36

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 46/46] x86: kmsan: enable KMSAN builds for x86

Make KMSAN usable by adding the necessary Kconfig bits.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I1d295ce8159ce15faa496d20089d953a919c125e
---
arch/x86/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3209073f96415..592f5ca2017c2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -168,6 +168,7 @@ config X86
select HAVE_ARCH_KASAN if X86_64
select HAVE_ARCH_KASAN_VMALLOC if X86_64
select HAVE_ARCH_KFENCE
+ select HAVE_ARCH_KMSAN if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:56:07

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 14/46] kmsan: disable instrumentation of unsupported common kernel code

EFI stub cannot be linked with KMSAN runtime, so we disable
instrumentation for it.

Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
caused by instrumentation hooks calling instrumented code again.

This patch was previously part of "kmsan: disable KMSAN instrumentation
for certain kernel parts", but was split away per Mark Rutland's
request.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I41ae706bd3474f074f6a870bfc3f0f90e9c720f7
---
drivers/firmware/efi/libstub/Makefile | 1 +
kernel/Makefile | 1 +
kernel/locking/Makefile | 3 ++-
lib/Makefile | 1 +
4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index d0537573501e9..81432d0c904b1 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -46,6 +46,7 @@ GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
+KMSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

diff --git a/kernel/Makefile b/kernel/Makefile
index 847a82bfe0e3a..2a98e46479817 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -39,6 +39,7 @@ KCOV_INSTRUMENT_kcov.o := n
KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
UBSAN_SANITIZE_kcov.o := n
+KMSAN_SANITIZE_kcov.o := n
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack) -fno-stack-protector

# Don't instrument error handlers
diff --git a/kernel/locking/Makefile b/kernel/locking/Makefile
index d51cabf28f382..ea925731fa40f 100644
--- a/kernel/locking/Makefile
+++ b/kernel/locking/Makefile
@@ -5,8 +5,9 @@ KCOV_INSTRUMENT := n

obj-y += mutex.o semaphore.o rwsem.o percpu-rwsem.o

-# Avoid recursion lockdep -> KCSAN -> ... -> lockdep.
+# Avoid recursion lockdep -> sanitizer -> ... -> lockdep.
KCSAN_SANITIZE_lockdep.o := n
+KMSAN_SANITIZE_lockdep.o := n

ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_lockdep.o = $(CC_FLAGS_FTRACE)
diff --git a/lib/Makefile b/lib/Makefile
index 6b9ffc1bd1eed..caeb55f661726 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -269,6 +269,7 @@ obj-$(CONFIG_IRQ_POLL) += irq_poll.o
CFLAGS_stackdepot.o += -fno-builtin
obj-$(CONFIG_STACKDEPOT) += stackdepot.o
KASAN_SANITIZE_stackdepot.o := n
+KMSAN_SANITIZE_stackdepot.o := n
KCOV_INSTRUMENT_stackdepot.o := n

obj-$(CONFIG_REF_TRACKER) += ref_tracker.o
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 09:58:46

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 19/46] kmsan: init: call KMSAN initialization routines

kmsan_init_shadow() scans the mappings created at boot time and creates
metadata pages for those mappings.

When the memblock allocator returns pages to pagealloc, we reserve 2/3
of those pages and use them as metadata for the remaining 1/3. Once KMSAN
starts, every page allocated by pagealloc has its associated shadow and
origin pages.

kmsan_initialize() initializes the bookkeeping for init_task and enables
KMSAN.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move mm/kmsan/init.c and kmsan_memblock_free_pages() to this patch
-- print a warning that KMSAN is a debugging tool (per Greg K-H's
request)

Link: https://linux-review.googlesource.com/id/I7bc53706141275914326df2345881ffe0cdd16bd
---
include/linux/kmsan.h | 48 +++++++++
init/main.c | 3 +
mm/kmsan/Makefile | 3 +-
mm/kmsan/init.c | 240 ++++++++++++++++++++++++++++++++++++++++++
mm/kmsan/kmsan.h | 3 +
mm/kmsan/shadow.c | 36 +++++++
mm/page_alloc.c | 3 +
7 files changed, 335 insertions(+), 1 deletion(-)
create mode 100644 mm/kmsan/init.c

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index dca42e0e91991..a5767c728a46b 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -52,6 +52,40 @@ void kmsan_task_create(struct task_struct *task);
*/
void kmsan_task_exit(struct task_struct *task);

+/**
+ * kmsan_init_shadow() - Initialize KMSAN shadow at boot time.
+ *
+ * Allocate and initialize KMSAN metadata for early allocations.
+ */
+void __init kmsan_init_shadow(void);
+
+/**
+ * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN.
+ */
+void __init kmsan_init_runtime(void);
+
+/**
+ * kmsan_memblock_free_pages() - handle freeing of memblock pages.
+ * @page: struct page to free.
+ * @order: order of @page.
+ *
+ * Freed pages are either returned to buddy allocator or held back to be used
+ * as metadata pages.
+ */
+bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order);
+
+/**
+ * kmsan_task_create() - Initialize KMSAN state for the task.
+ * @task: task to initialize.
+ */
+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -173,6 +207,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_init_shadow(void)
+{
+}
+
+static inline void kmsan_init_runtime(void)
+{
+}
+
+static inline bool kmsan_memblock_free_pages(struct page *page,
+ unsigned int order)
+{
+ return true;
+}
+
static inline void kmsan_task_create(struct task_struct *task)
{
}
diff --git a/init/main.c b/init/main.c
index 98182c3c2c4b3..5c6937921c890 100644
--- a/init/main.c
+++ b/init/main.c
@@ -34,6 +34,7 @@
#include <linux/percpu.h>
#include <linux/kmod.h>
#include <linux/kprobes.h>
+#include <linux/kmsan.h>
#include <linux/vmalloc.h>
#include <linux/kernel_stat.h>
#include <linux/start_kernel.h>
@@ -835,6 +836,7 @@ static void __init mm_init(void)
init_mem_debugging_and_hardening();
kfence_alloc_pool();
report_meminit();
+ kmsan_init_shadow();
stack_depot_early_init();
mem_init();
mem_init_print_info();
@@ -852,6 +854,7 @@ static void __init mm_init(void)
init_espfix_bsp();
/* Should be run after espfix64 is set up. */
pti_init();
+ kmsan_init_runtime();
}

#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index 73b705cbf75b9..f57a956cb1c8b 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -1,4 +1,4 @@
-obj-y := core.o instrumentation.o hooks.o report.o shadow.o annotations.o
+obj-y := core.o instrumentation.o init.o hooks.o report.o shadow.o annotations.o

KMSAN_SANITIZE := n
KCOV_INSTRUMENT := n
@@ -16,6 +16,7 @@ CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
CFLAGS_annotations.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
+CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
diff --git a/mm/kmsan/init.c b/mm/kmsan/init.c
new file mode 100644
index 0000000000000..45757d1390402
--- /dev/null
+++ b/mm/kmsan/init.c
@@ -0,0 +1,240 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KMSAN initialization routines.
+ *
+ * Copyright (C) 2017-2021 Google LLC
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include "kmsan.h"
+
+#include <asm/sections.h>
+#include <linux/mm.h>
+#include <linux/memblock.h>
+
+#include "../internal.h"
+
+#define NUM_FUTURE_RANGES 128
+struct start_end_pair {
+ u64 start, end;
+};
+
+static struct start_end_pair start_end_pairs[NUM_FUTURE_RANGES] __initdata;
+static int future_index __initdata;
+
+/*
+ * Record a range of memory for which the metadata pages will be created once
+ * the page allocator becomes available.
+ */
+static void __init kmsan_record_future_shadow_range(void *start, void *end)
+{
+ u64 nstart = (u64)start, nend = (u64)end, cstart, cend;
+ bool merged = false;
+ int i;
+
+ KMSAN_WARN_ON(future_index == NUM_FUTURE_RANGES);
+ KMSAN_WARN_ON((nstart >= nend) || !nstart || !nend);
+ nstart = ALIGN_DOWN(nstart, PAGE_SIZE);
+ nend = ALIGN(nend, PAGE_SIZE);
+
+ /*
+ * Scan the existing ranges to see if any of them overlaps with
+ * [start, end). In that case, merge the two ranges instead of
+ * creating a new one.
+ * The number of ranges is less than 20, so there is no need to organize
+ * them into a more intelligent data structure.
+ */
+ for (i = 0; i < future_index; i++) {
+ cstart = start_end_pairs[i].start;
+ cend = start_end_pairs[i].end;
+ if ((cstart < nstart && cend < nstart) ||
+ (cstart > nend && cend > nend))
+ /* ranges are disjoint - do not merge */
+ continue;
+ start_end_pairs[i].start = min(nstart, cstart);
+ start_end_pairs[i].end = max(nend, cend);
+ merged = true;
+ break;
+ }
+ if (merged)
+ return;
+ start_end_pairs[future_index].start = nstart;
+ start_end_pairs[future_index].end = nend;
+ future_index++;
+}
+
+/*
+ * Initialize the shadow for existing mappings during kernel initialization.
+ * These include kernel text/data sections, NODE_DATA and future ranges
+ * registered while creating other data (e.g. percpu).
+ *
+ * Allocations via memblock can be only done before slab is initialized.
+ */
+void __init kmsan_init_shadow(void)
+{
+ const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+ phys_addr_t p_start, p_end;
+ int nid;
+ u64 i;
+
+ for_each_reserved_mem_range(i, &p_start, &p_end)
+ kmsan_record_future_shadow_range(phys_to_virt(p_start),
+ phys_to_virt(p_end));
+ /* Allocate shadow for .data */
+ kmsan_record_future_shadow_range(_sdata, _edata);
+
+ for_each_online_node(nid)
+ kmsan_record_future_shadow_range(
+ NODE_DATA(nid), (char *)NODE_DATA(nid) + nd_size);
+
+ for (i = 0; i < future_index; i++)
+ kmsan_init_alloc_meta_for_range(
+ (void *)start_end_pairs[i].start,
+ (void *)start_end_pairs[i].end);
+}
+EXPORT_SYMBOL(kmsan_init_shadow);
+
+struct page_pair {
+ struct page *shadow, *origin;
+};
+static struct page_pair held_back[MAX_ORDER] __initdata;
+
+/*
+ * Eager metadata allocation. When the memblock allocator is freeing pages to
+ * pagealloc, we use 2/3 of them as metadata for the remaining 1/3.
+ * We store the pointers to the returned blocks of pages in held_back[] grouped
+ * by their order: when kmsan_memblock_free_pages() is called for the first
+ * time with a certain order, it is reserved as a shadow block, for the second
+ * time - as an origin block. On the third time the incoming block receives its
+ * shadow and origin ranges from the previously saved shadow and origin blocks,
+ * after which held_back[order] can be used again.
+ *
+ * At the very end there may be leftover blocks in held_back[]. They are
+ * collected later by kmsan_memblock_discard().
+ */
+bool kmsan_memblock_free_pages(struct page *page, unsigned int order)
+{
+ struct page *shadow, *origin;
+
+ if (!held_back[order].shadow) {
+ held_back[order].shadow = page;
+ return false;
+ }
+ if (!held_back[order].origin) {
+ held_back[order].origin = page;
+ return false;
+ }
+ shadow = held_back[order].shadow;
+ origin = held_back[order].origin;
+ kmsan_setup_meta(page, shadow, origin, order);
+
+ held_back[order].shadow = NULL;
+ held_back[order].origin = NULL;
+ return true;
+}
+
+#define MAX_BLOCKS 8
+struct smallstack {
+ struct page *items[MAX_BLOCKS];
+ int index;
+ int order;
+};
+
+struct smallstack collect = {
+ .index = 0,
+ .order = MAX_ORDER,
+};
+
+static void smallstack_push(struct smallstack *stack, struct page *pages)
+{
+ KMSAN_WARN_ON(stack->index == MAX_BLOCKS);
+ stack->items[stack->index] = pages;
+ stack->index++;
+}
+#undef MAX_BLOCKS
+
+static struct page *smallstack_pop(struct smallstack *stack)
+{
+ struct page *ret;
+
+ KMSAN_WARN_ON(stack->index == 0);
+ stack->index--;
+ ret = stack->items[stack->index];
+ stack->items[stack->index] = NULL;
+ return ret;
+}
+
+static void do_collection(void)
+{
+ struct page *page, *shadow, *origin;
+
+ while (collect.index >= 3) {
+ page = smallstack_pop(&collect);
+ shadow = smallstack_pop(&collect);
+ origin = smallstack_pop(&collect);
+ kmsan_setup_meta(page, shadow, origin, collect.order);
+ __free_pages_core(page, collect.order);
+ }
+}
+
+static void collect_split(void)
+{
+ struct smallstack tmp = {
+ .order = collect.order - 1,
+ .index = 0,
+ };
+ struct page *page;
+
+ if (!collect.order)
+ return;
+ while (collect.index) {
+ page = smallstack_pop(&collect);
+ smallstack_push(&tmp, &page[0]);
+ smallstack_push(&tmp, &page[1 << tmp.order]);
+ }
+ __memcpy(&collect, &tmp, sizeof(struct smallstack));
+}
+
+/*
+ * Memblock is about to go away. Split the page blocks left over in held_back[]
+ * and return 1/3 of that memory to the system.
+ */
+static void kmsan_memblock_discard(void)
+{
+ int i;
+
+ /*
+ * For each order=N:
+ * - push held_back[N].shadow and .origin to |collect|;
+ * - while there are >= 3 elements in |collect|, do garbage collection:
+ * - pop 3 ranges from |collect|;
+ * - use two of them as shadow and origin for the third one;
+ * - repeat;
+ * - split each remaining element from |collect| into 2 ranges of
+ * order=N-1,
+ * - repeat.
+ */
+ collect.order = MAX_ORDER - 1;
+ for (i = MAX_ORDER - 1; i >= 0; i--) {
+ if (held_back[i].shadow)
+ smallstack_push(&collect, held_back[i].shadow);
+ if (held_back[i].origin)
+ smallstack_push(&collect, held_back[i].origin);
+ held_back[i].shadow = NULL;
+ held_back[i].origin = NULL;
+ do_collection();
+ collect_split();
+ }
+}
+
+void __init kmsan_init_runtime(void)
+{
+ /* Assuming current is init_task */
+ kmsan_internal_task_create(current);
+ kmsan_memblock_discard();
+ pr_info("Starting KernelMemorySanitizer\n");
+ pr_info("ATTENTION: KMSAN is a debugging tool! Do not use it on production machines!\n");
+ kmsan_enabled = true;
+}
+EXPORT_SYMBOL(kmsan_init_runtime);
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index a1b5900ffd97b..059f21c39ec1b 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -66,6 +66,7 @@ struct shadow_origin_ptr {
struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
bool store);
void *kmsan_get_metadata(void *addr, bool is_origin);
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end);

enum kmsan_bug_reason {
REASON_ANY,
@@ -181,5 +182,7 @@ bool kmsan_internal_is_module_addr(void *vaddr);
bool kmsan_internal_is_vmalloc_addr(void *addr);

struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order);

#endif /* __MM_KMSAN_KMSAN_H */
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index 8fe6a5ed05e67..99cb9436eddc6 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -298,3 +298,39 @@ void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
kfree(s_pages);
kfree(o_pages);
}
+
+/* Allocate metadata for pages allocated at boot time. */
+void __init kmsan_init_alloc_meta_for_range(void *start, void *end)
+{
+ struct page *shadow_p, *origin_p;
+ void *shadow, *origin;
+ struct page *page;
+ u64 addr, size;
+
+ start = (void *)ALIGN_DOWN((u64)start, PAGE_SIZE);
+ size = ALIGN((u64)end - (u64)start, PAGE_SIZE);
+ shadow = memblock_alloc(size, PAGE_SIZE);
+ origin = memblock_alloc(size, PAGE_SIZE);
+ for (addr = 0; addr < size; addr += PAGE_SIZE) {
+ page = virt_to_page_or_null((char *)start + addr);
+ shadow_p = virt_to_page_or_null((char *)shadow + addr);
+ set_no_shadow_origin_page(shadow_p);
+ shadow_page_for(page) = shadow_p;
+ origin_p = virt_to_page_or_null((char *)origin + addr);
+ set_no_shadow_origin_page(origin_p);
+ origin_page_for(page) = origin_p;
+ }
+}
+
+void kmsan_setup_meta(struct page *page, struct page *shadow,
+ struct page *origin, int order)
+{
+ int i;
+
+ for (i = 0; i < (1 << order); i++) {
+ set_no_shadow_origin_page(&shadow[i]);
+ set_no_shadow_origin_page(&origin[i]);
+ shadow_page_for(&page[i]) = &shadow[i];
+ origin_page_for(&page[i]) = &origin[i];
+ }
+}
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 98393e01e4259..35b1fedb2f09c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1716,6 +1716,9 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn,
{
if (early_page_uninitialised(pfn))
return;
+ if (!kmsan_memblock_free_pages(page, order))
+ /* KMSAN will take care of these pages. */
+ return;
__free_pages_core(page, order);
}

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:09:29

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 22/46] kmsan: add iomap support

Functions from lib/iomap.c interact with hardware, so KMSAN must ensure
that:
- every read function returns an initialized value
- every write function checks values before sending them to hardware.

Signed-off-by: Alexander Potapenko <[email protected]>

---
Link: https://linux-review.googlesource.com/id/I45527599f09090aca046dfe1a26df453adab100d
---
lib/iomap.c | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)

diff --git a/lib/iomap.c b/lib/iomap.c
index fbaa3e8f19d6c..bdda1a42771b2 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -6,6 +6,7 @@
*/
#include <linux/pci.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#include <linux/export.h>

@@ -70,26 +71,31 @@ static void bad_io_access(unsigned long port, const char *access)
#define mmio_read64be(addr) swab64(readq(addr))
#endif

+__no_sanitize_memory
unsigned int ioread8(const void __iomem *addr)
{
IO_COND(addr, return inb(port), return readb(addr));
return 0xff;
}
+__no_sanitize_memory
unsigned int ioread16(const void __iomem *addr)
{
IO_COND(addr, return inw(port), return readw(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread16be(const void __iomem *addr)
{
IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
return 0xffff;
}
+__no_sanitize_memory
unsigned int ioread32(const void __iomem *addr)
{
IO_COND(addr, return inl(port), return readl(addr));
return 0xffffffff;
}
+__no_sanitize_memory
unsigned int ioread32be(const void __iomem *addr)
{
IO_COND(addr, return pio_read32be(port), return mmio_read32be(addr));
@@ -142,18 +148,21 @@ static u64 pio_read64be_hi_lo(unsigned long port)
return lo | (hi << 32);
}

+__no_sanitize_memory
u64 ioread64_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_lo_hi(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64_hi_lo(port), return readq(addr));
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_lo_hi(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_lo_hi(port),
@@ -161,6 +170,7 @@ u64 ioread64be_lo_hi(const void __iomem *addr)
return 0xffffffffffffffffULL;
}

+__no_sanitize_memory
u64 ioread64be_hi_lo(const void __iomem *addr)
{
IO_COND(addr, return pio_read64be_hi_lo(port),
@@ -188,22 +198,32 @@ EXPORT_SYMBOL(ioread64be_hi_lo);

void iowrite8(u8 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outb(val,port), writeb(val, addr));
}
void iowrite16(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outw(val,port), writew(val, addr));
}
void iowrite16be(u16 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write16be(val,port), mmio_write16be(val, addr));
}
void iowrite32(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, outl(val,port), writel(val, addr));
}
void iowrite32be(u32 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write32be(val,port), mmio_write32be(val, addr));
}
EXPORT_SYMBOL(iowrite8);
@@ -239,24 +259,32 @@ static void pio_write64be_hi_lo(u64 val, unsigned long port)

void iowrite64_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_lo_hi(val, port),
writeq(val, addr));
}

void iowrite64_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64_hi_lo(val, port),
writeq(val, addr));
}

void iowrite64be_lo_hi(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_lo_hi(val, port),
mmio_write64be(val, addr));
}

void iowrite64be_hi_lo(u64 val, void __iomem *addr)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(&val, sizeof(val));
IO_COND(addr, pio_write64be_hi_lo(val, port),
mmio_write64be(val, addr));
}
@@ -328,14 +356,20 @@ static inline void mmio_outsl(void __iomem *addr, const u32 *src, int count)
void ioread8_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insb(port,dst,count), mmio_insb(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count);
}
void ioread16_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insw(port,dst,count), mmio_insw(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 2);
}
void ioread32_rep(const void __iomem *addr, void *dst, unsigned long count)
{
IO_COND(addr, insl(port,dst,count), mmio_insl(addr, dst, count));
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(dst, count * 4);
}
EXPORT_SYMBOL(ioread8_rep);
EXPORT_SYMBOL(ioread16_rep);
@@ -343,14 +377,20 @@ EXPORT_SYMBOL(ioread32_rep);

void iowrite8_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count);
IO_COND(addr, outsb(port, src, count), mmio_outsb(addr, src, count));
}
void iowrite16_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 2);
IO_COND(addr, outsw(port, src, count), mmio_outsw(addr, src, count));
}
void iowrite32_rep(void __iomem *addr, const void *src, unsigned long count)
{
+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(src, count * 4);
IO_COND(addr, outsl(port, src,count), mmio_outsl(addr, src, count));
}
EXPORT_SYMBOL(iowrite8_rep);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:14:59

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 39/46] x86: kmsan: skip shadow checks in __switch_to()

When instrumenting functions, KMSAN obtains the per-task state (mostly
pointers to metadata for function arguments and return values) once per
function at its beginning, using the `current` pointer.

Every time the instrumented function calls another function, this state
(`struct kmsan_context_state`) is updated with shadow/origin data of the
passed and returned values.

When `current` changes in the low-level arch code, instrumented code can
not notice that, and will still refer to the old state, possibly corrupting
it or using stale data. This may result in false positive reports.

To deal with that, we need to apply __no_kmsan_checks to the functions
performing context switching - this will result in skipping all KMSAN
shadow checks and marking newly created values as initialized,
preventing all false positive reports in those functions. False negatives
are still possible, but we expect them to be rare and impersistent.

Suggested-by: Marco Elver <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>

---
v2:
-- This patch was previously called "kmsan: skip shadow checks in files
doing context switches". Per Mark Rutland's suggestion, we now only
skip checks in low-level arch-specific code, as context switches in
common code should be invisible to KMSAN. We also apply the checks
to precisely the functions performing the context switch instead of
the whole file.

Link: https://linux-review.googlesource.com/id/I45e3ed9c5f66ee79b0409d1673d66ae419029bcb

Replace KMSAN_ENABLE_CHECKS_process_64.o with __no_kmsan_checks
---
arch/x86/kernel/process_64.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e459253649be2..9952a4c7e1d20 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -553,6 +553,7 @@ void compat_start_thread(struct pt_regs *regs, u32 new_ip, u32 new_sp, bool x32)
* Kprobes not supported here. Set the probe on schedule instead.
* Function graph tracer not supported too.
*/
+__no_kmsan_checks
__visible __notrace_funcgraph struct task_struct *
__switch_to(struct task_struct *prev_p, struct task_struct *next_p)
{
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:18:10

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 36/46] objtool: kmsan: list KMSAN API functions as uaccess-safe

KMSAN inserts API function calls in a lot of places (function entries
and exits, local variables, memory accesses), so they may get called
from the uaccess regions as well.

KMSAN API functions are used to update the metadata (shadow/origin pages)
for kernel memory accesses. The metadata pages for kernel pointers are
also located in the kernel memory, so touching them is not a problem.
For userspace pointers, no metadata is allocated.

If an API function is supposed to read or modify the metadata, it does so
for kernel pointers and ignores userspace pointers.
If an API function is supposed to return a pair of metadata pointers for
the instrumentation to use (like all __msan_metadata_ptr_for_TYPE_SIZE()
functions do), it returns the allocated metadata for kernel pointers and
special dummy buffers residing in the kernel memory for userspace
pointers.

As a result, none of KMSAN API functions perform userspace accesses, but
since they might be called from UACCESS regions they use
user_access_save/restore().

Signed-off-by: Alexander Potapenko <[email protected]>
---
v3:
-- updated the patch description

Link: https://linux-review.googlesource.com/id/I242bc9816273fecad4ea3d977393784396bb3c35
---
tools/objtool/check.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index bd0c2c828940a..44825a96adc7c 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1008,6 +1008,25 @@ static const char *uaccess_safe_builtin[] = {
"__sanitizer_cov_trace_cmp4",
"__sanitizer_cov_trace_cmp8",
"__sanitizer_cov_trace_switch",
+ /* KMSAN */
+ "kmsan_copy_to_user",
+ "kmsan_report",
+ "kmsan_unpoison_memory",
+ "__msan_chain_origin",
+ "__msan_get_context_state",
+ "__msan_instrument_asm_store",
+ "__msan_metadata_ptr_for_load_1",
+ "__msan_metadata_ptr_for_load_2",
+ "__msan_metadata_ptr_for_load_4",
+ "__msan_metadata_ptr_for_load_8",
+ "__msan_metadata_ptr_for_load_n",
+ "__msan_metadata_ptr_for_store_1",
+ "__msan_metadata_ptr_for_store_2",
+ "__msan_metadata_ptr_for_store_4",
+ "__msan_metadata_ptr_for_store_8",
+ "__msan_metadata_ptr_for_store_n",
+ "__msan_poison_alloca",
+ "__msan_warning",
/* UBSAN */
"ubsan_type_mismatch_common",
"__ubsan_handle_type_mismatch",
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:39:36

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 37/46] x86: kmsan: make READ_ONCE_TASK_STACK() return initialized values

To avoid false positives, assume that reading from the task stack
always produces initialized values.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I9e2350bf3e88688dd83537e12a23456480141997
---
arch/x86/include/asm/unwind.h | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 7cede4dc21f00..87acc90875b74 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -128,18 +128,19 @@ unsigned long unwind_recover_ret_addr(struct unwind_state *state,
}

/*
- * This disables KASAN checking when reading a value from another task's stack,
- * since the other task could be running on another CPU and could have poisoned
- * the stack in the meantime.
+ * This disables KASAN/KMSAN checking when reading a value from another task's
+ * stack, since the other task could be running on another CPU and could have
+ * poisoned the stack in the meantime. Frame pointers are uninitialized by
+ * default, so for KMSAN we mark the return value initialized unconditionally.
*/
-#define READ_ONCE_TASK_STACK(task, x) \
-({ \
- unsigned long val; \
- if (task == current) \
- val = READ_ONCE(x); \
- else \
- val = READ_ONCE_NOCHECK(x); \
- val; \
+#define READ_ONCE_TASK_STACK(task, x) \
+({ \
+ unsigned long val; \
+ if (task == current && !IS_ENABLED(CONFIG_KMSAN)) \
+ val = READ_ONCE(x); \
+ else \
+ val = READ_ONCE_NOCHECK(x); \
+ val; \
})

static inline bool task_on_another_cpu(struct task_struct *task)
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:40:59

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 24/46] kmsan: dma: unpoison DMA mappings

KMSAN doesn't know about DMA memory writes performed by devices.
We unpoison such memory when it's mapped to avoid false positive
reports.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move implementation of kmsan_handle_dma() and kmsan_handle_dma_sg() here

Link: https://linux-review.googlesource.com/id/Ia162dc4c5a92e74d4686c1be32a4dfeffc5c32cd
---
include/linux/kmsan.h | 41 +++++++++++++++++++++++++++++
kernel/dma/mapping.c | 9 ++++---
mm/kmsan/hooks.c | 61 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index a5767c728a46b..d8667161a10c8 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -9,6 +9,7 @@
#ifndef _LINUX_KMSAN_H
#define _LINUX_KMSAN_H

+#include <linux/dma-direction.h>
#include <linux/gfp.h>
#include <linux/kmsan-checks.h>
#include <linux/stackdepot.h>
@@ -18,6 +19,7 @@
struct page;
struct kmem_cache;
struct task_struct;
+struct scatterlist;

#ifdef CONFIG_KMSAN

@@ -205,6 +207,35 @@ void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
*/
void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

+/**
+ * kmsan_handle_dma() - Handle a DMA data transfer.
+ * @page: first page of the buffer.
+ * @offset: offset of the buffer within the first page.
+ * @size: buffer size.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffer, if it is copied to device;
+ * * initializes the buffer, if it is copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir);
+
+/**
+ * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist.
+ * @sg: scatterlist holding DMA buffers.
+ * @nents: number of scatterlist entries.
+ * @dir: one of possible dma_data_direction values.
+ *
+ * Depending on @direction, KMSAN:
+ * * checks the buffers in the scatterlist, if they are copied to device;
+ * * initializes the buffers, if they are copied from device;
+ * * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ */
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir);
+
#else

static inline void kmsan_init_shadow(void)
@@ -287,6 +318,16 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}

+static inline void kmsan_handle_dma(struct page *page, size_t offset,
+ size_t size, enum dma_data_direction dir)
+{
+}
+
+static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b745..5d17d5d62166b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -156,6 +156,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ kmsan_handle_dma(page, offset, size, dir);
debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);

return addr;
@@ -194,11 +195,13 @@ static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);

- if (ents > 0)
+ if (ents > 0) {
+ kmsan_handle_dma_sg(sg, nents, dir);
debug_dma_map_sg(dev, sg, nents, ents, dir, attrs);
- else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
- ents != -EIO))
+ } else if (WARN_ON_ONCE(ents != -EINVAL && ents != -ENOMEM &&
+ ents != -EIO)) {
return -EIO;
+ }

return ents;
}
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 1cdb4420977f1..8a6947a2a2f22 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -10,9 +10,11 @@
*/

#include <linux/cacheflush.h>
+#include <linux/dma-direction.h>
#include <linux/gfp.h>
#include <linux/mm.h>
#include <linux/mm_types.h>
+#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <linux/uaccess.h>

@@ -250,6 +252,65 @@ void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy,
}
EXPORT_SYMBOL(kmsan_copy_to_user);

+static void kmsan_handle_dma_page(const void *addr, size_t size,
+ enum dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_BIDIRECTIONAL:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_TO_DEVICE:
+ kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
+ REASON_ANY);
+ break;
+ case DMA_FROM_DEVICE:
+ kmsan_internal_unpoison_memory((void *)addr, size,
+ /*checked*/ false);
+ break;
+ case DMA_NONE:
+ break;
+ }
+}
+
+/* Helper function to handle DMA data transfers. */
+void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+ enum dma_data_direction dir)
+{
+ u64 page_offset, to_go, addr;
+
+ if (PageHighMem(page))
+ return;
+ addr = (u64)page_address(page) + offset;
+ /*
+ * The kernel may occasionally give us adjacent DMA pages not belonging
+ * to the same allocation. Process them separately to avoid triggering
+ * internal KMSAN checks.
+ */
+ while (size > 0) {
+ page_offset = addr % PAGE_SIZE;
+ to_go = min(PAGE_SIZE - page_offset, (u64)size);
+ kmsan_handle_dma_page((void *)addr, to_go, dir);
+ addr += to_go;
+ size -= to_go;
+ }
+}
+EXPORT_SYMBOL(kmsan_handle_dma);
+
+void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
+ enum dma_data_direction dir)
+{
+ struct scatterlist *item;
+ int i;
+
+ for_each_sg(sg, item, nents, i)
+ kmsan_handle_dma(sg_page(item), item->offset, item->length,
+ dir);
+}
+EXPORT_SYMBOL(kmsan_handle_dma_sg);
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:43:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 03/46] kasan: common: adapt to the new prototype of __stack_depot_save()

Pass extra_bits=0, as KASAN does not intend to store additional
information in the stack handle. No functional change.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I932d8f4f11a41b7483e0d57078744cc94697607a
---
mm/kasan/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index d9079ec11f313..5d244746ac4fe 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
unsigned int nr_entries;

nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
- return __stack_depot_save(entries, nr_entries, flags, can_alloc);
+ return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
}

void kasan_set_track(struct kasan_track *track, gfp_t flags)
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:44:29

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 15/46] MAINTAINERS: add entry for KMSAN

Add entry for KMSAN maintainers/reviewers.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ic5836c2bceb6b63f71a60d3327d18af3aa3dab77
---
MAINTAINERS | 12 ++++++++++++
1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5e8c2f6117661..dc73b124971f1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10937,6 +10937,18 @@ F: kernel/kmod.c
F: lib/test_kmod.c
F: tools/testing/selftests/kmod/

+KMSAN
+M: Alexander Potapenko <[email protected]>
+R: Marco Elver <[email protected]>
+R: Dmitry Vyukov <[email protected]>
+L: [email protected]
+S: Maintained
+F: Documentation/dev-tools/kmsan.rst
+F: include/linux/kmsan*.h
+F: lib/Kconfig.kmsan
+F: mm/kmsan/
+F: scripts/Makefile.kmsan
+
KPROBES
M: Naveen N. Rao <[email protected]>
M: Anil S Keshavamurthy <[email protected]>
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:45:46

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 33/46] kmsan: block: skip bio block merging logic for KMSAN

KMSAN doesn't allow treating adjacent memory pages as such, if they were
allocated by different alloc_pages() calls.
The block layer however does so: adjacent pages end up being used
together. To prevent this, make page_is_mergeable() return false under
KMSAN.

Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Alexander Potapenko <[email protected]>

---

Link: https://linux-review.googlesource.com/id/Ie29cc2464c70032347c32ab2a22e1e7a0b37b905
---
block/bio.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index 4259125e16ab2..db56090c00bae 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -836,6 +836,8 @@ static inline bool page_is_mergeable(const struct bio_vec *bv,
return false;

*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+ if (!*same_page && IS_ENABLED(CONFIG_KMSAN))
+ return false;
if (*same_page)
return true;
return (bv->bv_page + bv_end / PAGE_SIZE) == (page + off / PAGE_SIZE);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:46:33

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 45/46] x86: kmsan: handle register passing from uninstrumented code

Replace instrumentation_begin() with instrumentation_begin_with_regs()
to let KMSAN handle the non-instrumented code and unpoison pt_regs
passed from the instrumented part. This is done to reduce the number of
false positive reports.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- this patch was previously called "x86: kmsan: handle register
passing from uninstrumented code". Instead of adding KMSAN-specific
code to every instrumentation_begin()/instrumentation_end() section,
we changed instrumentation_begin() to
instrumentation_begin_with_regs() where applicable.

Link: https://linux-review.googlesource.com/id/I435ec076cd21752c2f877f5da81f5eced62a2ea4
---
arch/x86/entry/common.c | 3 ++-
arch/x86/include/asm/idtentry.h | 10 +++++-----
arch/x86/kernel/cpu/mce/core.c | 2 +-
arch/x86/kernel/kvm.c | 2 +-
arch/x86/kernel/nmi.c | 2 +-
arch/x86/kernel/sev.c | 4 ++--
arch/x86/kernel/traps.c | 14 +++++++-------
arch/x86/mm/fault.c | 2 +-
8 files changed, 20 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 6c2826417b337..047d157987859 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -14,6 +14,7 @@
#include <linux/mm.h>
#include <linux/smp.h>
#include <linux/errno.h>
+#include <linux/kmsan.h>
#include <linux/ptrace.h>
#include <linux/export.h>
#include <linux/nospec.h>
@@ -75,7 +76,7 @@ __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
add_random_kstack_offset();
nr = syscall_enter_from_user_mode(regs, nr);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
/* Invalid system call, but still a system call. */
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index 7924f27f5c8b1..172b9b6f90628 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -53,7 +53,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__##func (regs); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -100,7 +100,7 @@ __visible noinstr void func(struct pt_regs *regs, \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__##func (regs, error_code); \
instrumentation_end(); \
irqentry_exit(regs, state); \
@@ -197,7 +197,7 @@ __visible noinstr void func(struct pt_regs *regs, \
irqentry_state_t state = irqentry_enter(regs); \
u32 vector = (u32)(u8)error_code; \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_irq_on_irqstack_cond(__##func, regs, vector); \
instrumentation_end(); \
@@ -237,7 +237,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
kvm_set_cpu_l1tf_flush_l1d(); \
run_sysvec_on_irqstack_cond(__##func, regs); \
instrumentation_end(); \
@@ -264,7 +264,7 @@ __visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
\
- instrumentation_begin(); \
+ instrumentation_begin_with_regs(regs); \
__irq_enter_raw(); \
kvm_set_cpu_l1tf_flush_l1d(); \
__##func (regs); \
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 981496e6bc0e4..e5acff54f7d55 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1376,7 +1376,7 @@ static void queue_task_work(struct mce *m, char *msg, void (*func)(struct callba
/* Handle unconfigured int18 (should never happen) */
static noinstr void unexpected_machine_check(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
pr_err("CPU#%d: Unexpected int18 (Machine Check)\n",
smp_processor_id());
instrumentation_end();
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 8b1c45c9cda87..3df82a51ab1b5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -250,7 +250,7 @@ noinstr bool __kvm_handle_async_pf(struct pt_regs *regs, u32 token)
return false;

state = irqentry_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* If the host managed to inject an async #PF into an interrupt
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index e73f7df362f5d..5078417e16ec1 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -328,7 +328,7 @@ static noinstr void default_do_nmi(struct pt_regs *regs)

__this_cpu_write(last_nmi_rip, regs->ip);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

handled = nmi_handle(NMI_LOCAL, regs);
__this_cpu_add(nmi_stats.normal, handled);
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index e6d316a01fdd4..9bfc29fc9c983 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -1330,7 +1330,7 @@ DEFINE_IDTENTRY_VC_KERNEL(exc_vmm_communication)

irq_state = irqentry_nmi_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/* Show some debug info */
@@ -1362,7 +1362,7 @@ DEFINE_IDTENTRY_VC_USER(exc_vmm_communication)
}

irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

if (!vc_raw_handle_exception(regs, error_code)) {
/*
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 1563fb9950059..9d3c9c4de94d3 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -305,7 +305,7 @@ static noinstr bool handle_bug(struct pt_regs *regs)
/*
* All lies, just get the WARN/BUG out.
*/
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
/*
* Since we're emulating a CALL with exceptions, restore the interrupt
* state to what it was at the exception site.
@@ -336,7 +336,7 @@ DEFINE_IDTENTRY_RAW(exc_invalid_op)
return;

state = irqentry_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
handle_invalid_op(regs);
instrumentation_end();
irqentry_exit(regs, state);
@@ -490,7 +490,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault)
#endif

irqentry_nmi_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV);

tsk->thread.error_code = error_code;
@@ -820,14 +820,14 @@ DEFINE_IDTENTRY_RAW(exc_int3)
*/
if (user_mode(regs)) {
irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
do_int3_user(regs);
instrumentation_end();
irqentry_exit_to_user_mode(regs);
} else {
irqentry_state_t irq_state = irqentry_nmi_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
if (!do_int3(regs))
die("int3", regs, 0);
instrumentation_end();
@@ -1026,7 +1026,7 @@ static __always_inline void exc_debug_kernel(struct pt_regs *regs,
*/
unsigned long dr7 = local_db_save();
irqentry_state_t irq_state = irqentry_nmi_enter(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* If something gets miswired and we end up here for a user mode
@@ -1105,7 +1105,7 @@ static __always_inline void exc_debug_user(struct pt_regs *regs,
*/

irqentry_enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);

/*
* Start the virtual/ptrace DR6 value with just the DR_STEP mask
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f2250a32a10ca..676e394f1af5b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1557,7 +1557,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
*/
state = irqentry_enter(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
handle_page_fault(regs, error_code, address);
instrumentation_end();

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 10:51:57

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 11/46] libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE

KMSAN adds extra metadata fields to struct page, so it does not fit into
64 bytes anymore.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I353796acc6a850bfd7bb342aa1b63e616fc614f1
---
drivers/nvdimm/nd.h | 2 +-
drivers/nvdimm/pfn_devs.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index ec5219680092d..85ca5b4da3cf3 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -652,7 +652,7 @@ void devm_namespace_disable(struct device *dev,
struct nd_namespace_common *ndns);
#if IS_ENABLED(CONFIG_ND_CLAIM)
/* max struct page size independent of kernel config */
-#define MAX_STRUCT_PAGE_SIZE 64
+#define MAX_STRUCT_PAGE_SIZE 128
int nvdimm_setup_pfn(struct nd_pfn *nd_pfn, struct dev_pagemap *pgmap);
#else
static inline int nvdimm_setup_pfn(struct nd_pfn *nd_pfn,
diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
index c31e184bfa45e..d51a3cd6581b1 100644
--- a/drivers/nvdimm/pfn_devs.c
+++ b/drivers/nvdimm/pfn_devs.c
@@ -784,7 +784,7 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
* when populating the vmemmap. This *should* be equal to
* PMD_SIZE for most architectures.
*
- * Also make sure size of struct page is less than 64. We
+ * Also make sure size of struct page is less than 128. We
* want to make sure we use large enough size here so that
* we don't have a dynamic reserve space depending on
* struct page size. But we also want to make sure we notice
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:01:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 18/46] kmsan: handle task creation and exiting

Tell KMSAN that a new task is created, so the tool creates a backing
metadata structure for that task.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move implementation of kmsan_task_create() and kmsan_task_exit() here

Link: https://linux-review.googlesource.com/id/I0f41c3a1c7d66f7e14aabcfdfc7c69addb945805
---
include/linux/kmsan.h | 17 +++++++++++++++++
kernel/exit.c | 2 ++
kernel/fork.c | 2 ++
mm/kmsan/core.c | 10 ++++++++++
mm/kmsan/hooks.c | 19 +++++++++++++++++++
mm/kmsan/kmsan.h | 2 ++
6 files changed, 52 insertions(+)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index ed3630068e2ef..dca42e0e91991 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -17,6 +17,7 @@

struct page;
struct kmem_cache;
+struct task_struct;

#ifdef CONFIG_KMSAN

@@ -43,6 +44,14 @@ struct kmsan_ctx {
bool allow_reporting;
};

+void kmsan_task_create(struct task_struct *task);
+
+/**
+ * kmsan_task_exit() - Notify KMSAN that a task has exited.
+ * @task: task about to finish.
+ */
+void kmsan_task_exit(struct task_struct *task);
+
/**
* kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
* @page: struct page pointer returned by alloc_pages().
@@ -164,6 +173,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

#else

+static inline void kmsan_task_create(struct task_struct *task)
+{
+}
+
+static inline void kmsan_task_exit(struct task_struct *task)
+{
+}
+
static inline int kmsan_alloc_page(struct page *page, unsigned int order,
gfp_t flags)
{
diff --git a/kernel/exit.c b/kernel/exit.c
index f072959fcab7f..1784b7a741ddd 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -60,6 +60,7 @@
#include <linux/writeback.h>
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/kmsan.h>
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
@@ -741,6 +742,7 @@ void __noreturn do_exit(long code)
WARN_ON(tsk->plug);

kcov_task_exit(tsk);
+ kmsan_task_exit(tsk);

coredump_task_exit(tsk);
ptrace_event(PTRACE_EVENT_EXIT, code);
diff --git a/kernel/fork.c b/kernel/fork.c
index 9796897560ab1..a6178bd28c409 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -37,6 +37,7 @@
#include <linux/fdtable.h>
#include <linux/iocontext.h>
#include <linux/key.h>
+#include <linux/kmsan.h>
#include <linux/binfmts.h>
#include <linux/mman.h>
#include <linux/mmu_notifier.h>
@@ -1027,6 +1028,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->worker_private = NULL;

kcov_task_init(tsk);
+ kmsan_task_create(tsk);
kmap_local_fork(tsk);

#ifdef CONFIG_FAULT_INJECTION
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
index 933d864d9d467..4b405abbb6c03 100644
--- a/mm/kmsan/core.c
+++ b/mm/kmsan/core.c
@@ -44,6 +44,16 @@ bool kmsan_enabled __read_mostly;
*/
DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);

+void kmsan_internal_task_create(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ __memset(ctx, 0, sizeof(struct kmsan_ctx));
+ ctx->allow_reporting = true;
+ kmsan_internal_unpoison_memory(current_thread_info(),
+ sizeof(struct thread_info), false);
+}
+
void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
unsigned int poison_flags)
{
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 052e17b7a717d..43a529569053d 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,25 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_task_create(struct task_struct *task)
+{
+ kmsan_enter_runtime();
+ kmsan_internal_task_create(task);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_task_create);
+
+void kmsan_task_exit(struct task_struct *task)
+{
+ struct kmsan_ctx *ctx = &task->kmsan_ctx;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ ctx->allow_reporting = false;
+}
+EXPORT_SYMBOL(kmsan_task_exit);
+
void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
{
if (unlikely(object == NULL))
diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
index bfe38789950a6..a1b5900ffd97b 100644
--- a/mm/kmsan/kmsan.h
+++ b/mm/kmsan/kmsan.h
@@ -172,6 +172,8 @@ void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
u32 origin, bool checked);
depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);

+void kmsan_internal_task_create(struct task_struct *task);
+
bool kmsan_metadata_is_contiguous(void *addr, size_t size);
void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
int reason);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:03:26

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 09/46] kmsan: mark noinstr as __no_sanitize_memory

noinstr functions should never be instrumented, so make KMSAN skip them
by applying the __no_sanitize_memory attribute.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- moved this patch earlier in the series per Mark Rutland's request

Link: https://linux-review.googlesource.com/id/I3c9abe860b97b49bc0c8026918b17a50448dec0d
---
include/linux/compiler_types.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 1c2c33ae1b37d..a9ba5edd8208b 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -227,7 +227,8 @@ struct ftrace_likely_data {
/* Section for code which can't be instrumented at all */
#define noinstr \
noinline notrace __attribute((__section__(".noinstr.text"))) \
- __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage
+ __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \
+ __no_sanitize_memory

#endif /* __KERNEL__ */

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:12:45

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 34/46] kmsan: kcov: unpoison area->list in kcov_remote_area_put()

KMSAN does not instrument kernel/kcov.c for performance reasons (with
CONFIG_KCOV=y virtually every place in the kernel invokes kcov
instrumentation). Therefore the tool may miss writes from kcov.c that
initialize memory.

When CONFIG_DEBUG_LIST is enabled, list pointers from kernel/kcov.c are
passed to instrumented helpers in lib/list_debug.c, resulting in false
positives.

To work around these reports, we unpoison the contents of area->list after
initializing it.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ie17f2ee47a7af58f5cdf716d585ebf0769348a5a
---
kernel/kcov.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kcov.c b/kernel/kcov.c
index b3732b2105930..9e38209a7e0a9 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -11,6 +11,7 @@
#include <linux/fs.h>
#include <linux/hashtable.h>
#include <linux/init.h>
+#include <linux/kmsan-checks.h>
#include <linux/mm.h>
#include <linux/preempt.h>
#include <linux/printk.h>
@@ -152,6 +153,12 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
INIT_LIST_HEAD(&area->list);
area->size = size;
list_add(&area->list, &kcov_remote_areas);
+ /*
+ * KMSAN doesn't instrument this file, so it may not know area->list
+ * is initialized. Unpoison it explicitly to avoid reports in
+ * kcov_remote_area_get().
+ */
+ kmsan_unpoison_memory(&area->list, sizeof(struct list_head));
}

static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:15:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 21/46] kmsan: unpoison @tlb in arch_tlb_gather_mmu()

This is a hack to reduce stackdepot pressure.

struct mmu_gather contains 7 1-bit fields packed into a 32-bit unsigned
int value. The remaining 25 bits remain uninitialized and are never used,
but KMSAN updates the origin for them in zap_pXX_range() in mm/memory.c,
thus creating very long origin chains. This is technically correct, but
consumes too much memory.

Unpoisoning the whole structure will prevent creating such chains.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I76abee411b8323acfdbc29bc3a60dca8cff2de77
---
mm/mmu_gather.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
index afb7185ffdc45..2f3821268b311 100644
--- a/mm/mmu_gather.c
+++ b/mm/mmu_gather.c
@@ -1,6 +1,7 @@
#include <linux/gfp.h>
#include <linux/highmem.h>
#include <linux/kernel.h>
+#include <linux/kmsan-checks.h>
#include <linux/mmdebug.h>
#include <linux/mm_types.h>
#include <linux/mm_inline.h>
@@ -253,6 +254,15 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
static void __tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
bool fullmm)
{
+ /*
+ * struct mmu_gather contains 7 1-bit fields packed into a 32-bit
+ * unsigned int value. The remaining 25 bits remain uninitialized
+ * and are never used, but KMSAN updates the origin for them in
+ * zap_pXX_range() in mm/memory.c, thus creating very long origin
+ * chains. This is technically correct, but consumes too much memory.
+ * Unpoisoning the whole structure will prevent creating such chains.
+ */
+ kmsan_unpoison_memory(tlb, sizeof(*tlb));
tlb->mm = mm;
tlb->fullmm = fullmm;

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:15:38

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

Replace instrumentation_begin() with instrumentation_begin_with_regs()
to let KMSAN handle the non-instrumented code and unpoison pt_regs
passed from the instrumented part.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I7f0a9809b66bd85faae43142971d0095771b7a42
---
kernel/entry/common.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index 93c3b86e781c1..ce2324374882c 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
CT_WARN_ON(ct_state() != CONTEXT_USER);
user_exit_irqoff();

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();
}
@@ -105,7 +105,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)

__enter_from_user_mode(regs);

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
local_irq_enable();
ret = __syscall_enter_from_user_work(regs, syscall);
instrumentation_end();
@@ -116,7 +116,7 @@ noinstr long syscall_enter_from_user_mode(struct pt_regs *regs, long syscall)
noinstr void syscall_enter_from_user_mode_prepare(struct pt_regs *regs)
{
__enter_from_user_mode(regs);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
local_irq_enable();
instrumentation_end();
}
@@ -290,7 +290,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs)

__visible noinstr void syscall_exit_to_user_mode(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
__syscall_exit_to_user_mode_work(regs);
instrumentation_end();
__exit_to_user_mode();
@@ -303,7 +303,7 @@ noinstr void irqentry_enter_from_user_mode(struct pt_regs *regs)

noinstr void irqentry_exit_to_user_mode(struct pt_regs *regs)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
exit_to_user_mode_prepare(regs);
instrumentation_end();
__exit_to_user_mode();
@@ -351,7 +351,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
*/
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
instrumentation_end();

@@ -366,7 +366,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs)
* in having another one here.
*/
lockdep_hardirqs_off(CALLER_ADDR0);
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
rcu_irq_enter_check_tick();
trace_hardirqs_off_finish();
instrumentation_end();
@@ -413,7 +413,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
* and RCU as the return to user mode path.
*/
if (state.exit_rcu) {
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
/* Tell the tracer that IRET will enable interrupts */
trace_hardirqs_on_prepare();
lockdep_hardirqs_on_prepare(CALLER_ADDR0);
@@ -423,7 +423,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
return;
}

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
if (IS_ENABLED(CONFIG_PREEMPTION))
irqentry_exit_cond_resched();

@@ -451,7 +451,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
lockdep_hardirq_enter();
rcu_nmi_enter();

- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();
@@ -461,7 +461,7 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)

void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
{
- instrumentation_begin();
+ instrumentation_begin_with_regs(regs);
ftrace_nmi_exit();
if (irq_state.lockdep) {
trace_hardirqs_on_prepare();
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:15:53

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 27/46] kmsan: instrumentation.h: add instrumentation_begin_with_regs()

When calling KMSAN-instrumented functions from non-instrumented
functions, function parameters may not be initialized properly, leading
to false positive reports. In particular, this happens all the time when
calling interrupt handlers from `noinstr` IDT entries.

We introduce instrumentation_begin_with_regs(), which calls
instrumentation_begin() and notifies KMSAN about the beginning of the
potentially instrumented region by calling
kmsan_instrumentation_begin(), which:
- wipes the current KMSAN state at the beginning of the region, ensuring
that the first call of an instrumented function receives initialized
parameters (this is a pretty good approximation of having all other
instrumented functions receive initialized parameters);
- unpoisons the `struct pt_regs` set up by the non-instrumented assembly
code.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I0f5e3372e00bd5fe25ddbf286f7260aae9011858
---
include/linux/instrumentation.h | 6 ++++++
include/linux/kmsan.h | 11 +++++++++++
mm/kmsan/hooks.c | 16 ++++++++++++++++
3 files changed, 33 insertions(+)

diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
index 24359b4a96053..3bbce9d556381 100644
--- a/include/linux/instrumentation.h
+++ b/include/linux/instrumentation.h
@@ -15,6 +15,11 @@
})
#define instrumentation_begin() __instrumentation_begin(__COUNTER__)

+#define instrumentation_begin_with_regs(regs) do { \
+ __instrumentation_begin(__COUNTER__); \
+ kmsan_instrumentation_begin(regs); \
+} while (0)
+
/*
* Because instrumentation_{begin,end}() can nest, objtool validation considers
* _begin() a +1 and _end() a -1 and computes a sum over the instructions.
@@ -55,6 +60,7 @@
#define instrumentation_end() __instrumentation_end(__COUNTER__)
#else
# define instrumentation_begin() do { } while(0)
+# define instrumentation_begin_with_regs(regs) kmsan_instrumentation_begin(regs)
# define instrumentation_end() do { } while(0)
#endif

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 55f976b721566..209a5a2192e22 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -247,6 +247,13 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
*/
void kmsan_handle_urb(const struct urb *urb, bool is_out);

+/**
+ * kmsan_instrumentation_begin() - handle instrumentation_begin().
+ * @regs: pointer to struct pt_regs that non-instrumented code passes to
+ * instrumented code.
+ */
+void kmsan_instrumentation_begin(struct pt_regs *regs);
+
#else

static inline void kmsan_init_shadow(void)
@@ -343,6 +350,10 @@ static inline void kmsan_handle_urb(const struct urb *urb, bool is_out)
{
}

+static inline void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 9aecbf2825837..c20d105c143c1 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -366,3 +366,19 @@ void kmsan_check_memory(const void *addr, size_t size)
REASON_ANY);
}
EXPORT_SYMBOL(kmsan_check_memory);
+
+void kmsan_instrumentation_begin(struct pt_regs *regs)
+{
+ struct kmsan_context_state *state = &kmsan_get_context()->cstate;
+
+ if (state)
+ __memset(state, 0, sizeof(struct kmsan_context_state));
+ if (!kmsan_enabled || !regs)
+ return;
+ /*
+ * @regs may reside in cpu_entry_area, for which KMSAN does not allocate
+ * metadata. Do not force an error in that case.
+ */
+ kmsan_internal_unpoison_memory(regs, sizeof(*regs), /*checked*/ false);
+}
+EXPORT_SYMBOL(kmsan_instrumentation_begin);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:18:54

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 16/46] kmsan: mm: maintain KMSAN metadata for page operations

Insert KMSAN hooks that make the necessary bookkeeping changes:
- poison page shadow and origins in alloc_pages()/free_page();
- clear page shadow and origins in clear_page(), copy_user_highpage();
- copy page metadata in copy_highpage(), wp_page_copy();
- handle vmap()/vunmap()/iounmap();

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move page metadata hooks implementation here
-- remove call to kmsan_memblock_free_pages()

v3:
-- use PAGE_SHIFT in kmsan_ioremap_page_range()

Link: https://linux-review.googlesource.com/id/I6d4f53a0e7eab46fa29f0348f3095d9f2e326850
---
arch/x86/include/asm/page_64.h | 13 ++++
arch/x86/mm/ioremap.c | 3 +
include/linux/highmem.h | 3 +
include/linux/kmsan.h | 123 +++++++++++++++++++++++++++++++++
mm/internal.h | 6 ++
mm/kmsan/hooks.c | 87 +++++++++++++++++++++++
mm/kmsan/shadow.c | 114 ++++++++++++++++++++++++++++++
mm/memory.c | 2 +
mm/page_alloc.c | 11 +++
mm/vmalloc.c | 20 +++++-
10 files changed, 380 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h
index e9c86299b8351..36e270a8ea9a4 100644
--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -45,14 +45,27 @@ void clear_page_orig(void *page);
void clear_page_rep(void *page);
void clear_page_erms(void *page);

+/* This is an assembly header, avoid including too much of kmsan.h */
+#ifdef CONFIG_KMSAN
+void kmsan_unpoison_memory(const void *addr, size_t size);
+#endif
+__no_sanitize_memory
static inline void clear_page(void *page)
{
+#ifdef CONFIG_KMSAN
+ /* alternative_call_2() changes @page. */
+ void *page_copy = page;
+#endif
alternative_call_2(clear_page_orig,
clear_page_rep, X86_FEATURE_REP_GOOD,
clear_page_erms, X86_FEATURE_ERMS,
"=D" (page),
"0" (page)
: "cc", "memory", "rax", "rcx");
+#ifdef CONFIG_KMSAN
+ /* Clear KMSAN shadow for the pages that have it. */
+ kmsan_unpoison_memory(page_copy, PAGE_SIZE);
+#endif
}

void copy_page(void *to, void *from);
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 17a492c273069..0da8608778221 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -17,6 +17,7 @@
#include <linux/cc_platform.h>
#include <linux/efi.h>
#include <linux/pgtable.h>
+#include <linux/kmsan.h>

#include <asm/set_memory.h>
#include <asm/e820/api.h>
@@ -474,6 +475,8 @@ void iounmap(volatile void __iomem *addr)
return;
}

+ kmsan_iounmap_page_range((unsigned long)addr,
+ (unsigned long)addr + get_vm_area_size(p));
memtype_free(p->phys_addr, p->phys_addr + get_vm_area_size(p));

/* Finally remove it */
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 39bb9b47fa9cd..3e1898a44d7e3 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/bug.h>
#include <linux/cacheflush.h>
+#include <linux/kmsan.h>
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <linux/hardirq.h>
@@ -277,6 +278,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_user_page(vto, vfrom, vaddr, to);
+ kmsan_unpoison_memory(page_address(to), PAGE_SIZE);
kunmap_local(vto);
kunmap_local(vfrom);
}
@@ -292,6 +294,7 @@ static inline void copy_highpage(struct page *to, struct page *from)
vfrom = kmap_local_page(from);
vto = kmap_local_page(to);
copy_page(vto, vfrom);
+ kmsan_copy_page_meta(to, from);
kunmap_local(vto);
kunmap_local(vfrom);
}
diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 4e35f43eceaa9..da41850b46cbd 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -42,6 +42,129 @@ struct kmsan_ctx {
bool allow_reporting;
};

+/**
+ * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call.
+ * @page: struct page pointer returned by alloc_pages().
+ * @order: order of allocated struct page.
+ * @flags: GFP flags used by alloc_pages()
+ *
+ * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless
+ * @flags contain __GFP_ZERO.
+ */
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags);
+
+/**
+ * kmsan_free_page() - Notify KMSAN about a free_pages() call.
+ * @page: struct page pointer passed to free_pages().
+ * @order: order of deallocated struct page.
+ *
+ * KMSAN marks freed memory as uninitialized.
+ */
+void kmsan_free_page(struct page *page, unsigned int order);
+
+/**
+ * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages.
+ * @dst: destination page.
+ * @src: source page.
+ *
+ * KMSAN copies the contents of metadata pages for @src into the metadata pages
+ * for @dst. If @dst has no associated metadata pages, nothing happens.
+ * If @src has no associated metadata pages, @dst metadata pages are unpoisoned.
+ */
+void kmsan_copy_page_meta(struct page *dst, struct page *src);
+
+/**
+ * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
+ * @start: start of vmapped range.
+ * @end: end of vmapped range.
+ * @prot: page protection flags used for vmap.
+ * @pages: array of pages.
+ * @page_shift: page_shift passed to vmap_range_noflush().
+ *
+ * KMSAN maps shadow and origin pages of @pages into contiguous ranges in
+ * vmalloc metadata address range.
+ */
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
+/**
+ * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap.
+ * @start: start of vunmapped range.
+ * @end: end of vunmapped range.
+ *
+ * KMSAN unmaps the contiguous metadata ranges created by
+ * kmsan_map_kernel_range_noflush().
+ */
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end);
+
+/**
+ * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call.
+ * @addr: range start.
+ * @end: range end.
+ * @phys_addr: physical range start.
+ * @prot: page protection flags used for ioremap_page_range().
+ * @page_shift: page_shift argument passed to vmap_range_noflush().
+ *
+ * KMSAN creates new metadata pages for the physical pages mapped into the
+ * virtual memory.
+ */
+void kmsan_ioremap_page_range(unsigned long addr, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift);
+
+/**
+ * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call.
+ * @start: range start.
+ * @end: range end.
+ *
+ * KMSAN unmaps the metadata pages for the given range and, unlike for
+ * vunmap_page_range(), also deallocates them.
+ */
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
+
+#else
+
+static inline int kmsan_alloc_page(struct page *page, unsigned int order,
+ gfp_t flags)
+{
+ return 0;
+}
+
+static inline void kmsan_free_page(struct page *page, unsigned int order)
+{
+}
+
+static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+}
+
+static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
+ unsigned long end,
+ pgprot_t prot,
+ struct page **pages,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_vunmap_range_noflush(unsigned long start,
+ unsigned long end)
+{
+}
+
+static inline void kmsan_ioremap_page_range(unsigned long start,
+ unsigned long end,
+ phys_addr_t phys_addr,
+ pgprot_t prot,
+ unsigned int page_shift)
+{
+}
+
+static inline void kmsan_iounmap_page_range(unsigned long start,
+ unsigned long end)
+{
+}
+
#endif

#endif /* _LINUX_KMSAN_H */
diff --git a/mm/internal.h b/mm/internal.h
index cf16280ce1321..3cf6fde8f02c4 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -744,8 +744,14 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
}
#endif

+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift);
+
void vunmap_range_noflush(unsigned long start, unsigned long end);

+void __vunmap_range_noflush(unsigned long start, unsigned long end);
+
int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 4ac62fa67a02a..070756be70e3a 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,93 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+static unsigned long vmalloc_shadow(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_SHADOW);
+}
+
+static unsigned long vmalloc_origin(unsigned long addr)
+{
+ return (unsigned long)kmsan_get_metadata((void *)addr,
+ KMSAN_META_ORIGIN);
+}
+
+void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ __vunmap_range_noflush(vmalloc_shadow(start), vmalloc_shadow(end));
+ __vunmap_range_noflush(vmalloc_origin(start), vmalloc_origin(end));
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+}
+EXPORT_SYMBOL(kmsan_vunmap_range_noflush);
+
+/*
+ * This function creates new shadow/origin pages for the physical pages mapped
+ * into the virtual memory. If those physical pages already had shadow/origin,
+ * those are ignored.
+ */
+void kmsan_ioremap_page_range(unsigned long start, unsigned long end,
+ phys_addr_t phys_addr, pgprot_t prot,
+ unsigned int page_shift)
+{
+ gfp_t gfp_mask = GFP_KERNEL | __GFP_ZERO;
+ struct page *shadow, *origin;
+ unsigned long off = 0;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ for (i = 0; i < nr; i++, off += PAGE_SIZE) {
+ shadow = alloc_pages(gfp_mask, 1);
+ origin = alloc_pages(gfp_mask, 1);
+ __vmap_pages_range_noflush(
+ vmalloc_shadow(start + off),
+ vmalloc_shadow(start + off + PAGE_SIZE), prot, &shadow,
+ PAGE_SHIFT);
+ __vmap_pages_range_noflush(
+ vmalloc_origin(start + off),
+ vmalloc_origin(start + off + PAGE_SIZE), prot, &origin,
+ PAGE_SHIFT);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_ioremap_page_range);
+
+void kmsan_iounmap_page_range(unsigned long start, unsigned long end)
+{
+ unsigned long v_shadow, v_origin;
+ struct page *shadow, *origin;
+ int i, nr;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ kmsan_enter_runtime();
+ v_shadow = (unsigned long)vmalloc_shadow(start);
+ v_origin = (unsigned long)vmalloc_origin(start);
+ for (i = 0; i < nr; i++, v_shadow += PAGE_SIZE, v_origin += PAGE_SIZE) {
+ shadow = kmsan_vmalloc_to_page_or_null((void *)v_shadow);
+ origin = kmsan_vmalloc_to_page_or_null((void *)v_origin);
+ __vunmap_range_noflush(v_shadow, vmalloc_shadow(end));
+ __vunmap_range_noflush(v_origin, vmalloc_origin(end));
+ if (shadow)
+ __free_pages(shadow, 1);
+ if (origin)
+ __free_pages(origin, 1);
+ }
+ flush_cache_vmap(vmalloc_shadow(start), vmalloc_shadow(end));
+ flush_cache_vmap(vmalloc_origin(start), vmalloc_origin(end));
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_iounmap_page_range);
+
/* Functions from kmsan-checks.h follow. */
void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
{
diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
index de58cfbc55b9d..8fe6a5ed05e67 100644
--- a/mm/kmsan/shadow.c
+++ b/mm/kmsan/shadow.c
@@ -184,3 +184,117 @@ void *kmsan_get_metadata(void *address, bool is_origin)
ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
return ret;
}
+
+void kmsan_copy_page_meta(struct page *dst, struct page *src)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ if (!dst || !page_has_metadata(dst))
+ return;
+ if (!src || !page_has_metadata(src)) {
+ kmsan_internal_unpoison_memory(page_address(dst), PAGE_SIZE,
+ /*checked*/ false);
+ return;
+ }
+
+ kmsan_enter_runtime();
+ __memcpy(shadow_ptr_for(dst), shadow_ptr_for(src), PAGE_SIZE);
+ __memcpy(origin_ptr_for(dst), origin_ptr_for(src), PAGE_SIZE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags)
+{
+ bool initialized = (flags & __GFP_ZERO) || !kmsan_enabled;
+ struct page *shadow, *origin;
+ depot_stack_handle_t handle;
+ int pages = 1 << order;
+ int i;
+
+ if (!page)
+ return;
+
+ shadow = shadow_page_for(page);
+ origin = origin_page_for(page);
+
+ if (initialized) {
+ __memset(page_address(shadow), 0, PAGE_SIZE * pages);
+ __memset(page_address(origin), 0, PAGE_SIZE * pages);
+ return;
+ }
+
+ /* Zero pages allocated by the runtime should also be initialized. */
+ if (kmsan_in_runtime())
+ return;
+
+ __memset(page_address(shadow), -1, PAGE_SIZE * pages);
+ kmsan_enter_runtime();
+ handle = kmsan_save_stack_with_flags(flags, /*extra_bits*/ 0);
+ kmsan_leave_runtime();
+ /*
+ * Addresses are page-aligned, pages are contiguous, so it's ok
+ * to just fill the origin pages with |handle|.
+ */
+ for (i = 0; i < PAGE_SIZE * pages / sizeof(handle); i++)
+ ((depot_stack_handle_t *)page_address(origin))[i] = handle;
+}
+
+void kmsan_free_page(struct page *page, unsigned int order)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(page_address(page),
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+
+void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end,
+ pgprot_t prot, struct page **pages,
+ unsigned int page_shift)
+{
+ unsigned long shadow_start, origin_start, shadow_end, origin_end;
+ struct page **s_pages, **o_pages;
+ int nr, i, mapped;
+
+ if (!kmsan_enabled)
+ return;
+
+ shadow_start = vmalloc_meta((void *)start, KMSAN_META_SHADOW);
+ shadow_end = vmalloc_meta((void *)end, KMSAN_META_SHADOW);
+ if (!shadow_start)
+ return;
+
+ nr = (end - start) / PAGE_SIZE;
+ s_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ o_pages = kcalloc(nr, sizeof(struct page *), GFP_KERNEL);
+ if (!s_pages || !o_pages)
+ goto ret;
+ for (i = 0; i < nr; i++) {
+ s_pages[i] = shadow_page_for(pages[i]);
+ o_pages[i] = origin_page_for(pages[i]);
+ }
+ prot = __pgprot(pgprot_val(prot) | _PAGE_NX);
+ prot = PAGE_KERNEL;
+
+ origin_start = vmalloc_meta((void *)start, KMSAN_META_ORIGIN);
+ origin_end = vmalloc_meta((void *)end, KMSAN_META_ORIGIN);
+ kmsan_enter_runtime();
+ mapped = __vmap_pages_range_noflush(shadow_start, shadow_end, prot,
+ s_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ mapped = __vmap_pages_range_noflush(origin_start, origin_end, prot,
+ o_pages, page_shift);
+ KMSAN_WARN_ON(mapped);
+ kmsan_leave_runtime();
+ flush_tlb_kernel_range(shadow_start, shadow_end);
+ flush_tlb_kernel_range(origin_start, origin_end);
+ flush_cache_vmap(shadow_start, shadow_end);
+ flush_cache_vmap(origin_start, origin_end);
+
+ret:
+ kfree(s_pages);
+ kfree(o_pages);
+}
diff --git a/mm/memory.c b/mm/memory.c
index 76e3af9639d93..04aa68acffeb3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -52,6 +52,7 @@
#include <linux/highmem.h>
#include <linux/pagemap.h>
#include <linux/memremap.h>
+#include <linux/kmsan.h>
#include <linux/ksm.h>
#include <linux/rmap.h>
#include <linux/export.h>
@@ -3032,6 +3033,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
put_page(old_page);
return 0;
}
+ kmsan_copy_page_meta(new_page, old_page);
}

if (mem_cgroup_charge(page_folio(new_page), mm, GFP_KERNEL))
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e42038382c12..98393e01e4259 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -27,6 +27,7 @@
#include <linux/compiler.h>
#include <linux/kernel.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/module.h>
#include <linux/suspend.h>
#include <linux/pagevec.h>
@@ -1305,6 +1306,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
VM_BUG_ON_PAGE(PageTail(page), page);

trace_mm_page_free(page, order);
+ kmsan_free_page(page, order);

if (unlikely(PageHWPoison(page)) && !order) {
/*
@@ -3696,6 +3698,14 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
/*
* Allocate a page from the given zone. Use pcplists for order-0 allocations.
*/
+
+/*
+ * Do not instrument rmqueue() with KMSAN. This function may call
+ * __msan_poison_alloca() through a call to set_pfnblock_flags_mask().
+ * If __msan_poison_alloca() attempts to allocate pages for the stack depot, it
+ * may call rmqueue() again, which will result in a deadlock.
+ */
+__no_sanitize_memory
static inline
struct page *rmqueue(struct zone *preferred_zone,
struct zone *zone, unsigned int order,
@@ -5428,6 +5438,7 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
}

trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype);
+ kmsan_alloc_page(page, order, alloc_gfp);

return page;
}
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index cadfbb5155ea5..76a54007e5142 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -320,6 +320,9 @@ int ioremap_page_range(unsigned long addr, unsigned long end,
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
+ if (!err)
+ kmsan_ioremap_page_range(addr, end, phys_addr, prot,
+ ioremap_max_page_shift);
return err;
}

@@ -419,7 +422,7 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-void vunmap_range_noflush(unsigned long start, unsigned long end)
+void __vunmap_range_noflush(unsigned long start, unsigned long end)
{
unsigned long next;
pgd_t *pgd;
@@ -441,6 +444,12 @@ void vunmap_range_noflush(unsigned long start, unsigned long end)
arch_sync_kernel_mappings(start, end);
}

+void vunmap_range_noflush(unsigned long start, unsigned long end)
+{
+ kmsan_vunmap_range_noflush(start, end);
+ __vunmap_range_noflush(start, end);
+}
+
/**
* vunmap_range - unmap kernel virtual addresses
* @addr: start of the VM area to unmap
@@ -575,7 +584,7 @@ static int vmap_small_pages_range_noflush(unsigned long addr, unsigned long end,
*
* This is an internal function only. Do not use outside mm/.
*/
-int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+int __vmap_pages_range_noflush(unsigned long addr, unsigned long end,
pgprot_t prot, struct page **pages, unsigned int page_shift)
{
unsigned int i, nr = (end - addr) >> PAGE_SHIFT;
@@ -601,6 +610,13 @@ int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
return 0;
}

+int vmap_pages_range_noflush(unsigned long addr, unsigned long end,
+ pgprot_t prot, struct page **pages, unsigned int page_shift)
+{
+ kmsan_vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+ return __vmap_pages_range_noflush(addr, end, prot, pages, page_shift);
+}
+
/**
* vmap_pages_range - map pages to a kernel virtual address
* @addr: start of the VM area to map
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:19:45

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 40/46] x86: kmsan: handle open-coded assembly in lib/iomem.c

KMSAN cannot intercept memory accesses within asm() statements.
That's why we add kmsan_unpoison_memory() and kmsan_check_memory() to
hint it how to handle memory copied from/to I/O memory.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Icb16bf17269087e475debf07a7fe7d4bebc3df23
---
arch/x86/lib/iomem.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/lib/iomem.c b/arch/x86/lib/iomem.c
index 3e2f33fc33de2..e0411a3774d49 100644
--- a/arch/x86/lib/iomem.c
+++ b/arch/x86/lib/iomem.c
@@ -1,6 +1,7 @@
#include <linux/string.h>
#include <linux/module.h>
#include <linux/io.h>
+#include <linux/kmsan-checks.h>

#define movs(type,to,from) \
asm volatile("movs" type:"=&D" (to), "=&S" (from):"0" (to), "1" (from):"memory")
@@ -37,6 +38,8 @@ static void string_memcpy_fromio(void *to, const volatile void __iomem *from, si
n-=2;
}
rep_movs(to, (const void *)from, n);
+ /* KMSAN must treat values read from devices as initialized. */
+ kmsan_unpoison_memory(to, n);
}

static void string_memcpy_toio(volatile void __iomem *to, const void *from, size_t n)
@@ -44,6 +47,8 @@ static void string_memcpy_toio(volatile void __iomem *to, const void *from, size
if (unlikely(!n))
return;

+ /* Make sure uninitialized memory isn't copied to devices. */
+ kmsan_check_memory(from, n);
/* Align any unaligned destination IO */
if (unlikely(1 & (unsigned long)to)) {
movs("b", to, from);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:22:20

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 29/46] kmsan: add tests for KMSAN

The testing module triggers KMSAN warnings in different cases and checks
that the errors are properly reported, using console probes to capture
the tool's output.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- add memcpy tests

Link: https://linux-review.googlesource.com/id/I49c3f59014cc37fd13541c80beb0b75a75244650
---
lib/Kconfig.kmsan | 16 ++
mm/kmsan/Makefile | 4 +
mm/kmsan/kmsan_test.c | 536 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 556 insertions(+)
create mode 100644 mm/kmsan/kmsan_test.c

diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
index 199f79d031f94..a68fdb5ed5d92 100644
--- a/lib/Kconfig.kmsan
+++ b/lib/Kconfig.kmsan
@@ -21,3 +21,19 @@ config KMSAN
the whole system down.

See <file:Documentation/dev-tools/kmsan.rst> for more details.
+
+if KMSAN
+
+config KMSAN_KUNIT_TEST
+ tristate "KMSAN integration test suite" if !KUNIT_ALL_TESTS
+ default KUNIT_ALL_TESTS
+ depends on TRACEPOINTS && KUNIT
+ help
+ Test suite for KMSAN, testing various error detection scenarios,
+ and checking that reports are correctly output to console.
+
+ Say Y here if you want the test to be built into the kernel and run
+ during boot; say M if you want the test to build as a module; say N
+ if you are unsure.
+
+endif
diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
index f57a956cb1c8b..7be6a7e92394f 100644
--- a/mm/kmsan/Makefile
+++ b/mm/kmsan/Makefile
@@ -20,3 +20,7 @@ CFLAGS_init.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
+
+obj-$(CONFIG_KMSAN_KUNIT_TEST) += kmsan_test.o
+KMSAN_SANITIZE_kmsan_test.o := y
+CFLAGS_kmsan_test.o += $(call cc-disable-warning, uninitialized)
diff --git a/mm/kmsan/kmsan_test.c b/mm/kmsan/kmsan_test.c
new file mode 100644
index 0000000000000..44bb2e0f87d81
--- /dev/null
+++ b/mm/kmsan/kmsan_test.c
@@ -0,0 +1,536 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Test cases for KMSAN.
+ * For each test case checks the presence (or absence) of generated reports.
+ * Relies on 'console' tracepoint to capture reports as they appear in the
+ * kernel log.
+ *
+ * Copyright (C) 2021-2022, Google LLC.
+ * Author: Alexander Potapenko <[email protected]>
+ *
+ */
+
+#include <kunit/test.h>
+#include "kmsan.h"
+
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
+#include <linux/kmsan.h>
+#include <linux/mm.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/string.h>
+#include <linux/tracepoint.h>
+#include <trace/events/printk.h>
+
+static DEFINE_PER_CPU(int, per_cpu_var);
+
+/* Report as observed from console. */
+static struct {
+ spinlock_t lock;
+ bool available;
+ bool ignore; /* Stop console output collection. */
+ char header[256];
+} observed = {
+ .lock = __SPIN_LOCK_UNLOCKED(observed.lock),
+};
+
+/* Probe for console output: obtains observed lines of interest. */
+static void probe_console(void *ignore, const char *buf, size_t len)
+{
+ unsigned long flags;
+
+ if (observed.ignore)
+ return;
+ spin_lock_irqsave(&observed.lock, flags);
+
+ if (strnstr(buf, "BUG: KMSAN: ", len)) {
+ /*
+ * KMSAN report and related to the test.
+ *
+ * The provided @buf is not NUL-terminated; copy no more than
+ * @len bytes and let strscpy() add the missing NUL-terminator.
+ */
+ strscpy(observed.header, buf,
+ min(len + 1, sizeof(observed.header)));
+ WRITE_ONCE(observed.available, true);
+ observed.ignore = true;
+ }
+ spin_unlock_irqrestore(&observed.lock, flags);
+}
+
+/* Check if a report related to the test exists. */
+static bool report_available(void)
+{
+ return READ_ONCE(observed.available);
+}
+
+/* Information we expect in a report. */
+struct expect_report {
+ const char *error_type; /* Error type. */
+ /*
+ * Kernel symbol from the error header, or NULL if no report is
+ * expected.
+ */
+ const char *symbol;
+};
+
+/* Check observed report matches information in @r. */
+static bool report_matches(const struct expect_report *r)
+{
+ typeof(observed.header) expected_header;
+ unsigned long flags;
+ bool ret = false;
+ const char *end;
+ char *cur;
+
+ /* Doubled-checked locking. */
+ if (!report_available() || !r->symbol)
+ return (!report_available() && !r->symbol);
+
+ /* Generate expected report contents. */
+
+ /* Title */
+ cur = expected_header;
+ end = &expected_header[sizeof(expected_header) - 1];
+
+ cur += scnprintf(cur, end - cur, "BUG: KMSAN: %s", r->error_type);
+
+ scnprintf(cur, end - cur, " in %s", r->symbol);
+ /* The exact offset won't match, remove it; also strip module name. */
+ cur = strchr(expected_header, '+');
+ if (cur)
+ *cur = '\0';
+
+ spin_lock_irqsave(&observed.lock, flags);
+ if (!report_available())
+ goto out; /* A new report is being captured. */
+
+ /* Finally match expected output to what we actually observed. */
+ ret = strstr(observed.header, expected_header);
+out:
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return ret;
+}
+
+/* ===== Test cases ===== */
+
+/* Prevent replacing branch with select in LLVM. */
+static noinline void check_true(char *arg)
+{
+ pr_info("%s is true\n", arg);
+}
+
+static noinline void check_false(char *arg)
+{
+ pr_info("%s is false\n", arg);
+}
+
+#define USE(x) \
+ do { \
+ if (x) \
+ check_true(#x); \
+ else \
+ check_false(#x); \
+ } while (0)
+
+#define EXPECTATION_ETYPE_FN(e, reason, fn) \
+ struct expect_report e = { \
+ .error_type = reason, \
+ .symbol = fn, \
+ }
+
+#define EXPECTATION_NO_REPORT(e) EXPECTATION_ETYPE_FN(e, NULL, NULL)
+#define EXPECTATION_UNINIT_VALUE_FN(e, fn) \
+ EXPECTATION_ETYPE_FN(e, "uninit-value", fn)
+#define EXPECTATION_UNINIT_VALUE(e) EXPECTATION_UNINIT_VALUE_FN(e, __func__)
+#define EXPECTATION_USE_AFTER_FREE(e) \
+ EXPECTATION_ETYPE_FN(e, "use-after-free", __func__)
+
+/* Test case: ensure that kmalloc() returns uninitialized memory. */
+static void test_uninit_kmalloc(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ int *ptr;
+
+ kunit_info(test, "uninitialized kmalloc test (UMR report)\n");
+ ptr = kmalloc(sizeof(int), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that kmalloc'ed memory becomes initialized after memset().
+ */
+static void test_init_kmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kmalloc test (no reports)\n");
+ ptr = kmalloc(sizeof(int), GFP_KERNEL);
+ memset(ptr, 0, sizeof(int));
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that kzalloc() returns initialized memory. */
+static void test_init_kzalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int *ptr;
+
+ kunit_info(test, "initialized kzalloc test (no reports)\n");
+ ptr = kzalloc(sizeof(int), GFP_KERNEL);
+ USE(*ptr);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables are uninitialized by default. */
+static void test_uninit_stack_var(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int cond;
+
+ kunit_info(test, "uninitialized stack variable (UMR report)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that local variables with initializers are initialized. */
+static void test_init_stack_var(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ volatile int cond = 1;
+
+ kunit_info(test, "initialized stack variable (no reports)\n");
+ USE(cond);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static noinline void two_param_fn_2(int arg1, int arg2)
+{
+ USE(arg1);
+ USE(arg2);
+}
+
+static noinline void one_param_fn(int arg)
+{
+ two_param_fn_2(arg, arg);
+ USE(arg);
+}
+
+static noinline void two_param_fn(int arg1, int arg2)
+{
+ int init = 0;
+
+ one_param_fn(init);
+ USE(arg1);
+ USE(arg2);
+}
+
+static void test_params(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "two_param_fn");
+ volatile int uninit, init = 1;
+
+ kunit_info(test,
+ "uninit passed through a function parameter (UMR report)\n");
+ two_param_fn(uninit, init);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static int signed_sum3(int a, int b, int c)
+{
+ return a + b + c;
+}
+
+/*
+ * Test case: ensure that uninitialized values are tracked through function
+ * arguments.
+ */
+static void test_uninit_multiple_params(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile char b = 3, c;
+ volatile int a;
+
+ kunit_info(test, "uninitialized local passed to fn (UMR report)\n");
+ USE(signed_sum3(a, b, c));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Helper function to make an array uninitialized. */
+static noinline void do_uninit_local_array(char *array, int start, int stop)
+{
+ volatile char uninit;
+ int i;
+
+ for (i = start; i < stop; i++)
+ array[i] = uninit;
+}
+
+/*
+ * Test case: ensure kmsan_check_memory() reports an error when checking
+ * uninitialized memory.
+ */
+static void test_uninit_kmsan_check_memory(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_uninit_kmsan_check_memory");
+ volatile char local_array[8];
+
+ kunit_info(
+ test,
+ "kmsan_check_memory() called on uninit local (UMR report)\n");
+ do_uninit_local_array((char *)local_array, 5, 7);
+
+ kmsan_check_memory((char *)local_array, 8);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: check that a virtual memory range created with vmap() from
+ * initialized pages is still considered as initialized.
+ */
+static void test_init_kmsan_vmap_vunmap(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ const int npages = 2;
+ struct page **pages;
+ void *vbuf;
+ int i;
+
+ kunit_info(test, "pages initialized via vmap (no reports)\n");
+
+ pages = kmalloc_array(npages, sizeof(struct page), GFP_KERNEL);
+ for (i = 0; i < npages; i++)
+ pages[i] = alloc_page(GFP_KERNEL);
+ vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
+ memset(vbuf, 0xfe, npages * PAGE_SIZE);
+ for (i = 0; i < npages; i++)
+ kmsan_check_memory(page_address(pages[i]), PAGE_SIZE);
+
+ if (vbuf)
+ vunmap(vbuf);
+ for (i = 0; i < npages; i++)
+ if (pages[i])
+ __free_page(pages[i]);
+ kfree(pages);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memset() can initialize a buffer allocated via
+ * vmalloc().
+ */
+static void test_init_vmalloc(struct kunit *test)
+{
+ EXPECTATION_NO_REPORT(expect);
+ int npages = 8, i;
+ char *buf;
+
+ kunit_info(test, "vmalloc buffer can be initialized (no reports)\n");
+ buf = vmalloc(PAGE_SIZE * npages);
+ buf[0] = 1;
+ memset(buf, 0xfe, PAGE_SIZE * npages);
+ USE(buf[0]);
+ for (i = 0; i < npages; i++)
+ kmsan_check_memory(&buf[PAGE_SIZE * i], PAGE_SIZE);
+ vfree(buf);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/* Test case: ensure that use-after-free reporting works. */
+static void test_uaf(struct kunit *test)
+{
+ EXPECTATION_USE_AFTER_FREE(expect);
+ volatile int value;
+ volatile int *var;
+
+ kunit_info(test, "use-after-free in kmalloc-ed buffer (UMR report)\n");
+ var = kmalloc(80, GFP_KERNEL);
+ var[3] = 0xfeedface;
+ kfree((int *)var);
+ /* Copy the invalid value before checking it. */
+ value = var[3];
+ USE(value);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that uninitialized values are propagated through per-CPU
+ * memory.
+ */
+static void test_percpu_propagate(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE(expect);
+ volatile int uninit, check;
+
+ kunit_info(test,
+ "uninit local stored to per_cpu memory (UMR report)\n");
+
+ this_cpu_write(per_cpu_var, uninit);
+ check = this_cpu_read(per_cpu_var);
+ USE(check);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that passing uninitialized values to printk() leads to an
+ * error report.
+ */
+static void test_printk(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "number");
+ volatile int uninit;
+
+ kunit_info(test, "uninit local passed to pr_info() (UMR report)\n");
+ pr_info("%px contains %d\n", &uninit, uninit);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and `dst`.
+ */
+static void test_memcpy_aligned_to_aligned(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_aligned");
+ volatile int uninit_src;
+ volatile int dst = 0;
+
+ kunit_info(test, "memcpy()ing aligned uninit src to aligned dst (UMR report)\n");
+ memcpy((void *)&dst, (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)&dst, sizeof(dst));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the first of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned");
+ volatile int uninit_src;
+ volatile char dst[8] = {0};
+
+ kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst (UMR report)\n");
+ memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)dst, 4);
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+/*
+ * Test case: ensure that memcpy() correctly copies uninitialized values between
+ * aligned `src` and unaligned `dst`.
+ *
+ * Copying aligned 4-byte value to an unaligned one leads to touching two
+ * aligned 4-byte values. This test case checks that KMSAN correctly reports an
+ * error on the second of the two values.
+ */
+static void test_memcpy_aligned_to_unaligned2(struct kunit *test)
+{
+ EXPECTATION_UNINIT_VALUE_FN(expect, "test_memcpy_aligned_to_unaligned2");
+ volatile int uninit_src;
+ volatile char dst[8] = {0};
+
+ kunit_info(test, "memcpy()ing aligned uninit src to unaligned dst - part 2 (UMR report)\n");
+ memcpy((void *)&dst[1], (void *)&uninit_src, sizeof(uninit_src));
+ kmsan_check_memory((void *)&dst[4], sizeof(uninit_src));
+ KUNIT_EXPECT_TRUE(test, report_matches(&expect));
+}
+
+static struct kunit_case kmsan_test_cases[] = {
+ KUNIT_CASE(test_uninit_kmalloc),
+ KUNIT_CASE(test_init_kmalloc),
+ KUNIT_CASE(test_init_kzalloc),
+ KUNIT_CASE(test_uninit_stack_var),
+ KUNIT_CASE(test_init_stack_var),
+ KUNIT_CASE(test_params),
+ KUNIT_CASE(test_uninit_multiple_params),
+ KUNIT_CASE(test_uninit_kmsan_check_memory),
+ KUNIT_CASE(test_init_kmsan_vmap_vunmap),
+ KUNIT_CASE(test_init_vmalloc),
+ KUNIT_CASE(test_uaf),
+ KUNIT_CASE(test_percpu_propagate),
+ KUNIT_CASE(test_printk),
+ KUNIT_CASE(test_memcpy_aligned_to_aligned),
+ KUNIT_CASE(test_memcpy_aligned_to_unaligned),
+ KUNIT_CASE(test_memcpy_aligned_to_unaligned2),
+ {},
+};
+
+/* ===== End test cases ===== */
+
+static int test_init(struct kunit *test)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&observed.lock, flags);
+ observed.header[0] = '\0';
+ observed.ignore = false;
+ observed.available = false;
+ spin_unlock_irqrestore(&observed.lock, flags);
+
+ return 0;
+}
+
+static void test_exit(struct kunit *test)
+{
+}
+
+static struct kunit_suite kmsan_test_suite = {
+ .name = "kmsan",
+ .test_cases = kmsan_test_cases,
+ .init = test_init,
+ .exit = test_exit,
+};
+static struct kunit_suite *kmsan_test_suites[] = { &kmsan_test_suite, NULL };
+
+static void register_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ check_trace_callback_type_console(probe_console);
+ if (!strcmp(tp->name, "console"))
+ WARN_ON(tracepoint_probe_register(tp, probe_console, NULL));
+}
+
+static void unregister_tracepoints(struct tracepoint *tp, void *ignore)
+{
+ if (!strcmp(tp->name, "console"))
+ tracepoint_probe_unregister(tp, probe_console, NULL);
+}
+
+/*
+ * We only want to do tracepoints setup and teardown once, therefore we have to
+ * customize the init and exit functions and cannot rely on kunit_test_suite().
+ */
+static int __init kmsan_test_init(void)
+{
+ /*
+ * Because we want to be able to build the test as a module, we need to
+ * iterate through all known tracepoints, since the static registration
+ * won't work here.
+ */
+ for_each_kernel_tracepoint(register_tracepoints, NULL);
+ return __kunit_test_suites_init(kmsan_test_suites);
+}
+
+static void kmsan_test_exit(void)
+{
+ __kunit_test_suites_exit(kmsan_test_suites);
+ for_each_kernel_tracepoint(unregister_tracepoints, NULL);
+ tracepoint_synchronize_unregister();
+}
+
+late_initcall_sync(kmsan_test_init);
+module_exit(kmsan_test_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Alexander Potapenko <[email protected]>");
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:24:07

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 05/46] x86: asm: instrument usercopy in get_user() and __put_user_size()

Use hooks from instrumented.h to notify bug detection tools about
usercopy events in get_user() and put_user_size().

It's still unclear how to instrument put_user(), which assumes that
instrumentation code doesn't clobber RAX.

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/Ia9f12bfe5832623250e20f1859fdf5cc485a2fce
---
arch/x86/include/asm/uaccess.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index f78e2b3501a19..0373d52a0543e 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -5,6 +5,7 @@
* User space memory access functions
*/
#include <linux/compiler.h>
+#include <linux/instrumented.h>
#include <linux/kasan-checks.h>
#include <linux/string.h>
#include <asm/asm.h>
@@ -99,11 +100,13 @@ extern int __get_user_bad(void);
int __ret_gu; \
register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, sizeof(*(ptr))); \
asm volatile("call __" #fn "_%P4" \
: "=a" (__ret_gu), "=r" (__val_gu), \
ASM_CALL_CONSTRAINT \
: "0" (ptr), "i" (sizeof(*(ptr)))); \
(x) = (__force __typeof__(*(ptr))) __val_gu; \
+ instrument_copy_from_user_after((void *)&(x), ptr, sizeof(*(ptr)), 0); \
__builtin_expect(__ret_gu, 0); \
})

@@ -248,7 +251,9 @@ extern void __put_user_nocheck_8(void);

#define __put_user_size(x, ptr, size, label) \
do { \
+ __typeof__(*(ptr)) __pus_val = x; \
__chk_user_ptr(ptr); \
+ instrument_copy_to_user(ptr, &(__pus_val), size); \
switch (size) { \
case 1: \
__put_user_goto(x, ptr, "b", "iq", label); \
@@ -286,6 +291,7 @@ do { \
#define __get_user_size(x, ptr, size, label) \
do { \
__chk_user_ptr(ptr); \
+ instrument_copy_from_user_before((void *)&(x), ptr, size); \
switch (size) { \
case 1: { \
unsigned char x_u8__; \
@@ -305,6 +311,7 @@ do { \
default: \
(x) = __get_user_bad(); \
} \
+ instrument_copy_from_user_after((void *)&(x), ptr, size, 0); \
} while (0)

#define __get_user_asm(x, addr, itype, ltype, label) \
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:27:34

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 25/46] kmsan: virtio: check/unpoison scatterlist in vring_map_one_sg()

If vring doesn't use the DMA API, KMSAN is unable to tell whether the
memory is initialized by hardware. Explicitly call kmsan_handle_dma()
from vring_map_one_sg() in this case to prevent false positives.

Signed-off-by: Alexander Potapenko <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>

---
Link: https://linux-review.googlesource.com/id/I211533ecb86a66624e151551f83ddd749536b3af
---
drivers/virtio/virtio_ring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cfb028ca238eb..faecd9e3d6560 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/hrtimer.h>
#include <linux/dma-mapping.h>
+#include <linux/kmsan-checks.h>
#include <linux/spinlock.h>
#include <xen/xen.h>

@@ -331,8 +332,15 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
struct scatterlist *sg,
enum dma_data_direction direction)
{
- if (!vq->use_dma_api)
+ if (!vq->use_dma_api) {
+ /*
+ * If DMA is not used, KMSAN doesn't know that the scatterlist
+ * is initialized by the hardware. Explicitly check/unpoison it
+ * depending on the direction.
+ */
+ kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
return (dma_addr_t)sg_phys(sg);
+ }

/*
* We can't use dma_map_sg, because we don't use scatterlists in
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:29:54

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 07/46] kmsan: add ReST documentation

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- added a note that KMSAN is not intended for production use

Link: https://linux-review.googlesource.com/id/I751586f79418b95550a83c6035c650b5b01567cc
---
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kmsan.rst | 414 ++++++++++++++++++++++++++++++
2 files changed, 415 insertions(+)
create mode 100644 Documentation/dev-tools/kmsan.rst

diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 4621eac290f46..6b0663075dc04 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -24,6 +24,7 @@ Documentation/dev-tools/testing-overview.rst
kcov
gcov
kasan
+ kmsan
ubsan
kmemleak
kcsan
diff --git a/Documentation/dev-tools/kmsan.rst b/Documentation/dev-tools/kmsan.rst
new file mode 100644
index 0000000000000..e116889da79d5
--- /dev/null
+++ b/Documentation/dev-tools/kmsan.rst
@@ -0,0 +1,414 @@
+=============================
+KernelMemorySanitizer (KMSAN)
+=============================
+
+KMSAN is a dynamic error detector aimed at finding uses of uninitialized
+values. It is based on compiler instrumentation, and is quite similar to the
+userspace `MemorySanitizer tool`_.
+
+An important note is that KMSAN is not intended for production use, because it
+drastically increases kernel memory footprint and slows the whole system down.
+
+Example report
+==============
+
+Here is an example of a KMSAN report::
+
+ =====================================================
+ BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
+ test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Uninit was stored to memory at:
+ do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+ kunit_run_case_internal lib/kunit/test.c:333
+ kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
+ kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
+ kthread+0x721/0x850 kernel/kthread.c:327
+ ret_from_fork+0x1f/0x30 ??:?
+
+ Local variable uninit created at:
+ do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
+ test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
+
+ Bytes 4-7 of 8 are uninitialized
+ Memory access of size 8 starts at ffff888083fe3da0
+
+ CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
+ =====================================================
+
+
+The report says that the local variable ``uninit`` was created uninitialized in
+``do_uninit_local_array()``. The lower stack trace corresponds to the place
+where this variable was created.
+
+The upper stack shows where the uninit value was used - in
+``test_uninit_kmsan_check_memory()``. The tool shows the bytes which were left
+uninitialized in the local variable, as well as the stack where the value was
+copied to another memory location before use.
+
+Please note that KMSAN only reports an error when an uninitialized value is
+actually used (e.g. in a condition or pointer dereference). A lot of
+uninitialized values in the kernel are never used, and reporting them would
+result in too many false positives.
+
+KMSAN and Clang
+===============
+
+In order for KMSAN to work the kernel must be built with Clang, which so far is
+the only compiler that has KMSAN support. The kernel instrumentation pass is
+based on the userspace `MemorySanitizer tool`_.
+
+How to build
+============
+
+In order to build a kernel with KMSAN you will need a fresh Clang (14.0.0+).
+Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
+
+Now configure and build the kernel with CONFIG_KMSAN enabled.
+
+How KMSAN works
+===============
+
+KMSAN shadow memory
+-------------------
+
+KMSAN associates a metadata byte (also called shadow byte) with every byte of
+kernel memory. A bit in the shadow byte is set iff the corresponding bit of the
+kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
+setting its shadow bytes to ``0xff``) is called poisoning, marking it
+initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
+
+When a new variable is allocated on the stack, it is poisoned by default by
+instrumentation code inserted by the compiler (unless it is a stack variable
+that is immediately initialized). Any new heap allocation done without
+``__GFP_ZERO`` is also poisoned.
+
+Compiler instrumentation also tracks the shadow values with the help from the
+runtime library in ``mm/kmsan/``.
+
+The shadow value of a basic or compound type is an array of bytes of the same
+length. When a constant value is written into memory, that memory is unpoisoned.
+When a value is read from memory, its shadow memory is also obtained and
+propagated into all the operations which use that value. For every instruction
+that takes one or more values the compiler generates code that calculates the
+shadow of the result depending on those values and their shadows.
+
+Example::
+
+ int a = 0xff; // i.e. 0x000000ff
+ int b;
+ int c = a | b;
+
+In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
+shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
+``c`` are uninitialized, while the lower byte is initialized.
+
+
+Origin tracking
+---------------
+
+Every four bytes of kernel memory also have a so-called origin assigned to
+them. This origin describes the point in program execution at which the
+uninitialized value was created. Every origin is associated with either the
+full allocation stack (for heap-allocated memory), or the function containing
+the uninitialized variable (for locals).
+
+When an uninitialized variable is allocated on stack or heap, a new origin
+value is created, and that variable's origin is filled with that value.
+When a value is read from memory, its origin is also read and kept together
+with the shadow. For every instruction that takes one or more values the origin
+of the result is one of the origins corresponding to any of the uninitialized
+inputs. If a poisoned value is written into memory, its origin is written to the
+corresponding storage as well.
+
+Example 1::
+
+ int a = 42;
+ int b;
+ int c = a + b;
+
+In this case the origin of ``b`` is generated upon function entry, and is
+stored to the origin of ``c`` right before the addition result is written into
+memory.
+
+Several variables may share the same origin address, if they are stored in the
+same four-byte chunk. In this case every write to either variable updates the
+origin for all of them. We have to sacrifice precision in this case, because
+storing origins for individual bits (and even bytes) would be too costly.
+
+Example 2::
+
+ int combine(short a, short b) {
+ union ret_t {
+ int i;
+ short s[2];
+ } ret;
+ ret.s[0] = a;
+ ret.s[1] = b;
+ return ret.i;
+ }
+
+If ``a`` is initialized and ``b`` is not, the shadow of the result would be
+0xffff0000, and the origin of the result would be the origin of ``b``.
+``ret.s[0]`` would have the same origin, but it will be never used, because
+that variable is initialized.
+
+If both function arguments are uninitialized, only the origin of the second
+argument is preserved.
+
+Origin chaining
+~~~~~~~~~~~~~~~
+
+To ease debugging, KMSAN creates a new origin for every store of an
+uninitialized value to memory. The new origin references both its creation stack
+and the previous origin the value had. This may cause increased memory
+consumption, so we limit the length of origin chains in the runtime.
+
+Clang instrumentation API
+-------------------------
+
+Clang instrumentation pass inserts calls to functions defined in
+``mm/kmsan/instrumentation.c`` into the kernel code.
+
+Shadow manipulation
+~~~~~~~~~~~~~~~~~~~
+
+For every memory access the compiler emits a call to a function that returns a
+pair of pointers to the shadow and origin addresses of the given memory::
+
+ typedef struct {
+ void *shadow, *origin;
+ } shadow_origin_ptr_t
+
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
+ shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
+
+The function name depends on the memory access size.
+
+The compiler makes sure that for every loaded value its shadow and origin
+values are read from memory. When a value is stored to memory, its shadow and
+origin are also stored using the metadata pointers.
+
+Origin tracking
+~~~~~~~~~~~~~~~
+
+A special function is used to create a new origin value for a local variable and
+set the origin of that variable to that value::
+
+ void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
+
+Access to per-task data
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+At the beginning of every instrumented function KMSAN inserts a call to
+``__msan_get_context_state()``::
+
+ kmsan_context_state *__msan_get_context_state(void)
+
+``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
+
+ struct kmsan_context_state {
+ char param_tls[KMSAN_PARAM_SIZE];
+ char retval_tls[KMSAN_RETVAL_SIZE];
+ char va_arg_tls[KMSAN_PARAM_SIZE];
+ char va_arg_origin_tls[KMSAN_PARAM_SIZE];
+ u64 va_arg_overflow_size_tls;
+ char param_origin_tls[KMSAN_PARAM_SIZE];
+ depot_stack_handle_t retval_origin_tls;
+ };
+
+This structure is used by KMSAN to pass parameter shadows and origins between
+instrumented functions.
+
+String functions
+~~~~~~~~~~~~~~~~
+
+The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
+following functions. These functions are also called when data structures are
+initialized or copied, making sure shadow and origin values are copied alongside
+with the data::
+
+ void *__msan_memcpy(void *dst, void *src, uintptr_t n)
+ void *__msan_memmove(void *dst, void *src, uintptr_t n)
+ void *__msan_memset(void *dst, int c, uintptr_t n)
+
+Error reporting
+~~~~~~~~~~~~~~~
+
+For each pointer dereference and each condition the compiler emits a shadow
+check that calls ``__msan_warning()`` in the case a poisoned value is being
+used::
+
+ void __msan_warning(u32 origin)
+
+``__msan_warning()`` causes KMSAN runtime to print an error report.
+
+Inline assembly instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+KMSAN instruments every inline assembly output with a call to::
+
+ void __msan_instrument_asm_store(void *addr, uintptr_t size)
+
+, which unpoisons the memory region.
+
+This approach may mask certain errors, but it also helps to avoid a lot of
+false positives in bitwise operations, atomics etc.
+
+Sometimes the pointers passed into inline assembly do not point to valid memory.
+In such cases they are ignored at runtime.
+
+Disabling the instrumentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
+ignore uninitialized values in that function and mark its output as initialized.
+As a result, the user will not get KMSAN reports related to that function.
+
+Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
+Applying this attribute to a function will result in KMSAN not instrumenting it,
+which can be helpful if we do not want the compiler to mess up some low-level
+code (e.g. that marked with ``noinstr``).
+
+This however comes at a cost: stack allocations from such functions will have
+incorrect shadow/origin values, likely leading to false positives. Functions
+called from non-instrumented code may also receive incorrect metadata for their
+parameters.
+
+As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
+
+It is also possible to disable KMSAN for a single file (e.g. main.o)::
+
+ KMSAN_SANITIZE_main.o := n
+
+or for the whole directory::
+
+ KMSAN_SANITIZE := n
+
+in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
+function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
+their code gets broken by KMSAN (e.g. runs at early boot time).
+
+Runtime library
+---------------
+
+The code is located in ``mm/kmsan/``.
+
+Per-task KMSAN state
+~~~~~~~~~~~~~~~~~~~~
+
+Every task_struct has an associated KMSAN task state that holds the KMSAN
+context (see above) and a per-task flag disallowing KMSAN reports::
+
+ struct kmsan_context {
+ ...
+ bool allow_reporting;
+ struct kmsan_context_state cstate;
+ ...
+ }
+
+ struct task_struct {
+ ...
+ struct kmsan_context kmsan;
+ ...
+ }
+
+
+KMSAN contexts
+~~~~~~~~~~~~~~
+
+When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
+hold the metadata for function parameters and return values.
+
+But in the case the kernel is running in the interrupt, softirq or NMI context,
+where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
+
+ DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
+
+Metadata allocation
+~~~~~~~~~~~~~~~~~~~
+
+There are several places in the kernel for which the metadata is stored.
+
+1. Each ``struct page`` instance contains two pointers to its shadow and
+origin pages::
+
+ struct page {
+ ...
+ struct page *shadow, *origin;
+ ...
+ };
+
+At boot-time, the kernel allocates shadow and origin pages for every available
+kernel page. This is done quite late, when the kernel address space is already
+fragmented, so normal data pages may arbitrarily interleave with the metadata
+pages.
+
+This means that in general for two contiguous memory pages their shadow/origin
+pages may not be contiguous. So, if a memory access crosses the boundary
+of a memory block, accesses to shadow/origin memory may potentially corrupt
+other pages or read incorrect values from them.
+
+In practice, contiguous memory pages returned by the same ``alloc_pages()``
+call will have contiguous metadata, whereas if these pages belong to two
+different allocations their metadata pages can be fragmented.
+
+For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
+there also are no guarantees on metadata contiguity.
+
+In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
+pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
+
+ char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+ char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
+
+``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
+All stores to ``dummy_store_page`` are ignored.
+
+2. For vmalloc memory and modules, there is a direct mapping between the memory
+range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
+the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
+area contains shadow memory for the first quarter, the third one holds the
+origins. A small part of the fourth quarter contains shadow and origins for the
+kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
+more details.
+
+When an array of pages is mapped into a contiguous virtual memory space, their
+shadow and origin pages are similarly mapped into contiguous regions.
+
+3. For CPU entry area there are separate per-CPU arrays that hold its
+metadata::
+
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_shadow);
+ DEFINE_PER_CPU(char[CPU_ENTRY_AREA_SIZE], cpu_entry_area_origin);
+
+When calculating shadow and origin addresses for a given memory address, KMSAN
+checks whether the address belongs to the physical page range, the virtual page
+range or CPU entry area.
+
+Handling ``pt_regs``
+~~~~~~~~~~~~~~~~~~~~
+
+Many functions receive a ``struct pt_regs`` holding the register state at a
+certain point. Registers do not have (easily calculatable) shadow or origin
+associated with them, so we assume they are always initialized.
+
+References
+==========
+
+E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
+memory use in C++
+<https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
+In Proceedings of CGO 2015.
+
+.. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
+.. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:31:25

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 04/46] instrumented.h: allow instrumenting both sides of copy_from_user()

Introduce instrument_copy_from_user_before() and
instrument_copy_from_user_after() hooks to be invoked before and after
the call to copy_from_user().

KASAN and KCSAN will be only using instrument_copy_from_user_before(),
but for KMSAN we'll need to insert code after copy_from_user().

Signed-off-by: Alexander Potapenko <[email protected]>
---
Link: https://linux-review.googlesource.com/id/I855034578f0b0f126734cbd734fb4ae1d3a6af99
---
include/linux/instrumented.h | 21 +++++++++++++++++++--
include/linux/uaccess.h | 19 ++++++++++++++-----
lib/iov_iter.c | 9 ++++++---
lib/usercopy.c | 3 ++-
4 files changed, 41 insertions(+), 11 deletions(-)

diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h
index 42faebbaa202a..ee8f7d17d34f5 100644
--- a/include/linux/instrumented.h
+++ b/include/linux/instrumented.h
@@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
}

/**
- * instrument_copy_from_user - instrument writes of copy_from_user
+ * instrument_copy_from_user_before - add instrumentation before copy_from_user
*
* Instrument writes to kernel memory, that are due to copy_from_user (and
* variants). The instrumentation should be inserted before the accesses.
@@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n)
* @n number of bytes to copy
*/
static __always_inline void
-instrument_copy_from_user(const void *to, const void __user *from, unsigned long n)
+instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n)
{
kasan_check_write(to, n);
kcsan_check_write(to, n);
}

+/**
+ * instrument_copy_from_user_after - add instrumentation after copy_from_user
+ *
+ * Instrument writes to kernel memory, that are due to copy_from_user (and
+ * variants). The instrumentation should be inserted after the accesses.
+ *
+ * @to destination address
+ * @from source address
+ * @n number of bytes to copy
+ * @left number of bytes not copied (as returned by copy_from_user)
+ */
+static __always_inline void
+instrument_copy_from_user_after(const void *to, const void __user *from,
+ unsigned long n, unsigned long left)
+{
+}
+
#endif /* _LINUX_INSTRUMENTED_H */
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index 546179418ffa2..079bdea3b9dcd 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -58,20 +58,28 @@
static __always_inline __must_check unsigned long
__copy_from_user_inatomic(void *to, const void __user *from, unsigned long n)
{
- instrument_copy_from_user(to, from, n);
+ unsigned long res;
+
+ instrument_copy_from_user_before(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

static __always_inline __must_check unsigned long
__copy_from_user(void *to, const void __user *from, unsigned long n)
{
+ unsigned long res;
+
might_fault();
+ instrument_copy_from_user_before(to, from, n);
if (should_fail_usercopy())
return n;
- instrument_copy_from_user(to, from, n);
check_object_size(to, n, false);
- return raw_copy_from_user(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
+ return res;
}

/**
@@ -115,8 +123,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n)
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 6dd5330f7a995..fb19401c29c4f 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -159,13 +159,16 @@ static int copyout(void __user *to, const void *from, size_t n)

static int copyin(void *to, const void __user *from, size_t n)
{
+ size_t res = n;
+
if (should_fail_usercopy())
return n;
if (access_ok(from, n)) {
- instrument_copy_from_user(to, from, n);
- n = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
+ res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
- return n;
+ return res;
}

static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t bytes,
diff --git a/lib/usercopy.c b/lib/usercopy.c
index 7413dd300516e..1505a52f23a01 100644
--- a/lib/usercopy.c
+++ b/lib/usercopy.c
@@ -12,8 +12,9 @@ unsigned long _copy_from_user(void *to, const void __user *from, unsigned long n
unsigned long res = n;
might_fault();
if (!should_fail_usercopy() && likely(access_ok(from, n))) {
- instrument_copy_from_user(to, from, n);
+ instrument_copy_from_user_before(to, from, n);
res = raw_copy_from_user(to, from, n);
+ instrument_copy_from_user_after(to, from, n, res);
}
if (unlikely(res))
memset(to + (n - res), 0, res);
--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 11:32:39

by Alexander Potapenko

[permalink] [raw]
Subject: [PATCH v3 17/46] kmsan: mm: call KMSAN hooks from SLUB code

In order to report uninitialized memory coming from heap allocations
KMSAN has to poison them unless they're created with __GFP_ZERO.

It's handy that we need KMSAN hooks in the places where
init_on_alloc/init_on_free initialization is performed.

Signed-off-by: Alexander Potapenko <[email protected]>
---
v2:
-- move the implementation of SLUB hooks here

Link: https://linux-review.googlesource.com/id/I6954b386c5c5d7f99f48bb6cbcc74b75136ce86e
---
include/linux/kmsan.h | 57 ++++++++++++++++++++++++++++++
mm/kmsan/hooks.c | 80 +++++++++++++++++++++++++++++++++++++++++++
mm/slab.h | 1 +
mm/slub.c | 21 ++++++++++--
4 files changed, 157 insertions(+), 2 deletions(-)

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index da41850b46cbd..ed3630068e2ef 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -16,6 +16,7 @@
#include <linux/vmalloc.h>

struct page;
+struct kmem_cache;

#ifdef CONFIG_KMSAN

@@ -73,6 +74,44 @@ void kmsan_free_page(struct page *page, unsigned int order);
*/
void kmsan_copy_page_meta(struct page *dst, struct page *src);

+/**
+ * kmsan_slab_alloc() - Notify KMSAN about a slab allocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the
+ * newly created object, marking it as initialized or uninitialized.
+ */
+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags);
+
+/**
+ * kmsan_slab_free() - Notify KMSAN about a slab deallocation.
+ * @s: slab cache the object belongs to.
+ * @object: object pointer.
+ *
+ * KMSAN marks the freed object as uninitialized.
+ */
+void kmsan_slab_free(struct kmem_cache *s, void *object);
+
+/**
+ * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation.
+ * @ptr: object pointer.
+ * @size: object size.
+ * @flags: GFP flags passed to the allocator.
+ *
+ * Similar to kmsan_slab_alloc(), but for large allocations.
+ */
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags);
+
+/**
+ * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation.
+ * @ptr: object pointer.
+ *
+ * Similar to kmsan_slab_free(), but for large allocations.
+ */
+void kmsan_kfree_large(const void *ptr);
+
/**
* kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap.
* @start: start of vmapped range.
@@ -139,6 +178,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src)
{
}

+static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+}
+
+static inline void kmsan_kmalloc_large(const void *ptr, size_t size,
+ gfp_t flags)
+{
+}
+
+static inline void kmsan_kfree_large(const void *ptr)
+{
+}
+
static inline void kmsan_vmap_pages_range_noflush(unsigned long start,
unsigned long end,
pgprot_t prot,
diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 070756be70e3a..052e17b7a717d 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -26,6 +26,86 @@
* skipping effects of functions like memset() inside instrumented code.
*/

+void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags)
+{
+ if (unlikely(object == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ /*
+ * There's a ctor or this is an RCU cache - do nothing. The memory
+ * status hasn't changed since last use.
+ */
+ if (s->ctor || (s->flags & SLAB_TYPESAFE_BY_RCU))
+ return;
+
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory(object, s->object_size,
+ KMSAN_POISON_CHECK);
+ else
+ kmsan_internal_poison_memory(object, s->object_size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_alloc);
+
+void kmsan_slab_free(struct kmem_cache *s, void *object)
+{
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+
+ /* RCU slabs could be legally used after free within the RCU period */
+ if (unlikely(s->flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)))
+ return;
+ /*
+ * If there's a constructor, freed memory must remain in the same state
+ * until the next allocation. We cannot save its state to detect
+ * use-after-free bugs, instead we just keep it unpoisoned.
+ */
+ if (s->ctor)
+ return;
+ kmsan_enter_runtime();
+ kmsan_internal_poison_memory(object, s->object_size, GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_slab_free);
+
+void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags)
+{
+ if (unlikely(ptr == NULL))
+ return;
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ if (flags & __GFP_ZERO)
+ kmsan_internal_unpoison_memory((void *)ptr, size,
+ /*checked*/ true);
+ else
+ kmsan_internal_poison_memory((void *)ptr, size, flags,
+ KMSAN_POISON_CHECK);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kmalloc_large);
+
+void kmsan_kfree_large(const void *ptr)
+{
+ struct page *page;
+
+ if (!kmsan_enabled || kmsan_in_runtime())
+ return;
+ kmsan_enter_runtime();
+ page = virt_to_head_page((void *)ptr);
+ KMSAN_WARN_ON(ptr != page_address(page));
+ kmsan_internal_poison_memory((void *)ptr,
+ PAGE_SIZE << compound_order(page),
+ GFP_KERNEL,
+ KMSAN_POISON_CHECK | KMSAN_POISON_FREE);
+ kmsan_leave_runtime();
+}
+EXPORT_SYMBOL(kmsan_kfree_large);
+
static unsigned long vmalloc_shadow(unsigned long addr)
{
return (unsigned long)kmsan_get_metadata((void *)addr,
diff --git a/mm/slab.h b/mm/slab.h
index 95eb34174c1bb..1276b83656091 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -751,6 +751,7 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
memset(p[i], 0, s->object_size);
kmemleak_alloc_recursive(p[i], s->object_size, 1,
s->flags, flags);
+ kmsan_slab_alloc(s, p[i], flags);
}

memcg_slab_post_alloc_hook(s, objcg, flags, size, p);
diff --git a/mm/slub.c b/mm/slub.c
index ed5c2c03a47aa..45082acaa6739 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -22,6 +22,7 @@
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/kasan.h>
+#include <linux/kmsan.h>
#include <linux/cpu.h>
#include <linux/cpuset.h>
#include <linux/mempolicy.h>
@@ -357,18 +358,28 @@ static void prefetch_freepointer(const struct kmem_cache *s, void *object)
prefetchw(object + s->offset);
}

+/*
+ * When running under KMSAN, get_freepointer_safe() may return an uninitialized
+ * pointer value in the case the current thread loses the race for the next
+ * memory chunk in the freelist. In that case this_cpu_cmpxchg_double() in
+ * slab_alloc_node() will fail, so the uninitialized value won't be used, but
+ * KMSAN will still check all arguments of cmpxchg because of imperfect
+ * handling of inline assembly.
+ * To work around this problem, use kmsan_init() to force initialize the
+ * return value of get_freepointer_safe().
+ */
static inline void *get_freepointer_safe(struct kmem_cache *s, void *object)
{
unsigned long freepointer_addr;
void *p;

if (!debug_pagealloc_enabled_static())
- return get_freepointer(s, object);
+ return kmsan_init(get_freepointer(s, object));

object = kasan_reset_tag(object);
freepointer_addr = (unsigned long)object + s->offset;
copy_from_kernel_nofault(&p, (void **)freepointer_addr, sizeof(p));
- return freelist_ptr(s, p, freepointer_addr);
+ return kmsan_init(freelist_ptr(s, p, freepointer_addr));
}

static inline void set_freepointer(struct kmem_cache *s, void *object, void *fp)
@@ -1683,6 +1694,7 @@ static inline void *kmalloc_large_node_hook(void *ptr, size_t size, gfp_t flags)
ptr = kasan_kmalloc_large(ptr, size, flags);
/* As ptr might get tagged, call kmemleak hook after KASAN. */
kmemleak_alloc(ptr, size, 1, flags);
+ kmsan_kmalloc_large(ptr, size, flags);
return ptr;
}

@@ -1690,12 +1702,14 @@ static __always_inline void kfree_hook(void *x)
{
kmemleak_free(x);
kasan_kfree_large(x);
+ kmsan_kfree_large(x);
}

static __always_inline bool slab_free_hook(struct kmem_cache *s,
void *x, bool init)
{
kmemleak_free_recursive(x, s->flags);
+ kmsan_slab_free(s, x);

debug_check_no_locks_freed(x, s->object_size);

@@ -3730,6 +3744,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
*/
slab_post_alloc_hook(s, objcg, flags, size, p,
slab_want_init_on_alloc(flags, s));
+
return i;
error:
slub_put_cpu_ptr(s->cpu_slab);
@@ -5898,6 +5913,7 @@ static char *create_unique_id(struct kmem_cache *s)
p += sprintf(p, "%07u", s->size);

BUG_ON(p > name + ID_STR_LENGTH - 1);
+ kmsan_unpoison_memory(name, p - name);
return name;
}

@@ -5999,6 +6015,7 @@ static int sysfs_slab_alias(struct kmem_cache *s, const char *name)
al->name = name;
al->next = alias_list;
alias_list = al;
+ kmsan_unpoison_memory(al, sizeof(struct saved_alias));
return 0;
}

--
2.36.0.rc2.479.g8af0fa9b8e-goog

2022-04-27 13:14:52

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 03/46] kasan: common: adapt to the new prototype of __stack_depot_save()

On Tue, Apr 26, 2022 at 06:42PM +0200, Alexander Potapenko wrote:
> Pass extra_bits=0, as KASAN does not intend to store additional
> information in the stack handle. No functional change.
>
> Signed-off-by: Alexander Potapenko <[email protected]>

I think this patch needs to be folded into the previous one, otherwise
bisection will be broken.

> ---
> Link: https://linux-review.googlesource.com/id/I932d8f4f11a41b7483e0d57078744cc94697607a
> ---
> mm/kasan/common.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index d9079ec11f313..5d244746ac4fe 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -36,7 +36,7 @@ depot_stack_handle_t kasan_save_stack(gfp_t flags, bool can_alloc)
> unsigned int nr_entries;
>
> nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0);
> - return __stack_depot_save(entries, nr_entries, flags, can_alloc);
> + return __stack_depot_save(entries, nr_entries, 0, flags, can_alloc);
> }
>
> void kasan_set_track(struct kasan_track *track, gfp_t flags)
> --
> 2.36.0.rc2.479.g8af0fa9b8e-goog
>

2022-04-27 13:55:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 27/46] kmsan: instrumentation.h: add instrumentation_begin_with_regs()

On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote:
> +void kmsan_instrumentation_begin(struct pt_regs *regs)
> +{
> + struct kmsan_context_state *state = &kmsan_get_context()->cstate;
> +
> + if (state)
> + __memset(state, 0, sizeof(struct kmsan_context_state));

sizeof(*state) please

> + if (!kmsan_enabled || !regs)
> + return;

Why has state to be cleared when kmsan is not enabled and how do you end up
with regs == NULL here?

Thanks,

tglx

2022-04-27 14:02:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote:

Can you please use 'entry:' as prefix. Slapping kmsan in front of
everything does not really make sense.

> Replace instrumentation_begin() with instrumentation_begin_with_regs()
> to let KMSAN handle the non-instrumented code and unpoison pt_regs
> passed from the instrumented part.

That should be:

from the non-instrumented part
or
passed to the instrumented part

right?

> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
> CT_WARN_ON(ct_state() != CONTEXT_USER);
> user_exit_irqoff();
>
> - instrumentation_begin();
> + instrumentation_begin_with_regs(regs);

I can see what you are trying to do, but this will end up doing the same
thing over and over. Let's just look at a syscall.

__visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
{
...
nr = syscall_enter_from_user_mode(regs, nr)

__enter_from_user_mode(regs)
.....
instrumentation_begin_with_regs(regs);
....

instrumentation_begin_with_regs(regs);
....

instrumentation_begin_with_regs(regs);

if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
/* Invalid system call, but still a system call. */
regs->ax = __x64_sys_ni_syscall(regs);
}

instrumentation_end();

syscall_exit_to_user_mode(regs);
instrumentation_begin_with_regs(regs);
__syscall_exit_to_user_mode_work(regs);
instrumentation_end();
__exit_to_user_mode();

That means you memset state four times and unpoison regs four times. I'm
not sure whether that's desired.

instrumentation_begin()/end() are not really suitable IMO. They were
added to allow objtool to validate that nothing escapes into
instrumentable code unless annotated accordingly.

Thanks,

tglx

2022-04-27 14:38:53

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 12/46] kmsan: add KMSAN runtime core

On Tue, Apr 26, 2022 at 06:42PM +0200, Alexander Potapenko wrote:
> For each memory location KernelMemorySanitizer maintains two types of
> metadata:
> 1. The so-called shadow of that location - а byte:byte mapping describing
> whether or not individual bits of memory are initialized (shadow is 0)
> or not (shadow is 1).
> 2. The origins of that location - а 4-byte:4-byte mapping containing
> 4-byte IDs of the stack traces where uninitialized values were
> created.
>
> Each struct page now contains pointers to two struct pages holding
> KMSAN metadata (shadow and origins) for the original struct page.
> Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
> metadata creation, addressing, copying and checking.
> mm/kmsan/report.c performs error reporting in the cases an uninitialized
> value is used in a way that leads to undefined behavior.
>
> KMSAN compiler instrumentation is responsible for tracking the metadata
> along with the kernel memory. mm/kmsan/instrumentation.c provides the
> implementation for instrumentation hooks that are called from files
> compiled with -fsanitize=kernel-memory.
>
> To aid parameter passing (also done at instrumentation level), each
> task_struct now contains a struct kmsan_task_state used to track the
> metadata of function parameters and return values for that task.
>
> Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
> declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
> The KMSAN_SANITIZE:=n Makefile directive can be used to completely
> disable KMSAN instrumentation for certain files.
>
> Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
> created stack memory initialized.
>
> Users can also use functions from include/linux/kmsan-checks.h to mark
> certain memory regions as uninitialized or initialized (this is called
> "poisoning" and "unpoisoning") or check that a particular region is
> initialized.
>
> Signed-off-by: Alexander Potapenko <[email protected]>
> ---
> v2:
> -- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
> rewrote the patch description;
> -- addressed comments by Dmitry Vyukov;
> -- added a note about KMSAN being not intended for production use.
> -- fix case of unaligned dst in kmsan_internal_memmove_metadata()
>
> v3:
> -- print build IDs in reports where applicable
> -- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
> -- remove a stray BUG()
>
> Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
> ---
> Makefile | 1 +
> include/linux/kmsan-checks.h | 64 +++++
> include/linux/kmsan.h | 47 ++++
> include/linux/mm_types.h | 12 +
> include/linux/sched.h | 5 +
> lib/Kconfig.debug | 1 +
> lib/Kconfig.kmsan | 23 ++
> mm/Makefile | 1 +
> mm/kmsan/Makefile | 18 ++
> mm/kmsan/core.c | 458 +++++++++++++++++++++++++++++++++++
> mm/kmsan/hooks.c | 66 +++++
> mm/kmsan/instrumentation.c | 267 ++++++++++++++++++++
> mm/kmsan/kmsan.h | 183 ++++++++++++++
> mm/kmsan/report.c | 211 ++++++++++++++++
> mm/kmsan/shadow.c | 186 ++++++++++++++
> scripts/Makefile.kmsan | 1 +
> scripts/Makefile.lib | 9 +
> 17 files changed, 1553 insertions(+)
> create mode 100644 include/linux/kmsan-checks.h
> create mode 100644 include/linux/kmsan.h
> create mode 100644 lib/Kconfig.kmsan
> create mode 100644 mm/kmsan/Makefile
> create mode 100644 mm/kmsan/core.c
> create mode 100644 mm/kmsan/hooks.c
> create mode 100644 mm/kmsan/instrumentation.c
> create mode 100644 mm/kmsan/kmsan.h
> create mode 100644 mm/kmsan/report.c
> create mode 100644 mm/kmsan/shadow.c
> create mode 100644 scripts/Makefile.kmsan
>
> diff --git a/Makefile b/Makefile
> index c3ec1ea423797..d3c7dcd9f0fea 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1009,6 +1009,7 @@ include-y := scripts/Makefile.extrawarn
> include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
> include-$(CONFIG_KASAN) += scripts/Makefile.kasan
> include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
> +include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
> include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
> include-$(CONFIG_KCOV) += scripts/Makefile.kcov
> include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
> diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
> new file mode 100644
> index 0000000000000..a6522a0c28df9
> --- /dev/null
> +++ b/include/linux/kmsan-checks.h
> @@ -0,0 +1,64 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN checks to be used for one-off annotations in subsystems.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#ifndef _LINUX_KMSAN_CHECKS_H
> +#define _LINUX_KMSAN_CHECKS_H
> +
> +#include <linux/types.h>
> +
> +#ifdef CONFIG_KMSAN
> +
> +/**
> + * kmsan_poison_memory() - Mark the memory range as uninitialized.
> + * @address: address to start with.
> + * @size: size of buffer to poison.
> + * @flags: GFP flags for allocations done by this function.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * uninitialized. Error reports for this memory will reference the call site of
> + * kmsan_poison_memory() as origin.
> + */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
> +
> +/**
> + * kmsan_unpoison_memory() - Mark the memory range as initialized.
> + * @address: address to start with.
> + * @size: size of buffer to unpoison.
> + *
> + * Until other data is written to this range, KMSAN will treat it as
> + * initialized.
> + */
> +void kmsan_unpoison_memory(const void *address, size_t size);
> +
> +/**
> + * kmsan_check_memory() - Check the memory range for being initialized.
> + * @address: address to start with.
> + * @size: size of buffer to check.
> + *
> + * If any piece of the given range is marked as uninitialized, KMSAN will report
> + * an error.
> + */
> +void kmsan_check_memory(const void *address, size_t size);
> +
> +#else
> +
> +static inline void kmsan_poison_memory(const void *address, size_t size,
> + gfp_t flags)
> +{
> +}
> +static inline void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> +}
> +static inline void kmsan_check_memory(const void *address, size_t size)
> +{
> +}
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_CHECKS_H */
> diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> new file mode 100644
> index 0000000000000..4e35f43eceaa9
> --- /dev/null
> +++ b/include/linux/kmsan.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * KMSAN API for subsystems.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +#ifndef _LINUX_KMSAN_H
> +#define _LINUX_KMSAN_H
> +
> +#include <linux/gfp.h>
> +#include <linux/kmsan-checks.h>
> +#include <linux/stackdepot.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +struct page;
> +
> +#ifdef CONFIG_KMSAN
> +
> +/* These constants are defined in the MSan LLVM instrumentation pass. */
> +#define KMSAN_RETVAL_SIZE 800
> +#define KMSAN_PARAM_SIZE 800
> +
> +struct kmsan_context_state {
> + char param_tls[KMSAN_PARAM_SIZE];
> + char retval_tls[KMSAN_RETVAL_SIZE];
> + char va_arg_tls[KMSAN_PARAM_SIZE];
> + char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> + u64 va_arg_overflow_size_tls;
> + char param_origin_tls[KMSAN_PARAM_SIZE];
> + depot_stack_handle_t retval_origin_tls;
> +};
> +
> +#undef KMSAN_PARAM_SIZE
> +#undef KMSAN_RETVAL_SIZE
> +
> +struct kmsan_ctx {
> + struct kmsan_context_state cstate;
> + int kmsan_in_runtime;
> + bool allow_reporting;
> +};
> +
> +#endif
> +
> +#endif /* _LINUX_KMSAN_H */
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 8834e38c06a4f..85c97a2145f7e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -218,6 +218,18 @@ struct page {
> not kmapped, ie. highmem) */
> #endif /* WANT_PAGE_VIRTUAL */
>
> +#ifdef CONFIG_KMSAN
> + /*
> + * KMSAN metadata for this page:
> + * - shadow page: every bit indicates whether the corresponding
> + * bit of the original page is initialized (0) or not (1);
> + * - origin page: every 4 bytes contain an id of the stack trace
> + * where the uninitialized value was created.
> + */
> + struct page *kmsan_shadow;
> + struct page *kmsan_origin;
> +#endif
> +
> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> int _last_cpupid;
> #endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a8911b1f35aad..9e53624cd73ac 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -14,6 +14,7 @@
> #include <linux/pid.h>
> #include <linux/sem.h>
> #include <linux/shm.h>
> +#include <linux/kmsan.h>
> #include <linux/mutex.h>
> #include <linux/plist.h>
> #include <linux/hrtimer.h>
> @@ -1352,6 +1353,10 @@ struct task_struct {
> #endif
> #endif
>
> +#ifdef CONFIG_KMSAN
> + struct kmsan_ctx kmsan_ctx;
> +#endif
> +
> #if IS_ENABLED(CONFIG_KUNIT)
> struct kunit *kunit_test;
> #endif
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 075cd25363ac3..b81670878acae 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -996,6 +996,7 @@ config DEBUG_STACKOVERFLOW
>
> source "lib/Kconfig.kasan"
> source "lib/Kconfig.kfence"
> +source "lib/Kconfig.kmsan"
>
> endmenu # "Memory Debugging"
>
> diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> new file mode 100644
> index 0000000000000..199f79d031f94
> --- /dev/null
> +++ b/lib/Kconfig.kmsan
> @@ -0,0 +1,23 @@

Missing SPDX-License-Identifier.

> +config HAVE_ARCH_KMSAN
> + bool
> +
> +config HAVE_KMSAN_COMPILER
> + def_bool (CC_IS_CLANG && $(cc-option,-fsanitize=kernel-memory -mllvm -msan-disable-checks=1))
> +
> +config KMSAN
> + bool "KMSAN: detector of uninitialized values use"
> + depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> + depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
> + depends on CC_IS_CLANG && CLANG_VERSION >= 140000

Shouldn't the "CC_IS_CLANG && CLANG_VERSION ..." check be a "depends on"
in HAVE_KMSAN_COMPILER? That way all the compiler-related checks are
confined to HAVE_KMSAN_COMPILER.

I guess, it might also be worth mentioning why the version check is
required at all (something about older compilers supporting
fsanitize=kernel-memory, but not having all features we need).

> + select STACKDEPOT
> + select STACKDEPOT_ALWAYS_INIT
> + help
> + KernelMemorySanitizer (KMSAN) is a dynamic detector of uses of
> + uninitialized values in the kernel. It is based on compiler
> + instrumentation provided by Clang and thus requires Clang to build.
> +
> + An important note is that KMSAN is not intended for production use,
> + because it drastically increases kernel memory footprint and slows
> + the whole system down.
> +
> + See <file:Documentation/dev-tools/kmsan.rst> for more details.
> diff --git a/mm/Makefile b/mm/Makefile
> index 4cc13f3179a51..4da7eeaecc214 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -89,6 +89,7 @@ obj-$(CONFIG_SLAB) += slab.o
> obj-$(CONFIG_SLUB) += slub.o
> obj-$(CONFIG_KASAN) += kasan/
> obj-$(CONFIG_KFENCE) += kfence/
> +obj-$(CONFIG_KMSAN) += kmsan/
> obj-$(CONFIG_FAILSLAB) += failslab.o
> obj-$(CONFIG_MEMTEST) += memtest.o
> obj-$(CONFIG_MIGRATION) += migrate.o
> diff --git a/mm/kmsan/Makefile b/mm/kmsan/Makefile
> new file mode 100644
> index 0000000000000..a80dde1de7048
> --- /dev/null
> +++ b/mm/kmsan/Makefile
> @@ -0,0 +1,18 @@

Makefile needs a SPDX-License-Identifier.

> +obj-y := core.o instrumentation.o hooks.o report.o shadow.o
> +
> +KMSAN_SANITIZE := n
> +KCOV_INSTRUMENT := n
> +UBSAN_SANITIZE := n
> +
> +# Disable instrumentation of KMSAN runtime with other tools.
> +CC_FLAGS_KMSAN_RUNTIME := -fno-stack-protector
> +CC_FLAGS_KMSAN_RUNTIME += $(call cc-option,-fno-conserve-stack)
> +CC_FLAGS_KMSAN_RUNTIME += -DDISABLE_BRANCH_PROFILING
> +
> +CFLAGS_REMOVE.o = $(CC_FLAGS_FTRACE)
> +
> +CFLAGS_core.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_hooks.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_instrumentation.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_report.o := $(CC_FLAGS_KMSAN_RUNTIME)
> +CFLAGS_shadow.o := $(CC_FLAGS_KMSAN_RUNTIME)
> diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c
> new file mode 100644
> index 0000000000000..933d864d9d467
> --- /dev/null
> +++ b/mm/kmsan/core.c
> @@ -0,0 +1,458 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN runtime library.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <asm/page.h>
> +#include <linux/compiler.h>
> +#include <linux/export.h>
> +#include <linux/highmem.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/kmsan.h>
> +#include <linux/memory.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/mmzone.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/preempt.h>
> +#include <linux/slab.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/types.h>
> +#include <linux/vmalloc.h>
> +
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Avoid creating too long origin chains, these are unlikely to participate in
> + * real reports.
> + */
> +#define MAX_CHAIN_DEPTH 7
> +#define NUM_SKIPPED_TO_WARN 10000
> +
> +bool kmsan_enabled __read_mostly;
> +
> +/*
> + * Per-CPU KMSAN context to be used in interrupts, where current->kmsan is
> + * unavaliable.
> + */
> +DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> + unsigned int poison_flags)
> +{
> + u32 extra_bits =
> + kmsan_extra_bits(/*depth*/ 0, poison_flags & KMSAN_POISON_FREE);
> + bool checked = poison_flags & KMSAN_POISON_CHECK;
> + depot_stack_handle_t handle;
> +
> + handle = kmsan_save_stack_with_flags(flags, extra_bits);
> + kmsan_internal_set_shadow_origin(address, size, -1, handle, checked);
> +}
> +
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked)
> +{
> + kmsan_internal_set_shadow_origin(address, size, 0, 0, checked);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> + unsigned int extra)
> +{
> + unsigned long entries[KMSAN_STACK_DEPTH];
> + unsigned int nr_entries;
> +
> + nr_entries = stack_trace_save(entries, KMSAN_STACK_DEPTH, 0);
> +
> + /* Don't sleep (see might_sleep_if() in __alloc_pages_nodemask()). */
> + flags &= ~__GFP_DIRECT_RECLAIM;
> +
> + return __stack_depot_save(entries, nr_entries, extra, flags, true);
> +}
> +
> +/* Copy the metadata following the memmove() behavior. */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n)
> +{
> + depot_stack_handle_t old_origin = 0, new_origin = 0;
> + int src_slots, dst_slots, i, iter, step, skip_bits;
> + depot_stack_handle_t *origin_src, *origin_dst;
> + void *shadow_src, *shadow_dst;
> + u32 *align_shadow_src, shadow;
> + bool backwards;
> +
> + shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
> + if (!shadow_dst)
> + return;
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
> +
> + shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
> + if (!shadow_src) {
> + /*
> + * |src| is untracked: zero out destination shadow, ignore the

Probably doesn't matter too much, but for consistency elsewhere - @src?

> + * origins, we're done.
> + */
> + __memset(shadow_dst, 0, n);
> + return;
> + }
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(src, n));
> +
> + __memmove(shadow_dst, shadow_src, n);
> +
> + origin_dst = kmsan_get_metadata(dst, KMSAN_META_ORIGIN);
> + origin_src = kmsan_get_metadata(src, KMSAN_META_ORIGIN);
> + KMSAN_WARN_ON(!origin_dst || !origin_src);
> + src_slots = (ALIGN((u64)src + n, KMSAN_ORIGIN_SIZE) -
> + ALIGN_DOWN((u64)src, KMSAN_ORIGIN_SIZE)) /
> + KMSAN_ORIGIN_SIZE;
> + dst_slots = (ALIGN((u64)dst + n, KMSAN_ORIGIN_SIZE) -
> + ALIGN_DOWN((u64)dst, KMSAN_ORIGIN_SIZE)) /
> + KMSAN_ORIGIN_SIZE;
> + KMSAN_WARN_ON((src_slots < 1) || (dst_slots < 1));
> + KMSAN_WARN_ON((src_slots - dst_slots > 1) ||
> + (dst_slots - src_slots < -1));
> +
> + backwards = dst > src;
> + i = backwards ? min(src_slots, dst_slots) - 1 : 0;
> + iter = backwards ? -1 : 1;
> +
> + align_shadow_src =
> + (u32 *)ALIGN_DOWN((u64)shadow_src, KMSAN_ORIGIN_SIZE);
> + for (step = 0; step < min(src_slots, dst_slots); step++, i += iter) {
> + KMSAN_WARN_ON(i < 0);
> + shadow = align_shadow_src[i];
> + if (i == 0) {
> + /*
> + * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
> + * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
> + * of the first shadow slot.
> + */
> + skip_bits = ((u64)src % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow >> skip_bits) << skip_bits;
> + }
> + if (i == src_slots - 1) {
> + /*
> + * If |src + n| isn't aligned on
> + * KMSAN_ORIGIN_SIZE, don't look at the last
> + * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
> + * last shadow slot.
> + */
> + skip_bits = (((u64)src + n) % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow << skip_bits) >> skip_bits;
> + }
> + /*
> + * Overwrite the origin only if the corresponding
> + * shadow is nonempty.
> + */
> + if (origin_src[i] && (origin_src[i] != old_origin) && shadow) {
> + old_origin = origin_src[i];
> + new_origin = kmsan_internal_chain_origin(old_origin);
> + /*
> + * kmsan_internal_chain_origin() may return
> + * NULL, but we don't want to lose the previous
> + * origin value.
> + */
> + if (!new_origin)
> + new_origin = old_origin;
> + }
> + if (shadow)
> + origin_dst[i] = new_origin;
> + else
> + origin_dst[i] = 0;
> + }
> + /*
> + * If dst_slots is greater than src_slots (i.e.
> + * dst_slots == src_slots + 1), there is an extra origin slot at the
> + * beginning or end of the destination buffer, for which we take the
> + * origin from the previous slot.
> + * This is only done if the part of the source shadow corresponding to
> + * slot is non-zero.
> + *
> + * E.g. if we copy 8 aligned bytes that are marked as uninitialized
> + * and have origins o111 and o222, to an unaligned buffer with offset 1,
> + * these two origins are copied to three origin slots, so one of then
> + * needs to be duplicated, depending on the copy direction (@backwards)
> + *
> + * src shadow: |uuuu|uuuu|....|
> + * src origin: |o111|o222|....|
> + *
> + * backwards = 0:
> + * dst shadow: |.uuu|uuuu|u...|
> + * dst origin: |....|o111|o222| - fill the empty slot with o111
> + * backwards = 1:
> + * dst shadow: |.uuu|uuuu|u...|
> + * dst origin: |o111|o222|....| - fill the empty slot with o222
> + */
> + if (src_slots < dst_slots) {
> + if (backwards) {
> + shadow = align_shadow_src[src_slots - 1];
> + skip_bits = (((u64)dst + n) % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow << skip_bits) >> skip_bits;
> + if (shadow)
> + /* src_slots > 0, therefore dst_slots is at least 2 */
> + origin_dst[dst_slots - 1] = origin_dst[dst_slots - 2];
> + } else {
> + shadow = align_shadow_src[0];
> + skip_bits = ((u64)dst % KMSAN_ORIGIN_SIZE) * 8;
> + shadow = (shadow >> skip_bits) << skip_bits;
> + if (shadow)
> + origin_dst[0] = origin_dst[1];
> + }
> + }
> +}
> +
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
> +{
> + unsigned long entries[3];
> + u32 extra_bits;
> + int depth;
> + bool uaf;
> +
> + if (!id)
> + return id;
> + /*
> + * Make sure we have enough spare bits in |id| to hold the UAF bit and
> + * the chain depth.
> + */
> + BUILD_BUG_ON((1 << STACK_DEPOT_EXTRA_BITS) <= (MAX_CHAIN_DEPTH << 1));
> +
> + extra_bits = stack_depot_get_extra_bits(id);
> + depth = kmsan_depth_from_eb(extra_bits);
> + uaf = kmsan_uaf_from_eb(extra_bits);
> +
> + if (depth >= MAX_CHAIN_DEPTH) {
> + static atomic_long_t kmsan_skipped_origins;
> + long skipped = atomic_long_inc_return(&kmsan_skipped_origins);
> +
> + if (skipped % NUM_SKIPPED_TO_WARN == 0) {
> + pr_warn("not chained %ld origins\n", skipped);
> + dump_stack();
> + kmsan_print_origin(id);
> + }
> + return id;
> + }
> + depth++;
> + extra_bits = kmsan_extra_bits(depth, uaf);
> +
> + entries[0] = KMSAN_CHAIN_MAGIC_ORIGIN;
> + entries[1] = kmsan_save_stack_with_flags(GFP_ATOMIC, 0);
> + entries[2] = id;
> + /*
> + * @entries is a local var in non-instrumented code, so KMSAN does not
> + * know it is initialized. Explicitly unpoison it to avoid false
> + * positives when __stack_depot_save() passes it to instrumented code.
> + */
> + kmsan_internal_unpoison_memory(entries, sizeof(entries), false);
> + return __stack_depot_save(entries, ARRAY_SIZE(entries), extra_bits,
> + GFP_ATOMIC, true);
> +}
> +
> +void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b,
> + u32 origin, bool checked)
> +{
> + u64 address = (u64)addr;
> + void *shadow_start;
> + u32 *origin_start;
> + size_t pad = 0;
> + int i;
> +
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> + shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW);
> + if (!shadow_start) {
> + /*
> + * kmsan_metadata_is_contiguous() is true, so either all shadow
> + * and origin pages are NULL, or all are non-NULL.
> + */
> + if (checked) {
> + pr_err("%s: not memsetting %ld bytes starting at %px, because the shadow is NULL\n",
> + __func__, size, addr);
> + KMSAN_WARN_ON(true);
> + }
> + return;
> + }
> + __memset(shadow_start, b, size);
> +
> + if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) {
> + pad = address % KMSAN_ORIGIN_SIZE;
> + address -= pad;
> + size += pad;
> + }
> + size = ALIGN(size, KMSAN_ORIGIN_SIZE);
> + origin_start =
> + (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
> +
> + for (i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
> + origin_start[i] = origin;
> +}
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
> +{
> + struct page *page;
> +
> + if (!kmsan_internal_is_vmalloc_addr(vaddr) &&
> + !kmsan_internal_is_module_addr(vaddr))
> + return NULL;
> + page = vmalloc_to_page(vaddr);
> + if (pfn_valid(page_to_pfn(page)))
> + return page;
> + else
> + return NULL;
> +}
> +
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> + int reason)
> +{
> + depot_stack_handle_t cur_origin = 0, new_origin = 0;
> + unsigned long addr64 = (unsigned long)addr;
> + depot_stack_handle_t *origin = NULL;
> + unsigned char *shadow = NULL;
> + int cur_off_start = -1;
> + int i, chunk_size;
> + size_t pos = 0;
> +
> + if (!size)
> + return;
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
> + while (pos < size) {
> + chunk_size = min(size - pos,
> + PAGE_SIZE - ((addr64 + pos) % PAGE_SIZE));
> + shadow = kmsan_get_metadata((void *)(addr64 + pos),
> + KMSAN_META_SHADOW);
> + if (!shadow) {
> + /*
> + * This page is untracked. If there were uninitialized
> + * bytes before, report them.
> + */
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos - 1, user_addr,
> + reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = 0;
> + cur_off_start = -1;
> + pos += chunk_size;
> + continue;
> + }
> + for (i = 0; i < chunk_size; i++) {
> + if (!shadow[i]) {
> + /*
> + * This byte is unpoisoned. If there were
> + * poisoned bytes before, report them.
> + */
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos + i - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = 0;
> + cur_off_start = -1;
> + continue;
> + }
> + origin = kmsan_get_metadata((void *)(addr64 + pos + i),
> + KMSAN_META_ORIGIN);
> + KMSAN_WARN_ON(!origin);
> + new_origin = *origin;
> + /*
> + * Encountered new origin - report the previous
> + * uninitialized range.
> + */
> + if (cur_origin != new_origin) {
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size,
> + cur_off_start, pos + i - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> + cur_origin = new_origin;
> + cur_off_start = pos + i;
> + }
> + }
> + pos += chunk_size;
> + }
> + KMSAN_WARN_ON(pos != size);
> + if (cur_origin) {
> + kmsan_enter_runtime();
> + kmsan_report(cur_origin, addr, size, cur_off_start, pos - 1,
> + user_addr, reason);
> + kmsan_leave_runtime();
> + }
> +}
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size)
> +{
> + char *cur_shadow = NULL, *next_shadow = NULL, *cur_origin = NULL,
> + *next_origin = NULL;
> + u64 cur_addr = (u64)addr, next_addr = cur_addr + PAGE_SIZE;
> + depot_stack_handle_t *origin_p;
> + bool all_untracked = false;
> +
> + if (!size)
> + return true;
> +
> + /* The whole range belongs to the same page. */
> + if (ALIGN_DOWN(cur_addr + size - 1, PAGE_SIZE) ==
> + ALIGN_DOWN(cur_addr, PAGE_SIZE))
> + return true;
> +
> + cur_shadow = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ false);
> + if (!cur_shadow)
> + all_untracked = true;
> + cur_origin = kmsan_get_metadata((void *)cur_addr, /*is_origin*/ true);
> + if (all_untracked && cur_origin)
> + goto report;
> +
> + for (; next_addr < (u64)addr + size;
> + cur_addr = next_addr, cur_shadow = next_shadow,
> + cur_origin = next_origin, next_addr += PAGE_SIZE) {
> + next_shadow = kmsan_get_metadata((void *)next_addr, false);
> + next_origin = kmsan_get_metadata((void *)next_addr, true);
> + if (all_untracked) {
> + if (next_shadow || next_origin)
> + goto report;
> + if (!next_shadow && !next_origin)
> + continue;
> + }
> + if (((u64)cur_shadow == ((u64)next_shadow - PAGE_SIZE)) &&
> + ((u64)cur_origin == ((u64)next_origin - PAGE_SIZE)))
> + continue;
> + goto report;
> + }
> + return true;
> +
> +report:
> + pr_err("%s: attempting to access two shadow page ranges.\n", __func__);
> + pr_err("Access of size %ld at %px.\n", size, addr);
> + pr_err("Addresses belonging to different ranges: %px and %px\n",
> + (void *)cur_addr, (void *)next_addr);
> + pr_err("page[0].shadow: %px, page[1].shadow: %px\n", cur_shadow,
> + next_shadow);
> + pr_err("page[0].origin: %px, page[1].origin: %px\n", cur_origin,
> + next_origin);
> + origin_p = kmsan_get_metadata(addr, KMSAN_META_ORIGIN);
> + if (origin_p) {
> + pr_err("Origin: %08x\n", *origin_p);
> + kmsan_print_origin(*origin_p);
> + } else {
> + pr_err("Origin: unavailable\n");
> + }
> + return false;
> +}
> +
> +bool kmsan_internal_is_module_addr(void *vaddr)
> +{
> + return ((u64)vaddr >= MODULES_VADDR) && ((u64)vaddr < MODULES_END);
> +}
> +
> +bool kmsan_internal_is_vmalloc_addr(void *addr)
> +{
> + return ((u64)addr >= VMALLOC_START) && ((u64)addr < VMALLOC_END);
> +}
> diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
> new file mode 100644
> index 0000000000000..4ac62fa67a02a
> --- /dev/null
> +++ b/mm/kmsan/hooks.c
> @@ -0,0 +1,66 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN hooks for kernel subsystems.
> + *
> + * These functions handle creation of KMSAN metadata for memory allocations.
> + *
> + * Copyright (C) 2018-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <linux/cacheflush.h>
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/mm_types.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +
> +#include "../internal.h"
> +#include "../slab.h"
> +#include "kmsan.h"
> +
> +/*
> + * Instrumented functions shouldn't be called under
> + * kmsan_enter_runtime()/kmsan_leave_runtime(), because this will lead to
> + * skipping effects of functions like memset() inside instrumented code.
> + */
> +
> +/* Functions from kmsan-checks.h follow. */
> +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + /* The users may want to poison/unpoison random memory. */
> + kmsan_internal_poison_memory((void *)address, size, flags,
> + KMSAN_POISON_NOCHECK);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(kmsan_poison_memory);
> +
> +void kmsan_unpoison_memory(const void *address, size_t size)
> +{
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + kmsan_enter_runtime();
> + /* The users may want to poison/unpoison random memory. */
> + kmsan_internal_unpoison_memory((void *)address, size,
> + KMSAN_POISON_NOCHECK);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(kmsan_unpoison_memory);
> +
> +void kmsan_check_memory(const void *addr, size_t size)
> +{
> + if (!kmsan_enabled)
> + return;
> + return kmsan_internal_check_memory((void *)addr, size, /*user_addr*/ 0,
> + REASON_ANY);
> +}
> +EXPORT_SYMBOL(kmsan_check_memory);
> diff --git a/mm/kmsan/instrumentation.c b/mm/kmsan/instrumentation.c
> new file mode 100644
> index 0000000000000..fe062d123a76f
> --- /dev/null
> +++ b/mm/kmsan/instrumentation.c
> @@ -0,0 +1,267 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN compiler API.
> + *
> + * This file implements __msan_XXX hooks that Clang inserts into the code
> + * compiled with -fsanitize=kernel-memory.
> + * See Documentation/dev-tools/kmsan.rst for more information on how KMSAN
> + * instrumentation works.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include "kmsan.h"
> +#include <linux/gfp.h>
> +#include <linux/mm.h>
> +#include <linux/uaccess.h>
> +
> +static inline bool is_bad_asm_addr(void *addr, uintptr_t size, bool is_store)
> +{
> + if ((u64)addr < TASK_SIZE)
> + return true;
> + if (!kmsan_get_metadata(addr, KMSAN_META_SHADOW))
> + return true;
> + return false;
> +}
> +
> +static inline struct shadow_origin_ptr
> +get_shadow_origin_ptr(void *addr, u64 size, bool store)
> +{
> + unsigned long ua_flags = user_access_save();
> + struct shadow_origin_ptr ret;
> +
> + ret = kmsan_get_shadow_origin_ptr(addr, size, store);
> + user_access_restore(ua_flags);
> + return ret;
> +}
> +
> +/* Get shadow and origin pointers for a memory load with non-standard size. */
> +struct shadow_origin_ptr __msan_metadata_ptr_for_load_n(void *addr,
> + uintptr_t size)
> +{
> + return get_shadow_origin_ptr(addr, size, /*store*/ false);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_load_n);
> +
> +/* Get shadow and origin pointers for a memory store with non-standard size. */
> +struct shadow_origin_ptr __msan_metadata_ptr_for_store_n(void *addr,
> + uintptr_t size)
> +{
> + return get_shadow_origin_ptr(addr, size, /*store*/ true);
> +}
> +EXPORT_SYMBOL(__msan_metadata_ptr_for_store_n);
> +
> +/*
> + * Declare functions that obtain shadow/origin pointers for loads and stores
> + * with fixed size.
> + */
> +#define DECLARE_METADATA_PTR_GETTER(size) \
> + struct shadow_origin_ptr __msan_metadata_ptr_for_load_##size( \
> + void *addr) \
> + { \
> + return get_shadow_origin_ptr(addr, size, /*store*/ false); \
> + } \
> + EXPORT_SYMBOL(__msan_metadata_ptr_for_load_##size); \
> + struct shadow_origin_ptr __msan_metadata_ptr_for_store_##size( \
> + void *addr) \
> + { \
> + return get_shadow_origin_ptr(addr, size, /*store*/ true); \
> + } \
> + EXPORT_SYMBOL(__msan_metadata_ptr_for_store_##size)
> +
> +DECLARE_METADATA_PTR_GETTER(1);
> +DECLARE_METADATA_PTR_GETTER(2);
> +DECLARE_METADATA_PTR_GETTER(4);
> +DECLARE_METADATA_PTR_GETTER(8);
> +
> +/*
> + * Handle a memory store performed by inline assembly. KMSAN conservatively
> + * attempts to unpoison the outputs of asm() directives to prevent false
> + * positives caused by missed stores.
> + */
> +void __msan_instrument_asm_store(void *addr, uintptr_t size)
> +{
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + /*
> + * Most of the accesses are below 32 bytes. The two exceptions so far
> + * are clwb() (64 bytes) and FPU state (512 bytes).
> + * It's unlikely that the assembly will touch more than 512 bytes.
> + */
> + if (size > 512) {
> + WARN_ONCE(1, "assembly store size too big: %ld\n", size);
> + size = 8;
> + }
> + if (is_bad_asm_addr(addr, size, /*is_store*/ true)) {
> + user_access_restore(ua_flags);
> + return;
> + }
> + kmsan_enter_runtime();
> + /* Unpoisoning the memory on best effort. */
> + kmsan_internal_unpoison_memory(addr, size, /*checked*/ false);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_instrument_asm_store);
> +
> +/* Handle llvm.memmove intrinsic. */
> +void *__msan_memmove(void *dst, const void *src, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memmove(dst, src, n);
> + if (!n)
> + /* Some people call memmove() with zero length. */
> + return result;
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memmove);
> +
> +/* Handle llvm.memcpy intrinsic. */
> +void *__msan_memcpy(void *dst, const void *src, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memcpy(dst, src, n);
> + if (!n)
> + /* Some people call memcpy() with zero length. */
> + return result;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + /* Using memmove instead of memcpy doesn't affect correctness. */
> + kmsan_internal_memmove_metadata(dst, (void *)src, n);
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memcpy);
> +
> +/* Handle llvm.memset intrinsic. */
> +void *__msan_memset(void *dst, int c, uintptr_t n)
> +{
> + void *result;
> +
> + result = __memset(dst, c, n);
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return result;
> +
> + kmsan_enter_runtime();
> + /*
> + * Clang doesn't pass parameter metadata here, so it is impossible to
> + * use shadow of @c to set up the shadow for @dst.
> + */
> + kmsan_internal_unpoison_memory(dst, n, /*checked*/ false);
> + kmsan_leave_runtime();
> +
> + return result;
> +}
> +EXPORT_SYMBOL(__msan_memset);
> +
> +/*
> + * Create a new origin from an old one. This is done when storing an
> + * uninitialized value to memory. When reporting an error, KMSAN unrolls and
> + * prints the whole chain of stores that preceded the use of this value.
> + */
> +depot_stack_handle_t __msan_chain_origin(depot_stack_handle_t origin)
> +{
> + depot_stack_handle_t ret = 0;
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return ret;
> +
> + ua_flags = user_access_save();
> +
> + /* Creating new origins may allocate memory. */
> + kmsan_enter_runtime();
> + ret = kmsan_internal_chain_origin(origin);
> + kmsan_leave_runtime();
> + user_access_restore(ua_flags);
> + return ret;
> +}
> +EXPORT_SYMBOL(__msan_chain_origin);
> +
> +/* Poison a local variable when entering a function. */
> +void __msan_poison_alloca(void *address, uintptr_t size, char *descr)
> +{
> + depot_stack_handle_t handle;
> + unsigned long entries[4];
> + unsigned long ua_flags;
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + ua_flags = user_access_save();
> + entries[0] = KMSAN_ALLOCA_MAGIC_ORIGIN;
> + entries[1] = (u64)descr;
> + entries[2] = (u64)__builtin_return_address(0);
> + /*
> + * With frame pointers enabled, it is possible to quickly fetch the
> + * second frame of the caller stack without calling the unwinder.
> + * Without them, simply do not bother.
> + */
> + if (IS_ENABLED(CONFIG_UNWINDER_FRAME_POINTER))
> + entries[3] = (u64)__builtin_return_address(1);
> + else
> + entries[3] = 0;
> +
> + /* stack_depot_save() may allocate memory. */
> + kmsan_enter_runtime();
> + handle = stack_depot_save(entries, ARRAY_SIZE(entries), GFP_ATOMIC);
> + kmsan_leave_runtime();
> +
> + kmsan_internal_set_shadow_origin(address, size, -1, handle,
> + /*checked*/ true);
> + user_access_restore(ua_flags);
> +}
> +EXPORT_SYMBOL(__msan_poison_alloca);
> +
> +/* Unpoison a local variable. */
> +void __msan_unpoison_alloca(void *address, uintptr_t size)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> +
> + kmsan_enter_runtime();
> + kmsan_internal_unpoison_memory(address, size, /*checked*/ true);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_unpoison_alloca);
> +
> +/*
> + * Report that an uninitialized value with the given origin was used in a way
> + * that constituted undefined behavior.
> + */
> +void __msan_warning(u32 origin)
> +{
> + if (!kmsan_enabled || kmsan_in_runtime())
> + return;
> + kmsan_enter_runtime();
> + kmsan_report(origin, /*address*/ 0, /*size*/ 0,
> + /*off_first*/ 0, /*off_last*/ 0, /*user_addr*/ 0,
> + REASON_ANY);
> + kmsan_leave_runtime();
> +}
> +EXPORT_SYMBOL(__msan_warning);
> +
> +/*
> + * At the beginning of an instrumented function, obtain the pointer to
> + * `struct kmsan_context_state` holding the metadata for function parameters.
> + */
> +struct kmsan_context_state *__msan_get_context_state(void)
> +{
> + return &kmsan_get_context()->cstate;
> +}
> +EXPORT_SYMBOL(__msan_get_context_state);
> diff --git a/mm/kmsan/kmsan.h b/mm/kmsan/kmsan.h
> new file mode 100644
> index 0000000000000..bfe38789950a6
> --- /dev/null
> +++ b/mm/kmsan/kmsan.h
> @@ -0,0 +1,183 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Functions used by the KMSAN runtime.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#ifndef __MM_KMSAN_KMSAN_H
> +#define __MM_KMSAN_KMSAN_H
> +
> +#include <asm/pgtable_64_types.h>
> +#include <linux/irqflags.h>
> +#include <linux/sched.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/nmi.h>
> +#include <linux/mm.h>
> +#include <linux/printk.h>
> +
> +#define KMSAN_ALLOCA_MAGIC_ORIGIN 0xabcd0100
> +#define KMSAN_CHAIN_MAGIC_ORIGIN 0xabcd0200
> +
> +#define KMSAN_POISON_NOCHECK 0x0
> +#define KMSAN_POISON_CHECK 0x1
> +#define KMSAN_POISON_FREE 0x2
> +
> +#define KMSAN_ORIGIN_SIZE 4
> +
> +#define KMSAN_STACK_DEPTH 64
> +
> +#define KMSAN_META_SHADOW (false)
> +#define KMSAN_META_ORIGIN (true)
> +
> +extern bool kmsan_enabled;
> +extern int panic_on_kmsan;
> +
> +/*
> + * KMSAN performs a lot of consistency checks that are currently enabled by
> + * default. BUG_ON is normally discouraged in the kernel, unless used for
> + * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
> + * recover if something goes wrong.
> + */
> +#define KMSAN_WARN_ON(cond) \
> + ({ \
> + const bool __cond = WARN_ON(cond); \
> + if (unlikely(__cond)) { \
> + WRITE_ONCE(kmsan_enabled, false); \
> + if (panic_on_kmsan) { \
> + /* Can't call panic() here because */ \
> + /* of uaccess checks.*/ \

space after '.'

> + BUG(); \
> + } \
> + } \
> + __cond; \
> + })
> +
> +/*
> + * A pair of metadata pointers to be returned by the instrumentation functions.
> + */
> +struct shadow_origin_ptr {
> + void *shadow, *origin;
> +};
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *addr, u64 size,
> + bool store);
> +void *kmsan_get_metadata(void *addr, bool is_origin);
> +
> +enum kmsan_bug_reason {
> + REASON_ANY,
> + REASON_COPY_TO_USER,
> + REASON_SUBMIT_URB,
> +};
> +
> +void kmsan_print_origin(depot_stack_handle_t origin);
> +
> +/**
> + * kmsan_report() - Report a use of uninitialized value.
> + * @origin: Stack ID of the uninitialized value.
> + * @address: Address at which the memory access happens.
> + * @size: Memory access size.
> + * @off_first: Offset (from @address) of the first byte to be reported.
> + * @off_last: Offset (from @address) of the last byte to be reported.
> + * @user_addr: When non-NULL, denotes the userspace address to which the kernel
> + * is leaking data.
> + * @reason: Error type from enum kmsan_bug_reason.
> + *
> + * kmsan_report() prints an error message for a consequent group of bytes
> + * sharing the same origin. If an uninitialized value is used in a comparison,
> + * this function is called once without specifying the addresses. When checking
> + * a memory range, KMSAN may call kmsan_report() multiple times with the same
> + * @address, @size, @user_addr and @reason, but different @off_first and
> + * @off_last corresponding to different @origin values.
> + */
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> + int off_first, int off_last, const void *user_addr,
> + enum kmsan_bug_reason reason);
> +
> +DECLARE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
> +
> +static __always_inline struct kmsan_ctx *kmsan_get_context(void)
> +{
> + return in_task() ? &current->kmsan_ctx : raw_cpu_ptr(&kmsan_percpu_ctx);
> +}
> +
> +/*
> + * When a compiler hook is invoked, it may make a call to instrumented code
> + * and eventually call itself recursively. To avoid that, we protect the
> + * runtime entry points with kmsan_enter_runtime()/kmsan_leave_runtime() and
> + * exit the hook if kmsan_in_runtime() is true.
> + */
> +
> +static __always_inline bool kmsan_in_runtime(void)
> +{
> + if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> + return true;
> + return kmsan_get_context()->kmsan_in_runtime;
> +}
> +
> +static __always_inline void kmsan_enter_runtime(void)
> +{
> + struct kmsan_ctx *ctx;
> +
> + ctx = kmsan_get_context();
> + KMSAN_WARN_ON(ctx->kmsan_in_runtime++);
> +}
> +
> +static __always_inline void kmsan_leave_runtime(void)
> +{
> + struct kmsan_ctx *ctx = kmsan_get_context();
> +
> + KMSAN_WARN_ON(--ctx->kmsan_in_runtime);
> +}
> +
> +depot_stack_handle_t kmsan_save_stack(void);
> +depot_stack_handle_t kmsan_save_stack_with_flags(gfp_t flags,
> + unsigned int extra_bits);
> +
> +/*
> + * Pack and unpack the origin chain depth and UAF flag to/from the extra bits
> + * provided by the stack depot.
> + * The UAF flag is stored in the lowest bit, followed by the depth in the upper
> + * bits.
> + * set_dsh_extra_bits() is responsible for clamping the value.
> + */
> +static __always_inline unsigned int kmsan_extra_bits(unsigned int depth,
> + bool uaf)
> +{
> + return (depth << 1) | uaf;
> +}
> +
> +static __always_inline bool kmsan_uaf_from_eb(unsigned int extra_bits)
> +{
> + return extra_bits & 1;
> +}
> +
> +static __always_inline unsigned int kmsan_depth_from_eb(unsigned int extra_bits)
> +{
> + return extra_bits >> 1;
> +}
> +
> +/*
> + * kmsan_internal_ functions are supposed to be very simple and not require the
> + * kmsan_in_runtime() checks.
> + */
> +void kmsan_internal_memmove_metadata(void *dst, void *src, size_t n);
> +void kmsan_internal_poison_memory(void *address, size_t size, gfp_t flags,
> + unsigned int poison_flags);
> +void kmsan_internal_unpoison_memory(void *address, size_t size, bool checked);
> +void kmsan_internal_set_shadow_origin(void *address, size_t size, int b,
> + u32 origin, bool checked);
> +depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id);
> +
> +bool kmsan_metadata_is_contiguous(void *addr, size_t size);
> +void kmsan_internal_check_memory(void *addr, size_t size, const void *user_addr,
> + int reason);
> +bool kmsan_internal_is_module_addr(void *vaddr);
> +bool kmsan_internal_is_vmalloc_addr(void *addr);
> +
> +struct page *kmsan_vmalloc_to_page_or_null(void *vaddr);
> +
> +#endif /* __MM_KMSAN_KMSAN_H */
> diff --git a/mm/kmsan/report.c b/mm/kmsan/report.c
> new file mode 100644
> index 0000000000000..f36fca452e313
> --- /dev/null
> +++ b/mm/kmsan/report.c
> @@ -0,0 +1,211 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN error reporting routines.
> + *
> + * Copyright (C) 2019-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <linux/console.h>
> +#include <linux/moduleparam.h>
> +#include <linux/stackdepot.h>
> +#include <linux/stacktrace.h>
> +#include <linux/uaccess.h>
> +
> +#include "kmsan.h"
> +
> +static DEFINE_SPINLOCK(kmsan_report_lock);
> +#define DESCR_SIZE 128
> +/* Protected by kmsan_report_lock */
> +static char report_local_descr[DESCR_SIZE];
> +int panic_on_kmsan __read_mostly;
> +
> +#ifdef MODULE_PARAM_PREFIX
> +#undef MODULE_PARAM_PREFIX
> +#endif
> +#define MODULE_PARAM_PREFIX "kmsan."
> +module_param_named(panic, panic_on_kmsan, int, 0);
> +
> +/*
> + * Skip internal KMSAN frames.
> + */
> +static int get_stack_skipnr(const unsigned long stack_entries[],
> + int num_entries)
> +{
> + int len, skip;
> + char buf[64];
> +
> + for (skip = 0; skip < num_entries; ++skip) {
> + len = scnprintf(buf, sizeof(buf), "%ps",
> + (void *)stack_entries[skip]);
> +
> + /* Never show __msan_* or kmsan_* functions. */
> + if ((strnstr(buf, "__msan_", len) == buf) ||
> + (strnstr(buf, "kmsan_", len) == buf))
> + continue;
> +
> + /*
> + * No match for runtime functions -- @skip entries to skip to
> + * get to first frame of interest.
> + */
> + break;
> + }
> +
> + return skip;
> +}
> +
> +/*
> + * Currently the descriptions of locals generated by Clang look as follows:
> + * ----local_name@function_name
> + * We want to print only the name of the local, as other information in that
> + * description can be confusing.
> + * The meaningful part of the description is copied to a global buffer to avoid
> + * allocating memory.
> + */
> +static char *pretty_descr(char *descr)
> +{
> + int i, pos = 0, len = strlen(descr);
> +
> + for (i = 0; i < len; i++) {
> + if (descr[i] == '@')
> + break;
> + if (descr[i] == '-')
> + continue;
> + report_local_descr[pos] = descr[i];
> + if (pos + 1 == DESCR_SIZE)
> + break;
> + pos++;
> + }
> + report_local_descr[pos] = 0;
> + return report_local_descr;
> +}
> +
> +void kmsan_print_origin(depot_stack_handle_t origin)
> +{
> + unsigned long *entries = NULL, *chained_entries = NULL;
> + unsigned int nr_entries, chained_nr_entries, skipnr;
> + void *pc1 = NULL, *pc2 = NULL;
> + depot_stack_handle_t head;
> + unsigned long magic;
> + char *descr = NULL;
> +
> + if (!origin)
> + return;
> +
> + while (true) {
> + nr_entries = stack_depot_fetch(origin, &entries);
> + magic = nr_entries ? entries[0] : 0;
> + if ((nr_entries == 4) && (magic == KMSAN_ALLOCA_MAGIC_ORIGIN)) {
> + descr = (char *)entries[1];
> + pc1 = (void *)entries[2];
> + pc2 = (void *)entries[3];
> + pr_err("Local variable %s created at:\n",
> + pretty_descr(descr));
> + if (pc1)
> + pr_err(" %pSb\n", pc1);
> + if (pc2)
> + pr_err(" %pSb\n", pc2);
> + break;
> + }
> + if ((nr_entries == 3) && (magic == KMSAN_CHAIN_MAGIC_ORIGIN)) {
> + head = entries[1];
> + origin = entries[2];
> + pr_err("Uninit was stored to memory at:\n");
> + chained_nr_entries =
> + stack_depot_fetch(head, &chained_entries);
> + kmsan_internal_unpoison_memory(
> + chained_entries,
> + chained_nr_entries * sizeof(*chained_entries),
> + /*checked*/ false);
> + skipnr = get_stack_skipnr(chained_entries,
> + chained_nr_entries);
> + stack_trace_print(chained_entries + skipnr,
> + chained_nr_entries - skipnr, 0);
> + pr_err("\n");
> + continue;
> + }
> + pr_err("Uninit was created at:\n");
> + if (nr_entries) {
> + skipnr = get_stack_skipnr(entries, nr_entries);
> + stack_trace_print(entries + skipnr, nr_entries - skipnr,
> + 0);
> + } else {
> + pr_err("(stack is not available)\n");
> + }
> + break;
> + }
> +}
> +
> +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> + int off_first, int off_last, const void *user_addr,
> + enum kmsan_bug_reason reason)
> +{
> + unsigned long stack_entries[KMSAN_STACK_DEPTH];
> + int num_stack_entries, skipnr;
> + char *bug_type = NULL;
> + unsigned long flags, ua_flags;
> + bool is_uaf;
> +
> + if (!kmsan_enabled)
> + return;
> + if (!current->kmsan_ctx.allow_reporting)
> + return;
> + if (!origin)
> + return;
> +
> + current->kmsan_ctx.allow_reporting = false;
> + ua_flags = user_access_save();
> + spin_lock_irqsave(&kmsan_report_lock, flags);

I think this might want to be a raw_spin_lock, since the reporting can
be called from any context, including from within other raw_spin_lock'd
critical sections (practically this will only matter in RT kernels).

Also, do you have to do lockdep_off/on() (like kernel/kcsan/report.c
does, see comment there)?

> + pr_err("=====================================================\n");
> + is_uaf = kmsan_uaf_from_eb(stack_depot_get_extra_bits(origin));
> + switch (reason) {
> + case REASON_ANY:
> + bug_type = is_uaf ? "use-after-free" : "uninit-value";
> + break;
> + case REASON_COPY_TO_USER:
> + bug_type = is_uaf ? "kernel-infoleak-after-free" :
> + "kernel-infoleak";
> + break;
> + case REASON_SUBMIT_URB:
> + bug_type = is_uaf ? "kernel-usb-infoleak-after-free" :
> + "kernel-usb-infoleak";
> + break;
> + }
> +
> + num_stack_entries =
> + stack_trace_save(stack_entries, KMSAN_STACK_DEPTH, 1);
> + skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
> +
> + pr_err("BUG: KMSAN: %s in %pSb\n",
> + bug_type, (void *)stack_entries[skipnr]);
> + stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
> + 0);
> + pr_err("\n");
> +
> + kmsan_print_origin(origin);
> +
> + if (size) {
> + pr_err("\n");
> + if (off_first == off_last)
> + pr_err("Byte %d of %d is uninitialized\n", off_first,
> + size);
> + else
> + pr_err("Bytes %d-%d of %d are uninitialized\n",
> + off_first, off_last, size);
> + }
> + if (address)
> + pr_err("Memory access of size %d starts at %px\n", size,
> + address);
> + if (user_addr && reason == REASON_COPY_TO_USER)
> + pr_err("Data copied to user address %px\n", user_addr);
> + pr_err("\n");
> + dump_stack_print_info(KERN_ERR);
> + pr_err("=====================================================\n");
> + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
> + spin_unlock_irqrestore(&kmsan_report_lock, flags);
> + if (panic_on_kmsan)
> + panic("kmsan.panic set ...\n");
> + user_access_restore(ua_flags);
> + current->kmsan_ctx.allow_reporting = true;
> +}
> diff --git a/mm/kmsan/shadow.c b/mm/kmsan/shadow.c
> new file mode 100644
> index 0000000000000..de58cfbc55b9d
> --- /dev/null
> +++ b/mm/kmsan/shadow.c
> @@ -0,0 +1,186 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KMSAN shadow implementation.
> + *
> + * Copyright (C) 2017-2022 Google LLC
> + * Author: Alexander Potapenko <[email protected]>
> + *
> + */
> +
> +#include <asm/page.h>
> +#include <asm/pgtable_64_types.h>
> +#include <asm/tlbflush.h>
> +#include <linux/cacheflush.h>
> +#include <linux/memblock.h>
> +#include <linux/mm_types.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/slab.h>
> +#include <linux/smp.h>
> +#include <linux/stddef.h>
> +
> +#include "../internal.h"
> +#include "kmsan.h"
> +
> +#define shadow_page_for(page) ((page)->kmsan_shadow)
> +
> +#define origin_page_for(page) ((page)->kmsan_origin)
> +
> +static void *shadow_ptr_for(struct page *page)
> +{
> + return page_address(shadow_page_for(page));
> +}
> +
> +static void *origin_ptr_for(struct page *page)
> +{
> + return page_address(origin_page_for(page));
> +}
> +
> +static bool page_has_metadata(struct page *page)
> +{
> + return shadow_page_for(page) && origin_page_for(page);
> +}
> +
> +static void set_no_shadow_origin_page(struct page *page)
> +{
> + shadow_page_for(page) = NULL;
> + origin_page_for(page) = NULL;
> +}
> +
> +/*
> + * Dummy load and store pages to be used when the real metadata is unavailable.
> + * There are separate pages for loads and stores, so that every load returns a
> + * zero, and every store doesn't affect other loads.
> + */
> +static char dummy_load_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +static char dummy_store_page[PAGE_SIZE] __aligned(PAGE_SIZE);
> +
> +/*
> + * Taken from arch/x86/mm/physaddr.h to avoid using an instrumented version.
> + */
> +static int kmsan_phys_addr_valid(unsigned long addr)

int -> bool ? (it already deviates from the original by using IS_ENABLED
instead of #ifdef)

> +{
> + if (IS_ENABLED(CONFIG_PHYS_ADDR_T_64BIT))
> + return !(addr >> boot_cpu_data.x86_phys_bits);
> + else
> + return 1;
> +}
> +
> +/*
> + * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
> + */
> +static bool kmsan_virt_addr_valid(void *addr)
> +{
> + unsigned long x = (unsigned long)addr;
> + unsigned long y = x - __START_KERNEL_map;
> +
> + /* use the carry flag to determine if x was < __START_KERNEL_map */
> + if (unlikely(x > y)) {
> + x = y + phys_base;
> +
> + if (y >= KERNEL_IMAGE_SIZE)
> + return false;
> + } else {
> + x = y + (__START_KERNEL_map - PAGE_OFFSET);
> +
> + /* carry flag will be set if starting x was >= PAGE_OFFSET */
> + if ((x > y) || !kmsan_phys_addr_valid(x))
> + return false;
> + }
> +
> + return pfn_valid(x >> PAGE_SHIFT);
> +}

These seem quite x86-specific - to ease eventual porting to other
architectures, it would make sense to introduce <asm/kmsan.h> which will
have these 2 functions (and if there's anything else arch-specific like
this, moving to <asm/kmsan.h> would help eventual ports).

> +static unsigned long vmalloc_meta(void *addr, bool is_origin)
> +{
> + unsigned long addr64 = (unsigned long)addr, off;
> +
> + KMSAN_WARN_ON(is_origin && !IS_ALIGNED(addr64, KMSAN_ORIGIN_SIZE));
> + if (kmsan_internal_is_vmalloc_addr(addr)) {
> + off = addr64 - VMALLOC_START;
> + return off + (is_origin ? KMSAN_VMALLOC_ORIGIN_START :
> + KMSAN_VMALLOC_SHADOW_START);
> + }
> + if (kmsan_internal_is_module_addr(addr)) {
> + off = addr64 - MODULES_VADDR;
> + return off + (is_origin ? KMSAN_MODULES_ORIGIN_START :
> + KMSAN_MODULES_SHADOW_START);
> + }
> + return 0;
> +}
> +
> +static struct page *virt_to_page_or_null(void *vaddr)
> +{
> + if (kmsan_virt_addr_valid(vaddr))
> + return virt_to_page(vaddr);
> + else
> + return NULL;
> +}
> +
> +struct shadow_origin_ptr kmsan_get_shadow_origin_ptr(void *address, u64 size,
> + bool store)
> +{
> + struct shadow_origin_ptr ret;
> + void *shadow;
> +
> + /*
> + * Even if we redirect this memory access to the dummy page, it will
> + * go out of bounds.
> + */
> + KMSAN_WARN_ON(size > PAGE_SIZE);
> +
> + if (!kmsan_enabled || kmsan_in_runtime())
> + goto return_dummy;
> +
> + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(address, size));
> + shadow = kmsan_get_metadata(address, KMSAN_META_SHADOW);
> + if (!shadow)
> + goto return_dummy;
> +
> + ret.shadow = shadow;
> + ret.origin = kmsan_get_metadata(address, KMSAN_META_ORIGIN);
> + return ret;
> +
> +return_dummy:
> + if (store) {
> + /* Ignore this store. */
> + ret.shadow = dummy_store_page;
> + ret.origin = dummy_store_page;
> + } else {
> + /* This load will return zero. */
> + ret.shadow = dummy_load_page;
> + ret.origin = dummy_load_page;
> + }
> + return ret;
> +}
> +
> +/*
> + * Obtain the shadow or origin pointer for the given address, or NULL if there's
> + * none. The caller must check the return value for being non-NULL if needed.
> + * The return value of this function should not depend on whether we're in the
> + * runtime or not.
> + */
> +void *kmsan_get_metadata(void *address, bool is_origin)
> +{
> + u64 addr = (u64)address, pad, off;
> + struct page *page;
> + void *ret;
> +
> + if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
> + pad = addr % KMSAN_ORIGIN_SIZE;
> + addr -= pad;
> + }
> + address = (void *)addr;
> + if (kmsan_internal_is_vmalloc_addr(address) ||
> + kmsan_internal_is_module_addr(address))
> + return (void *)vmalloc_meta(address, is_origin);
> +
> + page = virt_to_page_or_null(address);
> + if (!page)
> + return NULL;
> + if (!page_has_metadata(page))
> + return NULL;
> + off = addr % PAGE_SIZE;
> +
> + ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;

Just return this and avoid 'ret'?

> + return ret;
> +}
> diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
> new file mode 100644
> index 0000000000000..9793591f9855c
> --- /dev/null
> +++ b/scripts/Makefile.kmsan
> @@ -0,0 +1 @@

Makefile.kmsan needs SPDX-License-Identifier.

> +export CFLAGS_KMSAN := -fsanitize=kernel-memory
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 9f69ecdd7977a..49e6e57fdf4c8 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -157,6 +157,15 @@ _c_flags += $(if $(patsubst n%,, \
> endif
> endif
>
> +ifeq ($(CONFIG_KMSAN),y)
> +_c_flags += $(if $(patsubst n%,, \
> + $(KMSAN_SANITIZE_$(basetarget).o)$(KMSAN_SANITIZE)y), \
> + $(CFLAGS_KMSAN))
> +_c_flags += $(if $(patsubst n%,, \
> + $(KMSAN_ENABLE_CHECKS_$(basetarget).o)$(KMSAN_ENABLE_CHECKS)y), \
> + , -mllvm -msan-disable-checks=1)
> +endif
> +
> ifeq ($(CONFIG_UBSAN),y)
> _c_flags += $(if $(patsubst n%,, \
> $(UBSAN_SANITIZE_$(basetarget).o)$(UBSAN_SANITIZE)$(CONFIG_UBSAN_SANITIZE_ALL)), \
> --
> 2.36.0.rc2.479.g8af0fa9b8e-goog
>

2022-05-02 23:37:42

by kernel test robot

[permalink] [raw]
Subject: [x86] d216de19c8: kernel-selftests.x86.ioperm_32.fail



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: d216de19c8dd97fb6b0eac84fce4362489a61b2e ("[PATCH v3 05/46] x86: asm: instrument usercopy in get_user() and __put_user_size()")
url: https://github.com/intel-lab-lkp/linux/commits/Alexander-Potapenko/Add-KernelMemorySanitizer-infrastructure/20220427-004851
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 203d8919a9eda5d1bc68ac3cd7637588334c9dc1
patch link: https://lore.kernel.org/linux-mm/[email protected]

in testcase: kernel-selftests
version: kernel-selftests-x86_64-f6559bea-1_20220425
with following parameters:

group: x86
ucode: 0xec

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz with 32G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


actually we also observed other tests failed on this commit but pass on parent:

c30e163fc48e6944 d216de19c8dd97fb6b0eac84fce
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:6 100% 6:6 kmsg.segfault_at_ip_sp_error
:6 100% 6:6 kernel-selftests.x86.fsgsbase_restore_32.fail
:6 100% 6:6 kernel-selftests.x86.fsgsbase_restore_64.fail
:6 100% 6:6 kernel-selftests.x86.ioperm_32.fail
:6 100% 6:6 kernel-selftests.x86.iopl_32.fail
:6 100% 6:6 kernel-selftests.x86.ptrace_syscall_32.fail
:6 100% 6:6 kernel-selftests.x86.ptrace_syscall_64.fail
:6 100% 6:6 kernel-selftests.x86.syscall_numbering_64.fail



# selftests: x86: iopl_32
# iopl_32: sched_setaffinity to CPU 0: Invalid argument
not ok 7 selftests: x86: iopl_32 # exit=1
# selftests: x86: ioperm_32
# ioperm_32: sched_setaffinity to CPU 0: Invalid argument
not ok 8 selftests: x86: ioperm_32 # exit=1

....

# selftests: x86: fsgsbase_restore_32
# fsgsbase_restore_32: PTRACE_GETREGS: Input/output error
# Setting up a segment
# segment base address = 0xf7fb4000
# using LDT slot 0
# [OK] The segment points to the right place.
# Child FS=0x7
# Tracer: redirecting tracee to tracee_zap_segment()
not ok 12 selftests: x86: fsgsbase_restore_32 # exit=1

....

# selftests: x86: ptrace_syscall_32
# ptrace_syscall_32: PTRACE_SETREGS: Input/output error
# [RUN] Check int80 return regs
# [OK] getpid() preserves regs
# [OK] kill(getpid(), SIGUSR1) preserves regs
# [RUN] Check AT_SYSINFO return regs
# [OK] getpid() preserves regs
# [OK] kill(getpid(), SIGUSR1) preserves regs
# [RUN] ptrace-induced syscall restart
# [RUN] SYSEMU
# [OK] Initial nr and args are correct
# [RUN] Restart the syscall (ip = 0xf7edb549)
not ok 22 selftests: x86: ptrace_syscall_32 # exit=1

....

# selftests: x86: fsgsbase_restore_64
# fsgsbase_restore_64: PTRACE_GETREGS: Input/output error
# Setting up a segment
# segment base address = 0x4075c000
# using LDT slot 0
# [OK] The segment points to the right place.
# Child GS=0x7, GSBASE=0x4075c000
# Tracer: redirecting tracee to tracee_zap_segment()
not ok 34 selftests: x86: fsgsbase_restore_64 # exit=1

....

# selftests: x86: syscall_numbering_64
# [RUN] Checking for x32 by calling x32 getpid()
# [INFO] x32 is not supported
# [RUN] Running tests without ptrace...
# [RUN] Checking system calls with msb = 0 (0x0)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 0:0 returned 0 as expected
# [OK] x64 syscall 0:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 0:19 returned 0 as expected
# [OK] x64 syscall 0:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 0:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 0:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 0:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 0:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 0:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 0:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1 (0x1)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1:0 returned 0 as expected
# [OK] x64 syscall 1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1:19 returned 0 as expected
# [OK] x64 syscall 1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1 (0xffffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1:0 returned 0 as expected
# [OK] x64 syscall -1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1:19 returned 0 as expected
# [OK] x64 syscall -1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741824 (0x40000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741824:0 returned 0 as expected
# [OK] x64 syscall 1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741824:19 returned 0 as expected
# [OK] x64 syscall 1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1073741824 (0xc0000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1073741824:0 returned 0 as expected
# [OK] x64 syscall -1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1073741824:19 returned 0 as expected
# [OK] x64 syscall -1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 2147483647 (0x7fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 2147483647:0 returned 0 as expected
# [OK] x64 syscall 2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 2147483647:19 returned 0 as expected
# [OK] x64 syscall 2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 2147483647:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483648 (0x80000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483648:0 returned 0 as expected
# [OK] x64 syscall -2147483648:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483648:19 returned 0 as expected
# [OK] x64 syscall -2147483648:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483648:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483648:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483648:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483648:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483647 (0x80000001)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483647:0 returned 0 as expected
# [OK] x64 syscall -2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483647:19 returned 0 as expected
# [OK] x64 syscall -2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483647:0..999 returned -ENOSYS as expected
# [RUN] Running tests under ptrace: just stop, no data read
# [RUN] Checking system calls with msb = 0 (0x0)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 0:0 returned 0 as expected
# [OK] x64 syscall 0:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 0:19 returned 0 as expected
# [OK] x64 syscall 0:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 0:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 0:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 0:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 0:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 0:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 0:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1 (0x1)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1:0 returned 0 as expected
# [OK] x64 syscall 1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1:19 returned 0 as expected
# [OK] x64 syscall 1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1 (0xffffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1:0 returned 0 as expected
# [OK] x64 syscall -1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1:19 returned 0 as expected
# [OK] x64 syscall -1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741824 (0x40000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741824:0 returned 0 as expected
# [OK] x64 syscall 1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741824:19 returned 0 as expected
# [OK] x64 syscall 1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1073741824 (0xc0000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1073741824:0 returned 0 as expected
# [OK] x64 syscall -1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1073741824:19 returned 0 as expected
# [OK] x64 syscall -1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 2147483647 (0x7fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 2147483647:0 returned 0 as expected
# [OK] x64 syscall 2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 2147483647:19 returned 0 as expected
# [OK] x64 syscall 2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 2147483647:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483648 (0x80000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483648:0 returned 0 as expected
# [OK] x64 syscall -2147483648:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483648:19 returned 0 as expected
# [OK] x64 syscall -2147483648:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483648:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483648:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483648:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483648:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483647 (0x80000001)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483647:0 returned 0 as expected
# [OK] x64 syscall -2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483647:19 returned 0 as expected
# [OK] x64 syscall -2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483647:0..999 returned -ENOSYS as expected
# [RUN] Running tests under ptrace: only getregs
# [RUN] Checking system calls with msb = 0 (0x0)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 0:0 returned 0 as expected
# [OK] x64 syscall 0:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 0:19 returned 0 as expected
# [OK] x64 syscall 0:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 0:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 0:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 0:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 0:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 0:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 0:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1 (0x1)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1:0 returned 0 as expected
# [OK] x64 syscall 1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1:19 returned 0 as expected
# [OK] x64 syscall 1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1 (0xffffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1:0 returned 0 as expected
# [OK] x64 syscall -1:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1:19 returned 0 as expected
# [OK] x64 syscall -1:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741824 (0x40000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741824:0 returned 0 as expected
# [OK] x64 syscall 1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741824:19 returned 0 as expected
# [OK] x64 syscall 1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 1073741823 (0x3fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 1073741823:0 returned 0 as expected
# [OK] x64 syscall 1073741823:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 1073741823:19 returned 0 as expected
# [OK] x64 syscall 1073741823:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 1073741823:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 1073741823:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 1073741823:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 1073741823:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 1073741823:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -1073741824 (0xc0000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -1073741824:0 returned 0 as expected
# [OK] x64 syscall -1073741824:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -1073741824:19 returned 0 as expected
# [OK] x64 syscall -1073741824:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -1073741824:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -1073741824:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -1073741824:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -1073741824:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -1073741824:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = 2147483647 (0x7fffffff)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall 2147483647:0 returned 0 as expected
# [OK] x64 syscall 2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall 2147483647:19 returned 0 as expected
# [OK] x64 syscall 2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls 2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall 2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls 2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls 2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls 2147483647:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483648 (0x80000000)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483648:0 returned 0 as expected
# [OK] x64 syscall -2147483648:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483648:19 returned 0 as expected
# [OK] x64 syscall -2147483648:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483648:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483648:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483648:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483648:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483648:0..999 returned -ENOSYS as expected
# [RUN] Checking system calls with msb = -2147483647 (0x80000001)
# [RUN] Checking some common syscalls as 64 bit
# [OK] x64 syscall -2147483647:0 returned 0 as expected
# [OK] x64 syscall -2147483647:1 returned 0 as expected
# [RUN] Checking some 64-bit only syscalls as 64 bit
# [OK] x64 syscall -2147483647:19 returned 0 as expected
# [OK] x64 syscall -2147483647:20 returned 0 as expected
# [RUN] Checking out of range system calls
# [OK] x32 syscalls -2147483647:-64..-2 returned -ENOSYS as expected
# [OK] x32 syscall -2147483647:-1 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:1073741760..1073741823 returned -ENOSYS as expected
# [OK] x64 syscalls -2147483647:-64..-1 returned -ENOSYS as expected
# [OK] x32 syscalls -2147483647:1073741759..1073741822 returned -ENOSYS as expected
# [RUN] Checking for absence of x32 system calls
# [OK] x32 syscalls -2147483647:0..999 returned -ENOSYS as expected
# [RUN] Running tests under ptrace: getregs, unmodified setregs
# [RUN] Checking system calls with msb = 0 (0x0)
# [RUN] Checking some common syscalls as 64 bit
#
not ok 38 selftests: x86: syscall_numbering_64 # TIMEOUT 300 seconds

....

# selftests: x86: ptrace_syscall_64
# ptrace_syscall_64: PTRACE_SETREGS: Input/output error
# [RUN] Check int80 return regs
# [OK] getpid() preserves regs
# [OK] kill(getpid(), SIGUSR1) preserves regs
# [RUN] ptrace-induced syscall restart
# [RUN] SYSEMU
# [OK] Initial nr and args are correct
# [RUN] Restart the syscall (ip = 0x7f424c815989)
not ok 42 selftests: x86: ptrace_syscall_64 # exit=1



To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (37.22 kB)
config-5.18.0-rc1-00009-gd216de19c8dd (169.66 kB)
job-script (6.05 kB)
dmesg.xz (41.67 kB)
kernel-selftests (94.95 kB)
job.yaml (5.00 kB)
reproduce (150.00 B)
Download all attachments

2022-05-03 00:30:22

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Wed, Apr 27, 2022 at 3:32 PM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote:
>
> Can you please use 'entry:' as prefix. Slapping kmsan in front of
> everything does not really make sense.
Sure, will do.

> > Replace instrumentation_begin() with instrumentation_begin_with_regs()
> > to let KMSAN handle the non-instrumented code and unpoison pt_regs
> > passed from the instrumented part.
>
> That should be:
>
> from the non-instrumented part
> or
> passed to the instrumented part
>
> right?

That should be "from the non-instrumented part", you are right.

> > --- a/kernel/entry/common.c
> > +++ b/kernel/entry/common.c
> > @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
> > CT_WARN_ON(ct_state() != CONTEXT_USER);
> > user_exit_irqoff();
> >
> > - instrumentation_begin();
> > + instrumentation_begin_with_regs(regs);
>
> I can see what you are trying to do, but this will end up doing the same
> thing over and over. Let's just look at a syscall.
>
> __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
> {
> ...
> nr = syscall_enter_from_user_mode(regs, nr)
>
> __enter_from_user_mode(regs)
> .....
> instrumentation_begin_with_regs(regs);
> ....
>
> instrumentation_begin_with_regs(regs);
> ....
>
> instrumentation_begin_with_regs(regs);
>
> if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
> /* Invalid system call, but still a system call. */
> regs->ax = __x64_sys_ni_syscall(regs);
> }
>
> instrumentation_end();
>
> syscall_exit_to_user_mode(regs);
> instrumentation_begin_with_regs(regs);
> __syscall_exit_to_user_mode_work(regs);
> instrumentation_end();
> __exit_to_user_mode();
>
> That means you memset state four times and unpoison regs four times. I'm
> not sure whether that's desired.

Regarding the regs, you are right. It should be enough to unpoison the
regs at idtentry prologue instead.
I tried that initially, but IIRC it required patching each of the
DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().
This decision can probably be revisited.

As for the state, what we are doing here is still not enough, although
it appears to work.

Every time an instrumented function calls another function, it sets up
the metadata for the function arguments in the per-task struct
kmsan_context_state.
Similarly, every instrumented function expects its caller to put the
metadata into that structure.
Now, if a non-instrumented function (e.g. every `noinstr` function)
calls an instrumented one (which happens inside the
instrumentation_begin()/instrumentation_end() region), nobody sets up
the state for that instrumented function, so it may report false
positives when accessing its arguments, if there are leftover poisoned
values in the state.

To overcome this problem, ideally we need to wipe kmsan_context_state
every time a call from the non-instrumented function occurs.
But this cannot be done automatically exactly because we cannot
instrument the named function :)

We therefore apply an approximation, wiping the state at the point of
the first transition between instrumented and non-instrumented code.
Because poison values are generally rare, and instrumented regions
tend to be short, it is unlikely that further calls from the same
non-instrumented function will result in false positives.
Yet it is not completely impossible, so wiping the state for the
second/third etc. time won't hurt.

>
> instrumentation_begin()/end() are not really suitable IMO. They were
> added to allow objtool to validate that nothing escapes into
> instrumentable code unless annotated accordingly.

An alternative to this would be adding some extra code unpoisoning the
state to every non-instrumented function that contains an instrumented
region.
That code would have to precede the first instrumentation_begin()
anyway, so I thought it would be reasonable to piggyback on the
existing annotation.

>
> Thanks,
>
> tglx



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-05-03 01:24:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

Alexander,

On Mon, May 02 2022 at 19:00, Alexander Potapenko wrote:
> On Wed, Apr 27, 2022 at 3:32 PM Thomas Gleixner <[email protected]> wrote:
>> > --- a/kernel/entry/common.c
>> > +++ b/kernel/entry/common.c
>> > @@ -23,7 +23,7 @@ static __always_inline void __enter_from_user_mode(struct pt_regs *regs)
>> > CT_WARN_ON(ct_state() != CONTEXT_USER);
>> > user_exit_irqoff();
>> >
>> > - instrumentation_begin();
>> > + instrumentation_begin_with_regs(regs);
>>
>> I can see what you are trying to do, but this will end up doing the same
>> thing over and over. Let's just look at a syscall.
>>
>> __visible noinstr void do_syscall_64(struct pt_regs *regs, int nr)
>> {
>> ...
>> nr = syscall_enter_from_user_mode(regs, nr)
>>
>> __enter_from_user_mode(regs)
>> .....
>> instrumentation_begin_with_regs(regs);
>> ....
>>
>> instrumentation_begin_with_regs(regs);
>> ....
>>
>> instrumentation_begin_with_regs(regs);
>>
>> if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
>> /* Invalid system call, but still a system call. */
>> regs->ax = __x64_sys_ni_syscall(regs);
>> }
>>
>> instrumentation_end();
>>
>> syscall_exit_to_user_mode(regs);
>> instrumentation_begin_with_regs(regs);
>> __syscall_exit_to_user_mode_work(regs);
>> instrumentation_end();
>> __exit_to_user_mode();
>>
>> That means you memset state four times and unpoison regs four times. I'm
>> not sure whether that's desired.
>
> Regarding the regs, you are right. It should be enough to unpoison the
> regs at idtentry prologue instead.
> I tried that initially, but IIRC it required patching each of the
> DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().

Exactly 4 instances :)

> This decision can probably be revisited.

It has to be revisited because the whole thing is incomplete if this is
not addressed.

> As for the state, what we are doing here is still not enough, although
> it appears to work.
>
> Every time an instrumented function calls another function, it sets up
> the metadata for the function arguments in the per-task struct
> kmsan_context_state.
> Similarly, every instrumented function expects its caller to put the
> metadata into that structure.
> Now, if a non-instrumented function (e.g. every `noinstr` function)
> calls an instrumented one (which happens inside the
> instrumentation_begin()/instrumentation_end() region), nobody sets up
> the state for that instrumented function, so it may report false
> positives when accessing its arguments, if there are leftover poisoned
> values in the state.
>
> To overcome this problem, ideally we need to wipe kmsan_context_state
> every time a call from the non-instrumented function occurs.
> But this cannot be done automatically exactly because we cannot
> instrument the named function :)
>
> We therefore apply an approximation, wiping the state at the point of
> the first transition between instrumented and non-instrumented code.
> Because poison values are generally rare, and instrumented regions
> tend to be short, it is unlikely that further calls from the same
> non-instrumented function will result in false positives.
> Yet it is not completely impossible, so wiping the state for the
> second/third etc. time won't hurt.

Understood. But if I understand you correctly:

> Similarly, every instrumented function expects its caller to put the
> metadata into that structure.

then

instrumentation_begin();
foo(fargs...);
bar(bargs...);
instrumentation_end();

is a source of potential false positives because the state is not
guaranteed to be correct, neither for foo() nor for bar(), even if you
wipe the state in instrumentation_begin(), right?

This approximation approach smells fishy and it's inevitably going to be
a constant source of 'add yet another kmsan annotation/fixup' patches,
which I'm not interested in at all.

As this needs compiler support anyway, then why not doing the obvious:

#define noinstr \
.... __kmsan_conditional

#define instrumentation_begin() \
..... __kmsan_cond_begin

#define instrumentation_end() \
__kmsan_cond_end .......

and let the compiler stick whatever is required into that code section
between instrumentation_begin() and instrumentation_end()?

That's not violating any of the noinstr constraints at all. In fact we
allow _any_ instrumentation to be placed between this two points. We
have tracepoints there today.

We could also allow breakpoints, kprobes or whatever, but handling this
at that granularity level for a production kernel is just overkill and
the code in those instrumentable sections is usually not that
interesting as it's mostly function calls.

But if the compiler converts

instrumentation_begin();
foo(fargs...);
bar(bargs...);
instrumentation_end();

to

instrumentation_begin();
kmsan_instr_begin_magic();
kmsan_magic(fargs...);
foo(fargs...);
kmsan_magic(bargs...);
bar(bargs...);
kmsan_instr_end_magic();
instrumentation_end();

for the kmsan case and leaves anything outside of these sections alone,
then you have:

- a minimal code change
- the best possible coverage
- the least false positive crap to chase and annotate

IOW, a solution which is solid and future proof.

I'm all for making use of advanced instrumentation, validation and
debugging features, but this mindset of 'make the code comply to what
the tool of today provides' is fundamentally wrong. Tools have to
provide value to the programmer and not the other way round.

Yes, it's more work on the tooling side, but the tooling side is mostly
a one time effort while chasing the false positives is a long term
nightmare.

Thanks,

tglx

2022-05-05 19:28:29

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <[email protected]> wrote:
>
> Alexander,

First of all, thanks a lot for the comments, those are greatly appreciated!
I tried to revert this patch and the previous one ("kmsan:
instrumentation.h: add instrumentation_begin_with_regs()") and
reimplement unpoisoning pt_regs without breaking into
instrumentation_begin(), see below.

> >
> > Regarding the regs, you are right. It should be enough to unpoison the
> > regs at idtentry prologue instead.
> > I tried that initially, but IIRC it required patching each of the
> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().
>
> Exactly 4 instances :)
>

Not really, I had to add a call to `kmsan_unpoison_memory(regs,
sizeof(*regs));` to the following places in
arch/x86/include/asm/idtentry.h:
- DEFINE_IDTENTRY()
- DEFINE_IDTENTRY_ERRORCODE()
- DEFINE_IDTENTRY_RAW()
- DEFINE_IDTENTRY_RAW_ERRORCODE()
- DEFINE_IDTENTRY_IRQ()
- DEFINE_IDTENTRY_SYSVEC()
- DEFINE_IDTENTRY_SYSVEC_SIMPLE()
- DEFINE_IDTENTRY_DF()

, but even that wasn't enough. For some reason I also had to unpoison
pt_regs directly in
DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and
DEFINE_IDTENTRY_IRQ(common_interrupt).
In the latter case, this could have been caused by
asm_common_interrupt being entered from irq_entries_start(), but I am
not sure what is so special about sysvec_apic_timer_interrupt().

Ideally, it would be great to find that single point where pt_regs are
set up before being passed to all IDT entries.
I used to do that by inserting calls to kmsan_unpoison_memory right
into arch/x86/entry/entry_64.S
(https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026),
but that required storing/restoring all GP registers. Maybe there's a
better way?


>
> then
>
> instrumentation_begin();
> foo(fargs...);
> bar(bargs...);
> instrumentation_end();
>
> is a source of potential false positives because the state is not
> guaranteed to be correct, neither for foo() nor for bar(), even if you
> wipe the state in instrumentation_begin(), right?

Yes, this is right.

> This approximation approach smells fishy and it's inevitably going to be
> a constant source of 'add yet another kmsan annotation/fixup' patches,
> which I'm not interested in at all.
>
> As this needs compiler support anyway, then why not doing the obvious:
>
> #define noinstr \
> .... __kmsan_conditional
>
> #define instrumentation_begin() \
> ..... __kmsan_cond_begin
>
> #define instrumentation_end() \
> __kmsan_cond_end .......
>
> and let the compiler stick whatever is required into that code section
> between instrumentation_begin() and instrumentation_end()?

We define noinstr as
__attribute__((disable_sanitizer_instrumentation))
(https://llvm.org/docs/LangRef.html#:~:text=disable_sanitizer_instrumentation),
which means no instrumentation will be applied to the annotated
function.
Changing that behavior by adding subregions that can be instrumented
sounds questionable.
C also doesn't have good syntactic means to define these subregions -
perhaps some __xxx_begin()/__xxx_end() intrinsics would work, but they
would require both compile-time and run-time validation.

Fortunately, I don't think we need to insert extra instrumentation
into instrumentation_begin()/instrumentation_end() regions.

What I have in mind is adding a bool flag to kmsan_context_state, that
the instrumentation sets to true before the function call.
When entering an instrumented function, KMSAN would check that flag
and set it to false, so that the context state can be only used once.
If a function is called from another instrumented function, the
context state is properly set up, and there is nothing to worry about.
If it is called from non-instrumented code (either noinstr or the
skipped files that have KMSAN_SANITIZE:=n), KMSAN would detect that
and wipe the context state before use.

By the way, I've noticed that at least for now (with pt_regs
unpoisoning performed in IDT entries) the problem with false positives
in noinstr code is entirely gone, so maybe we don't even have to
bother.

> Yes, it's more work on the tooling side, but the tooling side is mostly
> a one time effort while chasing the false positives is a long term
> nightmare.

Well said.

> Thanks,
>
> tglx



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-05-09 03:49:16

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <[email protected]> wrote:
>
> Alexander,
>
> On Thu, May 05 2022 at 20:04, Alexander Potapenko wrote:
> > On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <[email protected]> wrote:
> >> > Regarding the regs, you are right. It should be enough to unpoison the
> >> > regs at idtentry prologue instead.
> >> > I tried that initially, but IIRC it required patching each of the
> >> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().
> >>
> >> Exactly 4 instances :)
> >>
> >
> > Not really, I had to add a call to `kmsan_unpoison_memory(regs,
> > sizeof(*regs));` to the following places in
> > arch/x86/include/asm/idtentry.h:
> > - DEFINE_IDTENTRY()
> > - DEFINE_IDTENTRY_ERRORCODE()
> > - DEFINE_IDTENTRY_RAW()
> > - DEFINE_IDTENTRY_RAW_ERRORCODE()
> > - DEFINE_IDTENTRY_IRQ()
> > - DEFINE_IDTENTRY_SYSVEC()
> > - DEFINE_IDTENTRY_SYSVEC_SIMPLE()
> > - DEFINE_IDTENTRY_DF()
> >
> > , but even that wasn't enough. For some reason I also had to unpoison
> > pt_regs directly in
> > DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and
> > DEFINE_IDTENTRY_IRQ(common_interrupt).
> > In the latter case, this could have been caused by
> > asm_common_interrupt being entered from irq_entries_start(), but I am
> > not sure what is so special about sysvec_apic_timer_interrupt().
> >
> > Ideally, it would be great to find that single point where pt_regs are
> > set up before being passed to all IDT entries.
> > I used to do that by inserting calls to kmsan_unpoison_memory right
> > into arch/x86/entry/entry_64.S
> > (https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026),
> > but that required storing/restoring all GP registers. Maybe there's a
> > better way?
>
> Yes. Something like this should cover all exceptions and syscalls before
> anything instrumentable can touch @regs. Anything up to those points is
> either off-limit for instrumentation or does not deal with @regs.
>
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -24,6 +24,7 @@ static __always_inline void __enter_from
> user_exit_irqoff();
>
> instrumentation_begin();
> + unpoison(regs);
> trace_hardirqs_off_finish();
> instrumentation_end();
> }
> @@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter(
> lockdep_hardirqs_off(CALLER_ADDR0);
> rcu_irq_enter();
> instrumentation_begin();
> + unpoison(regs);
> trace_hardirqs_off_finish();
> instrumentation_end();
>
> @@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter(
> */
> lockdep_hardirqs_off(CALLER_ADDR0);
> instrumentation_begin();
> + unpoison(regs);
> rcu_irq_enter_check_tick();
> trace_hardirqs_off_finish();
> instrumentation_end();
> @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en
> rcu_nmi_enter();
>
> instrumentation_begin();
> + unpoison(regs);
> trace_hardirqs_off_finish();
> ftrace_nmi_enter();
> instrumentation_end();
>
> As I said: 4 places :)

These four instances still do not look sufficient.
Right now I am seeing e.g. reports with the following stack trace:

BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580
kernel/sched/cputime.c:382
irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382
account_process_tick+0x98/0x450 kernel/sched/cputime.c:476
update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786
tick_sched_handle kernel/time/tick-sched.c:243
tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473
__run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685
__hrtimer_run_queues kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086
__sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103
sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097
...
(uninit creation stack trace is irrelevant here, because it is some
random value from the stack)

sysvec_apic_timer_interrupt() receives struct pt_regs from
uninstrumented code, so regs can be partially uninitialized.
They are not passed down the call stack directly, but are instead
saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by
get_irq_regs() in tick_sched_timer().

The remaining false positives can be fixed by unpoisoning the
registers in set_irq_regs():

static inline struct pt_regs *set_irq_regs(struct pt_regs *new_regs)
{
struct pt_regs *old_regs;
+ kmsan_unpoison_memory(new_regs, sizeof(*new_regs));

old_regs = __this_cpu_read(__irq_regs);
__this_cpu_write(__irq_regs, new_regs);

Does that sound viable? Is it correct to assume that set_irq_regs() is
always called for registers received from non-instrumented code?

(It seems that just unpoisoning registers in set_irq_regs() is not
enough, i.e. we still need to do what you suggest in
kernel/entry/common.c)
--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-05-09 04:02:13

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Fri, May 6, 2022 at 6:14 PM Thomas Gleixner <[email protected]> wrote:
>
> On Fri, May 06 2022 at 16:52, Alexander Potapenko wrote:
> > On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <[email protected]> wrote:
> >> @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en
> >> rcu_nmi_enter();
> >>
> >> instrumentation_begin();
> >> + unpoison(regs);
> >> trace_hardirqs_off_finish();
> >> ftrace_nmi_enter();
> >> instrumentation_end();
> >>
> >> As I said: 4 places :)
> >
> > These four instances still do not look sufficient.
> > Right now I am seeing e.g. reports with the following stack trace:
> >
> > BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580
> > kernel/sched/cputime.c:382
> > irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382
> > account_process_tick+0x98/0x450 kernel/sched/cputime.c:476
> > update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786
> > tick_sched_handle kernel/time/tick-sched.c:243
> > tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473
> > __run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685
> > __hrtimer_run_queues kernel/time/hrtimer.c:1749
> > hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811
> > local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086
> > __sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103
> > sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097
> > ...
> > (uninit creation stack trace is irrelevant here, because it is some
> > random value from the stack)
> >
> > sysvec_apic_timer_interrupt() receives struct pt_regs from
> > uninstrumented code, so regs can be partially uninitialized.
> > They are not passed down the call stack directly, but are instead
> > saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by
> > get_irq_regs() in tick_sched_timer().
>
> sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_
> set_irq_regs() and irqentry_enter() unpoisons @reg.
>
> Confused...

As far as I can tell in this case sysvect_apic_timer_interrupt() is
called by the following code in arch/x86/kernel/idt.c:

INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt),

, which does not use IDTENTRY_SYSVEC framework and thus does not call
irqentry_enter().

I guess handling those will require wrapping every interrupt gate into
a function that performs register unpoisoning?

By the way, if it helps, I think we don't necessarily have to call
kmsan_unpoison_memory() from within the
instrumentation_begin()/instrumentation_end() region?
We could move the call to the beginning of irqentry_enter(), removing
unnecessary duplication.

2022-05-09 05:39:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Fri, May 06 2022 at 16:52, Alexander Potapenko wrote:
> On Thu, May 5, 2022 at 11:56 PM Thomas Gleixner <[email protected]> wrote:
>> @@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en
>> rcu_nmi_enter();
>>
>> instrumentation_begin();
>> + unpoison(regs);
>> trace_hardirqs_off_finish();
>> ftrace_nmi_enter();
>> instrumentation_end();
>>
>> As I said: 4 places :)
>
> These four instances still do not look sufficient.
> Right now I am seeing e.g. reports with the following stack trace:
>
> BUG: KMSAN: uninit-value in irqtime_account_process_tick+0x255/0x580
> kernel/sched/cputime.c:382
> irqtime_account_process_tick+0x255/0x580 kernel/sched/cputime.c:382
> account_process_tick+0x98/0x450 kernel/sched/cputime.c:476
> update_process_times+0xe4/0x3e0 kernel/time/timer.c:1786
> tick_sched_handle kernel/time/tick-sched.c:243
> tick_sched_timer+0x83e/0x9e0 kernel/time/tick-sched.c:1473
> __run_hrtimer+0x518/0xe50 kernel/time/hrtimer.c:1685
> __hrtimer_run_queues kernel/time/hrtimer.c:1749
> hrtimer_interrupt+0x838/0x15a0 kernel/time/hrtimer.c:1811
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086
> __sysvec_apic_timer_interrupt+0x1ae/0x680 arch/x86/kernel/apic/apic.c:1103
> sysvec_apic_timer_interrupt+0x95/0xc0 arch/x86/kernel/apic/apic.c:1097
> ...
> (uninit creation stack trace is irrelevant here, because it is some
> random value from the stack)
>
> sysvec_apic_timer_interrupt() receives struct pt_regs from
> uninstrumented code, so regs can be partially uninitialized.
> They are not passed down the call stack directly, but are instead
> saved by set_irq_regs() in sysvec_apic_timer_interrupt() and loaded by
> get_irq_regs() in tick_sched_timer().

sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_
set_irq_regs() and irqentry_enter() unpoisons @reg.

Confused...





2022-05-09 06:46:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

Alexander,

On Thu, May 05 2022 at 20:04, Alexander Potapenko wrote:
> On Tue, May 3, 2022 at 12:00 AM Thomas Gleixner <[email protected]> wrote:
>> > Regarding the regs, you are right. It should be enough to unpoison the
>> > regs at idtentry prologue instead.
>> > I tried that initially, but IIRC it required patching each of the
>> > DEFINE_IDTENTRY_XXX macros, which already use instrumentation_begin().
>>
>> Exactly 4 instances :)
>>
>
> Not really, I had to add a call to `kmsan_unpoison_memory(regs,
> sizeof(*regs));` to the following places in
> arch/x86/include/asm/idtentry.h:
> - DEFINE_IDTENTRY()
> - DEFINE_IDTENTRY_ERRORCODE()
> - DEFINE_IDTENTRY_RAW()
> - DEFINE_IDTENTRY_RAW_ERRORCODE()
> - DEFINE_IDTENTRY_IRQ()
> - DEFINE_IDTENTRY_SYSVEC()
> - DEFINE_IDTENTRY_SYSVEC_SIMPLE()
> - DEFINE_IDTENTRY_DF()
>
> , but even that wasn't enough. For some reason I also had to unpoison
> pt_regs directly in
> DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt) and
> DEFINE_IDTENTRY_IRQ(common_interrupt).
> In the latter case, this could have been caused by
> asm_common_interrupt being entered from irq_entries_start(), but I am
> not sure what is so special about sysvec_apic_timer_interrupt().
>
> Ideally, it would be great to find that single point where pt_regs are
> set up before being passed to all IDT entries.
> I used to do that by inserting calls to kmsan_unpoison_memory right
> into arch/x86/entry/entry_64.S
> (https://github.com/google/kmsan/commit/3b0583f45f74f3a09f4c7e0e0588169cef918026),
> but that required storing/restoring all GP registers. Maybe there's a
> better way?

Yes. Something like this should cover all exceptions and syscalls before
anything instrumentable can touch @regs. Anything up to those points is
either off-limit for instrumentation or does not deal with @regs.

--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -24,6 +24,7 @@ static __always_inline void __enter_from
user_exit_irqoff();

instrumentation_begin();
+ unpoison(regs);
trace_hardirqs_off_finish();
instrumentation_end();
}
@@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter(
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
instrumentation_begin();
+ unpoison(regs);
trace_hardirqs_off_finish();
instrumentation_end();

@@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter(
*/
lockdep_hardirqs_off(CALLER_ADDR0);
instrumentation_begin();
+ unpoison(regs);
rcu_irq_enter_check_tick();
trace_hardirqs_off_finish();
instrumentation_end();
@@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en
rcu_nmi_enter();

instrumentation_begin();
+ unpoison(regs);
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();

As I said: 4 places :)

> Fortunately, I don't think we need to insert extra instrumentation
> into instrumentation_begin()/instrumentation_end() regions.
>
> What I have in mind is adding a bool flag to kmsan_context_state, that
> the instrumentation sets to true before the function call.
> When entering an instrumented function, KMSAN would check that flag
> and set it to false, so that the context state can be only used once.
> If a function is called from another instrumented function, the
> context state is properly set up, and there is nothing to worry about.
> If it is called from non-instrumented code (either noinstr or the
> skipped files that have KMSAN_SANITIZE:=n), KMSAN would detect that
> and wipe the context state before use.

Sounds good.

Thanks,

tglx

2022-05-09 07:15:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Fri, May 06 2022 at 19:41, Alexander Potapenko wrote:
> On Fri, May 6, 2022 at 6:14 PM Thomas Gleixner <[email protected]> wrote:
>> sysvec_apic_timer_interrupt() invokes irqentry_enter() _before_
>> set_irq_regs() and irqentry_enter() unpoisons @reg.
>>
>> Confused...
>
> As far as I can tell in this case sysvect_apic_timer_interrupt() is
> called by the following code in arch/x86/kernel/idt.c:
>
> INTG(LOCAL_TIMER_VECTOR, asm_sysvec_apic_timer_interrupt),
>
> , which does not use IDTENTRY_SYSVEC framework and thus does not call
> irqentry_enter().

asm_sysvec_apic_timer_interrupt != sysvec_apic_timer_interrupt

arch/x86/kernel/apic/apic.c:
DEFINE_IDTENTRY_SYSVEC(sysvec_apic_timer_interrupt)
{
....

#define DEFINE_IDTENTRY_SYSVEC(func) \
static void __##func(struct pt_regs *regs); \
\
__visible noinstr void func(struct pt_regs *regs) \
{ \
irqentry_state_t state = irqentry_enter(regs); \
....
__##func (regs); \
....
} \
\
static noinline void __##func(struct pt_regs *regs)

So it goes through that code path _before_ the actual implementation
which does set_irq_regs() is reached.

The callchain is:

asm_sysvec_apic_timer_interrupt <- ASM entry in gate
sysvec_apic_timer_interrupt(regs) <- noinstr C entry point
irqentry_enter(regs) <- unpoisons @reg
__sysvec_apic_timer_interrupt(regs) <- the actual handler
set_irq_regs(regs) <- stores regs
local_apic_timer_interrupt()
...
tick_handler() <- One of the 4 variants
regs = get_irq_regs(); <- retrieves regs
update_process_times(user_tick = user_mode(regs))
account_process_tick(user_tick)
irqtime_account_process_tick(user_tick)
line 382: } else if { user_tick } <- KMSAN complains

I'm even more confused now.

> I guess handling those will require wrapping every interrupt gate into
> a function that performs register unpoisoning?

No, guessing does not help here.

The gates point to the ASM entry point, which then invokes the C entry
point. All C entry points use a DEFINE_IDTENTRY variant.

Some of the DEFINE_IDTENTRY_* C entry points are not doing anything in
the macro, but the C function either invokes irqentry_enter() or
irqentry_nmi_enter() open coded _before_ invoking any instrumentable
function. So the unpoisoning of @regs in these functions should tell
KMSAN that @regs or something derived from @regs are not some random
uninitialized values.

There should be no difference between unpoisoning @regs in
irqentry_enter() or in set_irq_regs(), right?

If so, then the problem is definitely _not_ the idt entry code.

> By the way, if it helps, I think we don't necessarily have to call
> kmsan_unpoison_memory() from within the
> instrumentation_begin()/instrumentation_end() region?
> We could move the call to the beginning of irqentry_enter(), removing
> unnecessary duplication.

We could, but then you need to mark unpoison_memory() noinstr too and you
have to add the unpoison into the syscall code. No win and irrelevant to
the problem at hand.

Thanks,

tglx



2022-05-09 17:01:03

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

> The callchain is:
>
> asm_sysvec_apic_timer_interrupt <- ASM entry in gate
> sysvec_apic_timer_interrupt(regs) <- noinstr C entry point
> irqentry_enter(regs) <- unpoisons @reg
> __sysvec_apic_timer_interrupt(regs) <- the actual handler
> set_irq_regs(regs) <- stores regs
> local_apic_timer_interrupt()
> ...
> tick_handler() <- One of the 4 variants
> regs = get_irq_regs(); <- retrieves regs
> update_process_times(user_tick = user_mode(regs))
> account_process_tick(user_tick)
> irqtime_account_process_tick(user_tick)
> line 382: } else if { user_tick } <- KMSAN complains
>
> I'm even more confused now.

Ok, I think I know what's going on.

Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was
supposed to be enough, but we have code in kmsan_unpoison_memory() (as
well as other runtime functions) that checks for kmsan_in_runtime()
and bails out to prevent potential recursion if KMSAN code starts
calling itself.

kmsan_in_runtime() is implemented as follows:

==============================================
static __always_inline bool kmsan_in_runtime(void)
{
if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
return true;
return kmsan_get_context()->kmsan_in_runtime;
}
==============================================
(see the code here:
https://lore.kernel.org/lkml/[email protected]/#Z31mm:kmsan:kmsan.h)

If we are running in the task context (in_task()==true),
kmsan_get_context() returns a per-task `struct *kmsan_ctx`.
If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it
returns a per-CPU one.
Otherwise kmsan_in_runtime() is considered true to avoid dealing with
nested interrupts.

So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
1, kmsan_in_runtime() becomes a no-op, which leads to false positives.

The solution I currently have in mind is to provide a specialized
version of kmsan_unpoison_memory() for entry.c, which will not perform
the reentrancy checks.

> > I guess handling those will require wrapping every interrupt gate into
> > a function that performs register unpoisoning?
>
> No, guessing does not help here.
>
> The gates point to the ASM entry point, which then invokes the C entry
> point. All C entry points use a DEFINE_IDTENTRY variant.

Thanks for the explanation. I previously thought there were two
different entry points, one with asm_ and one without, that ended up
calling the same code.

> Some of the DEFINE_IDTENTRY_* C entry points are not doing anything in
> the macro, but the C function either invokes irqentry_enter() or
> irqentry_nmi_enter() open coded _before_ invoking any instrumentable
> function. So the unpoisoning of @regs in these functions should tell
> KMSAN that @regs or something derived from @regs are not some random
> uninitialized values.
>
> There should be no difference between unpoisoning @regs in
> irqentry_enter() or in set_irq_regs(), right?
>
> If so, then the problem is definitely _not_ the idt entry code.
>
> > By the way, if it helps, I think we don't necessarily have to call
> > kmsan_unpoison_memory() from within the
> > instrumentation_begin()/instrumentation_end() region?
> > We could move the call to the beginning of irqentry_enter(), removing
> > unnecessary duplication.
>
> We could, but then you need to mark unpoison_memory() noinstr too and you
> have to add the unpoison into the syscall code. No win and irrelevant to
> the problem at hand.

Makes sense, thank you!

> Thanks,
>
> tglx
>
>


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-05-09 17:01:12

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Mon, May 9, 2022 at 6:50 PM Alexander Potapenko <[email protected]> wrote:
>
> > The callchain is:
> >
> > asm_sysvec_apic_timer_interrupt <- ASM entry in gate
> > sysvec_apic_timer_interrupt(regs) <- noinstr C entry point
> > irqentry_enter(regs) <- unpoisons @reg
> > __sysvec_apic_timer_interrupt(regs) <- the actual handler
> > set_irq_regs(regs) <- stores regs
> > local_apic_timer_interrupt()
> > ...
> > tick_handler() <- One of the 4 variants
> > regs = get_irq_regs(); <- retrieves regs
> > update_process_times(user_tick = user_mode(regs))
> > account_process_tick(user_tick)
> > irqtime_account_process_tick(user_tick)
> > line 382: } else if { user_tick } <- KMSAN complains
> >
> > I'm even more confused now.
>
> Ok, I think I know what's going on.
>
> Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was
> supposed to be enough, but we have code in kmsan_unpoison_memory() (as
> well as other runtime functions) that checks for kmsan_in_runtime()
> and bails out to prevent potential recursion if KMSAN code starts
> calling itself.
>
> kmsan_in_runtime() is implemented as follows:
>
> ==============================================
> static __always_inline bool kmsan_in_runtime(void)
> {
> if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> return true;
> return kmsan_get_context()->kmsan_in_runtime;
> }
> ==============================================
> (see the code here:
> https://lore.kernel.org/lkml/[email protected]/#Z31mm:kmsan:kmsan.h)
>
> If we are running in the task context (in_task()==true),
> kmsan_get_context() returns a per-task `struct *kmsan_ctx`.
> If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it
> returns a per-CPU one.
> Otherwise kmsan_in_runtime() is considered true to avoid dealing with
> nested interrupts.
>
> So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
> 1, kmsan_in_runtime() becomes a no-op, which leads to false positives.
Should be "kmsan_unpoison_memory() becomes a no-op..."

2022-05-09 19:15:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Mon, May 09 2022 at 18:50, Alexander Potapenko wrote:
> Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was
> supposed to be enough, but we have code in kmsan_unpoison_memory() (as
> well as other runtime functions) that checks for kmsan_in_runtime()
> and bails out to prevent potential recursion if KMSAN code starts
> calling itself.
>
> kmsan_in_runtime() is implemented as follows:
>
> ==============================================
> static __always_inline bool kmsan_in_runtime(void)
> {
> if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> return true;
> return kmsan_get_context()->kmsan_in_runtime;
> }
> ==============================================
> (see the code here:
> https://lore.kernel.org/lkml/[email protected]/#Z31mm:kmsan:kmsan.h)
>
> If we are running in the task context (in_task()==true),
> kmsan_get_context() returns a per-task `struct *kmsan_ctx`.
> If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it
> returns a per-CPU one.
> Otherwise kmsan_in_runtime() is considered true to avoid dealing with
> nested interrupts.
>
> So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
> 1, kmsan_in_runtime() becomes a no-op, which leads to false positives.

But, that'd only > 1 when there is a nested interrupt, which is not the
case. Interrupt handlers keep interrupts disabled. The last exception from
that rule was some legacy IDE driver which is gone by now.

So no, not a good explanation either.

Thanks,

tglx

2022-05-13 06:16:56

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <[email protected]> wrote:
>
> On Mon, May 09 2022 at 18:50, Alexander Potapenko wrote:
> > Indeed, calling kmsan_unpoison_memory() in irqentry_enter() was
> > supposed to be enough, but we have code in kmsan_unpoison_memory() (as
> > well as other runtime functions) that checks for kmsan_in_runtime()
> > and bails out to prevent potential recursion if KMSAN code starts
> > calling itself.
> >
> > kmsan_in_runtime() is implemented as follows:
> >
> > ==============================================
> > static __always_inline bool kmsan_in_runtime(void)
> > {
> > if ((hardirq_count() >> HARDIRQ_SHIFT) > 1)
> > return true;
> > return kmsan_get_context()->kmsan_in_runtime;
> > }
> > ==============================================
> > (see the code here:
> > https://lore.kernel.org/lkml/[email protected]/#Z31mm:kmsan:kmsan.h)
> >
> > If we are running in the task context (in_task()==true),
> > kmsan_get_context() returns a per-task `struct *kmsan_ctx`.
> > If `in_task()==false` and `hardirq_count()>>HARDIRQ_SHIFT==1`, it
> > returns a per-CPU one.
> > Otherwise kmsan_in_runtime() is considered true to avoid dealing with
> > nested interrupts.
> >
> > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
> > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives.
>
> But, that'd only > 1 when there is a nested interrupt, which is not the
> case. Interrupt handlers keep interrupts disabled. The last exception from
> that rule was some legacy IDE driver which is gone by now.

That's good to know, then we probably don't need this hardirq_count()
check anymore.

> So no, not a good explanation either.

After looking deeper I see that unpoisoning was indeed skipped because
kmsan_in_runtime() returned true, but I was wrong about the root
cause.
The problem was not caused by a nested hardirq, but rather by the fact
that the KMSAN hook in irqentry_enter() was called with in_task()==1.

Roughly said, T0 was running some code in the task context, then it
started executing KMSAN instrumentation and entered the runtime by
setting current->kmsan_ctx.kmsan_in_runtime.
Then an IRQ kicked in and started calling
asm_sysvec_apic_timer_interrupt() => sysvec_apic_timer_interrupt(regs)
=> irqentry_enter(regs) - but at that point in_task() was still true,
therefore kmsan_unpoison_memory() became a no-op.

As far as I can see, it is irq_enter_rcu() that makes in_task() return
0 by incrementing the preempt count in __irq_enter_raw(), so our
unpoisoning can only work if we perform it after we enter the IRQ
context.

I think the best that can be done here is (as suggested above) to
provide some kmsan_unpoison_pt_regs() function that will only be
called from the entry points and won't be doing reentrancy checks.
It should be safe, because unpoisoning boils down to calculating
shadow/origin addresses and calling memset() on them, no instrumented
code will be involved.

We could try to figure out the places in idtentry code where normal
kmsan_unpoison_memory() can be called in IRQ context, but as far as I
can see it will depend on the type of the entry point.

Another way to deal with the problem is to not rely on in_task(), but
rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to
figure out whether we are in IRQ code already.
However this is only possible irqentry_enter() itself guarantees that
the execution cannot be rescheduled to another CPU - is that the case?

> Thanks,
>
> tglx
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/871qx2r09k.ffs%40tglx.



--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-05-14 02:26:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote:
> On Mon, May 9, 2022 at 9:09 PM Thomas Gleixner <[email protected]> wrote:
>> > So in the case when `hardirq_count()>>HARDIRQ_SHIFT` is greater than
>> > 1, kmsan_in_runtime() becomes a no-op, which leads to false positives.
>>
>> But, that'd only > 1 when there is a nested interrupt, which is not the
>> case. Interrupt handlers keep interrupts disabled. The last exception from
>> that rule was some legacy IDE driver which is gone by now.
>
> That's good to know, then we probably don't need this hardirq_count()
> check anymore.
>
>> So no, not a good explanation either.
>
> After looking deeper I see that unpoisoning was indeed skipped because
> kmsan_in_runtime() returned true, but I was wrong about the root
> cause.
> The problem was not caused by a nested hardirq, but rather by the fact
> that the KMSAN hook in irqentry_enter() was called with in_task()==1.

Argh, the preempt counter increment happens _after_ irqentry_enter().

> I think the best that can be done here is (as suggested above) to
> provide some kmsan_unpoison_pt_regs() function that will only be
> called from the entry points and won't be doing reentrancy checks.
> It should be safe, because unpoisoning boils down to calculating
> shadow/origin addresses and calling memset() on them, no instrumented
> code will be involved.

If you keep them where I placed them, then there is no need for a
noinstr function. It's already instrumentable.

> We could try to figure out the places in idtentry code where normal
> kmsan_unpoison_memory() can be called in IRQ context, but as far as I
> can see it will depend on the type of the entry point.

NMI is covered as it increments before it invokes the unpoison().

Let me figure out why we increment the preempt count late for
interrupts. IIRC it's for symmetry reasons related to softirq processing
on return, but let me double check.

> Another way to deal with the problem is to not rely on in_task(), but
> rather use some per-cpu counter in irqentry_enter()/irqentry_exit() to
> figure out whether we are in IRQ code already.

Well, if you have a irqentry() specific unpoison, then you know the
context, right?

> However this is only possible irqentry_enter() itself guarantees that
> the execution cannot be rescheduled to another CPU - is that the case?

Obviously. It runs with interrupts disabled and eventually on a
separate interrupt stack.

Thanks,

tglx

2022-05-14 03:16:53

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Thu, May 12 2022 at 18:17, Thomas Gleixner wrote:
> On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote:
>> We could try to figure out the places in idtentry code where normal
>> kmsan_unpoison_memory() can be called in IRQ context, but as far as I
>> can see it will depend on the type of the entry point.
>
> NMI is covered as it increments before it invokes the unpoison().
>
> Let me figure out why we increment the preempt count late for
> interrupts. IIRC it's for symmetry reasons related to softirq processing
> on return, but let me double check.

It's even documented:

https://www.kernel.org/doc/html/latest/core-api/entry.html#interrupts-and-regular-exceptions

But who reads documentation? :)

So, I think the simplest and least intrusive solution is to have special
purpose unpoison functions. See the patch below for illustration.

The reasons why I used specific ones:

1) User entry

Whether that's a syscall or interrupt/exception does not
matter. It's always on the task stack and your machinery cannot be
running at that point because it came from user space.

2) Interrupt/exception/NMI entry kernel

Those can nest into an already active context, so you really want
to unpoison @regs.

Also while regular interrupts cannot nest because of interrupts
staying disabled, exceptions triggered in the interrupt handler and
NMIs can nest.

-> device interrupt()
irqentry_enter(regs)

-> NMI()
irqentry_nmi_enter(regs)

-> fault()
irqentry_enter(regs)

--> debug_exception()
irqentry_nmi_enter(regs)

Soft interrupt processing on return from interrupt makes it more
interesting:

interrupt()
handler()
do_softirq()
local_irq_enable()
interrupt()
NMI
....

And everytime you get a new @regs pointer to deal with.

Wonderful, isn't it?

Thanks,

tglx

---
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -24,6 +24,7 @@ static __always_inline void __enter_from
user_exit_irqoff();

instrumentation_begin();
+ unpoison_user(regs);
trace_hardirqs_off_finish();
instrumentation_end();
}
@@ -352,6 +353,7 @@ noinstr irqentry_state_t irqentry_enter(
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
instrumentation_begin();
+ unpoison_irq(regs);
trace_hardirqs_off_finish();
instrumentation_end();

@@ -367,6 +369,7 @@ noinstr irqentry_state_t irqentry_enter(
*/
lockdep_hardirqs_off(CALLER_ADDR0);
instrumentation_begin();
+ unpoison_irq(regs);
rcu_irq_enter_check_tick();
trace_hardirqs_off_finish();
instrumentation_end();
@@ -452,6 +455,7 @@ irqentry_state_t noinstr irqentry_nmi_en
rcu_nmi_enter();

instrumentation_begin();
+ unpoison_irq(regs);
trace_hardirqs_off_finish();
ftrace_nmi_enter();
instrumentation_end();

2022-05-16 18:00:52

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 27/46] kmsan: instrumentation.h: add instrumentation_begin_with_regs()

On Wed, Apr 27, 2022 at 3:28 PM Thomas Gleixner <[email protected]> wrote:
>
> On Tue, Apr 26 2022 at 18:42, Alexander Potapenko wrote:
> > +void kmsan_instrumentation_begin(struct pt_regs *regs)
> > +{
> > + struct kmsan_context_state *state = &kmsan_get_context()->cstate;
> > +
> > + if (state)
> > + __memset(state, 0, sizeof(struct kmsan_context_state));
>
> sizeof(*state) please
>
> > + if (!kmsan_enabled || !regs)
> > + return;
>
> Why has state to be cleared when kmsan is not enabled and how do you end up
> with regs == NULL here?
>
> Thanks,
>
> tglx
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/87bkwmy7t4.ffs%40tglx.

As discussed in another thread, I'll be dropping this patch in favor
of the new kmsan_unpoison_entry_regs().

I'll also ensure I consistently use sizeof(*pointer) where applicable.

Regarding regs==NULL, this is actually not a thing.

--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-06-01 20:03:09

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 12/46] kmsan: add KMSAN runtime core

On Wed, Apr 27, 2022 at 4:10 PM Marco Elver <[email protected]> wrote:
>
> On Tue, Apr 26, 2022 at 06:42PM +0200, Alexander Potapenko wrote:
> > For each memory location KernelMemorySanitizer maintains two types of
> > metadata:
> > 1. The so-called shadow of that location - а byte:byte mapping describing
> > whether or not individual bits of memory are initialized (shadow is 0)
> > or not (shadow is 1).
> > 2. The origins of that location - а 4-byte:4-byte mapping containing
> > 4-byte IDs of the stack traces where uninitialized values were
> > created.
> >
> > Each struct page now contains pointers to two struct pages holding
> > KMSAN metadata (shadow and origins) for the original struct page.
> > Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the
> > metadata creation, addressing, copying and checking.
> > mm/kmsan/report.c performs error reporting in the cases an uninitialized
> > value is used in a way that leads to undefined behavior.
> >
> > KMSAN compiler instrumentation is responsible for tracking the metadata
> > along with the kernel memory. mm/kmsan/instrumentation.c provides the
> > implementation for instrumentation hooks that are called from files
> > compiled with -fsanitize=kernel-memory.
> >
> > To aid parameter passing (also done at instrumentation level), each
> > task_struct now contains a struct kmsan_task_state used to track the
> > metadata of function parameters and return values for that task.
> >
> > Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and
> > declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN.
> > The KMSAN_SANITIZE:=n Makefile directive can be used to completely
> > disable KMSAN instrumentation for certain files.
> >
> > Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly
> > created stack memory initialized.
> >
> > Users can also use functions from include/linux/kmsan-checks.h to mark
> > certain memory regions as uninitialized or initialized (this is called
> > "poisoning" and "unpoisoning") or check that a particular region is
> > initialized.
> >
> > Signed-off-by: Alexander Potapenko <[email protected]>
> > ---
> > v2:
> > -- as requested by Greg K-H, moved hooks for different subsystems to respective patches,
> > rewrote the patch description;
> > -- addressed comments by Dmitry Vyukov;
> > -- added a note about KMSAN being not intended for production use.
> > -- fix case of unaligned dst in kmsan_internal_memmove_metadata()
> >
> > v3:
> > -- print build IDs in reports where applicable
> > -- drop redundant filter_irq_stacks(), unpoison the local passed to __stack_depot_save()
> > -- remove a stray BUG()
> >
> > Link: https://linux-review.googlesource.com/id/I9b71bfe3425466c97159f9de0062e5e8e4fec866
> > ---
> > Makefile | 1 +
> > include/linux/kmsan-checks.h | 64 +++++
> > include/linux/kmsan.h | 47 ++++
> > include/linux/mm_types.h | 12 +
> > include/linux/sched.h | 5 +
> > lib/Kconfig.debug | 1 +
> > lib/Kconfig.kmsan | 23 ++
> > mm/Makefile | 1 +
> > mm/kmsan/Makefile | 18 ++
> > mm/kmsan/core.c | 458 +++++++++++++++++++++++++++++++++++
> > mm/kmsan/hooks.c | 66 +++++
> > mm/kmsan/instrumentation.c | 267 ++++++++++++++++++++
> > mm/kmsan/kmsan.h | 183 ++++++++++++++
> > mm/kmsan/report.c | 211 ++++++++++++++++
> > mm/kmsan/shadow.c | 186 ++++++++++++++
> > scripts/Makefile.kmsan | 1 +
> > scripts/Makefile.lib | 9 +
> > 17 files changed, 1553 insertions(+)
> > create mode 100644 include/linux/kmsan-checks.h
> > create mode 100644 include/linux/kmsan.h
> > create mode 100644 lib/Kconfig.kmsan
> > create mode 100644 mm/kmsan/Makefile
> > create mode 100644 mm/kmsan/core.c
> > create mode 100644 mm/kmsan/hooks.c
> > create mode 100644 mm/kmsan/instrumentation.c
> > create mode 100644 mm/kmsan/kmsan.h
> > create mode 100644 mm/kmsan/report.c
> > create mode 100644 mm/kmsan/shadow.c
> > create mode 100644 scripts/Makefile.kmsan
> >
> > diff --git a/Makefile b/Makefile
> > index c3ec1ea423797..d3c7dcd9f0fea 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1009,6 +1009,7 @@ include-y := scripts/Makefile.extrawarn
> > include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug
> > include-$(CONFIG_KASAN) += scripts/Makefile.kasan
> > include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan
> > +include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan
> > include-$(CONFIG_UBSAN) += scripts/Makefile.ubsan
> > include-$(CONFIG_KCOV) += scripts/Makefile.kcov
> > include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins
> > diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h
> > new file mode 100644
> > index 0000000000000..a6522a0c28df9
> > --- /dev/null
> > +++ b/include/linux/kmsan-checks.h
> > @@ -0,0 +1,64 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * KMSAN checks to be used for one-off annotations in subsystems.
> > + *
> > + * Copyright (C) 2017-2022 Google LLC
> > + * Author: Alexander Potapenko <[email protected]>
> > + *
> > + */
> > +
> > +#ifndef _LINUX_KMSAN_CHECKS_H
> > +#define _LINUX_KMSAN_CHECKS_H
> > +
> > +#include <linux/types.h>
> > +
> > +#ifdef CONFIG_KMSAN
> > +
> > +/**
> > + * kmsan_poison_memory() - Mark the memory range as uninitialized.
> > + * @address: address to start with.
> > + * @size: size of buffer to poison.
> > + * @flags: GFP flags for allocations done by this function.
> > + *
> > + * Until other data is written to this range, KMSAN will treat it as
> > + * uninitialized. Error reports for this memory will reference the call site of
> > + * kmsan_poison_memory() as origin.
> > + */
> > +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags);
> > +
> > +/**
> > + * kmsan_unpoison_memory() - Mark the memory range as initialized.
> > + * @address: address to start with.
> > + * @size: size of buffer to unpoison.
> > + *
> > + * Until other data is written to this range, KMSAN will treat it as
> > + * initialized.
> > + */
> > +void kmsan_unpoison_memory(const void *address, size_t size);
> > +
> > +/**
> > + * kmsan_check_memory() - Check the memory range for being initialized.
> > + * @address: address to start with.
> > + * @size: size of buffer to check.
> > + *
> > + * If any piece of the given range is marked as uninitialized, KMSAN will report
> > + * an error.
> > + */
> > +void kmsan_check_memory(const void *address, size_t size);
> > +
> > +#else
> > +
> > +static inline void kmsan_poison_memory(const void *address, size_t size,
> > + gfp_t flags)
> > +{
> > +}
> > +static inline void kmsan_unpoison_memory(const void *address, size_t size)
> > +{
> > +}
> > +static inline void kmsan_check_memory(const void *address, size_t size)
> > +{
> > +}
> > +
> > +#endif
> > +
> > +#endif /* _LINUX_KMSAN_CHECKS_H */
> > diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
> > new file mode 100644
> > index 0000000000000..4e35f43eceaa9
> > --- /dev/null
> > +++ b/include/linux/kmsan.h
> > @@ -0,0 +1,47 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * KMSAN API for subsystems.
> > + *
> > + * Copyright (C) 2017-2022 Google LLC
> > + * Author: Alexander Potapenko <[email protected]>
> > + *
> > + */
> > +#ifndef _LINUX_KMSAN_H
> > +#define _LINUX_KMSAN_H
> > +
> > +#include <linux/gfp.h>
> > +#include <linux/kmsan-checks.h>
> > +#include <linux/stackdepot.h>
> > +#include <linux/types.h>
> > +#include <linux/vmalloc.h>
> > +
> > +struct page;
> > +
> > +#ifdef CONFIG_KMSAN
> > +
> > +/* These constants are defined in the MSan LLVM instrumentation pass. */
> > +#define KMSAN_RETVAL_SIZE 800
> > +#define KMSAN_PARAM_SIZE 800
> > +
> > +struct kmsan_context_state {
> > + char param_tls[KMSAN_PARAM_SIZE];
> > + char retval_tls[KMSAN_RETVAL_SIZE];
> > + char va_arg_tls[KMSAN_PARAM_SIZE];
> > + char va_arg_origin_tls[KMSAN_PARAM_SIZE];
> > + u64 va_arg_overflow_size_tls;
> > + char param_origin_tls[KMSAN_PARAM_SIZE];
> > + depot_stack_handle_t retval_origin_tls;
> > +};
> > +
> > +#undef KMSAN_PARAM_SIZE
> > +#undef KMSAN_RETVAL_SIZE
> > +
> > +struct kmsan_ctx {
> > + struct kmsan_context_state cstate;
> > + int kmsan_in_runtime;
> > + bool allow_reporting;
> > +};
> > +
> > +#endif
> > +
> > +#endif /* _LINUX_KMSAN_H */
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 8834e38c06a4f..85c97a2145f7e 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -218,6 +218,18 @@ struct page {
> > not kmapped, ie. highmem) */
> > #endif /* WANT_PAGE_VIRTUAL */
> >
> > +#ifdef CONFIG_KMSAN
> > + /*
> > + * KMSAN metadata for this page:
> > + * - shadow page: every bit indicates whether the corresponding
> > + * bit of the original page is initialized (0) or not (1);
> > + * - origin page: every 4 bytes contain an id of the stack trace
> > + * where the uninitialized value was created.
> > + */
> > + struct page *kmsan_shadow;
> > + struct page *kmsan_origin;
> > +#endif
> > +
> > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
> > int _last_cpupid;
> > #endif
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index a8911b1f35aad..9e53624cd73ac 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -14,6 +14,7 @@
> > #include <linux/pid.h>
> > #include <linux/sem.h>
> > #include <linux/shm.h>
> > +#include <linux/kmsan.h>
> > #include <linux/mutex.h>
> > #include <linux/plist.h>
> > #include <linux/hrtimer.h>
> > @@ -1352,6 +1353,10 @@ struct task_struct {
> > #endif
> > #endif
> >
> > +#ifdef CONFIG_KMSAN
> > + struct kmsan_ctx kmsan_ctx;
> > +#endif
> > +
> > #if IS_ENABLED(CONFIG_KUNIT)
> > struct kunit *kunit_test;
> > #endif
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 075cd25363ac3..b81670878acae 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -996,6 +996,7 @@ config DEBUG_STACKOVERFLOW
> >
> > source "lib/Kconfig.kasan"
> > source "lib/Kconfig.kfence"
> > +source "lib/Kconfig.kmsan"
> >
> > endmenu # "Memory Debugging"
> >
> > diff --git a/lib/Kconfig.kmsan b/lib/Kconfig.kmsan
> > new file mode 100644
> > index 0000000000000..199f79d031f94
> > --- /dev/null
> > +++ b/lib/Kconfig.kmsan
> > @@ -0,0 +1,23 @@
>
> Missing SPDX-License-Identifier.
Will do in v4, thanks!

> > +config KMSAN
> > + bool "KMSAN: detector of uninitialized values use"
> > + depends on HAVE_ARCH_KMSAN && HAVE_KMSAN_COMPILER
> > + depends on SLUB && DEBUG_KERNEL && !KASAN && !KCSAN
> > + depends on CC_IS_CLANG && CLANG_VERSION >= 140000
>
> Shouldn't the "CC_IS_CLANG && CLANG_VERSION ..." check be a "depends on"
> in HAVE_KMSAN_COMPILER? That way all the compiler-related checks are
> confined to HAVE_KMSAN_COMPILER.
Good point, thanks!
I also think I can drop the excessive CC_IS_CLANG in the definition of
HAVE_KMSAN_COMPILER.

> I guess, it might also be worth mentioning why the version check is
> required at all (something about older compilers supporting
> fsanitize=kernel-memory, but not having all features we need).
Done.

> > index 0000000000000..a80dde1de7048
> > --- /dev/null
> > +++ b/mm/kmsan/Makefile
> > @@ -0,0 +1,18 @@
>
> Makefile needs a SPDX-License-Identifier.
Done.


> > + shadow_dst = kmsan_get_metadata(dst, KMSAN_META_SHADOW);
> > + if (!shadow_dst)
> > + return;
> > + KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(dst, n));
> > +
> > + shadow_src = kmsan_get_metadata(src, KMSAN_META_SHADOW);
> > + if (!shadow_src) {
> > + /*
> > + * |src| is untracked: zero out destination shadow, ignore the
>
> Probably doesn't matter too much, but for consistency elsewhere - @src?
Fixed here and in other places where |var| is used.

> > + * If |src| isn't aligned on KMSAN_ORIGIN_SIZE, don't
> > + * look at the first |src % KMSAN_ORIGIN_SIZE| bytes
> > + * of the first shadow slot.
> > + */
E.g. here

> > + /*
> > + * If |src + n| isn't aligned on
> > + * KMSAN_ORIGIN_SIZE, don't look at the last
> > + * |(src + n) % KMSAN_ORIGIN_SIZE| bytes of the
> > + * last shadow slot.
> > + */
and here.



> > +
> > +extern bool kmsan_enabled;
> > +extern int panic_on_kmsan;
> > +
> > +/*
> > + * KMSAN performs a lot of consistency checks that are currently enabled by
> > + * default. BUG_ON is normally discouraged in the kernel, unless used for
> > + * debugging, but KMSAN itself is a debugging tool, so it makes little sense to
> > + * recover if something goes wrong.
> > + */
> > +#define KMSAN_WARN_ON(cond) \
> > + ({ \
> > + const bool __cond = WARN_ON(cond); \
> > + if (unlikely(__cond)) { \
> > + WRITE_ONCE(kmsan_enabled, false); \
> > + if (panic_on_kmsan) { \
> > + /* Can't call panic() here because */ \
> > + /* of uaccess checks.*/ \
>
> space after '.'
Done; also reformatted the macro to use tabs instead of spaces.


> > +void kmsan_report(depot_stack_handle_t origin, void *address, int size,
> > + int off_first, int off_last, const void *user_addr,
> > + enum kmsan_bug_reason reason)
> > +{
> > + unsigned long stack_entries[KMSAN_STACK_DEPTH];
> > + int num_stack_entries, skipnr;
> > + char *bug_type = NULL;
> > + unsigned long flags, ua_flags;
> > + bool is_uaf;
> > +
> > + if (!kmsan_enabled)
> > + return;
> > + if (!current->kmsan_ctx.allow_reporting)
> > + return;
> > + if (!origin)
> > + return;
> > +
> > + current->kmsan_ctx.allow_reporting = false;
> > + ua_flags = user_access_save();
> > + spin_lock_irqsave(&kmsan_report_lock, flags);
>
> I think this might want to be a raw_spin_lock, since the reporting can
> be called from any context, including from within other raw_spin_lock'd
> critical sections (practically this will only matter in RT kernels).
(Marco elaborated off-list that lockdep will complain if a spin_lock
critical section is nested inside raw_spin_lock)
Thanks, done.

> Also, do you have to do lockdep_off/on() (like kernel/kcsan/report.c
> does, see comment there)?

I don't see lockdep reports from within mm/kmsan/report.c
However there's one boot-time report that I am struggling to comprehend:

DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5481 check_flags+0x63/0x180
...
<TASK>
lock_acquire+0x85/0x1c0 kernel/locking/lockdep.c:5638
__raw_spin_lock_irqsave ./include/linux/spinlock_api_smp.h:110
_raw_spin_lock_irqsave+0x129/0x220 kernel/locking/spinlock.c:162
__stack_depot_save+0x1b1/0x4b0 lib/stackdepot.c:417
stack_depot_save+0x13/0x20 lib/stackdepot.c:471
__msan_poison_alloca+0x100/0x1a0 mm/kmsan/instrumentation.c:228
_raw_spin_unlock_irqrestore ??:?
arch_local_save_flags ./arch/x86/include/asm/irqflags.h:70
arch_irqs_disabled ./arch/x86/include/asm/irqflags.h:130
__raw_spin_unlock_irqrestore ./include/linux/spinlock_api_smp.h:151
_raw_spin_unlock_irqrestore+0xc6/0x190 kernel/locking/spinlock.c:194
tty_register_ldisc+0x15e/0x1c0 drivers/tty/tty_ldisc.c:68
n_tty_init+0x2f/0x32 drivers/tty/n_tty.c:2418
console_init+0x20/0x10d kernel/printk/printk.c:3220
start_kernel+0x6f0/0xd23 init/main.c:1071
x86_64_start_reservations+0x2a/0x2c arch/x86/kernel/head64.c:546
x86_64_start_kernel+0xf5/0xfa arch/x86/kernel/head64.c:527
secondary_startup_64_no_verify+0xc4/0xcb ??:?
</TASK>

Perhaps we need to disable lockdep in stackdepot as well?

> > + */
> > +static int kmsan_phys_addr_valid(unsigned long addr)
>
> int -> bool ? (it already deviates from the original by using IS_ENABLED
> instead of #ifdef)

Makes sense.

> > + * Taken from arch/x86/mm/physaddr.c to avoid using an instrumented version.
> > + */
> > +static bool kmsan_virt_addr_valid(void *addr)
> > +{
> > + unsigned long x = (unsigned long)addr;
> > + unsigned long y = x - __START_KERNEL_map;
> > +
> > + /* use the carry flag to determine if x was < __START_KERNEL_map */
> > + if (unlikely(x > y)) {
> > + x = y + phys_base;
> > +
> > + if (y >= KERNEL_IMAGE_SIZE)
> > + return false;
> > + } else {
> > + x = y + (__START_KERNEL_map - PAGE_OFFSET);
> > +
> > + /* carry flag will be set if starting x was >= PAGE_OFFSET */
> > + if ((x > y) || !kmsan_phys_addr_valid(x))
> > + return false;
> > + }
> > +
> > + return pfn_valid(x >> PAGE_SHIFT);
> > +}
>
> These seem quite x86-specific - to ease eventual porting to other
> architectures, it would make sense to introduce <asm/kmsan.h> which will
> have these 2 functions (and if there's anything else arch-specific like
> this, moving to <asm/kmsan.h> would help eventual ports).

Good idea, will do!
This part will probably need to go into "x86: kmsan: enable KMSAN
builds for x86"


> > + if (is_origin && !IS_ALIGNED(addr, KMSAN_ORIGIN_SIZE)) {
> > + pad = addr % KMSAN_ORIGIN_SIZE;
> > + addr -= pad;
> > + }
> > + address = (void *)addr;
> > + if (kmsan_internal_is_vmalloc_addr(address) ||
> > + kmsan_internal_is_module_addr(address))
> > + return (void *)vmalloc_meta(address, is_origin);
> > +
> > + page = virt_to_page_or_null(address);
> > + if (!page)
> > + return NULL;
> > + if (!page_has_metadata(page))
> > + return NULL;
> > + off = addr % PAGE_SIZE;
> > +
> > + ret = (is_origin ? origin_ptr_for(page) : shadow_ptr_for(page)) + off;
>
> Just return this and avoid 'ret'?
Good catch. There was some debugging code in the middle, but now we
don't need ret.

>
> > + return ret;
> > +}
> > diff --git a/scripts/Makefile.kmsan b/scripts/Makefile.kmsan
> > new file mode 100644
> > index 0000000000000..9793591f9855c
> > --- /dev/null
> > +++ b/scripts/Makefile.kmsan
> > @@ -0,0 +1 @@
>
> Makefile.kmsan needs SPDX-License-Identifier.
Done.





--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise
erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes
weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich
bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by
mistake, please don't forward it to anyone else, please erase all
copies and attachments, and please let me know that it has gone to the
wrong person.

2022-06-01 20:22:44

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [PATCH v3 28/46] kmsan: entry: handle register passing from uninstrumented code

On Thu, May 12, 2022 at 6:48 PM Thomas Gleixner <[email protected]> wrote:
>
> On Thu, May 12 2022 at 18:17, Thomas Gleixner wrote:
> > On Thu, May 12 2022 at 14:24, Alexander Potapenko wrote:
> >> We could try to figure out the places in idtentry code where normal
> >> kmsan_unpoison_memory() can be called in IRQ context, but as far as I
> >> can see it will depend on the type of the entry point.
> >
> > NMI is covered as it increments before it invokes the unpoison().
> >
> > Let me figure out why we increment the preempt count late for
> > interrupts. IIRC it's for symmetry reasons related to softirq processing
> > on return, but let me double check.
>
> It's even documented:
>
> https://www.kernel.org/doc/html/latest/core-api/entry.html#interrupts-and-regular-exceptions
>
> But who reads documentation? :)
>
> So, I think the simplest and least intrusive solution is to have special
> purpose unpoison functions. See the patch below for illustration.

This patch works well and I am going to adopt it for my series.
But the problem with occasional calls of instrumented functions from
noinstr still persists: if there is a noinstr function foo() and an
instrumented function bar() called from foo() with one or more
arguments, bar() must wipe its kmsan_context_state before using the
arguments.

I have a solution for this problem described in https://reviews.llvm.org/D126385
The plan is to pass __builtin_return_address(0) to
__msan_get_context_state_caller() at the beginning of each
instrumented function.
Then KMSAN runtime can check the passed return address and wipe the
context if it belongs to the .noinstr code section.

Alternatively, we could employ MSan's -fsanitize-memory-param-retval
flag, that will report supplying uninitialized parameters when calling
functions.
Doing so is currently allowed in the kernel, but Clang aggressively
applies the noundef attribute (see https://llvm.org/docs/LangRef.html)
to function arguments, which effectively makes passing uninit values
as function parameters an UB.
So if we make KMSAN detect such cases as well, we can ultimately get
rid of all cases when uninits are passed to functions.
As a result, kmsan_context_state will become unnecessary, because it
will never contain nonzero values.


> The reasons why I used specific ones:
>
> 1) User entry
>
> Whether that's a syscall or interrupt/exception does not
> matter. It's always on the task stack and your machinery cannot be
> running at that point because it came from user space.
>
> 2) Interrupt/exception/NMI entry kernel
>
> Those can nest into an already active context, so you really want
> to unpoison @regs.
>
> Also while regular interrupts cannot nest because of interrupts
> staying disabled, exceptions triggered in the interrupt handler and
> NMIs can nest.
>
> -> device interrupt()
> irqentry_enter(regs)
>
> -> NMI()
> irqentry_nmi_enter(regs)
>
> -> fault()
> irqentry_enter(regs)
>
> --> debug_exception()
> irqentry_nmi_enter(regs)
>
> Soft interrupt processing on return from interrupt makes it more
> interesting:
>
> interrupt()
> handler()
> do_softirq()
> local_irq_enable()
> interrupt()
> NMI
> ....
>
> And everytime you get a new @regs pointer to deal with.
>
> Wonderful, isn't it?
>
> Thanks,
>
> tglx
>