Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU.
v11 applies to next-20210317. I had hoped to have it apply to
powerpc/next but once again there are changes in the kasan core that
clash. Also, thanks to mpe for fixing a build break with KASAN off.
I'm not sure how best to progress this towards actually being merged
when it has impacts across subsystems. I'd appreciate any input. Maybe
the first four patches could go in via the kasan tree, that should
make things easier for powerpc in a future cycle?
v10 rebases on top of next-20210125, fixing things up to work on top
of the latest changes, and fixing some review comments from
Christophe. I have tested host and guest with 64k pages for this spin.
There is now only 1 failing KUnit test: kasan_global_oob - gcc puts
the ASAN init code in a section called '.init_array'. Powerpc64 module
loading code goes through and _renames_ any section beginning with
'.init' to begin with '_init' in order to avoid some complexities
around our 24-bit indirect jumps. This means it renames '.init_array'
to '_init_array', and the generic module loading code then fails to
recognise the section as a constructor and thus doesn't run it. This
hack dates back to 2003 and so I'm not going to try to unpick it in
this series. (I suspect this may have previously worked if the code
ended up in .ctors rather than .init_array but I don't keep my old
binaries around so I have no real way of checking.)
(The previously failing stack tests are now skipped due to more
accurate configuration settings.)
Details from v9: This is a significant reworking of the previous
versions. Instead of the previous approach which supported inline
instrumentation, this series provides only outline instrumentation.
To get around the problem of accessing the shadow region inside code we run
with translations off (in 'real mode'), we we restrict checking to when
translations are enabled. This is done via a new hook in the kasan core and
by excluding larger quantites of arch code from instrumentation. The upside
is that we no longer require that you be able to specify the amount of
physically contiguous memory on the system at compile time. Hopefully this
is a better trade-off. More details in patch 6.
kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
fragile - if any real mode code paths call out to instrumented code, things
will go boom.
Kind regards,
Daniel
Daniel Axtens (6):
kasan: allow an architecture to disable inline instrumentation
kasan: allow architectures to provide an outline readiness check
kasan: define and use MAX_PTRS_PER_* for early shadow tables
kasan: Document support on 32-bit powerpc
powerpc/mm/kasan: rename kasan_init_32.c to init_32.c
powerpc: Book3S 64-bit outline-only KASAN support
Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.
This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.
Cc: Balbir Singh <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Daniel Axtens <[email protected]>
--
I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
- https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
- https://patchwork.ozlabs.org/patch/795211/ # ppc radix series
---
include/linux/kasan.h | 4 ++++
mm/kasan/common.c | 4 ++++
mm/kasan/generic.c | 3 +++
mm/kasan/shadow.c | 4 ++++
4 files changed, 15 insertions(+)
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 8b3b99d659b7..6bd8343f0033 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -23,6 +23,10 @@ struct kunit_kasan_expectation {
#endif
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void) { return true; }
+#endif
+
#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
#include <linux/pgtable.h>
diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 6bb87f2acd4e..f23a9e2dce9f 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -345,6 +345,10 @@ static inline bool ____kasan_slab_free(struct kmem_cache *cache, void *object,
if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
return false;
+ /* We can't read the shadow byte if the arch isn't ready */
+ if (!kasan_arch_is_ready())
+ return false;
+
if (!kasan_byte_accessible(tagged_object)) {
kasan_report_invalid_free(tagged_object, ip);
return true;
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned long addr,
size_t size, bool write,
unsigned long ret_ip)
{
+ if (!kasan_arch_is_ready())
+ return true;
+
if (unlikely(size == 0))
return true;
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 727ad4629173..1f650c521037 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -80,6 +80,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, bool init)
*/
addr = kasan_reset_tag(addr);
+ /* Don't touch the shadow memory if arch isn't ready */
+ if (!kasan_arch_is_ready())
+ return;
+
/* Skip KFENCE memory if called explicitly outside of sl*b. */
if (is_kfence_address(addr))
return;
--
2.27.0
powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.
This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.
Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.
Suggested-by: Christophe Leroy <[email protected]>
Suggested-by: Balbir Singh <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Signed-off-by: Daniel Axtens <[email protected]>
---
include/linux/kasan.h | 18 +++++++++++++++---
mm/kasan/init.c | 6 +++---
2 files changed, 18 insertions(+), 6 deletions(-)
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 6bd8343f0033..68cd6e55c872 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -44,10 +44,22 @@ static inline bool kasan_arch_is_ready(void) { return true; }
#define PTE_HWTABLE_PTRS 0
#endif
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index c4605ac9837b..b4d822dff1fb 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
}
#endif
#if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
static inline bool kasan_pud_table(p4d_t p4d)
{
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
}
#endif
#if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
static inline bool kasan_pmd_table(pud_t pud)
{
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
}
#endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
static inline bool kasan_pte_table(pmd_t pmd)
--
2.27.0
KASAN is supported on 32-bit powerpc and the docs should reflect this.
Suggested-by: Christophe Leroy <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Signed-off-by: Daniel Axtens <[email protected]>
---
Documentation/dev-tools/kasan.rst | 8 ++++++--
Documentation/powerpc/kasan.txt | 12 ++++++++++++
2 files changed, 18 insertions(+), 2 deletions(-)
create mode 100644 Documentation/powerpc/kasan.txt
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index a8c3e0cff88d..2cfd5d9068c0 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,7 +36,8 @@ Both software KASAN modes work with SLUB and SLAB memory allocators,
while the hardware tag-based KASAN currently only supports SLUB.
Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures, and tag-based KASAN modes are supported only for arm64.
+and riscv architectures. It is also supported on 32-bit powerpc kernels.
+Tag-based KASAN modes are supported only for arm64.
Usage
-----
@@ -334,7 +335,10 @@ CONFIG_KASAN_VMALLOC
With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and powerpc.
+riscv, s390, and 32-bit powerpc.
+
+It is optional, except on 32-bit powerpc kernels with module support,
+where it is required.
This works by hooking into vmalloc and vmap and dynamically
allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
new file mode 100644
index 000000000000..26bb0e8bb18c
--- /dev/null
+++ b/Documentation/powerpc/kasan.txt
@@ -0,0 +1,12 @@
+KASAN is supported on powerpc on 32-bit only.
+
+32 bit support
+==============
+
+KASAN is supported on both hash and nohash MMUs on 32-bit.
+
+The shadow area sits at the top of the kernel virtual memory space above the
+fixmap area and occupies one eighth of the total kernel virtual memory space.
+
+Instrumentation of the vmalloc area is optional, unless built with modules,
+in which case it is required.
--
2.27.0
kasan is already implied by the directory name, we don't need to
repeat it.
Suggested-by: Christophe Leroy <[email protected]>
Signed-off-by: Daniel Axtens <[email protected]>
---
arch/powerpc/mm/kasan/Makefile | 2 +-
arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
rename arch/powerpc/mm/kasan/{kasan_init_32.c => init_32.c} (100%)
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index bb1a5408b86b..42fb628a44fd 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -2,6 +2,6 @@
KASAN_SANITIZE := n
-obj-$(CONFIG_PPC32) += kasan_init_32.o
+obj-$(CONFIG_PPC32) += init_32.o
obj-$(CONFIG_PPC_8xx) += 8xx.o
obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/init_32.c
similarity index 100%
rename from arch/powerpc/mm/kasan/kasan_init_32.c
rename to arch/powerpc/mm/kasan/init_32.c
--
2.27.0
For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.
Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)
We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.
Signed-off-by: Daniel Axtens <[email protected]>
---
lib/Kconfig.kasan | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..7e237dbb6df3 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
config HAVE_ARCH_KASAN_VMALLOC
bool
+config ARCH_DISABLE_KASAN_INLINE
+ def_bool n
+
config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
@@ -130,6 +133,7 @@ config KASAN_OUTLINE
config KASAN_INLINE
bool "Inline instrumentation"
+ depends on !ARCH_DISABLE_KASAN_INLINE
help
Compiler directly inserts code checking shadow memory before
memory accesses. This is faster than outline (in some workloads
@@ -142,6 +146,7 @@ config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && !COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
default y if CC_IS_GCC
+ depends on !ARCH_DISABLE_KASAN_INLINE
help
The LLVM stack address sanitizer has a know problem that
causes excessive stack usage in a lot of functions, see
@@ -154,6 +159,9 @@ config KASAN_STACK
but clang users can still enable it for builds without
CONFIG_COMPILE_TEST. On gcc it is assumed to always be safe
to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
--
2.27.0
Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.
- Enable the compiler instrumentation to check addresses and maintain the
shadow region. (This is the guts of KASAN which we can easily reuse.)
- Require kasan-vmalloc support to handle modules and anything else in
vmalloc space.
- KASAN needs to be able to validate all pointer accesses, but we can't
instrument all kernel addresses - only linear map and vmalloc. On boot,
set up a single page of read-only shadow that marks all iomap and
vmemmap accesses as valid.
- Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
generic code, arm64, s390 and x86 all do this for similar sorts of
reasons: when unwinding a stack, we might touch memory that KASAN has
marked as being out-of-bounds. In our case we often get this when
checking for an exception frame because we're checking an arbitrary
offset into the stack frame.
See commit 20955746320e ("s390/kasan: avoid false positives during stack
unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
Prevent KASAN false positive warnings") and commit 6e22c8366416
("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")
- Document KASAN in both generic and powerpc docs.
Background
----------
KASAN support on Book3S is a bit tricky to get right:
- It would be good to support inline instrumentation so as to be able to
catch stack issues that cannot be caught with outline mode.
- Inline instrumentation requires a fixed offset.
- Book3S runs code with translations off ("real mode") during boot,
including a lot of generic device-tree parsing code which is used to
determine MMU features.
[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008.... This is a one-to-one mapping with physical
memory from 0000... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]
- Some code - most notably a lot of KVM code - also runs with translations
off after boot.
- Therefore any offset has to point to memory that is valid with
translations on or off.
One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.
Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.
Cc: Balbir Singh <[email protected]> # ppc64 out-of-line radix version
Cc: Aneesh Kumar K.V <[email protected]> # ppc64 hash version
Cc: Christophe Leroy <[email protected]> # ppc32 version
Signed-off-by: Daniel Axtens <[email protected]>
---
Documentation/dev-tools/kasan.rst | 11 +--
Documentation/powerpc/kasan.txt | 48 +++++++++-
arch/powerpc/Kconfig | 4 +-
arch/powerpc/Kconfig.debug | 3 +-
arch/powerpc/include/asm/book3s/64/hash.h | 4 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 4 +
arch/powerpc/include/asm/book3s/64/radix.h | 13 ++-
arch/powerpc/include/asm/kasan.h | 22 +++++
arch/powerpc/kernel/Makefile | 11 +++
arch/powerpc/kernel/process.c | 16 ++--
arch/powerpc/kvm/Makefile | 5 ++
arch/powerpc/mm/book3s64/Makefile | 9 ++
arch/powerpc/mm/kasan/Makefile | 1 +
arch/powerpc/mm/kasan/init_book3s_64.c | 95 ++++++++++++++++++++
arch/powerpc/mm/ptdump/ptdump.c | 20 ++++-
arch/powerpc/platforms/Kconfig.cputype | 1 +
arch/powerpc/platforms/powernv/Makefile | 6 ++
arch/powerpc/platforms/pseries/Makefile | 3 +
18 files changed, 257 insertions(+), 19 deletions(-)
create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index 2cfd5d9068c0..8024b55c7aa8 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory allocators,
while the hardware tag-based KASAN currently only supports SLUB.
Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
-and riscv architectures. It is also supported on 32-bit powerpc kernels.
-Tag-based KASAN modes are supported only for arm64.
+and riscv architectures. It is also supported on powerpc for 32-bit kernels and
+for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
+supported only for arm64.
Usage
-----
@@ -335,10 +336,10 @@ CONFIG_KASAN_VMALLOC
With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
cost of greater memory usage. Currently, this is supported on x86,
-riscv, s390, and 32-bit powerpc.
+riscv, s390, and powerpc.
-It is optional, except on 32-bit powerpc kernels with module support,
-where it is required.
+It is optional, except on 64-bit powerpc kernels, and on 32-bit
+powerpc kernels with module support, where it is required.
This works by hooking into vmalloc and vmap and dynamically
allocating real shadow memory to back the mappings.
diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
index 26bb0e8bb18c..f032b4eaf205 100644
--- a/Documentation/powerpc/kasan.txt
+++ b/Documentation/powerpc/kasan.txt
@@ -1,4 +1,4 @@
-KASAN is supported on powerpc on 32-bit only.
+KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
32 bit support
==============
@@ -10,3 +10,49 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
Instrumentation of the vmalloc area is optional, unless built with modules,
in which case it is required.
+
+64 bit support
+==============
+
+Currently, only the radix MMU is supported. There have been versions for hash
+and Book3E processors floating around on the mailing list, but nothing has been
+merged.
+
+KASAN support on Book3S is a bit tricky to get right:
+
+ - It would be good to support inline instrumentation so as to be able to catch
+ stack issues that cannot be caught with outline mode.
+
+ - Inline instrumentation requires a fixed offset.
+
+ - Book3S runs code with translations off ("real mode") during boot, including a
+ lot of generic device-tree parsing code which is used to determine MMU
+ features.
+
+ - Some code - most notably a lot of KVM code - also runs with translations off
+ after boot.
+
+ - Therefore any offset has to point to memory that is valid with
+ translations on or off.
+
+One approach is just to give up on inline instrumentation. This way boot-time
+checks can be delayed until after the MMU is set is up, and we can just not
+instrument any code that runs with translations off after booting. This is the
+current approach.
+
+To avoid this limitiation, the KASAN shadow would have to be placed inside the
+linear mapping, using the same high-bits trick we use for the rest of the linear
+mapping. This is tricky:
+
+ - We'd like to place it near the start of physical memory. In theory we can do
+ this at run-time based on how much physical memory we have, but this requires
+ being able to arbitrarily relocate the kernel, which is basically the tricky
+ part of KASLR. Not being game to implement both tricky things at once, this
+ is hopefully something we can revisit once we get KASLR for Book3S.
+
+ - Alternatively, we can place the shadow at the _end_ of memory, but this
+ requires knowing how much contiguous physical memory a system has _at compile
+ time_. This is a big hammer, and has some unfortunate consequences: inablity
+ to handle discontiguous physical memory, total failure to boot on machines
+ with less memory than specified, and that machines with more memory than
+ specified can't use it. This was deemed unacceptable.
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 4232d3f539c8..04aa817d1c5a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -118,6 +118,7 @@ config PPC
# Please keep this list sorted alphabetically.
#
select ARCH_32BIT_OFF_T if PPC32
+ select ARCH_DISABLE_KASAN_INLINE if PPC_RADIX_MMU
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ELF_RANDOMIZE
@@ -183,7 +184,8 @@ config PPC
select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
- select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
+ select HAVE_ARCH_KASAN if PPC_RADIX_MMU
+ select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index ae084357994e..195f7845f41a 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -398,4 +398,5 @@ config PPC_FAST_ENDIAN_SWITCH
config KASAN_SHADOW_OFFSET
hex
depends on KASAN
- default 0xe0000000
+ default 0xe0000000 if PPC32
+ default 0xa80e000000000000 if PPC64
diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
index d959b0195ad9..222669864ff6 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -18,6 +18,10 @@
#include <asm/book3s/64/hash-4k.h>
#endif
+#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE)
+#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE)
+#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE)
+
/* Bits to set in a PMD/PUD/PGD entry valid bit*/
#define HASH_PMD_VAL_BITS (0x8000000000000000UL)
#define HASH_PUD_VAL_BITS (0x8000000000000000UL)
diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
index 058601efbc8a..7598a5b055bd 100644
--- a/arch/powerpc/include/asm/book3s/64/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -230,6 +230,10 @@ extern unsigned long __pmd_frag_size_shift;
#define PTRS_PER_PUD (1 << PUD_INDEX_SIZE)
#define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
+#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? H_PTRS_PER_PTE : R_PTRS_PER_PTE)
+#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? H_PTRS_PER_PMD : R_PTRS_PER_PMD)
+#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? H_PTRS_PER_PUD : R_PTRS_PER_PUD)
+
/* PMD_SHIFT determines what a second-level page table entry can map */
#define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
#define PMD_SIZE (1UL << PMD_SHIFT)
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
index c7813dc628fc..b3492b80f858 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -35,6 +35,11 @@
#define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
#define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
#define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
+
+#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
+#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
+#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
+
/*
* Size of EA range mapped by our pagetables.
*/
@@ -68,11 +73,11 @@
*
*
* 3rd quadrant expanded:
- * +------------------------------+
+ * +------------------------------+ Highest address (0xc010000000000000)
+ * +------------------------------+ KASAN shadow end (0xc00fc00000000000)
* | |
* | |
- * | |
- * +------------------------------+ Kernel vmemmap end (0xc010000000000000)
+ * +------------------------------+ Kernel vmemmap end/shadow start (0xc00e000000000000)
* | |
* | 512TB |
* | |
@@ -126,6 +131,8 @@
#define RADIX_VMEMMAP_SIZE RADIX_KERN_MAP_SIZE
#define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
+/* For the sizes of the shadow area, see kasan.h */
+
#ifndef __ASSEMBLY__
#define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
#define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
index 7355ed05e65e..df946165812d 100644
--- a/arch/powerpc/include/asm/kasan.h
+++ b/arch/powerpc/include/asm/kasan.h
@@ -30,9 +30,31 @@
#define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
+#ifdef CONFIG_PPC32
#define KASAN_SHADOW_END (-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
+#endif
#ifdef CONFIG_KASAN
+#ifdef CONFIG_PPC_BOOK3S_64
+/*
+ * The shadow ends before the highest accessible address
+ * because we don't need a shadow for the shadow. Instead:
+ * c00e000000000000 << 3 + a80e000000000000000 = c00fc00000000000
+ */
+#define KASAN_SHADOW_END 0xc00fc00000000000UL
+
+DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
+
+static __always_inline bool kasan_arch_is_ready(void)
+{
+ if (static_branch_likely(&powerpc_kasan_enabled_key))
+ return true;
+ return false;
+}
+
+#define kasan_arch_is_ready kasan_arch_is_ready
+#endif
+
void kasan_early_init(void);
void kasan_mmu_init(void);
void kasan_init(void);
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 6084fa499aa3..163755b1cef4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -32,6 +32,17 @@ KASAN_SANITIZE_early_32.o := n
KASAN_SANITIZE_cputable.o := n
KASAN_SANITIZE_prom_init.o := n
KASAN_SANITIZE_btext.o := n
+KASAN_SANITIZE_paca.o := n
+KASAN_SANITIZE_setup_64.o := n
+KASAN_SANITIZE_mce.o := n
+KASAN_SANITIZE_mce_power.o := n
+
+# we have to be particularly careful in ppc64 to exclude code that
+# runs with translations off, as we cannot access the shadow with
+# translations off. However, ppc32 can sanitize this.
+ifdef CONFIG_PPC64
+KASAN_SANITIZE_traps.o := n
+endif
ifdef CONFIG_KASAN
CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 3231c2df9e26..d4ae21b9e9b7 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2160,8 +2160,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
break;
stack = (unsigned long *) sp;
- newsp = stack[0];
- ip = stack[STACK_FRAME_LR_SAVE];
+ newsp = READ_ONCE_NOCHECK(stack[0]);
+ ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
if (!firstframe || ip != lr) {
printk("%s["REG"] ["REG"] %pS",
loglvl, sp, ip, (void *)ip);
@@ -2179,17 +2179,19 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
* See if this is an exception frame.
* We look for the "regshere" marker in the current frame.
*/
- if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
- && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
+ if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
+ (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
+ STACK_FRAME_REGS_MARKER)) {
struct pt_regs *regs = (struct pt_regs *)
(sp + STACK_FRAME_OVERHEAD);
- lr = regs->link;
+ lr = READ_ONCE_NOCHECK(regs->link);
printk("%s--- interrupt: %lx at %pS\n",
- loglvl, regs->trap, (void *)regs->nip);
+ loglvl, READ_ONCE_NOCHECK(regs->trap),
+ (void *)READ_ONCE_NOCHECK(regs->nip));
__show_regs(regs);
printk("%s--- interrupt: %lx\n",
- loglvl, regs->trap);
+ loglvl, READ_ONCE_NOCHECK(regs->trap));
firstframe = 1;
}
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 2bfeaa13befb..7f1592dacbeb 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -136,3 +136,8 @@ obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
obj-y += $(kvm-book3s_64-builtin-objs-y)
+
+# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
+ifdef CONFIG_PPC_BOOK3S_64
+KASAN_SANITIZE := n
+endif
diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
index 1b56d3af47d4..a7d8a68bd2c5 100644
--- a/arch/powerpc/mm/book3s64/Makefile
+++ b/arch/powerpc/mm/book3s64/Makefile
@@ -21,3 +21,12 @@ obj-$(CONFIG_PPC_PKEY) += pkeys.o
# Instrumenting the SLB fault path can lead to duplicate SLB entries
KCOV_INSTRUMENT_slb.o := n
+
+# Parts of these can run in real mode and therefore are
+# not safe with the current outline KASAN implementation
+KASAN_SANITIZE_mmu_context.o := n
+KASAN_SANITIZE_pgtable.o := n
+KASAN_SANITIZE_radix_pgtable.o := n
+KASAN_SANITIZE_radix_tlb.o := n
+KASAN_SANITIZE_slb.o := n
+KASAN_SANITIZE_pkeys.o := n
diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
index 42fb628a44fd..07eef87abd6c 100644
--- a/arch/powerpc/mm/kasan/Makefile
+++ b/arch/powerpc/mm/kasan/Makefile
@@ -5,3 +5,4 @@ KASAN_SANITIZE := n
obj-$(CONFIG_PPC32) += init_32.o
obj-$(CONFIG_PPC_8xx) += 8xx.o
obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
+obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o
diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
new file mode 100644
index 000000000000..ca913ed951a2
--- /dev/null
+++ b/arch/powerpc/mm/kasan/init_book3s_64.c
@@ -0,0 +1,95 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KASAN for 64-bit Book3S powerpc
+ *
+ * Copyright (C) 2019-2020 IBM Corporation
+ * Author: Daniel Axtens <[email protected]>
+ */
+
+#define DISABLE_BRANCH_PROFILING
+
+#include <linux/kasan.h>
+#include <linux/printk.h>
+#include <linux/sched/task.h>
+#include <linux/memblock.h>
+#include <asm/pgalloc.h>
+
+DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
+
+static void __init kasan_init_phys_region(void *start, void *end)
+{
+ unsigned long k_start, k_end, k_cur;
+ void *va;
+
+ if (start >= end)
+ return;
+
+ k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
+ k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
+
+ va = memblock_alloc(k_end - k_start, PAGE_SIZE);
+ for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
+ map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
+}
+
+void __init kasan_init(void)
+{
+ /*
+ * We want to do the following things:
+ * 1) Map real memory into the shadow for all physical memblocks
+ * This takes us from c000... to c008...
+ * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
+ * will manage this for us.
+ * This takes us from c008... to c00a...
+ * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
+ * This takes us up to where we start at c00e...
+ */
+
+ void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
+ void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
+ phys_addr_t start, end;
+ u64 i;
+ pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
+
+ if (!early_radix_enabled())
+ panic("KASAN requires radix!");
+
+ for_each_mem_range(i, &start, &end)
+ kasan_init_phys_region((void *)start, (void *)end);
+
+ for (i = 0; i < PTRS_PER_PTE; i++)
+ __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+ &kasan_early_shadow_pte[i], zero_pte, 0);
+
+ for (i = 0; i < PTRS_PER_PMD; i++)
+ pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
+ kasan_early_shadow_pte);
+
+ for (i = 0; i < PTRS_PER_PUD; i++)
+ pud_populate(&init_mm, &kasan_early_shadow_pud[i],
+ kasan_early_shadow_pmd);
+
+ /* map the early shadow over the iomap and vmemmap space */
+ kasan_populate_early_shadow(k_start, k_end);
+
+ /* mark early shadow region as RO and wipe it */
+ zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
+ for (i = 0; i < PTRS_PER_PTE; i++)
+ __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
+ &kasan_early_shadow_pte[i], zero_pte, 0);
+
+ /*
+ * clear_page relies on some cache info that hasn't been set up yet.
+ * It ends up looping ~forever and blows up other data.
+ * Use memset instead.
+ */
+ memset(kasan_early_shadow_page, 0, PAGE_SIZE);
+
+ static_branch_inc(&powerpc_kasan_enabled_key);
+
+ /* Enable error messages */
+ init_task.kasan_depth = 0;
+ pr_info("KASAN init done (64-bit Book3S)\n");
+}
+
+void __init kasan_late_init(void) { }
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index aca354fb670b..63672aa656e8 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -20,6 +20,7 @@
#include <linux/seq_file.h>
#include <asm/fixmap.h>
#include <linux/const.h>
+#include <linux/kasan.h>
#include <asm/page.h>
#include <asm/hugetlb.h>
@@ -317,6 +318,23 @@ static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
unsigned long addr;
unsigned int i;
+#if defined(CONFIG_KASAN) && defined(CONFIG_PPC_BOOK3S_64)
+ /*
+ * On radix + KASAN, we want to check for the KASAN "early" shadow
+ * which covers huge quantities of memory with the same set of
+ * read-only PTEs. If it is, we want to note the first page (to see
+ * the status change), and then note the last page. This gives us good
+ * results without spending ages noting the exact same PTEs over 100s of
+ * terabytes of memory.
+ */
+ if (p4d_page(*p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud))) {
+ walk_pmd(st, pud, start);
+ addr = start + (PTRS_PER_PUD - 1) * PUD_SIZE;
+ walk_pmd(st, pud, addr);
+ return;
+ }
+#endif
+
for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
addr = start + i * PUD_SIZE;
if (!pud_none(*pud) && !pud_is_leaf(*pud))
@@ -387,11 +405,11 @@ static void populate_markers(void)
#endif
address_markers[i++].start_address = FIXADDR_START;
address_markers[i++].start_address = FIXADDR_TOP;
+#endif /* CONFIG_PPC64 */
#ifdef CONFIG_KASAN
address_markers[i++].start_address = KASAN_SHADOW_START;
address_markers[i++].start_address = KASAN_SHADOW_END;
#endif
-#endif /* CONFIG_PPC64 */
}
static int ptdump_show(struct seq_file *m, void *v)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 3ce907523b1e..9063c13e7221 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -101,6 +101,7 @@ config PPC_BOOK3S_64
select ARCH_SUPPORTS_NUMA_BALANCING
select IRQ_WORK
select PPC_MM_SLICES
+ select KASAN_VMALLOC if KASAN
config PPC_BOOK3E_64
bool "Embedded processors"
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 2eb6ae150d1f..f277e4793696 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,4 +1,10 @@
# SPDX-License-Identifier: GPL-2.0
+
+# nothing that deals with real mode is safe to KASAN
+# in particular, idle code runs a bunch of things in real mode
+KASAN_SANITIZE_idle.o := n
+KASAN_SANITIZE_pci-ioda.o := n
+
obj-y += setup.o opal-call.o opal-wrappers.o opal.o opal-async.o
obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index c8a2b0b05ac0..202199ef9e5c 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -30,3 +30,6 @@ obj-$(CONFIG_PPC_SVM) += svm.o
obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
obj-$(CONFIG_SUSPEND) += suspend.o
+
+# nothing that operates in real mode is safe for KASAN
+KASAN_SANITIZE_ras.o := n
--
2.27.0
On Sat, Mar 20, 2021 at 01:40:52AM +1100, Daniel Axtens wrote:
> Building on the work of Christophe, Aneesh and Balbir, I've ported
> KASAN to 64-bit Book3S kernels running on the Radix MMU.
>
> v11 applies to next-20210317. I had hoped to have it apply to
> powerpc/next but once again there are changes in the kasan core that
> clash. Also, thanks to mpe for fixing a build break with KASAN off.
>
> I'm not sure how best to progress this towards actually being merged
> when it has impacts across subsystems. I'd appreciate any input. Maybe
> the first four patches could go in via the kasan tree, that should
> make things easier for powerpc in a future cycle?
>
> v10 rebases on top of next-20210125, fixing things up to work on top
> of the latest changes, and fixing some review comments from
> Christophe. I have tested host and guest with 64k pages for this spin.
>
> There is now only 1 failing KUnit test: kasan_global_oob - gcc puts
> the ASAN init code in a section called '.init_array'. Powerpc64 module
> loading code goes through and _renames_ any section beginning with
> '.init' to begin with '_init' in order to avoid some complexities
> around our 24-bit indirect jumps. This means it renames '.init_array'
> to '_init_array', and the generic module loading code then fails to
> recognise the section as a constructor and thus doesn't run it. This
> hack dates back to 2003 and so I'm not going to try to unpick it in
> this series. (I suspect this may have previously worked if the code
> ended up in .ctors rather than .init_array but I don't keep my old
> binaries around so I have no real way of checking.)
>
> (The previously failing stack tests are now skipped due to more
> accurate configuration settings.)
>
> Details from v9: This is a significant reworking of the previous
> versions. Instead of the previous approach which supported inline
> instrumentation, this series provides only outline instrumentation.
>
> To get around the problem of accessing the shadow region inside code we run
> with translations off (in 'real mode'), we we restrict checking to when
> translations are enabled. This is done via a new hook in the kasan core and
> by excluding larger quantites of arch code from instrumentation. The upside
> is that we no longer require that you be able to specify the amount of
> physically contiguous memory on the system at compile time. Hopefully this
> is a better trade-off. More details in patch 6.
>
> kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
> nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
> fragile - if any real mode code paths call out to instrumented code, things
> will go boom.
>
The last time I checked, the changes for real mode, made the code hard to
review/maintain. I am happy to see that we've decided to leave that off
the table for now, reviewing the series
Balbir Singh.
On Sat, Mar 20, 2021 at 01:40:53AM +1100, Daniel Axtens wrote:
> For annoying architectural reasons, it's very difficult to support inline
> instrumentation on powerpc64.
I think we can expand here and talk about how in hash mode, the vmalloc
address space is in a region of memory different than where kernel virtual
addresses are mapped. Did I recollect the reason correctly?
>
> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
> annoying to be 'backwards', but I'm not aware of any way to have
> an arch force a symbol to be 'n', rather than 'y'.)
>
> We also disable stack instrumentation in this case as it does things that
> are functionally equivalent to inline instrumentation, namely adding
> code that touches the shadow directly without going through a C helper.
>
> Signed-off-by: Daniel Axtens <[email protected]>
> ---
> lib/Kconfig.kasan | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index cffc2ebbf185..7e237dbb6df3 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
> config HAVE_ARCH_KASAN_VMALLOC
> bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> + def_bool n
> +
Some comments on what arch's want to disable kasan inline would
be helpful and why.
Balbir Singh.
On Sat, Mar 20, 2021 at 01:40:58AM +1100, Daniel Axtens wrote:
> Implement a limited form of KASAN for Book3S 64-bit machines running under
> the Radix MMU, supporting only outline mode.
>
Could you highlight the changes from
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/?
Feel free to use my signed-off-by if you need to and add/update copyright
headers if appropriate.
> - Enable the compiler instrumentation to check addresses and maintain the
> shadow region. (This is the guts of KASAN which we can easily reuse.)
>
> - Require kasan-vmalloc support to handle modules and anything else in
> vmalloc space.
>
> - KASAN needs to be able to validate all pointer accesses, but we can't
> instrument all kernel addresses - only linear map and vmalloc. On boot,
> set up a single page of read-only shadow that marks all iomap and
> vmemmap accesses as valid.
>
> - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
> generic code, arm64, s390 and x86 all do this for similar sorts of
> reasons: when unwinding a stack, we might touch memory that KASAN has
> marked as being out-of-bounds. In our case we often get this when
> checking for an exception frame because we're checking an arbitrary
> offset into the stack frame.
>
> See commit 20955746320e ("s390/kasan: avoid false positives during stack
> unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
> frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
> Prevent KASAN false positive warnings") and commit 6e22c8366416
> ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")
>
> - Document KASAN in both generic and powerpc docs.
>
> Background
> ----------
>
> KASAN support on Book3S is a bit tricky to get right:
>
> - It would be good to support inline instrumentation so as to be able to
> catch stack issues that cannot be caught with outline mode.
>
> - Inline instrumentation requires a fixed offset.
>
> - Book3S runs code with translations off ("real mode") during boot,
> including a lot of generic device-tree parsing code which is used to
> determine MMU features.
>
> [ppc64 mm note: The kernel installs a linear mapping at effective
> address c000...-c008.... This is a one-to-one mapping with physical
> memory from 0000... onward. Because of how memory accesses work on
> powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
> same memory both with translations on (accessing as an 'effective
> address'), and with translations off (accessing as a 'real
> address'). This works in both guests and the hypervisor. For more
> details, see s5.7 of Book III of version 3 of the ISA, in particular
> the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
> KASAN implementation currently only supports Radix.]
>
> - Some code - most notably a lot of KVM code - also runs with translations
> off after boot.
>
> - Therefore any offset has to point to memory that is valid with
> translations on or off.
>
> One approach is just to give up on inline instrumentation. This way
> boot-time checks can be delayed until after the MMU is set is up, and we
> can just not instrument any code that runs with translations off after
> booting. Take this approach for now and require outline instrumentation.
>
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and it had to be specified at compile time. Maybe we can do better in
> the future.
>
> Cc: Balbir Singh <[email protected]> # ppc64 out-of-line radix version
> Cc: Aneesh Kumar K.V <[email protected]> # ppc64 hash version
> Cc: Christophe Leroy <[email protected]> # ppc32 version
> Signed-off-by: Daniel Axtens <[email protected]>
> ---
> Documentation/dev-tools/kasan.rst | 11 +--
> Documentation/powerpc/kasan.txt | 48 +++++++++-
> arch/powerpc/Kconfig | 4 +-
> arch/powerpc/Kconfig.debug | 3 +-
> arch/powerpc/include/asm/book3s/64/hash.h | 4 +
> arch/powerpc/include/asm/book3s/64/pgtable.h | 4 +
> arch/powerpc/include/asm/book3s/64/radix.h | 13 ++-
> arch/powerpc/include/asm/kasan.h | 22 +++++
> arch/powerpc/kernel/Makefile | 11 +++
> arch/powerpc/kernel/process.c | 16 ++--
> arch/powerpc/kvm/Makefile | 5 ++
> arch/powerpc/mm/book3s64/Makefile | 9 ++
> arch/powerpc/mm/kasan/Makefile | 1 +
> arch/powerpc/mm/kasan/init_book3s_64.c | 95 ++++++++++++++++++++
> arch/powerpc/mm/ptdump/ptdump.c | 20 ++++-
> arch/powerpc/platforms/Kconfig.cputype | 1 +
> arch/powerpc/platforms/powernv/Makefile | 6 ++
> arch/powerpc/platforms/pseries/Makefile | 3 +
> 18 files changed, 257 insertions(+), 19 deletions(-)
> create mode 100644 arch/powerpc/mm/kasan/init_book3s_64.c
>
> diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
> index 2cfd5d9068c0..8024b55c7aa8 100644
> --- a/Documentation/dev-tools/kasan.rst
> +++ b/Documentation/dev-tools/kasan.rst
> @@ -36,8 +36,9 @@ Both software KASAN modes work with SLUB and SLAB memory allocators,
> while the hardware tag-based KASAN currently only supports SLUB.
>
> Currently, generic KASAN is supported for the x86_64, arm, arm64, xtensa, s390,
> -and riscv architectures. It is also supported on 32-bit powerpc kernels.
> -Tag-based KASAN modes are supported only for arm64.
> +and riscv architectures. It is also supported on powerpc for 32-bit kernels and
> +for 64-bit kernels running under the Radix MMU. Tag-based KASAN modes are
> +supported only for arm64.
>
> Usage
> -----
> @@ -335,10 +336,10 @@ CONFIG_KASAN_VMALLOC
>
> With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the
> cost of greater memory usage. Currently, this is supported on x86,
> -riscv, s390, and 32-bit powerpc.
> +riscv, s390, and powerpc.
>
> -It is optional, except on 32-bit powerpc kernels with module support,
> -where it is required.
> +It is optional, except on 64-bit powerpc kernels, and on 32-bit
> +powerpc kernels with module support, where it is required.
>
> This works by hooking into vmalloc and vmap and dynamically
> allocating real shadow memory to back the mappings.
> diff --git a/Documentation/powerpc/kasan.txt b/Documentation/powerpc/kasan.txt
> index 26bb0e8bb18c..f032b4eaf205 100644
> --- a/Documentation/powerpc/kasan.txt
> +++ b/Documentation/powerpc/kasan.txt
> @@ -1,4 +1,4 @@
> -KASAN is supported on powerpc on 32-bit only.
> +KASAN is supported on powerpc on 32-bit and Radix 64-bit only.
>
> 32 bit support
> ==============
> @@ -10,3 +10,49 @@ fixmap area and occupies one eighth of the total kernel virtual memory space.
>
> Instrumentation of the vmalloc area is optional, unless built with modules,
> in which case it is required.
> +
> +64 bit support
> +==============
> +
> +Currently, only the radix MMU is supported. There have been versions for hash
> +and Book3E processors floating around on the mailing list, but nothing has been
> +merged.
> +
> +KASAN support on Book3S is a bit tricky to get right:
> +
> + - It would be good to support inline instrumentation so as to be able to catch
> + stack issues that cannot be caught with outline mode.
> +
> + - Inline instrumentation requires a fixed offset.
> +
> + - Book3S runs code with translations off ("real mode") during boot, including a
> + lot of generic device-tree parsing code which is used to determine MMU
> + features.
> +
> + - Some code - most notably a lot of KVM code - also runs with translations off
> + after boot.
> +
> + - Therefore any offset has to point to memory that is valid with
> + translations on or off.
> +
> +One approach is just to give up on inline instrumentation. This way boot-time
> +checks can be delayed until after the MMU is set is up, and we can just not
> +instrument any code that runs with translations off after booting. This is the
> +current approach.
> +
> +To avoid this limitiation, the KASAN shadow would have to be placed inside the
> +linear mapping, using the same high-bits trick we use for the rest of the linear
> +mapping. This is tricky:
> +
> + - We'd like to place it near the start of physical memory. In theory we can do
> + this at run-time based on how much physical memory we have, but this requires
> + being able to arbitrarily relocate the kernel, which is basically the tricky
> + part of KASLR. Not being game to implement both tricky things at once, this
> + is hopefully something we can revisit once we get KASLR for Book3S.
> +
> + - Alternatively, we can place the shadow at the _end_ of memory, but this
> + requires knowing how much contiguous physical memory a system has _at compile
> + time_. This is a big hammer, and has some unfortunate consequences: inablity
> + to handle discontiguous physical memory, total failure to boot on machines
> + with less memory than specified, and that machines with more memory than
> + specified can't use it. This was deemed unacceptable.
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 4232d3f539c8..04aa817d1c5a 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -118,6 +118,7 @@ config PPC
> # Please keep this list sorted alphabetically.
> #
> select ARCH_32BIT_OFF_T if PPC32
> + select ARCH_DISABLE_KASAN_INLINE if PPC_RADIX_MMU
> select ARCH_HAS_DEBUG_VIRTUAL
> select ARCH_HAS_DEVMEM_IS_ALLOWED
> select ARCH_HAS_ELF_RANDOMIZE
> @@ -183,7 +184,8 @@ config PPC
> select HAVE_ARCH_HUGE_VMAP if PPC_BOOK3S_64 && PPC_RADIX_MMU
> select HAVE_ARCH_JUMP_LABEL
> select HAVE_ARCH_KASAN if PPC32 && PPC_PAGE_SHIFT <= 14
> - select HAVE_ARCH_KASAN_VMALLOC if PPC32 && PPC_PAGE_SHIFT <= 14
> + select HAVE_ARCH_KASAN if PPC_RADIX_MMU
> + select HAVE_ARCH_KASAN_VMALLOC if HAVE_ARCH_KASAN
> select HAVE_ARCH_KGDB
> select HAVE_ARCH_MMAP_RND_BITS
> select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index ae084357994e..195f7845f41a 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -398,4 +398,5 @@ config PPC_FAST_ENDIAN_SWITCH
> config KASAN_SHADOW_OFFSET
> hex
> depends on KASAN
> - default 0xe0000000
> + default 0xe0000000 if PPC32
> + default 0xa80e000000000000 if PPC64
> diff --git a/arch/powerpc/include/asm/book3s/64/hash.h b/arch/powerpc/include/asm/book3s/64/hash.h
> index d959b0195ad9..222669864ff6 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash.h
> @@ -18,6 +18,10 @@
> #include <asm/book3s/64/hash-4k.h>
> #endif
>
> +#define H_PTRS_PER_PTE (1 << H_PTE_INDEX_SIZE)
> +#define H_PTRS_PER_PMD (1 << H_PMD_INDEX_SIZE)
> +#define H_PTRS_PER_PUD (1 << H_PUD_INDEX_SIZE)
> +
> /* Bits to set in a PMD/PUD/PGD entry valid bit*/
> #define HASH_PMD_VAL_BITS (0x8000000000000000UL)
> #define HASH_PUD_VAL_BITS (0x8000000000000000UL)
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index 058601efbc8a..7598a5b055bd 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -230,6 +230,10 @@ extern unsigned long __pmd_frag_size_shift;
> #define PTRS_PER_PUD (1 << PUD_INDEX_SIZE)
> #define PTRS_PER_PGD (1 << PGD_INDEX_SIZE)
>
> +#define MAX_PTRS_PER_PTE ((H_PTRS_PER_PTE > R_PTRS_PER_PTE) ? H_PTRS_PER_PTE : R_PTRS_PER_PTE)
> +#define MAX_PTRS_PER_PMD ((H_PTRS_PER_PMD > R_PTRS_PER_PMD) ? H_PTRS_PER_PMD : R_PTRS_PER_PMD)
> +#define MAX_PTRS_PER_PUD ((H_PTRS_PER_PUD > R_PTRS_PER_PUD) ? H_PTRS_PER_PUD : R_PTRS_PER_PUD)
> +
> /* PMD_SHIFT determines what a second-level page table entry can map */
> #define PMD_SHIFT (PAGE_SHIFT + PTE_INDEX_SIZE)
> #define PMD_SIZE (1UL << PMD_SHIFT)
> diff --git a/arch/powerpc/include/asm/book3s/64/radix.h b/arch/powerpc/include/asm/book3s/64/radix.h
> index c7813dc628fc..b3492b80f858 100644
> --- a/arch/powerpc/include/asm/book3s/64/radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/radix.h
> @@ -35,6 +35,11 @@
> #define RADIX_PMD_SHIFT (PAGE_SHIFT + RADIX_PTE_INDEX_SIZE)
> #define RADIX_PUD_SHIFT (RADIX_PMD_SHIFT + RADIX_PMD_INDEX_SIZE)
> #define RADIX_PGD_SHIFT (RADIX_PUD_SHIFT + RADIX_PUD_INDEX_SIZE)
> +
> +#define R_PTRS_PER_PTE (1 << RADIX_PTE_INDEX_SIZE)
> +#define R_PTRS_PER_PMD (1 << RADIX_PMD_INDEX_SIZE)
> +#define R_PTRS_PER_PUD (1 << RADIX_PUD_INDEX_SIZE)
> +
> /*
> * Size of EA range mapped by our pagetables.
> */
> @@ -68,11 +73,11 @@
> *
> *
> * 3rd quadrant expanded:
> - * +------------------------------+
> + * +------------------------------+ Highest address (0xc010000000000000)
> + * +------------------------------+ KASAN shadow end (0xc00fc00000000000)
> * | |
> * | |
> - * | |
> - * +------------------------------+ Kernel vmemmap end (0xc010000000000000)
> + * +------------------------------+ Kernel vmemmap end/shadow start (0xc00e000000000000)
> * | |
> * | 512TB |
> * | |
> @@ -126,6 +131,8 @@
> #define RADIX_VMEMMAP_SIZE RADIX_KERN_MAP_SIZE
> #define RADIX_VMEMMAP_END (RADIX_VMEMMAP_START + RADIX_VMEMMAP_SIZE)
>
> +/* For the sizes of the shadow area, see kasan.h */
> +
> #ifndef __ASSEMBLY__
> #define RADIX_PTE_TABLE_SIZE (sizeof(pte_t) << RADIX_PTE_INDEX_SIZE)
> #define RADIX_PMD_TABLE_SIZE (sizeof(pmd_t) << RADIX_PMD_INDEX_SIZE)
> diff --git a/arch/powerpc/include/asm/kasan.h b/arch/powerpc/include/asm/kasan.h
> index 7355ed05e65e..df946165812d 100644
> --- a/arch/powerpc/include/asm/kasan.h
> +++ b/arch/powerpc/include/asm/kasan.h
> @@ -30,9 +30,31 @@
>
> #define KASAN_SHADOW_OFFSET ASM_CONST(CONFIG_KASAN_SHADOW_OFFSET)
>
> +#ifdef CONFIG_PPC32
> #define KASAN_SHADOW_END (-(-KASAN_SHADOW_START >> KASAN_SHADOW_SCALE_SHIFT))
> +#endif
>
> #ifdef CONFIG_KASAN
> +#ifdef CONFIG_PPC_BOOK3S_64
> +/*
> + * The shadow ends before the highest accessible address
> + * because we don't need a shadow for the shadow. Instead:
> + * c00e000000000000 << 3 + a80e000000000000000 = c00fc00000000000
The comment has one extra 0 in a80e.., I did the math and had to use
the data from the defines :)
> + */
> +#define KASAN_SHADOW_END 0xc00fc00000000000UL
> +
> +DECLARE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static __always_inline bool kasan_arch_is_ready(void)
> +{
> + if (static_branch_likely(&powerpc_kasan_enabled_key))
> + return true;
> + return false;
> +}
> +
> +#define kasan_arch_is_ready kasan_arch_is_ready
> +#endif
> +
> void kasan_early_init(void);
> void kasan_mmu_init(void);
> void kasan_init(void);
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 6084fa499aa3..163755b1cef4 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -32,6 +32,17 @@ KASAN_SANITIZE_early_32.o := n
> KASAN_SANITIZE_cputable.o := n
> KASAN_SANITIZE_prom_init.o := n
> KASAN_SANITIZE_btext.o := n
> +KASAN_SANITIZE_paca.o := n
> +KASAN_SANITIZE_setup_64.o := n
> +KASAN_SANITIZE_mce.o := n
> +KASAN_SANITIZE_mce_power.o := n
> +
> +# we have to be particularly careful in ppc64 to exclude code that
> +# runs with translations off, as we cannot access the shadow with
> +# translations off. However, ppc32 can sanitize this.
> +ifdef CONFIG_PPC64
> +KASAN_SANITIZE_traps.o := n
> +endif
>
> ifdef CONFIG_KASAN
> CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 3231c2df9e26..d4ae21b9e9b7 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2160,8 +2160,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> break;
>
> stack = (unsigned long *) sp;
> - newsp = stack[0];
> - ip = stack[STACK_FRAME_LR_SAVE];
> + newsp = READ_ONCE_NOCHECK(stack[0]);
> + ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
> if (!firstframe || ip != lr) {
> printk("%s["REG"] ["REG"] %pS",
> loglvl, sp, ip, (void *)ip);
> @@ -2179,17 +2179,19 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> * See if this is an exception frame.
> * We look for the "regshere" marker in the current frame.
> */
> - if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
> - && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
> + if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
> + (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
> + STACK_FRAME_REGS_MARKER)) {
> struct pt_regs *regs = (struct pt_regs *)
> (sp + STACK_FRAME_OVERHEAD);
>
> - lr = regs->link;
> + lr = READ_ONCE_NOCHECK(regs->link);
> printk("%s--- interrupt: %lx at %pS\n",
> - loglvl, regs->trap, (void *)regs->nip);
> + loglvl, READ_ONCE_NOCHECK(regs->trap),
> + (void *)READ_ONCE_NOCHECK(regs->nip));
> __show_regs(regs);
> printk("%s--- interrupt: %lx\n",
> - loglvl, regs->trap);
> + loglvl, READ_ONCE_NOCHECK(regs->trap));
>
> firstframe = 1;
> }
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 2bfeaa13befb..7f1592dacbeb 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -136,3 +136,8 @@ obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
> obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
>
> obj-y += $(kvm-book3s_64-builtin-objs-y)
> +
> +# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
> +ifdef CONFIG_PPC_BOOK3S_64
> +KASAN_SANITIZE := n
> +endif
> diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
> index 1b56d3af47d4..a7d8a68bd2c5 100644
> --- a/arch/powerpc/mm/book3s64/Makefile
> +++ b/arch/powerpc/mm/book3s64/Makefile
> @@ -21,3 +21,12 @@ obj-$(CONFIG_PPC_PKEY) += pkeys.o
>
> # Instrumenting the SLB fault path can lead to duplicate SLB entries
> KCOV_INSTRUMENT_slb.o := n
> +
> +# Parts of these can run in real mode and therefore are
> +# not safe with the current outline KASAN implementation
> +KASAN_SANITIZE_mmu_context.o := n
> +KASAN_SANITIZE_pgtable.o := n
> +KASAN_SANITIZE_radix_pgtable.o := n
> +KASAN_SANITIZE_radix_tlb.o := n
> +KASAN_SANITIZE_slb.o := n
> +KASAN_SANITIZE_pkeys.o := n
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index 42fb628a44fd..07eef87abd6c 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
> obj-$(CONFIG_PPC32) += init_32.o
> obj-$(CONFIG_PPC_8xx) += 8xx.o
> obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
> +obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..ca913ed951a2
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019-2020 IBM Corporation
> + * Author: Daniel Axtens <[email protected]>
> + */
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <linux/memblock.h>
> +#include <asm/pgalloc.h>
> +
> +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static void __init kasan_init_phys_region(void *start, void *end)
> +{
> + unsigned long k_start, k_end, k_cur;
> + void *va;
> +
> + if (start >= end)
> + return;
> +
> + k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
> + k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
> +
> + va = memblock_alloc(k_end - k_start, PAGE_SIZE);
> + for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
> + map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> +}
> +
> +void __init kasan_init(void)
> +{
> + /*
> + * We want to do the following things:
> + * 1) Map real memory into the shadow for all physical memblocks
> + * This takes us from c000... to c008...
> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> + * will manage this for us.
> + * This takes us from c008... to c00a...
> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> + * This takes us up to where we start at c00e...
> + */
> +
assuming we have
#define VMEMMAP_END R_VMEMMAP_END
and ditto for hash we probably need
BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
Looks good otherwise, I've not been able to test it yet
Balbir Singh.
Balbir Singh <[email protected]> writes:
> On Sat, Mar 20, 2021 at 01:40:53AM +1100, Daniel Axtens wrote:
>> For annoying architectural reasons, it's very difficult to support inline
>> instrumentation on powerpc64.
>
> I think we can expand here and talk about how in hash mode, the vmalloc
> address space is in a region of memory different than where kernel virtual
> addresses are mapped. Did I recollect the reason correctly?
I think that's _a_ reason, but for radix mode (which is all I support at
the moment), the reason is a bit simpler. We call into generic code like
the DT parser and printk when we have translations off. The shadow
region lives at c00e.... which is not part of the linear mapping, so if
you try to access the shadow while in real mode you will access unmapped
memory and (at least on PowerNV) take a machine check.
>>
>> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
>> annoying to be 'backwards', but I'm not aware of any way to have
>> an arch force a symbol to be 'n', rather than 'y'.)
>>
>> We also disable stack instrumentation in this case as it does things that
>> are functionally equivalent to inline instrumentation, namely adding
>> code that touches the shadow directly without going through a C helper.
>>
>> Signed-off-by: Daniel Axtens <[email protected]>
>> ---
>> lib/Kconfig.kasan | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
>> index cffc2ebbf185..7e237dbb6df3 100644
>> --- a/lib/Kconfig.kasan
>> +++ b/lib/Kconfig.kasan
>> @@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
>> config HAVE_ARCH_KASAN_VMALLOC
>> bool
>>
>> +config ARCH_DISABLE_KASAN_INLINE
>> + def_bool n
>> +
>
> Some comments on what arch's want to disable kasan inline would
> be helpful and why.
Sure, added.
Kind regards,
Daniel
Hi Balbir,
> Could you highlight the changes from
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/?
>
> Feel free to use my signed-off-by if you need to and add/update copyright
> headers if appropriate.
There's not really anything in common any more:
- ppc32 KASAN landed, so there was already a kasan.h for powerpc, the
explicit memcpy changes, the support for non-instrumented files,
prom_check.sh, etc. all already landed.
- I locate the shadow region differently and don't resize any virtual
memory areas.
- The ARCH_DEFINES_KASAN_ZERO_PTE handling changed upstream and our
handling for that is now handled more by patch 3.
- The outline hook is now an inline function rather than a #define.
- The init function has been totally rewritten as it's gone from
supporting real mode to not supporting real mode and back.
- The list of non-instrumented files has grown a lot.
- There's new stuff: stack walking is now safe, KASAN vmalloc support
means modules are better supported now, ptdump works, and there's
documentation.
It's been a while now, but I don't think when I started this process 2
years ago that I directly reused much of your code. So I'm not sure that
a signed-off-by makes sense here? Would a different tag (Originally-by?)
make more sense?
>> + * The shadow ends before the highest accessible address
>> + * because we don't need a shadow for the shadow. Instead:
>> + * c00e000000000000 << 3 + a80e 0000 0000 0000 000 = c00fc00000000000
>
> The comment has one extra 0 in a80e.., I did the math and had to use
> the data from the defines :)
3 extra 0s, even! Fixed.
>> +void __init kasan_init(void)
>> +{
>> + /*
>> + * We want to do the following things:
>> + * 1) Map real memory into the shadow for all physical memblocks
>> + * This takes us from c000... to c008...
>> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
>> + * will manage this for us.
>> + * This takes us from c008... to c00a...
>> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
>> + * This takes us up to where we start at c00e...
>> + */
>> +
>
> assuming we have
> #define VMEMMAP_END R_VMEMMAP_END
> and ditto for hash we probably need
>
> BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
Sorry, I'm not sure what this is supposed to be testing? In what
situation would this trigger?
Kind regards,
Daniel
>
> Looks good otherwise, I've not been able to test it yet
>
> Balbir Singh.
On Mon, Mar 22, 2021 at 11:55:08AM +1100, Daniel Axtens wrote:
> Hi Balbir,
>
> > Could you highlight the changes from
> > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/?
> >
> > Feel free to use my signed-off-by if you need to and add/update copyright
> > headers if appropriate.
>
> There's not really anything in common any more:
>
> - ppc32 KASAN landed, so there was already a kasan.h for powerpc, the
> explicit memcpy changes, the support for non-instrumented files,
> prom_check.sh, etc. all already landed.
>
> - I locate the shadow region differently and don't resize any virtual
> memory areas.
>
> - The ARCH_DEFINES_KASAN_ZERO_PTE handling changed upstream and our
> handling for that is now handled more by patch 3.
>
> - The outline hook is now an inline function rather than a #define.
>
> - The init function has been totally rewritten as it's gone from
> supporting real mode to not supporting real mode and back.
>
> - The list of non-instrumented files has grown a lot.
>
> - There's new stuff: stack walking is now safe, KASAN vmalloc support
> means modules are better supported now, ptdump works, and there's
> documentation.
>
> It's been a while now, but I don't think when I started this process 2
> years ago that I directly reused much of your code. So I'm not sure that
> a signed-off-by makes sense here? Would a different tag (Originally-by?)
> make more sense?
>
Sure
> >> + * The shadow ends before the highest accessible address
> >> + * because we don't need a shadow for the shadow. Instead:
> >> + * c00e000000000000 << 3 + a80e 0000 0000 0000 000 = c00fc00000000000
> >
> > The comment has one extra 0 in a80e.., I did the math and had to use
> > the data from the defines :)
>
> 3 extra 0s, even! Fixed.
>
> >> +void __init kasan_init(void)
> >> +{
> >> + /*
> >> + * We want to do the following things:
> >> + * 1) Map real memory into the shadow for all physical memblocks
> >> + * This takes us from c000... to c008...
> >> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> >> + * will manage this for us.
> >> + * This takes us from c008... to c00a...
> >> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> >> + * This takes us up to where we start at c00e...
> >> + */
> >> +
> >
> > assuming we have
> > #define VMEMMAP_END R_VMEMMAP_END
> > and ditto for hash we probably need
> >
> > BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>
> Sorry, I'm not sure what this is supposed to be testing? In what
> situation would this trigger?
>
I am bit concerned that we have hard coded (IIR) 0xa80e... in the
config, any changes to VMEMMAP_END, KASAN_SHADOW_OFFSET/END
should be guarded.
Balbir Singh.
Balbir Singh <[email protected]> writes:
> On Mon, Mar 22, 2021 at 11:55:08AM +1100, Daniel Axtens wrote:
>> Hi Balbir,
>>
>> > Could you highlight the changes from
>> > https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/?
>> >
>> > Feel free to use my signed-off-by if you need to and add/update copyright
>> > headers if appropriate.
>>
>> There's not really anything in common any more:
>>
>> - ppc32 KASAN landed, so there was already a kasan.h for powerpc, the
>> explicit memcpy changes, the support for non-instrumented files,
>> prom_check.sh, etc. all already landed.
>>
>> - I locate the shadow region differently and don't resize any virtual
>> memory areas.
>>
>> - The ARCH_DEFINES_KASAN_ZERO_PTE handling changed upstream and our
>> handling for that is now handled more by patch 3.
>>
>> - The outline hook is now an inline function rather than a #define.
>>
>> - The init function has been totally rewritten as it's gone from
>> supporting real mode to not supporting real mode and back.
>>
>> - The list of non-instrumented files has grown a lot.
>>
>> - There's new stuff: stack walking is now safe, KASAN vmalloc support
>> means modules are better supported now, ptdump works, and there's
>> documentation.
>>
>> It's been a while now, but I don't think when I started this process 2
>> years ago that I directly reused much of your code. So I'm not sure that
>> a signed-off-by makes sense here? Would a different tag (Originally-by?)
>> make more sense?
>>
>
> Sure
Will do.
>
>> >> + * The shadow ends before the highest accessible address
>> >> + * because we don't need a shadow for the shadow. Instead:
>> >> + * c00e000000000000 << 3 + a80e 0000 0000 0000 000 = c00fc00000000000
>> >
>> > The comment has one extra 0 in a80e.., I did the math and had to use
>> > the data from the defines :)
>>
>> 3 extra 0s, even! Fixed.
>>
>> >> +void __init kasan_init(void)
>> >> +{
>> >> + /*
>> >> + * We want to do the following things:
>> >> + * 1) Map real memory into the shadow for all physical memblocks
>> >> + * This takes us from c000... to c008...
>> >> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
>> >> + * will manage this for us.
>> >> + * This takes us from c008... to c00a...
>> >> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
>> >> + * This takes us up to where we start at c00e...
>> >> + */
>> >> +
>> >
>> > assuming we have
>> > #define VMEMMAP_END R_VMEMMAP_END
>> > and ditto for hash we probably need
>> >
>> > BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>>
>> Sorry, I'm not sure what this is supposed to be testing? In what
>> situation would this trigger?
>>
>
> I am bit concerned that we have hard coded (IIR) 0xa80e... in the
> config, any changes to VMEMMAP_END, KASAN_SHADOW_OFFSET/END
> should be guarded.
>
Ah that makes sense. I'll come up with some test that should catch any
unsynchronised changes to VMEMMAP_END, KASAN_SHADOW_OFFSET or
KASAN_SHADOW_END.
Kind regards,
Daniel Axtens
> Balbir Singh.
On Fri, 19 Mar 2021 at 15:41, Daniel Axtens <[email protected]> wrote:
> Allow architectures to define a kasan_arch_is_ready() hook that bails
> out of any function that's about to touch the shadow unless the arch
> says that it is ready for the memory to be accessed. This is fairly
> uninvasive and should have a negligible performance penalty.
>
> This will only work in outline mode, so an arch must specify
> ARCH_DISABLE_KASAN_INLINE if it requires this.
>
> Cc: Balbir Singh <[email protected]>
> Cc: Aneesh Kumar K.V <[email protected]>
> Suggested-by: Christophe Leroy <[email protected]>
> Signed-off-by: Daniel Axtens <[email protected]>
>
> --
>
> I discuss the justfication for this later in the series. Also,
> both previous RFCs for ppc64 - by 2 different people - have
> needed this trick! See:
> - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
> - https://patchwork.ozlabs.org/patch/795211/ # ppc radix series
> ---
> include/linux/kasan.h | 4 ++++
> mm/kasan/common.c | 4 ++++
> mm/kasan/generic.c | 3 +++
> mm/kasan/shadow.c | 4 ++++
> 4 files changed, 15 insertions(+)
>
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 8b3b99d659b7..6bd8343f0033 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
Does kasan_arch_is_ready() need to be defined in the public interface
of KASAN? Could it instead be moved to mm/kasan/kasan.h?
> @@ -23,6 +23,10 @@ struct kunit_kasan_expectation {
>
> #endif
>
> +#ifndef kasan_arch_is_ready
> +static inline bool kasan_arch_is_ready(void) { return true; }
> +#endif
> +
> #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)
>
> #include <linux/pgtable.h>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 6bb87f2acd4e..f23a9e2dce9f 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -345,6 +345,10 @@ static inline bool ____kasan_slab_free(struct kmem_cache *cache, void *object,
> if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
> return false;
>
> + /* We can't read the shadow byte if the arch isn't ready */
> + if (!kasan_arch_is_ready())
> + return false;
> +
While it probably doesn't matter much, it seems this check could be
moved up, rather than having it in the middle here.
> if (!kasan_byte_accessible(tagged_object)) {
> kasan_report_invalid_free(tagged_object, ip);
> return true;
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index 53cbf28859b5..c3f5ba7a294a 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned long addr,
> size_t size, bool write,
> unsigned long ret_ip)
> {
> + if (!kasan_arch_is_ready())
> + return true;
> +
> if (unlikely(size == 0))
> return true;
>
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 727ad4629173..1f650c521037 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -80,6 +80,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, bool init)
> */
> addr = kasan_reset_tag(addr);
>
> + /* Don't touch the shadow memory if arch isn't ready */
> + if (!kasan_arch_is_ready())
> + return;
> +
> /* Skip KFENCE memory if called explicitly outside of sl*b. */
> if (is_kfence_address(addr))
> return;
> --
> 2.27.0
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20210319144058.772525-3-dja%40axtens.net.
On Fri, 19 Mar 2021 at 15:41, Daniel Axtens <[email protected]> wrote:
>
> For annoying architectural reasons, it's very difficult to support inline
> instrumentation on powerpc64.
>
> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
> annoying to be 'backwards', but I'm not aware of any way to have
> an arch force a symbol to be 'n', rather than 'y'.)
>
> We also disable stack instrumentation in this case as it does things that
> are functionally equivalent to inline instrumentation, namely adding
> code that touches the shadow directly without going through a C helper.
>
> Signed-off-by: Daniel Axtens <[email protected]>
> ---
> lib/Kconfig.kasan | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index cffc2ebbf185..7e237dbb6df3 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -12,6 +12,9 @@ config HAVE_ARCH_KASAN_HW_TAGS
> config HAVE_ARCH_KASAN_VMALLOC
> bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> + def_bool n
> +
Does just "bool" work here?
> config CC_HAS_KASAN_GENERIC
> def_bool $(cc-option, -fsanitize=kernel-address)
>
> @@ -130,6 +133,7 @@ config KASAN_OUTLINE
>
> config KASAN_INLINE
> bool "Inline instrumentation"
> + depends on !ARCH_DISABLE_KASAN_INLINE
> help
> Compiler directly inserts code checking shadow memory before
> memory accesses. This is faster than outline (in some workloads
> @@ -142,6 +146,7 @@ config KASAN_STACK
> bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && !COMPILE_TEST
> depends on KASAN_GENERIC || KASAN_SW_TAGS
> default y if CC_IS_GCC
> + depends on !ARCH_DISABLE_KASAN_INLINE
Minor, but perhaps this 'depends on' line could be moved up 1 line to
be grouped with the other 'depends on'.
> help
> The LLVM stack address sanitizer has a know problem that
> causes excessive stack usage in a lot of functions, see
> @@ -154,6 +159,9 @@ config KASAN_STACK
> but clang users can still enable it for builds without
> CONFIG_COMPILE_TEST. On gcc it is assumed to always be safe
> to use and enabled by default.
> + If the architecture disables inline instrumentation, this is
> + also disabled as it adds inline-style instrumentation that
> + is run unconditionally.
>
> config KASAN_SW_TAGS_IDENTIFY
> bool "Enable memory corruption identification"
> --
> 2.27.0
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20210319144058.772525-2-dja%40axtens.net.
Daniel Axtens <[email protected]> writes:
> Balbir Singh <[email protected]> writes:
>
>> On Sat, Mar 20, 2021 at 01:40:53AM +1100, Daniel Axtens wrote:
>>> For annoying architectural reasons, it's very difficult to support inline
>>> instrumentation on powerpc64.
>>
>> I think we can expand here and talk about how in hash mode, the vmalloc
>> address space is in a region of memory different than where kernel virtual
>> addresses are mapped. Did I recollect the reason correctly?
>
> I think that's _a_ reason, but for radix mode (which is all I support at
> the moment), the reason is a bit simpler.
Actually Aneesh fixed that in:
0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 0xc range")
The problem we had prior to that was that the linear mapping was at
(0xc << 60), vmalloc was at (0xd << 60), and vmemap was at (0xf << 60).
Meaning our shadow region would need to be more than (3 << 60) in size.
cheers
Le 19/03/2021 à 15:40, Daniel Axtens a écrit :
> Building on the work of Christophe, Aneesh and Balbir, I've ported
> KASAN to 64-bit Book3S kernels running on the Radix MMU.
>
> v11 applies to next-20210317. I had hoped to have it apply to
> powerpc/next but once again there are changes in the kasan core that
> clash. Also, thanks to mpe for fixing a build break with KASAN off.
>
> I'm not sure how best to progress this towards actually being merged
> when it has impacts across subsystems. I'd appreciate any input. Maybe
> the first four patches could go in via the kasan tree, that should
> make things easier for powerpc in a future cycle?
>
> v10 rebases on top of next-20210125, fixing things up to work on top
> of the latest changes, and fixing some review comments from
> Christophe. I have tested host and guest with 64k pages for this spin.
>
> There is now only 1 failing KUnit test: kasan_global_oob - gcc puts
> the ASAN init code in a section called '.init_array'. Powerpc64 module
> loading code goes through and _renames_ any section beginning with
> '.init' to begin with '_init' in order to avoid some complexities
> around our 24-bit indirect jumps. This means it renames '.init_array'
> to '_init_array', and the generic module loading code then fails to
> recognise the section as a constructor and thus doesn't run it. This
> hack dates back to 2003 and so I'm not going to try to unpick it in
> this series. (I suspect this may have previously worked if the code
> ended up in .ctors rather than .init_array but I don't keep my old
> binaries around so I have no real way of checking.)
>
> (The previously failing stack tests are now skipped due to more
> accurate configuration settings.)
>
> Details from v9: This is a significant reworking of the previous
> versions. Instead of the previous approach which supported inline
> instrumentation, this series provides only outline instrumentation.
>
> To get around the problem of accessing the shadow region inside code we run
> with translations off (in 'real mode'), we we restrict checking to when
> translations are enabled. This is done via a new hook in the kasan core and
> by excluding larger quantites of arch code from instrumentation. The upside
> is that we no longer require that you be able to specify the amount of
> physically contiguous memory on the system at compile time. Hopefully this
> is a better trade-off. More details in patch 6.
>
> kexec works. Both 64k and 4k pages work. Running as a KVM host works, but
> nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
> fragile - if any real mode code paths call out to instrumented code, things
> will go boom.
>
In the discussion we had long time ago,
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/#2321067
, I challenged you on why it was not possible to implement things the same way as other
architectures, in extenso with an early mapping.
Your first answer was that too many things were done in real mode at startup. After some discussion
you said that finally there was not that much things at startup but the issue was KVM.
Now you say that instrumentation on KVM is fully disabled.
So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow
? Then you could also support inline instrumentation.
Christophe
Le 19/03/2021 à 15:40, Daniel Axtens a écrit :
> Implement a limited form of KASAN for Book3S 64-bit machines running under
> the Radix MMU, supporting only outline mode.
>
> - Enable the compiler instrumentation to check addresses and maintain the
> shadow region. (This is the guts of KASAN which we can easily reuse.)
>
> - Require kasan-vmalloc support to handle modules and anything else in
> vmalloc space.
>
> - KASAN needs to be able to validate all pointer accesses, but we can't
> instrument all kernel addresses - only linear map and vmalloc. On boot,
> set up a single page of read-only shadow that marks all iomap and
> vmemmap accesses as valid.
>
> - Make our stack-walking code KASAN-safe by using READ_ONCE_NOCHECK -
> generic code, arm64, s390 and x86 all do this for similar sorts of
> reasons: when unwinding a stack, we might touch memory that KASAN has
> marked as being out-of-bounds. In our case we often get this when
> checking for an exception frame because we're checking an arbitrary
> offset into the stack frame.
>
> See commit 20955746320e ("s390/kasan: avoid false positives during stack
> unwind"), commit bcaf669b4bdb ("arm64: disable kasan when accessing
> frame->fp in unwind_frame"), commit 91e08ab0c851 ("x86/dumpstack:
> Prevent KASAN false positive warnings") and commit 6e22c8366416
> ("tracing, kasan: Silence Kasan warning in check_stack of stack_tracer")
>
> - Document KASAN in both generic and powerpc docs.
>
> Background
> ----------
>
> KASAN support on Book3S is a bit tricky to get right:
>
> - It would be good to support inline instrumentation so as to be able to
> catch stack issues that cannot be caught with outline mode.
>
> - Inline instrumentation requires a fixed offset.
>
> - Book3S runs code with translations off ("real mode") during boot,
> including a lot of generic device-tree parsing code which is used to
> determine MMU features.
>
> [ppc64 mm note: The kernel installs a linear mapping at effective
> address c000...-c008.... This is a one-to-one mapping with physical
> memory from 0000... onward. Because of how memory accesses work on
> powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
> same memory both with translations on (accessing as an 'effective
> address'), and with translations off (accessing as a 'real
> address'). This works in both guests and the hypervisor. For more
> details, see s5.7 of Book III of version 3 of the ISA, in particular
> the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
> KASAN implementation currently only supports Radix.]
>
> - Some code - most notably a lot of KVM code - also runs with translations
> off after boot.
>
> - Therefore any offset has to point to memory that is valid with
> translations on or off.
>
> One approach is just to give up on inline instrumentation. This way
> boot-time checks can be delayed until after the MMU is set is up, and we
> can just not instrument any code that runs with translations off after
> booting. Take this approach for now and require outline instrumentation.
>
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and it had to be specified at compile time. Maybe we can do better in
> the future.
>
> Cc: Balbir Singh <[email protected]> # ppc64 out-of-line radix version
> Cc: Aneesh Kumar K.V <[email protected]> # ppc64 hash version
> Cc: Christophe Leroy <[email protected]> # ppc32 version
> Signed-off-by: Daniel Axtens <[email protected]>
> ---
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index 6084fa499aa3..163755b1cef4 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -32,6 +32,17 @@ KASAN_SANITIZE_early_32.o := n
> KASAN_SANITIZE_cputable.o := n
> KASAN_SANITIZE_prom_init.o := n
> KASAN_SANITIZE_btext.o := n
> +KASAN_SANITIZE_paca.o := n
> +KASAN_SANITIZE_setup_64.o := n
> +KASAN_SANITIZE_mce.o := n
> +KASAN_SANITIZE_mce_power.o := n
> +
> +# we have to be particularly careful in ppc64 to exclude code that
> +# runs with translations off, as we cannot access the shadow with
> +# translations off. However, ppc32 can sanitize this.
Which functions of this file can run with translations off on PPC64 ?
On PPC32 no functions run with translation off.
> +ifdef CONFIG_PPC64
> +KASAN_SANITIZE_traps.o := n
> +endif
>
> ifdef CONFIG_KASAN
> CFLAGS_early_32.o += -DDISABLE_BRANCH_PROFILING
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index 3231c2df9e26..d4ae21b9e9b7 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2160,8 +2160,8 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> break;
>
> stack = (unsigned long *) sp;
> - newsp = stack[0];
> - ip = stack[STACK_FRAME_LR_SAVE];
> + newsp = READ_ONCE_NOCHECK(stack[0]);
> + ip = READ_ONCE_NOCHECK(stack[STACK_FRAME_LR_SAVE]);
> if (!firstframe || ip != lr) {
> printk("%s["REG"] ["REG"] %pS",
> loglvl, sp, ip, (void *)ip);
> @@ -2179,17 +2179,19 @@ void show_stack(struct task_struct *tsk, unsigned long *stack,
> * See if this is an exception frame.
> * We look for the "regshere" marker in the current frame.
> */
> - if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS)
> - && stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
> + if (validate_sp(sp, tsk, STACK_FRAME_WITH_PT_REGS) &&
> + (READ_ONCE_NOCHECK(stack[STACK_FRAME_MARKER]) ==
> + STACK_FRAME_REGS_MARKER)) {
> struct pt_regs *regs = (struct pt_regs *)
> (sp + STACK_FRAME_OVERHEAD);
>
> - lr = regs->link;
> + lr = READ_ONCE_NOCHECK(regs->link);
> printk("%s--- interrupt: %lx at %pS\n",
> - loglvl, regs->trap, (void *)regs->nip);
> + loglvl, READ_ONCE_NOCHECK(regs->trap),
> + (void *)READ_ONCE_NOCHECK(regs->nip));
> __show_regs(regs);
> printk("%s--- interrupt: %lx\n",
> - loglvl, regs->trap);
> + loglvl, READ_ONCE_NOCHECK(regs->trap));
>
> firstframe = 1;
> }
All changes in that file look more as a bug fix than a thing special for PPC64 KASAN. Could it be a
separate patch ?
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 2bfeaa13befb..7f1592dacbeb 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -136,3 +136,8 @@ obj-$(CONFIG_KVM_BOOK3S_64_PR) += kvm-pr.o
> obj-$(CONFIG_KVM_BOOK3S_64_HV) += kvm-hv.o
>
> obj-y += $(kvm-book3s_64-builtin-objs-y)
> +
> +# KVM does a lot in real-mode, and 64-bit Book3S KASAN doesn't support that
> +ifdef CONFIG_PPC_BOOK3S_64
> +KASAN_SANITIZE := n
> +endif
> diff --git a/arch/powerpc/mm/book3s64/Makefile b/arch/powerpc/mm/book3s64/Makefile
> index 1b56d3af47d4..a7d8a68bd2c5 100644
> --- a/arch/powerpc/mm/book3s64/Makefile
> +++ b/arch/powerpc/mm/book3s64/Makefile
> @@ -21,3 +21,12 @@ obj-$(CONFIG_PPC_PKEY) += pkeys.o
>
> # Instrumenting the SLB fault path can lead to duplicate SLB entries
> KCOV_INSTRUMENT_slb.o := n
> +
> +# Parts of these can run in real mode and therefore are
> +# not safe with the current outline KASAN implementation
> +KASAN_SANITIZE_mmu_context.o := n
> +KASAN_SANITIZE_pgtable.o := n
> +KASAN_SANITIZE_radix_pgtable.o := n
> +KASAN_SANITIZE_radix_tlb.o := n
> +KASAN_SANITIZE_slb.o := n
> +KASAN_SANITIZE_pkeys.o := n
> diff --git a/arch/powerpc/mm/kasan/Makefile b/arch/powerpc/mm/kasan/Makefile
> index 42fb628a44fd..07eef87abd6c 100644
> --- a/arch/powerpc/mm/kasan/Makefile
> +++ b/arch/powerpc/mm/kasan/Makefile
> @@ -5,3 +5,4 @@ KASAN_SANITIZE := n
> obj-$(CONFIG_PPC32) += init_32.o
> obj-$(CONFIG_PPC_8xx) += 8xx.o
> obj-$(CONFIG_PPC_BOOK3S_32) += book3s_32.o
> +obj-$(CONFIG_PPC_BOOK3S_64) += init_book3s_64.o
> diff --git a/arch/powerpc/mm/kasan/init_book3s_64.c b/arch/powerpc/mm/kasan/init_book3s_64.c
> new file mode 100644
> index 000000000000..ca913ed951a2
> --- /dev/null
> +++ b/arch/powerpc/mm/kasan/init_book3s_64.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * KASAN for 64-bit Book3S powerpc
> + *
> + * Copyright (C) 2019-2020 IBM Corporation
> + * Author: Daniel Axtens <[email protected]>
> + */
> +
> +#define DISABLE_BRANCH_PROFILING
> +
> +#include <linux/kasan.h>
> +#include <linux/printk.h>
> +#include <linux/sched/task.h>
> +#include <linux/memblock.h>
> +#include <asm/pgalloc.h>
> +
> +DEFINE_STATIC_KEY_FALSE(powerpc_kasan_enabled_key);
> +
> +static void __init kasan_init_phys_region(void *start, void *end)
> +{
> + unsigned long k_start, k_end, k_cur;
> + void *va;
> +
> + if (start >= end)
> + return;
> +
> + k_start = ALIGN_DOWN((unsigned long)kasan_mem_to_shadow(start), PAGE_SIZE);
> + k_end = ALIGN((unsigned long)kasan_mem_to_shadow(end), PAGE_SIZE);
> +
> + va = memblock_alloc(k_end - k_start, PAGE_SIZE);
> + for (k_cur = k_start; k_cur < k_end; k_cur += PAGE_SIZE, va += PAGE_SIZE)
> + map_kernel_page(k_cur, __pa(va), PAGE_KERNEL);
> +}
> +
> +void __init kasan_init(void)
> +{
> + /*
> + * We want to do the following things:
> + * 1) Map real memory into the shadow for all physical memblocks
> + * This takes us from c000... to c008...
> + * 2) Leave a hole over the shadow of vmalloc space. KASAN_VMALLOC
> + * will manage this for us.
> + * This takes us from c008... to c00a...
> + * 3) Map the 'early shadow'/zero page over iomap and vmemmap space.
> + * This takes us up to where we start at c00e...
> + */
> +
> + void *k_start = kasan_mem_to_shadow((void *)RADIX_VMALLOC_END);
> + void *k_end = kasan_mem_to_shadow((void *)RADIX_VMEMMAP_END);
> + phys_addr_t start, end;
> + u64 i;
> + pte_t zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL);
> +
> + if (!early_radix_enabled())
> + panic("KASAN requires radix!");
> +
> + for_each_mem_range(i, &start, &end)
> + kasan_init_phys_region((void *)start, (void *)end);
> +
> + for (i = 0; i < PTRS_PER_PTE; i++)
> + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> + &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> + for (i = 0; i < PTRS_PER_PMD; i++)
> + pmd_populate_kernel(&init_mm, &kasan_early_shadow_pmd[i],
> + kasan_early_shadow_pte);
> +
> + for (i = 0; i < PTRS_PER_PUD; i++)
> + pud_populate(&init_mm, &kasan_early_shadow_pud[i],
> + kasan_early_shadow_pmd);
> +
> + /* map the early shadow over the iomap and vmemmap space */
> + kasan_populate_early_shadow(k_start, k_end);
> +
> + /* mark early shadow region as RO and wipe it */
> + zero_pte = pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL_RO);
> + for (i = 0; i < PTRS_PER_PTE; i++)
> + __set_pte_at(&init_mm, (unsigned long)kasan_early_shadow_page,
> + &kasan_early_shadow_pte[i], zero_pte, 0);
> +
> + /*
> + * clear_page relies on some cache info that hasn't been set up yet.
> + * It ends up looping ~forever and blows up other data.
> + * Use memset instead.
> + */
> + memset(kasan_early_shadow_page, 0, PAGE_SIZE);
> +
> + static_branch_inc(&powerpc_kasan_enabled_key);
> +
> + /* Enable error messages */
> + init_task.kasan_depth = 0;
> + pr_info("KASAN init done (64-bit Book3S)\n");
> +}
> +
> +void __init kasan_late_init(void) { }
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index aca354fb670b..63672aa656e8 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -20,6 +20,7 @@
> #include <linux/seq_file.h>
> #include <asm/fixmap.h>
> #include <linux/const.h>
> +#include <linux/kasan.h>
> #include <asm/page.h>
> #include <asm/hugetlb.h>
>
> @@ -317,6 +318,23 @@ static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
> unsigned long addr;
> unsigned int i;
>
> +#if defined(CONFIG_KASAN) && defined(CONFIG_PPC_BOOK3S_64)
> + /*
> + * On radix + KASAN, we want to check for the KASAN "early" shadow
> + * which covers huge quantities of memory with the same set of
> + * read-only PTEs. If it is, we want to note the first page (to see
> + * the status change), and then note the last page. This gives us good
> + * results without spending ages noting the exact same PTEs over 100s of
> + * terabytes of memory.
> + */
Could you use huge pages to map shadow memory ?
We do that on PPC32 now.
> + if (p4d_page(*p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud))) {
> + walk_pmd(st, pud, start);
> + addr = start + (PTRS_PER_PUD - 1) * PUD_SIZE;
> + walk_pmd(st, pud, addr);
> + return;
> + }
> +#endif
> +
> for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
> addr = start + i * PUD_SIZE;
> if (!pud_none(*pud) && !pud_is_leaf(*pud))
> @@ -387,11 +405,11 @@ static void populate_markers(void)
> #endif
> address_markers[i++].start_address = FIXADDR_START;
> address_markers[i++].start_address = FIXADDR_TOP;
> +#endif /* CONFIG_PPC64 */
> #ifdef CONFIG_KASAN
> address_markers[i++].start_address = KASAN_SHADOW_START;
> address_markers[i++].start_address = KASAN_SHADOW_END;
> #endif
> -#endif /* CONFIG_PPC64 */
> }
>
> static int ptdump_show(struct seq_file *m, void *v)
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index 3ce907523b1e..9063c13e7221 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -101,6 +101,7 @@ config PPC_BOOK3S_64
> select ARCH_SUPPORTS_NUMA_BALANCING
> select IRQ_WORK
> select PPC_MM_SLICES
> + select KASAN_VMALLOC if KASAN
>
> config PPC_BOOK3E_64
> bool "Embedded processors"
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index 2eb6ae150d1f..f277e4793696 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -1,4 +1,10 @@
> # SPDX-License-Identifier: GPL-2.0
> +
> +# nothing that deals with real mode is safe to KASAN
> +# in particular, idle code runs a bunch of things in real mode
> +KASAN_SANITIZE_idle.o := n
> +KASAN_SANITIZE_pci-ioda.o := n
> +
> obj-y += setup.o opal-call.o opal-wrappers.o opal.o opal-async.o
> obj-y += idle.o opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
> obj-y += rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
> diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
> index c8a2b0b05ac0..202199ef9e5c 100644
> --- a/arch/powerpc/platforms/pseries/Makefile
> +++ b/arch/powerpc/platforms/pseries/Makefile
> @@ -30,3 +30,6 @@ obj-$(CONFIG_PPC_SVM) += svm.o
> obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
>
> obj-$(CONFIG_SUSPEND) += suspend.o
> +
> +# nothing that operates in real mode is safe for KASAN
> +KASAN_SANITIZE_ras.o := n
>
Christophe
Hi Christophe,
> In the discussion we had long time ago,
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/#2321067
> , I challenged you on why it was not possible to implement things the same way as other
> architectures, in extenso with an early mapping.
>
> Your first answer was that too many things were done in real mode at startup. After some discussion
> you said that finally there was not that much things at startup but the issue was KVM.
>
> Now you say that instrumentation on KVM is fully disabled.
>
> So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow
> ? Then you could also support inline instrumentation.
Fair enough, I've had some trouble both understanding the problem myself
and clearly articulating it. Let me try again.
We need translations on to access the shadow area.
We reach setup_64.c::early_setup() with translations off. At this point
we don't know what MMU we're running under, or our CPU features.
To determine our MMU and CPU features, early_setup() calls functions
(dt_cpu_ftrs_init, early_init_devtree) that call out to generic code
like of_scan_flat_dt. We need to do this before we turn on translations
because we can't set up the MMU until we know what MMU we have.
So this puts us in a bind:
- We can't set up an early shadow until we have translations on, which
requires that the MMU is set up.
- We can't set up an MMU until we call out to generic code for FDT
parsing.
So there will be calls to generic FDT parsing code that happen before the
early shadow is set up.
The setup code also prints a bunch of information about the platform
with printk() while translations are off, so it wouldn't even be enough
to disable instrumentation for bits of the generic DT code on ppc64.
Does that make sense? If you can figure out how to 'square the circle'
here I'm all ears.
Other notes:
- There's a comment about printk() being 'safe' in early_setup(), that
refers to having a valid PACA, it doesn't mean that it's safe in any
other sense.
- KVM does indeed also run stuff with translations off but we can catch
all of that by disabling instrumentation on the real-mode handlers:
it doesn't seem to leak out to generic code. So you are right that
KVM is no longer an issue.
Kind regards,
Daniel
>
> Christophe
Le 23/03/2021 à 02:21, Daniel Axtens a écrit :
> Hi Christophe,
>
>> In the discussion we had long time ago,
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/#2321067
>> , I challenged you on why it was not possible to implement things the same way as other
>> architectures, in extenso with an early mapping.
>>
>> Your first answer was that too many things were done in real mode at startup. After some discussion
>> you said that finally there was not that much things at startup but the issue was KVM.
>>
>> Now you say that instrumentation on KVM is fully disabled.
>>
>> So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow
>> ? Then you could also support inline instrumentation.
>
> Fair enough, I've had some trouble both understanding the problem myself
> and clearly articulating it. Let me try again.
>
> We need translations on to access the shadow area.
>
> We reach setup_64.c::early_setup() with translations off. At this point
> we don't know what MMU we're running under, or our CPU features.
What do you need to know ? Whether it is Hash or Radix, or more/different details ?
IIUC, today we only support KASAN on Radix. Would it make sense to say that a kernel built with
KASAN can only run on processors having Radix capacility ? Then select CONFIG_PPC_RADIX_MMU_DEFAULT
when KASAN is set, and accept that the kernel crashes if Radix is not available ?
>
> To determine our MMU and CPU features, early_setup() calls functions
> (dt_cpu_ftrs_init, early_init_devtree) that call out to generic code
> like of_scan_flat_dt. We need to do this before we turn on translations
> because we can't set up the MMU until we know what MMU we have.
>
> So this puts us in a bind:
>
> - We can't set up an early shadow until we have translations on, which
> requires that the MMU is set up.
>
> - We can't set up an MMU until we call out to generic code for FDT
> parsing.
>
> So there will be calls to generic FDT parsing code that happen before the
> early shadow is set up.
I see some logic in kernel/prom_init.c for detecting MMU. Can we get the information from there in
order to setup the MMU ?
>
> The setup code also prints a bunch of information about the platform
> with printk() while translations are off, so it wouldn't even be enough
> to disable instrumentation for bits of the generic DT code on ppc64.
I'm sure the printk() stuff can be avoided or delayed without much problems, I guess the main
problem is the DT code, isn't it ?
As far as I can see the code only use udbg_printf() before MMU is on, and this could be simply
skipped when KASAN is selected, I see no situation where you need early printk together with KASAN.
>
> Does that make sense? If you can figure out how to 'square the circle'
> here I'm all ears.
Yes it is a lot more clear now, thanks you. Gave a few ideas above, does it help ?
>
> Other notes:
>
> - There's a comment about printk() being 'safe' in early_setup(), that
> refers to having a valid PACA, it doesn't mean that it's safe in any
> other sense.
>
> - KVM does indeed also run stuff with translations off but we can catch
> all of that by disabling instrumentation on the real-mode handlers:
> it doesn't seem to leak out to generic code. So you are right that
> KVM is no longer an issue.
>
Christophe
Christophe Leroy <[email protected]> writes:
> Le 23/03/2021 à 02:21, Daniel Axtens a écrit :
>> Hi Christophe,
>>
>>> In the discussion we had long time ago,
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/[email protected]/#2321067
>>> , I challenged you on why it was not possible to implement things the same way as other
>>> architectures, in extenso with an early mapping.
>>>
>>> Your first answer was that too many things were done in real mode at startup. After some discussion
>>> you said that finally there was not that much things at startup but the issue was KVM.
>>>
>>> Now you say that instrumentation on KVM is fully disabled.
>>>
>>> So my question is, if KVM is not a problem anymore, why not go the standard way with an early shadow
>>> ? Then you could also support inline instrumentation.
>>
>> Fair enough, I've had some trouble both understanding the problem myself
>> and clearly articulating it. Let me try again.
>>
>> We need translations on to access the shadow area.
>>
>> We reach setup_64.c::early_setup() with translations off. At this point
>> we don't know what MMU we're running under, or our CPU features.
>
> What do you need to know ? Whether it is Hash or Radix, or
> more/different details ?
Yes, as well as some other details like SLB size, supported segment &
page sizes, possibly the CPU version for workarounds, various other
device tree things.
You also need to know if you're bare metal or in a guest, or on a PS3 ...
> IIUC, today we only support KASAN on Radix. Would it make sense to say that a kernel built with
> KASAN can only run on processors having Radix capacility ? Then select CONFIG_PPC_RADIX_MMU_DEFAULT
> when KASAN is set, and accept that the kernel crashes if Radix is not available ?
I would rather not. We already have some options like that
(EARLY_DEBUG), and they have caused people to waste time debugging
crashes over the years that turned out to just due to the wrong CONFIG
selected.
>> To determine our MMU and CPU features, early_setup() calls functions
>> (dt_cpu_ftrs_init, early_init_devtree) that call out to generic code
>> like of_scan_flat_dt. We need to do this before we turn on translations
>> because we can't set up the MMU until we know what MMU we have.
>>
>> So this puts us in a bind:
>>
>> - We can't set up an early shadow until we have translations on, which
>> requires that the MMU is set up.
>>
>> - We can't set up an MMU until we call out to generic code for FDT
>> parsing.
>>
>> So there will be calls to generic FDT parsing code that happen before the
>> early shadow is set up.
>
> I see some logic in kernel/prom_init.c for detecting MMU. Can we get the information from there in
> order to setup the MMU ?
You could find some of the information, but you'd need to stash it
somewhere (like the flat device tree :P) because you can't turn the MMU
on until we shutdown open firmware.
That also doesn't help you on bare metal where we don't use prom_init.
>> The setup code also prints a bunch of information about the platform
>> with printk() while translations are off, so it wouldn't even be enough
>> to disable instrumentation for bits of the generic DT code on ppc64.
>
> I'm sure the printk() stuff can be avoided or delayed without much problems, I guess the main
> problem is the DT code, isn't it ?
We spent many years making printk() work for early boot messages,
because it has the nice property of being persisted in dmesg.
But possibly we could come up with some workaround for that.
Disabling KASAN for the flat DT code seems like it wouldn't be a huge
loss, most (all?) of that code should only run at boot anyway.
But we also have code spread out in various files that would need to be
built without KASAN. See eg. everything called by of_scan_flat_dt(),
mmu_early_init_devtree(), pseries_probe_fw_features()
pkey_early_init_devtree() etc.
Because we can only disable KASAN per-file that would require quite a
bit of code movement and related churn.
> As far as I can see the code only use udbg_printf() before MMU is on, and this could be simply
> skipped when KASAN is selected, I see no situation where you need early printk together with KASAN.
We definitely use printk() before the MMU is on.
>> Does that make sense? If you can figure out how to 'square the circle'
>> here I'm all ears.
>
> Yes it is a lot more clear now, thanks you. Gave a few ideas above,
> does it help ?
A little? :)
It's possible we could do slightly less of the current boot sequence
before turning the MMU on. But we would still need to scan the flat
device tree, so all that code would be implicated either way.
We could also rearrange the early boot code to put bits in separate
files so they can be built without KASAN, but like I said above that
would be a lot of churn.
I don't see a way to fix printk() though, other than not using it during
early boot. Maybe that's OK but it feels like a bit of a backward step.
There's also other issues, like if we WARN during early boot that causes
a program check and that runs all sorts of code, some of which would
have KASAN enabled.
So I don't see an easy path to enabling inline instrumentation. It's
obviously possible, but I don't think it's something we can get done in
any reasonable time frame.
cheers
Le 19/03/2021 à 15:40, Daniel Axtens a écrit :
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index aca354fb670b..63672aa656e8 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -20,6 +20,7 @@
> #include <linux/seq_file.h>
> #include <asm/fixmap.h>
> #include <linux/const.h>
> +#include <linux/kasan.h>
> #include <asm/page.h>
> #include <asm/hugetlb.h>
>
> @@ -317,6 +318,23 @@ static void walk_pud(struct pg_state *st, p4d_t *p4d, unsigned long start)
> unsigned long addr;
> unsigned int i;
>
> +#if defined(CONFIG_KASAN) && defined(CONFIG_PPC_BOOK3S_64)
> + /*
> + * On radix + KASAN, we want to check for the KASAN "early" shadow
> + * which covers huge quantities of memory with the same set of
> + * read-only PTEs. If it is, we want to note the first page (to see
> + * the status change), and then note the last page. This gives us good
> + * results without spending ages noting the exact same PTEs over 100s of
> + * terabytes of memory.
> + */
> + if (p4d_page(*p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud))) {
> + walk_pmd(st, pud, start);
> + addr = start + (PTRS_PER_PUD - 1) * PUD_SIZE;
> + walk_pmd(st, pud, addr);
> + return;
> + }
> +#endif
> +
> for (i = 0; i < PTRS_PER_PUD; i++, pud++) {
> addr = start + i * PUD_SIZE;
> if (!pud_none(*pud) && !pud_is_leaf(*pud))
The above changes should not be necessary once PPC_PTDUMP is converted to GENERIC_PTDUMP.
See https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=239795
Christophe