Hi all,
This is v4 of the series adding support for nVHE hypervisor stacktraces;
and is based on arm64 for-next/stacktrace.
Thanks all for your feedback on previous revisions. Mark Brown, I
appreciate your Reviewed-by on the v3, I have dropped the tags in this
new verision since I think the series has changed quite a bit.
The previous versions were posted at:
v3: https://lore.kernel.org/r/[email protected]/
v2: https://lore.kernel.org/r/[email protected]/
v1: https://lore.kernel.org/r/[email protected]/
The main updates in this version are to address concerens from Marc on the
memory usage and reusing the common code by refactoring into a shared header.
Thanks,
Kalesh
============
KVM nVHE Stack unwinding.
===
nVHE has two modes of operation: protected (pKVM) and unprotected
(conventional nVHE). Depending on the mode, a slightly different approach
is used to dump the hyperviosr stacktrace but the core unwinding logic
remains the same.
Protected nVHE (pKVM) stacktraces
====
In protected nVHE mode, the host cannot directly access hypervisor memory.
The hypervisor stack unwinding happens in EL2 and is made accessible to
the host via a shared buffer. Symbolizing and printing the stacktrace
addresses is delegated to the host and happens in EL1.
Non-protected (Conventional) nVHE stacktraces
====
In non-protected mode, the host is able to directly access the hypervisor
stack pages.
The hypervisor stack unwinding and dumping of the stacktrace is performed
by the host in EL1, as this avoids the memory overhead of setting up
shared buffers between the host and hypervisor.
Resuing the Core Unwinding Logic
====
Since the hypervisor cannot link against the kernel code in proteced mode.
The common stack unwinding code is moved to a shared header to allow reuse
in the nVHE hypervisor.
Reducing the memory footprint
====
In this version the below steps were taken to reduce the memory usage of
nVHE stack unwinding:
1) The nVHE overflow stack is reduced from PAGE_SIZE to 4KB; benificial
for configurations with non 4KB pages (16KB or 64KB pages).
2) In protected nVHE mode (pKVM), the shared stacktrace buffers with the
host are reduced from PAGE_SIZE to the minimum size required.
3) In systems other than Android, conventional nVHE makes up the vast
majority of use case. So the pKVM stack tracing is disabled by default
(!CONFIG_PROTECTED_NVHE_STACKTRACE), which avoid the memory usage for
setting up shared buffers.
4) In non-protected nVHE mode (conventional nVHE), the stack unwinding
is done directly in EL1 by the host and no shared buffers with the
hyperviosr are needed.
Sample Output
====
The below shows an example output from a simple stack overflow test:
[ 126.862960] kvm [371]: nVHE hyp panic at: [<ffff8000090a51d0>] __kvm_nvhe_recursive_death+0x10/0x34!
[ 126.869920] kvm [371]: Protected nVHE HYP call trace:
[ 126.870528] kvm [371]: [<ffff8000090a5570>] __kvm_nvhe_hyp_panic+0xac/0xf8
[ 126.871342] kvm [371]: [<ffff8000090a55cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[ 126.872174] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.872971] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
. . .
[ 126.927314] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.927727] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.928137] kvm [371]: [<ffff8000090a4de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
[ 126.928561] kvm [371]: [<ffff8000090a7b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
[ 126.928984] kvm [371]: [<ffff8000090a78b8>] __kvm_nvhe_handle_trap+0xc4/0x128
[ 126.929385] kvm [371]: [<ffff8000090a6864>] __kvm_nvhe___host_exit+0x64/0x64
[ 126.929804] kvm [371]: ---- End of Protected nVHE HYP call trace ----
============
Kalesh Singh (18):
arm64: stacktrace: Add shared header for common stack unwinding code
arm64: stacktrace: Factor out on_accessible_stack_common()
arm64: stacktrace: Factor out unwind_next_common()
arm64: stacktrace: Handle frame pointer from different address spaces
arm64: stacktrace: Factor out common unwind()
arm64: stacktrace: Add description of stacktrace/common.h
KVM: arm64: On stack overflow switch to hyp overflow_stack
KVM: arm64: Add PROTECTED_NVHE_STACKTRACE Kconfig
KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
KVM: arm64: Stub implementation of pKVM HYP stack unwinder
KVM: arm64: Stub implementation of non-protected nVHE HYP stack
unwinder
KVM: arm64: Save protected-nVHE (pKVM) hyp stacktrace
KVM: arm64: Prepare non-protected nVHE hypervisor stacktrace
KVM: arm64: Implement protected nVHE hyp stack unwinder
KVM: arm64: Implement non-protected nVHE hyp stack unwinder
KVM: arm64: Introduce pkvm_dump_backtrace()
KVM: arm64: Introduce hyp_dump_backtrace()
KVM: arm64: Dump nVHE hypervisor stack on panic
arch/arm64/include/asm/kvm_asm.h | 16 ++
arch/arm64/include/asm/memory.h | 7 +
arch/arm64/include/asm/stacktrace.h | 92 ++++---
arch/arm64/include/asm/stacktrace/common.h | 224 ++++++++++++++++
arch/arm64/include/asm/stacktrace/nvhe.h | 291 +++++++++++++++++++++
arch/arm64/kernel/stacktrace.c | 157 -----------
arch/arm64/kvm/Kconfig | 15 ++
arch/arm64/kvm/arm.c | 2 +-
arch/arm64/kvm/handle_exit.c | 4 +
arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
arch/arm64/kvm/hyp/nvhe/host.S | 9 +-
arch/arm64/kvm/hyp/nvhe/stacktrace.c | 108 ++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +
13 files changed, 727 insertions(+), 205 deletions(-)
create mode 100644 arch/arm64/include/asm/stacktrace/common.h
create mode 100644 arch/arm64/include/asm/stacktrace/nvhe.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/stacktrace.c
base-commit: 82a592c13b0aeff94d84d54183dae0b26384c95f
--
2.37.0.170.g444d1eabd0-goog
Add stub implementations of non-protected nVHE stack unwinder, for
building. These are implemented later in this series.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/nvhe.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index 1eac4e57f2ae..36cf7858ddd8 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -8,6 +8,12 @@
* the HYP memory. The stack is unwinded in EL2 and dumped to a shared
* buffer where the host can read and print the stacktrace.
*
+ * 2) Non-protected nVHE mode - the host can directly access the
+ * HYP stack pages and unwind the HYP stack in EL1. This saves having
+ * to allocate shared buffers for the host to read the unwinded
+ * stacktrace.
+ *
+ *
* Copyright (C) 2022 Google LLC
*/
#ifndef __ASM_STACKTRACE_NVHE_H
@@ -53,5 +59,21 @@ static int notrace unwind_next(struct unwind_state *state)
NOKPROBE_SYMBOL(unwind_next);
#endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
+/*
+ * Non-protected nVHE HYP stack unwinder
+ */
+#else /* !__KVM_NVHE_HYPERVISOR__ */
+static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
+ struct stack_info *info)
+{
+ return false;
+}
+
+static int notrace unwind_next(struct unwind_state *state)
+{
+ return 0;
+}
+NOKPROBE_SYMBOL(unwind_next);
+
#endif /* __KVM_NVHE_HYPERVISOR__ */
#endif /* __ASM_STACKTRACE_NVHE_H */
--
2.37.0.170.g444d1eabd0-goog
This can be used to disable stacktrace for the protected KVM
nVHE hypervisor, in order to save on the associated memory usage.
This option is disabled by default, since protected KVM is not widely
used on platforms other than Android currently.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/kvm/Kconfig | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8a5fbbf084df..1edab6f8a3b8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -46,6 +46,21 @@ menuconfig KVM
If unsure, say N.
+config PROTECTED_NVHE_STACKTRACE
+ bool "Protected KVM hypervisor stacktraces"
+ depends on KVM
+ default n
+ help
+ Say Y here to enable pKVM hypervisor stacktraces on hyp_panic()
+
+ If you are not using protected nVHE (pKVM), say N.
+
+ If using protected nVHE mode, but cannot afford the associated
+ memory cost (less than 0.75 page per CPU) of pKVM stacktraces,
+ say N.
+
+ If unsure, say N.
+
config NVHE_EL2_DEBUG
bool "Debug mode for non-VHE EL2 object"
depends on KVM
--
2.37.0.170.g444d1eabd0-goog
In order to reuse the arm64 stack unwinding logic for the nVHE
hypervisor stack, move the common code to a shared header
(arch/arm64/include/asm/stacktrace/common.h).
The nVHE hypervisor cannot safely link against kernel code, so we
make use of the shared header to avoid duplicated logic later in
this series.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 35 +------
arch/arm64/include/asm/stacktrace/common.h | 105 +++++++++++++++++++++
arch/arm64/kernel/stacktrace.c | 57 -----------
3 files changed, 106 insertions(+), 91 deletions(-)
create mode 100644 arch/arm64/include/asm/stacktrace/common.h
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index aec9315bf156..79f455b37c84 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -8,52 +8,19 @@
#include <linux/percpu.h>
#include <linux/sched.h>
#include <linux/sched/task_stack.h>
-#include <linux/types.h>
#include <linux/llist.h>
#include <asm/memory.h>
#include <asm/ptrace.h>
#include <asm/sdei.h>
-enum stack_type {
- STACK_TYPE_UNKNOWN,
- STACK_TYPE_TASK,
- STACK_TYPE_IRQ,
- STACK_TYPE_OVERFLOW,
- STACK_TYPE_SDEI_NORMAL,
- STACK_TYPE_SDEI_CRITICAL,
- __NR_STACK_TYPES
-};
-
-struct stack_info {
- unsigned long low;
- unsigned long high;
- enum stack_type type;
-};
+#include <asm/stacktrace/common.h>
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl);
DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
-static inline bool on_stack(unsigned long sp, unsigned long size,
- unsigned long low, unsigned long high,
- enum stack_type type, struct stack_info *info)
-{
- if (!low)
- return false;
-
- if (sp < low || sp + size < sp || sp + size > high)
- return false;
-
- if (info) {
- info->low = low;
- info->high = high;
- info->type = type;
- }
- return true;
-}
-
static inline bool on_irq_stack(unsigned long sp, unsigned long size,
struct stack_info *info)
{
diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
new file mode 100644
index 000000000000..64ae4f6b06fe
--- /dev/null
+++ b/arch/arm64/include/asm/stacktrace/common.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Common arm64 stack unwinder code.
+ *
+ * Copyright (C) 2012 ARM Ltd.
+ */
+#ifndef __ASM_STACKTRACE_COMMON_H
+#define __ASM_STACKTRACE_COMMON_H
+
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/types.h>
+
+enum stack_type {
+ STACK_TYPE_UNKNOWN,
+ STACK_TYPE_TASK,
+ STACK_TYPE_IRQ,
+ STACK_TYPE_OVERFLOW,
+ STACK_TYPE_SDEI_NORMAL,
+ STACK_TYPE_SDEI_CRITICAL,
+ __NR_STACK_TYPES
+};
+
+struct stack_info {
+ unsigned long low;
+ unsigned long high;
+ enum stack_type type;
+};
+
+/*
+ * A snapshot of a frame record or fp/lr register values, along with some
+ * accounting information necessary for robust unwinding.
+ *
+ * @fp: The fp value in the frame record (or the real fp)
+ * @pc: The lr value in the frame record (or the real lr)
+ *
+ * @stacks_done: Stacks which have been entirely unwound, for which it is no
+ * longer valid to unwind to.
+ *
+ * @prev_fp: The fp that pointed to this frame record, or a synthetic value
+ * of 0. This is used to ensure that within a stack, each
+ * subsequent frame record is at an increasing address.
+ * @prev_type: The type of stack this frame record was on, or a synthetic
+ * value of STACK_TYPE_UNKNOWN. This is used to detect a
+ * transition from one stack to another.
+ *
+ * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
+ * associated with the most recently encountered replacement lr
+ * value.
+ *
+ * @task: The task being unwound.
+ */
+struct unwind_state {
+ unsigned long fp;
+ unsigned long pc;
+ DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
+ unsigned long prev_fp;
+ enum stack_type prev_type;
+#ifdef CONFIG_KRETPROBES
+ struct llist_node *kr_cur;
+#endif
+ struct task_struct *task;
+};
+
+static inline bool on_stack(unsigned long sp, unsigned long size,
+ unsigned long low, unsigned long high,
+ enum stack_type type, struct stack_info *info)
+{
+ if (!low)
+ return false;
+
+ if (sp < low || sp + size < sp || sp + size > high)
+ return false;
+
+ if (info) {
+ info->low = low;
+ info->high = high;
+ info->type = type;
+ }
+ return true;
+}
+
+static inline void unwind_init_common(struct unwind_state *state,
+ struct task_struct *task)
+{
+ state->task = task;
+#ifdef CONFIG_KRETPROBES
+ state->kr_cur = NULL;
+#endif
+
+ /*
+ * Prime the first unwind.
+ *
+ * In unwind_next() we'll check that the FP points to a valid stack,
+ * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
+ * treated as a transition to whichever stack that happens to be. The
+ * prev_fp value won't be used, but we set it to 0 such that it is
+ * definitely not an accessible stack address.
+ */
+ bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
+ state->prev_fp = 0;
+ state->prev_type = STACK_TYPE_UNKNOWN;
+}
+
+#endif /* __ASM_STACKTRACE_COMMON_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index fcaa151b81f1..94a5dd2ab8fd 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,63 +18,6 @@
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
-/*
- * A snapshot of a frame record or fp/lr register values, along with some
- * accounting information necessary for robust unwinding.
- *
- * @fp: The fp value in the frame record (or the real fp)
- * @pc: The lr value in the frame record (or the real lr)
- *
- * @stacks_done: Stacks which have been entirely unwound, for which it is no
- * longer valid to unwind to.
- *
- * @prev_fp: The fp that pointed to this frame record, or a synthetic value
- * of 0. This is used to ensure that within a stack, each
- * subsequent frame record is at an increasing address.
- * @prev_type: The type of stack this frame record was on, or a synthetic
- * value of STACK_TYPE_UNKNOWN. This is used to detect a
- * transition from one stack to another.
- *
- * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
- * associated with the most recently encountered replacement lr
- * value.
- *
- * @task: The task being unwound.
- */
-struct unwind_state {
- unsigned long fp;
- unsigned long pc;
- DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
- unsigned long prev_fp;
- enum stack_type prev_type;
-#ifdef CONFIG_KRETPROBES
- struct llist_node *kr_cur;
-#endif
- struct task_struct *task;
-};
-
-static void unwind_init_common(struct unwind_state *state,
- struct task_struct *task)
-{
- state->task = task;
-#ifdef CONFIG_KRETPROBES
- state->kr_cur = NULL;
-#endif
-
- /*
- * Prime the first unwind.
- *
- * In unwind_next() we'll check that the FP points to a valid stack,
- * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
- * treated as a transition to whichever stack that happens to be. The
- * prev_fp value won't be used, but we set it to 0 such that it is
- * definitely not an accessible stack address.
- */
- bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
- state->prev_fp = 0;
- state->prev_type = STACK_TYPE_UNKNOWN;
-}
-
/*
* Start an unwind from a pt_regs.
*
--
2.37.0.170.g444d1eabd0-goog
In non-protected nVHE mode (non-pKVM) the host can directly access
hypervisor memory; and unwinding of the hypervisor stacktrace is
done from EL1 to save on memory for shared buffers.
To unwind the hypervisor stack from EL1 the host needs to know the
starting point for the unwind and information that will allow it to
translate hypervisor stack addresses to the corresponding kernel
addresses. This patch sets up this book keeping. It is made use of
later in the series.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 16 ++++++++++++++++
arch/arm64/include/asm/stacktrace/nvhe.h | 4 ++++
arch/arm64/kvm/hyp/nvhe/stacktrace.c | 24 ++++++++++++++++++++++++
3 files changed, 44 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..0ae9d12c2b5a 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -176,6 +176,22 @@ struct kvm_nvhe_init_params {
unsigned long vtcr;
};
+/**
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic() in non-protected mode.
+ *
+ * @stack_base: hyp VA of the hyp_stack base.
+ * @overflow_stack_base: hyp VA of the hyp_overflow_stack base.
+ * @fp: hyp FP where the backtrace begins.
+ * @pc: hyp PC where the backtrace begins.
+ */
+struct kvm_nvhe_stacktrace_info {
+ unsigned long stack_base;
+ unsigned long overflow_stack_base;
+ unsigned long fp;
+ unsigned long pc;
+};
+
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index 456a6ae08433..1aadfd8d7ac9 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -19,6 +19,7 @@
#ifndef __ASM_STACKTRACE_NVHE_H
#define __ASM_STACKTRACE_NVHE_H
+#include <asm/kvm_asm.h>
#include <asm/stacktrace/common.h>
/**
@@ -49,6 +50,9 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
*/
#ifdef __KVM_NVHE_HYPERVISOR__
+DECLARE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack);
+DECLARE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
+
extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
diff --git a/arch/arm64/kvm/hyp/nvhe/stacktrace.c b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
index 832a536e440f..315eb41c37a2 100644
--- a/arch/arm64/kvm/hyp/nvhe/stacktrace.c
+++ b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
@@ -9,6 +9,28 @@
DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)
__aligned(16);
+DEFINE_PER_CPU(struct kvm_nvhe_stacktrace_info, kvm_stacktrace_info);
+
+/**
+ * hyp_prepare_backtrace - Prepare non-protected nVHE backtrace.
+ *
+ * @fp : frame pointer at which to start the unwinding.
+ * @pc : program counter at which to start the unwinding.
+ *
+ * Save the information needed by the host to unwind the non-protected
+ * nVHE hypervisor stack in EL1.
+ */
+static void hyp_prepare_backtrace(unsigned long fp, unsigned long pc)
+{
+ struct kvm_nvhe_stacktrace_info *stacktrace_info = this_cpu_ptr(&kvm_stacktrace_info);
+ struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+ stacktrace_info->stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+ stacktrace_info->overflow_stack_base = (unsigned long)this_cpu_ptr(overflow_stack);
+ stacktrace_info->fp = fp;
+ stacktrace_info->pc = pc;
+}
+
#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
DEFINE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
@@ -81,4 +103,6 @@ void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc)
{
if (is_protected_kvm_enabled())
pkvm_save_backtrace(fp, pc);
+ else
+ hyp_prepare_backtrace(fp, pc);
}
--
2.37.0.170.g444d1eabd0-goog
Add brief description on how to use stacktrace/common.h to implement
a stack unwinder.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/common.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
index f86efe71479d..b362086f4c70 100644
--- a/arch/arm64/include/asm/stacktrace/common.h
+++ b/arch/arm64/include/asm/stacktrace/common.h
@@ -2,6 +2,14 @@
/*
* Common arm64 stack unwinder code.
*
+ * To implement a new arm64 stack unwinder:
+ * 1) Include this header
+ *
+ * 2) Provide implementations for the following functions:
+ * - on_overflow_stack()
+ * - on_accessible_stack()
+ * - unwind_next()
+ *
* Copyright (C) 2012 ARM Ltd.
*/
#ifndef __ASM_STACKTRACE_COMMON_H
--
2.37.0.170.g444d1eabd0-goog
On hyp_panic(), unwind and dump the nVHE hypervisor stack trace.
In protected nVHE mode, hypervisor stacktraces are only produced
if CONFIG_PROTECTED_NVHE_STACKTRACE is enabled.
Example backtrace:
[ 126.862960] kvm [371]: nVHE hyp panic at: [<ffff8000090a51d0>] __kvm_nvhe_recursive_death+0x10/0x34!
[ 126.869920] kvm [371]: Protected nVHE HYP call trace:
[ 126.870528] kvm [371]: [<ffff8000090a5570>] __kvm_nvhe_hyp_panic+0xac/0xf8
[ 126.871342] kvm [371]: [<ffff8000090a55cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[ 126.872174] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.872971] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
. . .
[ 126.927314] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.927727] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
[ 126.928137] kvm [371]: [<ffff8000090a4de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
[ 126.928561] kvm [371]: [<ffff8000090a7b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
[ 126.928984] kvm [371]: [<ffff8000090a78b8>] __kvm_nvhe_handle_trap+0xc4/0x128
[ 126.929385] kvm [371]: [<ffff8000090a6864>] __kvm_nvhe___host_exit+0x64/0x64
[ 126.929804] kvm [371]: ---- End of Protected nVHE HYP call trace ----
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/kvm/handle_exit.c | 4 ++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
2 files changed, 9 insertions(+)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index f66c0142b335..ef8b57953aa2 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -17,6 +17,7 @@
#include <asm/kvm_emulate.h>
#include <asm/kvm_mmu.h>
#include <asm/debug-monitors.h>
+#include <asm/stacktrace/nvhe.h>
#include <asm/traps.h>
#include <kvm/arm_hypercalls.h>
@@ -353,6 +354,9 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
(void *)panic_addr);
}
+ /* Dump the nVHE hypervisor backtrace */
+ kvm_nvhe_dump_backtrace(hyp_offset);
+
/*
* Hyp has panicked and we're going to handle that by panicking the
* kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6db801db8f27..a50cfd39dedb 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -25,6 +25,7 @@
#include <asm/fpsimd.h>
#include <asm/debug-monitors.h>
#include <asm/processor.h>
+#include <asm/stacktrace/nvhe.h>
#include <nvhe/fixed_config.h>
#include <nvhe/mem_protect.h>
@@ -375,6 +376,10 @@ asmlinkage void __noreturn hyp_panic(void)
__sysreg_restore_state_nvhe(host_ctxt);
}
+ /* Prepare to dump kvm nvhe hyp stacktrace */
+ kvm_nvhe_prepare_backtrace((unsigned long)__builtin_frame_address(0),
+ _THIS_IP_);
+
__hyp_do_panic(host_ctxt, spsr, elr, par);
unreachable();
}
--
2.37.0.170.g444d1eabd0-goog
Implements the common framework necessary for unwind() to work
for non-protected nVHE mode:
- on_accessible_stack()
- on_overflow_stack()
- unwind_next()
Non-protected nVHE unwind() is used to unwind and dump the hypervisor
stacktrace by the host in EL1
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/nvhe.h | 67 +++++++++++++++++++++++-
arch/arm64/kvm/arm.c | 2 +-
2 files changed, 66 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index c7c8ac889ec1..c3f94b10f8f0 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -115,15 +115,78 @@ NOKPROBE_SYMBOL(unwind_next);
* Non-protected nVHE HYP stack unwinder
*/
#else /* !__KVM_NVHE_HYPERVISOR__ */
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_stacktrace_info, kvm_stacktrace_info);
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+
+/**
+ * kvm_nvhe_stack_kern_va - Convert KVM nVHE HYP stack addresses to a kernel VAs
+ *
+ * The nVHE hypervisor stack is mapped in the flexible 'private' VA range, to
+ * allow for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base.
+ *
+ * Returns true on success and updates @addr to its corresponding kernel VA;
+ * otherwise returns false.
+ */
+static inline bool kvm_nvhe_stack_kern_va(unsigned long *addr,
+ enum stack_type type)
+{
+ struct kvm_nvhe_stacktrace_info *stacktrace_info;
+ unsigned long hyp_base, kern_base, hyp_offset;
+
+ stacktrace_info = this_cpu_ptr_nvhe_sym(kvm_stacktrace_info);
+
+ switch (type) {
+ case STACK_TYPE_HYP:
+ kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+ hyp_base = (unsigned long)stacktrace_info->stack_base;
+ break;
+ case STACK_TYPE_OVERFLOW:
+ kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(overflow_stack);
+ hyp_base = (unsigned long)stacktrace_info->overflow_stack_base;
+ break;
+ default:
+ return false;
+ }
+
+ hyp_offset = *addr - hyp_base;
+
+ *addr = kern_base + hyp_offset;
+
+ return true;
+}
+
static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
struct stack_info *info)
{
- return false;
+ struct kvm_nvhe_stacktrace_info *stacktrace_info
+ = this_cpu_ptr_nvhe_sym(kvm_stacktrace_info);
+ unsigned long low = (unsigned long)stacktrace_info->overflow_stack_base;
+ unsigned long high = low + OVERFLOW_STACK_SIZE;
+
+ return on_stack(sp, size, low, high, STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long sp, unsigned long size,
+ struct stack_info *info)
+{
+ struct kvm_nvhe_stacktrace_info *stacktrace_info
+ = this_cpu_ptr_nvhe_sym(kvm_stacktrace_info);
+ unsigned long low = (unsigned long)stacktrace_info->stack_base;
+ unsigned long high = low + PAGE_SIZE;
+
+ return on_stack(sp, size, low, high, STACK_TYPE_HYP, info);
}
static int notrace unwind_next(struct unwind_state *state)
{
- return 0;
+ struct stack_info info;
+
+ return unwind_next_common(state, &info, kvm_nvhe_stack_kern_va);
}
NOKPROBE_SYMBOL(unwind_next);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a0188144a122..6a64293108c5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
--
2.37.0.170.g444d1eabd0-goog
Dumps the pKVM hypervisor backtrace from EL1 by reading the unwinded
addresses from the shared stacktrace buffer.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/nvhe.h | 49 ++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index c3f94b10f8f0..ec1a4ee21c21 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -190,5 +190,54 @@ static int notrace unwind_next(struct unwind_state *state)
}
NOKPROBE_SYMBOL(unwind_next);
+#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
+
+/**
+ * pkvm_dump_backtrace - Dump the protected nVHE HYP backtrace.
+ *
+ * @hyp_offset: hypervisor offset, used for address translation.
+ *
+ * Dumping of the pKVM HYP backtrace is done by reading the
+ * stack addresses from the shared stacktrace buffer, since the
+ * host cannot direclty access hyperviosr memory in protected
+ * mode.
+ */
+static inline void pkvm_dump_backtrace(unsigned long hyp_offset)
+{
+ unsigned long *stacktrace_pos;
+ unsigned long va_mask, pc;
+
+ stacktrace_pos = (unsigned long *)this_cpu_ptr_nvhe_sym(pkvm_stacktrace);
+ va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+ kvm_err("Protected nVHE HYP call trace:\n");
+
+ /* The stack trace is terminated by a null entry */
+ for (; *stacktrace_pos; stacktrace_pos++) {
+ /* Mask tags and convert to kern addr */
+ pc = (*stacktrace_pos & va_mask) + hyp_offset;
+ kvm_err(" [<%016lx>] %pB\n", pc, (void *)pc);
+ }
+
+ kvm_err("---- End of Protected nVHE HYP call trace ----\n");
+}
+#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
+static inline void pkvm_dump_backtrace(unsigned long hyp_offset)
+{
+ kvm_err("Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE\n");
+}
+#endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
+
+/**
+ * kvm_nvhe_dump_backtrace - Dump KVM nVHE hypervisor backtrace.
+ *
+ * @hyp_offset: hypervisor offset, used for address translation.
+ */
+static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
+{
+ if (is_protected_kvm_enabled())
+ pkvm_dump_backtrace(hyp_offset);
+}
#endif /* __KVM_NVHE_HYPERVISOR__ */
#endif /* __ASM_STACKTRACE_NVHE_H */
--
2.37.0.170.g444d1eabd0-goog
In protected nVHE mode, the host cannot access private owned hypervisor
memory. Also the hypervisor aims to remains simple to reduce the attack
surface and does not provide any printk support.
For the above reasons, the approach taken to provide hypervisor stacktraces
in protected mode is:
1) Unwind and save the hyp stack addresses in EL2 to a shared buffer
with the host (done in this patch).
2) Delegate the dumping and symbolization of the addresses to the
host in EL1 (later patch in the series).
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/nvhe.h | 18 ++++++
arch/arm64/kvm/hyp/nvhe/stacktrace.c | 70 ++++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index 36cf7858ddd8..456a6ae08433 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -21,6 +21,22 @@
#include <asm/stacktrace/common.h>
+/**
+ * kvm_nvhe_unwind_init - Start an unwind from the given nVHE HYP fp and pc
+ *
+ * @fp : frame pointer at which to start the unwinding.
+ * @pc : program counter at which to start the unwinding.
+ */
+static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
+ unsigned long fp,
+ unsigned long pc)
+{
+ unwind_init_common(state, NULL);
+
+ state->fp = fp;
+ state->pc = pc;
+}
+
static inline bool on_accessible_stack(const struct task_struct *tsk,
unsigned long sp, unsigned long size,
struct stack_info *info)
@@ -33,6 +49,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
*/
#ifdef __KVM_NVHE_HYPERVISOR__
+extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
+
#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
struct stack_info *info)
diff --git a/arch/arm64/kvm/hyp/nvhe/stacktrace.c b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
index 96c8b93320eb..832a536e440f 100644
--- a/arch/arm64/kvm/hyp/nvhe/stacktrace.c
+++ b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
@@ -11,4 +11,74 @@ DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)
#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
DEFINE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
+
+/**
+ * pkvm_save_backtrace_entry - Saves a protected nVHE HYP stacktrace entry
+ *
+ * @arg : the position of the entry in the stacktrace buffer
+ * @where : the program counter corresponding to the stack frame
+ *
+ * Save the return address of a stack frame to the shared stacktrace buffer.
+ * The host can access this shared buffer from EL1 to dump the backtrace.
+ */
+static bool pkvm_save_backtrace_entry(void *arg, unsigned long where)
+{
+ unsigned long **stacktrace_pos = (unsigned long **)arg;
+ unsigned long stacktrace_start, stacktrace_end;
+
+ stacktrace_start = (unsigned long)this_cpu_ptr(pkvm_stacktrace);
+ stacktrace_end = stacktrace_start + NVHE_STACKTRACE_SIZE - (2 * sizeof(long));
+
+ if ((unsigned long) *stacktrace_pos > stacktrace_end)
+ return false;
+
+ /* Save the entry to the current pos in stacktrace buffer */
+ **stacktrace_pos = where;
+
+ /* A zero entry delimits the end of the stacktrace. */
+ *(*stacktrace_pos + 1) = 0UL;
+
+ /* Increment the current pos */
+ ++*stacktrace_pos;
+
+ return true;
+}
+
+/**
+ * pkvm_save_backtrace - Saves the protected nVHE HYP stacktrace
+ *
+ * @fp : frame pointer at which to start the unwinding.
+ * @pc : program counter at which to start the unwinding.
+ *
+ * Save the unwinded stack addresses to the shared stacktrace buffer.
+ * The host can access this shared buffer from EL1 to dump the backtrace.
+ */
+static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
+{
+ void *stacktrace_start = (void *)this_cpu_ptr(pkvm_stacktrace);
+ struct unwind_state state;
+
+ kvm_nvhe_unwind_init(&state, fp, pc);
+
+ unwind(&state, pkvm_save_backtrace_entry, &stacktrace_start);
+}
+#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
+static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
+{
+}
#endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
+
+/**
+ * kvm_nvhe_prepare_backtrace - prepare to dump the nVHE backtrace
+ *
+ * @fp : frame pointer at which to start the unwinding.
+ * @pc : program counter at which to start the unwinding.
+ *
+ * Saves the information needed by the host to dump the nVHE hypervisor
+ * backtrace.
+ */
+void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc)
+{
+ if (is_protected_kvm_enabled())
+ pkvm_save_backtrace(fp, pc);
+}
--
2.37.0.170.g444d1eabd0-goog
In non-protected nVHE mode, unwinds and dumps the hypervisor backtrace
from EL1. This is possible beacuase the host can directly access the
hypervisor stack pages in non-proteced mode.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/nvhe.h | 64 +++++++++++++++++++++---
1 file changed, 56 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index ec1a4ee21c21..c322ac95b256 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -190,6 +190,56 @@ static int notrace unwind_next(struct unwind_state *state)
}
NOKPROBE_SYMBOL(unwind_next);
+/**
+ * kvm_nvhe_print_backtrace_entry - Symbolizes and prints the HYP stack address
+ */
+static inline void kvm_nvhe_print_backtrace_entry(unsigned long addr,
+ unsigned long hyp_offset)
+{
+ unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+ /* Mask tags and convert to kern addr */
+ addr = (addr & va_mask) + hyp_offset;
+ kvm_err(" [<%016lx>] %pB\n", addr, (void *)addr);
+}
+
+/**
+ * hyp_backtrace_entry - Dump an entry of the non-protected nVHE HYP stacktrace
+ *
+ * @arg : the hypervisor offset, used for address translation
+ * @where : the program counter corresponding to the stack frame
+ */
+static inline bool hyp_dump_backtrace_entry(void *arg, unsigned long where)
+{
+ kvm_nvhe_print_backtrace_entry(where, (unsigned long)arg);
+
+ return true;
+}
+
+/**
+ * hyp_dump_backtrace - Dump the non-proteced nVHE HYP backtrace.
+ *
+ * @hyp_offset: hypervisor offset, used for address translation.
+ *
+ * The host can directly access HYP stack pages in non-protected
+ * mode, so the unwinding is done directly from EL1. This removes
+ * the need for shared buffers between host and hypervisor for
+ * the stacktrace.
+ */
+static inline void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+ struct kvm_nvhe_stacktrace_info *stacktrace_info;
+ struct unwind_state state;
+
+ stacktrace_info = this_cpu_ptr_nvhe_sym(kvm_stacktrace_info);
+
+ kvm_nvhe_unwind_init(&state, stacktrace_info->fp, stacktrace_info->pc);
+
+ kvm_err("Non-protected nVHE HYP call trace:\n");
+ unwind(&state, hyp_dump_backtrace_entry, (void *)hyp_offset);
+ kvm_err("---- End of Non-protected nVHE HYP call trace ----\n");
+}
+
#ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
DECLARE_KVM_NVHE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
@@ -206,22 +256,18 @@ DECLARE_KVM_NVHE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm
static inline void pkvm_dump_backtrace(unsigned long hyp_offset)
{
unsigned long *stacktrace_pos;
- unsigned long va_mask, pc;
stacktrace_pos = (unsigned long *)this_cpu_ptr_nvhe_sym(pkvm_stacktrace);
- va_mask = GENMASK_ULL(vabits_actual - 1, 0);
kvm_err("Protected nVHE HYP call trace:\n");
- /* The stack trace is terminated by a null entry */
- for (; *stacktrace_pos; stacktrace_pos++) {
- /* Mask tags and convert to kern addr */
- pc = (*stacktrace_pos & va_mask) + hyp_offset;
- kvm_err(" [<%016lx>] %pB\n", pc, (void *)pc);
- }
+ /* The saved stacktrace is terminated by a null entry */
+ for (; *stacktrace_pos; stacktrace_pos++)
+ kvm_nvhe_print_backtrace_entry(*stacktrace_pos, hyp_offset);
kvm_err("---- End of Protected nVHE HYP call trace ----\n");
}
+
#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
static inline void pkvm_dump_backtrace(unsigned long hyp_offset)
{
@@ -238,6 +284,8 @@ static inline void kvm_nvhe_dump_backtrace(unsigned long hyp_offset)
{
if (is_protected_kvm_enabled())
pkvm_dump_backtrace(hyp_offset);
+ else
+ hyp_dump_backtrace(hyp_offset);
}
#endif /* __KVM_NVHE_HYPERVISOR__ */
#endif /* __ASM_STACKTRACE_NVHE_H */
--
2.37.0.170.g444d1eabd0-goog
Move unwind() to stacktrace/common.h, and as a result
the kernel unwind_next() to asm/stacktrace.h. This allow
reusing unwind() in the implementation of the nVHE HYP
stack unwinder, later in the series.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 51 ++++++++++++++++
arch/arm64/include/asm/stacktrace/common.h | 19 ++++++
arch/arm64/kernel/stacktrace.c | 67 ----------------------
3 files changed, 70 insertions(+), 67 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index a4f8b84fb459..4fa07f0f913d 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -11,6 +11,7 @@
#include <linux/llist.h>
#include <asm/memory.h>
+#include <asm/pointer_auth.h>
#include <asm/ptrace.h>
#include <asm/sdei.h>
@@ -78,4 +79,54 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
return false;
}
+
+/*
+ * Unwind from one frame record (A) to the next frame record (B).
+ *
+ * We terminate early if the location of B indicates a malformed chain of frame
+ * records (e.g. a cycle), determined based on the location and fp value of A
+ * and the location (but not the fp value) of B.
+ */
+static inline int notrace unwind_next(struct unwind_state *state)
+{
+ struct task_struct *tsk = state->task;
+ unsigned long fp = state->fp;
+ struct stack_info info;
+ int err;
+
+ /* Final frame; nothing to unwind */
+ if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
+ return -ENOENT;
+
+ err = unwind_next_common(state, &info, NULL);
+ if (err)
+ return err;
+
+ state->pc = ptrauth_strip_insn_pac(state->pc);
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ if (tsk->ret_stack &&
+ (state->pc == (unsigned long)return_to_handler)) {
+ unsigned long orig_pc;
+ /*
+ * This is a case where function graph tracer has
+ * modified a return address (LR) in a stack frame
+ * to hook a function return.
+ * So replace it to an original value.
+ */
+ orig_pc = ftrace_graph_ret_addr(tsk, NULL, state->pc,
+ (void *)state->fp);
+ if (WARN_ON_ONCE(state->pc == orig_pc))
+ return -EINVAL;
+ state->pc = orig_pc;
+ }
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+#ifdef CONFIG_KRETPROBES
+ if (is_kretprobe_trampoline(state->pc))
+ state->pc = kretprobe_find_ret_addr(tsk, (void *)state->fp, &state->kr_cur);
+#endif
+
+ return 0;
+}
+NOKPROBE_SYMBOL(unwind_next);
#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
index 5f5d74a286f3..f86efe71479d 100644
--- a/arch/arm64/include/asm/stacktrace/common.h
+++ b/arch/arm64/include/asm/stacktrace/common.h
@@ -9,6 +9,7 @@
#include <linux/bitmap.h>
#include <linux/bitops.h>
+#include <linux/kprobes.h>
#include <linux/types.h>
enum stack_type {
@@ -69,6 +70,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
unsigned long sp, unsigned long size,
struct stack_info *info);
+static inline int unwind_next(struct unwind_state *state);
+
static inline bool on_stack(unsigned long sp, unsigned long size,
unsigned long low, unsigned long high,
enum stack_type type, struct stack_info *info)
@@ -192,4 +195,20 @@ static inline int unwind_next_common(struct unwind_state *state,
return 0;
}
+
+static inline void notrace unwind(struct unwind_state *state,
+ stack_trace_consume_fn consume_entry,
+ void *cookie)
+{
+ while (1) {
+ int ret;
+
+ if (!consume_entry(cookie, state->pc))
+ break;
+ ret = unwind_next(state);
+ if (ret < 0)
+ break;
+ }
+}
+NOKPROBE_SYMBOL(unwind);
#endif /* __ASM_STACKTRACE_COMMON_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index eef3cf6bf2d7..9fa60ee48499 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -7,14 +7,12 @@
#include <linux/kernel.h>
#include <linux/export.h>
#include <linux/ftrace.h>
-#include <linux/kprobes.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
#include <linux/stacktrace.h>
#include <asm/irq.h>
-#include <asm/pointer_auth.h>
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
@@ -69,71 +67,6 @@ static inline void unwind_init_from_task(struct unwind_state *state,
state->pc = thread_saved_pc(task);
}
-/*
- * Unwind from one frame record (A) to the next frame record (B).
- *
- * We terminate early if the location of B indicates a malformed chain of frame
- * records (e.g. a cycle), determined based on the location and fp value of A
- * and the location (but not the fp value) of B.
- */
-static int notrace unwind_next(struct unwind_state *state)
-{
- struct task_struct *tsk = state->task;
- unsigned long fp = state->fp;
- struct stack_info info;
- int err;
-
- /* Final frame; nothing to unwind */
- if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
- return -ENOENT;
-
- err = unwind_next_common(state, &info, NULL);
- if (err)
- return err;
-
- state->pc = ptrauth_strip_insn_pac(state->pc);
-
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- if (tsk->ret_stack &&
- (state->pc == (unsigned long)return_to_handler)) {
- unsigned long orig_pc;
- /*
- * This is a case where function graph tracer has
- * modified a return address (LR) in a stack frame
- * to hook a function return.
- * So replace it to an original value.
- */
- orig_pc = ftrace_graph_ret_addr(tsk, NULL, state->pc,
- (void *)state->fp);
- if (WARN_ON_ONCE(state->pc == orig_pc))
- return -EINVAL;
- state->pc = orig_pc;
- }
-#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-#ifdef CONFIG_KRETPROBES
- if (is_kretprobe_trampoline(state->pc))
- state->pc = kretprobe_find_ret_addr(tsk, (void *)state->fp, &state->kr_cur);
-#endif
-
- return 0;
-}
-NOKPROBE_SYMBOL(unwind_next);
-
-static void notrace unwind(struct unwind_state *state,
- stack_trace_consume_fn consume_entry, void *cookie)
-{
- while (1) {
- int ret;
-
- if (!consume_entry(cookie, state->pc))
- break;
- ret = unwind_next(state);
- if (ret < 0)
- break;
- }
-}
-NOKPROBE_SYMBOL(unwind);
-
static bool dump_backtrace_entry(void *arg, unsigned long where)
{
char *loglvl = arg;
--
2.37.0.170.g444d1eabd0-goog
Implements the common framework necessary for unwind() to work in
the protected nVHE context:
- on_accessible_stack()
- on_overflow_stack()
- unwind_next()
Protected nVHE unwind() is used to unwind and save the hyp stack
addresses to the shared stacktrace buffer. The host reads the
entries in this buffer, symbolizes and dumps the stacktrace (later
patch in the series).
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/stacktrace/common.h | 2 ++
arch/arm64/include/asm/stacktrace/nvhe.h | 34 ++++++++++++++++++++--
2 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
index b362086f4c70..cf442e67dccd 100644
--- a/arch/arm64/include/asm/stacktrace/common.h
+++ b/arch/arm64/include/asm/stacktrace/common.h
@@ -27,6 +27,7 @@ enum stack_type {
STACK_TYPE_OVERFLOW,
STACK_TYPE_SDEI_NORMAL,
STACK_TYPE_SDEI_CRITICAL,
+ STACK_TYPE_HYP,
__NR_STACK_TYPES
};
@@ -171,6 +172,7 @@ static inline int unwind_next_common(struct unwind_state *state,
*
* TASK -> IRQ -> OVERFLOW -> SDEI_NORMAL
* TASK -> SDEI_NORMAL -> SDEI_CRITICAL -> OVERFLOW
+ * HYP -> OVERFLOW
*
* ... but the nesting itself is strict. Once we transition from one
* stack to another, it's never valid to unwind back to that first
diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
index 1aadfd8d7ac9..c7c8ac889ec1 100644
--- a/arch/arm64/include/asm/stacktrace/nvhe.h
+++ b/arch/arm64/include/asm/stacktrace/nvhe.h
@@ -38,10 +38,19 @@ static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
state->pc = pc;
}
+static inline bool on_hyp_stack(unsigned long sp, unsigned long size,
+ struct stack_info *info);
+
static inline bool on_accessible_stack(const struct task_struct *tsk,
unsigned long sp, unsigned long size,
struct stack_info *info)
{
+ if (on_accessible_stack_common(tsk, sp, size, info))
+ return true;
+
+ if (on_hyp_stack(sp, size, info))
+ return true;
+
return false;
}
@@ -59,12 +68,27 @@ extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
struct stack_info *info)
{
- return false;
+ unsigned long low = (unsigned long)this_cpu_ptr(overflow_stack);
+ unsigned long high = low + OVERFLOW_STACK_SIZE;
+
+ return on_stack(sp, size, low, high, STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long sp, unsigned long size,
+ struct stack_info *info)
+{
+ struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+ unsigned long high = params->stack_hyp_va;
+ unsigned long low = high - PAGE_SIZE;
+
+ return on_stack(sp, size, low, high, STACK_TYPE_HYP, info);
}
static int notrace unwind_next(struct unwind_state *state)
{
- return 0;
+ struct stack_info info;
+
+ return unwind_next_common(state, &info, NULL);
}
NOKPROBE_SYMBOL(unwind_next);
#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
@@ -74,6 +98,12 @@ static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
return false;
}
+static inline bool on_hyp_stack(unsigned long sp, unsigned long size,
+ struct stack_info *info)
+{
+ return false;
+}
+
static int notrace unwind_next(struct unwind_state *state)
{
return 0;
--
2.37.0.170.g444d1eabd0-goog
On Thu, Jul 14, 2022 at 11:10:10PM -0700, Kalesh Singh wrote:
> In order to reuse the arm64 stack unwinding logic for the nVHE
> hypervisor stack, move the common code to a shared header
> (arch/arm64/include/asm/stacktrace/common.h).
>
> The nVHE hypervisor cannot safely link against kernel code, so we
> make use of the shared header to avoid duplicated logic later in
> this series.
Reviewed-by: Mark Brown <[email protected]>
Hi Kalesh,
On Fri, Jul 15, 2022 at 7:11 AM Kalesh Singh <[email protected]> wrote:
>
> Move unwind() to stacktrace/common.h, and as a result
> the kernel unwind_next() to asm/stacktrace.h. This allow
> reusing unwind() in the implementation of the nVHE HYP
> stack unwinder, later in the series.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
Reviewed-by: Fuad Tabba <[email protected]>
Thanks,
/fuad
> arch/arm64/include/asm/stacktrace.h | 51 ++++++++++++++++
> arch/arm64/include/asm/stacktrace/common.h | 19 ++++++
> arch/arm64/kernel/stacktrace.c | 67 ----------------------
> 3 files changed, 70 insertions(+), 67 deletions(-)
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index a4f8b84fb459..4fa07f0f913d 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -11,6 +11,7 @@
> #include <linux/llist.h>
>
> #include <asm/memory.h>
> +#include <asm/pointer_auth.h>
> #include <asm/ptrace.h>
> #include <asm/sdei.h>
>
> @@ -78,4 +79,54 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
>
> return false;
> }
> +
> +/*
> + * Unwind from one frame record (A) to the next frame record (B).
> + *
> + * We terminate early if the location of B indicates a malformed chain of frame
> + * records (e.g. a cycle), determined based on the location and fp value of A
> + * and the location (but not the fp value) of B.
> + */
> +static inline int notrace unwind_next(struct unwind_state *state)
> +{
> + struct task_struct *tsk = state->task;
> + unsigned long fp = state->fp;
> + struct stack_info info;
> + int err;
> +
> + /* Final frame; nothing to unwind */
> + if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> + return -ENOENT;
> +
> + err = unwind_next_common(state, &info, NULL);
> + if (err)
> + return err;
> +
> + state->pc = ptrauth_strip_insn_pac(state->pc);
> +
> +#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> + if (tsk->ret_stack &&
> + (state->pc == (unsigned long)return_to_handler)) {
> + unsigned long orig_pc;
> + /*
> + * This is a case where function graph tracer has
> + * modified a return address (LR) in a stack frame
> + * to hook a function return.
> + * So replace it to an original value.
> + */
> + orig_pc = ftrace_graph_ret_addr(tsk, NULL, state->pc,
> + (void *)state->fp);
> + if (WARN_ON_ONCE(state->pc == orig_pc))
> + return -EINVAL;
> + state->pc = orig_pc;
> + }
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +#ifdef CONFIG_KRETPROBES
> + if (is_kretprobe_trampoline(state->pc))
> + state->pc = kretprobe_find_ret_addr(tsk, (void *)state->fp, &state->kr_cur);
> +#endif
> +
> + return 0;
> +}
> +NOKPROBE_SYMBOL(unwind_next);
> #endif /* __ASM_STACKTRACE_H */
> diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> index 5f5d74a286f3..f86efe71479d 100644
> --- a/arch/arm64/include/asm/stacktrace/common.h
> +++ b/arch/arm64/include/asm/stacktrace/common.h
> @@ -9,6 +9,7 @@
>
> #include <linux/bitmap.h>
> #include <linux/bitops.h>
> +#include <linux/kprobes.h>
> #include <linux/types.h>
>
> enum stack_type {
> @@ -69,6 +70,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> unsigned long sp, unsigned long size,
> struct stack_info *info);
>
> +static inline int unwind_next(struct unwind_state *state);
> +
> static inline bool on_stack(unsigned long sp, unsigned long size,
> unsigned long low, unsigned long high,
> enum stack_type type, struct stack_info *info)
> @@ -192,4 +195,20 @@ static inline int unwind_next_common(struct unwind_state *state,
>
> return 0;
> }
> +
> +static inline void notrace unwind(struct unwind_state *state,
> + stack_trace_consume_fn consume_entry,
> + void *cookie)
> +{
> + while (1) {
> + int ret;
> +
> + if (!consume_entry(cookie, state->pc))
> + break;
> + ret = unwind_next(state);
> + if (ret < 0)
> + break;
> + }
> +}
> +NOKPROBE_SYMBOL(unwind);
> #endif /* __ASM_STACKTRACE_COMMON_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index eef3cf6bf2d7..9fa60ee48499 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -7,14 +7,12 @@
> #include <linux/kernel.h>
> #include <linux/export.h>
> #include <linux/ftrace.h>
> -#include <linux/kprobes.h>
> #include <linux/sched.h>
> #include <linux/sched/debug.h>
> #include <linux/sched/task_stack.h>
> #include <linux/stacktrace.h>
>
> #include <asm/irq.h>
> -#include <asm/pointer_auth.h>
> #include <asm/stack_pointer.h>
> #include <asm/stacktrace.h>
>
> @@ -69,71 +67,6 @@ static inline void unwind_init_from_task(struct unwind_state *state,
> state->pc = thread_saved_pc(task);
> }
>
> -/*
> - * Unwind from one frame record (A) to the next frame record (B).
> - *
> - * We terminate early if the location of B indicates a malformed chain of frame
> - * records (e.g. a cycle), determined based on the location and fp value of A
> - * and the location (but not the fp value) of B.
> - */
> -static int notrace unwind_next(struct unwind_state *state)
> -{
> - struct task_struct *tsk = state->task;
> - unsigned long fp = state->fp;
> - struct stack_info info;
> - int err;
> -
> - /* Final frame; nothing to unwind */
> - if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
> - return -ENOENT;
> -
> - err = unwind_next_common(state, &info, NULL);
> - if (err)
> - return err;
> -
> - state->pc = ptrauth_strip_insn_pac(state->pc);
> -
> -#ifdef CONFIG_FUNCTION_GRAPH_TRACER
> - if (tsk->ret_stack &&
> - (state->pc == (unsigned long)return_to_handler)) {
> - unsigned long orig_pc;
> - /*
> - * This is a case where function graph tracer has
> - * modified a return address (LR) in a stack frame
> - * to hook a function return.
> - * So replace it to an original value.
> - */
> - orig_pc = ftrace_graph_ret_addr(tsk, NULL, state->pc,
> - (void *)state->fp);
> - if (WARN_ON_ONCE(state->pc == orig_pc))
> - return -EINVAL;
> - state->pc = orig_pc;
> - }
> -#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> -#ifdef CONFIG_KRETPROBES
> - if (is_kretprobe_trampoline(state->pc))
> - state->pc = kretprobe_find_ret_addr(tsk, (void *)state->fp, &state->kr_cur);
> -#endif
> -
> - return 0;
> -}
> -NOKPROBE_SYMBOL(unwind_next);
> -
> -static void notrace unwind(struct unwind_state *state,
> - stack_trace_consume_fn consume_entry, void *cookie)
> -{
> - while (1) {
> - int ret;
> -
> - if (!consume_entry(cookie, state->pc))
> - break;
> - ret = unwind_next(state);
> - if (ret < 0)
> - break;
> - }
> -}
> -NOKPROBE_SYMBOL(unwind);
> -
> static bool dump_backtrace_entry(void *arg, unsigned long where)
> {
> char *loglvl = arg;
> --
> 2.37.0.170.g444d1eabd0-goog
>
Hi Kalesh,
On Fri, Jul 15, 2022 at 7:11 AM Kalesh Singh <[email protected]> wrote:
>
> Add brief description on how to use stacktrace/common.h to implement
> a stack unwinder.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace/common.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> index f86efe71479d..b362086f4c70 100644
> --- a/arch/arm64/include/asm/stacktrace/common.h
> +++ b/arch/arm64/include/asm/stacktrace/common.h
> @@ -2,6 +2,14 @@
> /*
> * Common arm64 stack unwinder code.
> *
> + * To implement a new arm64 stack unwinder:
> + * 1) Include this header
> + *
> + * 2) Provide implementations for the following functions:
> + * - on_overflow_stack()
> + * - on_accessible_stack()
> + * - unwind_next()
> + *
> * Copyright (C) 2012 ARM Ltd.
> */
> #ifndef __ASM_STACKTRACE_COMMON_H
> --
Ideally it would be nice to have a description of what these functions
are expected to do, but that said,
Reviewed-by: Fuad Tabba <[email protected]>
Thanks,
/fuad
> 2.37.0.170.g444d1eabd0-goog
>
Hi Kalesh,
On Fri, Jul 15, 2022 at 7:10 AM Kalesh Singh <[email protected]> wrote:
>
> In order to reuse the arm64 stack unwinding logic for the nVHE
> hypervisor stack, move the common code to a shared header
> (arch/arm64/include/asm/stacktrace/common.h).
>
> The nVHE hypervisor cannot safely link against kernel code, so we
> make use of the shared header to avoid duplicated logic later in
> this series.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
Reviewed-by: Fuad Tabba <[email protected]>
Thanks,
/fuad
> arch/arm64/include/asm/stacktrace.h | 35 +------
> arch/arm64/include/asm/stacktrace/common.h | 105 +++++++++++++++++++++
> arch/arm64/kernel/stacktrace.c | 57 -----------
> 3 files changed, 106 insertions(+), 91 deletions(-)
> create mode 100644 arch/arm64/include/asm/stacktrace/common.h
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index aec9315bf156..79f455b37c84 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -8,52 +8,19 @@
> #include <linux/percpu.h>
> #include <linux/sched.h>
> #include <linux/sched/task_stack.h>
> -#include <linux/types.h>
> #include <linux/llist.h>
>
> #include <asm/memory.h>
> #include <asm/ptrace.h>
> #include <asm/sdei.h>
>
> -enum stack_type {
> - STACK_TYPE_UNKNOWN,
> - STACK_TYPE_TASK,
> - STACK_TYPE_IRQ,
> - STACK_TYPE_OVERFLOW,
> - STACK_TYPE_SDEI_NORMAL,
> - STACK_TYPE_SDEI_CRITICAL,
> - __NR_STACK_TYPES
> -};
> -
> -struct stack_info {
> - unsigned long low;
> - unsigned long high;
> - enum stack_type type;
> -};
> +#include <asm/stacktrace/common.h>
>
> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> const char *loglvl);
>
> DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
>
> -static inline bool on_stack(unsigned long sp, unsigned long size,
> - unsigned long low, unsigned long high,
> - enum stack_type type, struct stack_info *info)
> -{
> - if (!low)
> - return false;
> -
> - if (sp < low || sp + size < sp || sp + size > high)
> - return false;
> -
> - if (info) {
> - info->low = low;
> - info->high = high;
> - info->type = type;
> - }
> - return true;
> -}
> -
> static inline bool on_irq_stack(unsigned long sp, unsigned long size,
> struct stack_info *info)
> {
> diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> new file mode 100644
> index 000000000000..64ae4f6b06fe
> --- /dev/null
> +++ b/arch/arm64/include/asm/stacktrace/common.h
> @@ -0,0 +1,105 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Common arm64 stack unwinder code.
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +#ifndef __ASM_STACKTRACE_COMMON_H
> +#define __ASM_STACKTRACE_COMMON_H
> +
> +#include <linux/bitmap.h>
> +#include <linux/bitops.h>
> +#include <linux/types.h>
> +
> +enum stack_type {
> + STACK_TYPE_UNKNOWN,
> + STACK_TYPE_TASK,
> + STACK_TYPE_IRQ,
> + STACK_TYPE_OVERFLOW,
> + STACK_TYPE_SDEI_NORMAL,
> + STACK_TYPE_SDEI_CRITICAL,
> + __NR_STACK_TYPES
> +};
> +
> +struct stack_info {
> + unsigned long low;
> + unsigned long high;
> + enum stack_type type;
> +};
> +
> +/*
> + * A snapshot of a frame record or fp/lr register values, along with some
> + * accounting information necessary for robust unwinding.
> + *
> + * @fp: The fp value in the frame record (or the real fp)
> + * @pc: The lr value in the frame record (or the real lr)
> + *
> + * @stacks_done: Stacks which have been entirely unwound, for which it is no
> + * longer valid to unwind to.
> + *
> + * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> + * of 0. This is used to ensure that within a stack, each
> + * subsequent frame record is at an increasing address.
> + * @prev_type: The type of stack this frame record was on, or a synthetic
> + * value of STACK_TYPE_UNKNOWN. This is used to detect a
> + * transition from one stack to another.
> + *
> + * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> + * associated with the most recently encountered replacement lr
> + * value.
> + *
> + * @task: The task being unwound.
> + */
> +struct unwind_state {
> + unsigned long fp;
> + unsigned long pc;
> + DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> + unsigned long prev_fp;
> + enum stack_type prev_type;
> +#ifdef CONFIG_KRETPROBES
> + struct llist_node *kr_cur;
> +#endif
> + struct task_struct *task;
> +};
> +
> +static inline bool on_stack(unsigned long sp, unsigned long size,
> + unsigned long low, unsigned long high,
> + enum stack_type type, struct stack_info *info)
> +{
> + if (!low)
> + return false;
> +
> + if (sp < low || sp + size < sp || sp + size > high)
> + return false;
> +
> + if (info) {
> + info->low = low;
> + info->high = high;
> + info->type = type;
> + }
> + return true;
> +}
> +
> +static inline void unwind_init_common(struct unwind_state *state,
> + struct task_struct *task)
> +{
> + state->task = task;
> +#ifdef CONFIG_KRETPROBES
> + state->kr_cur = NULL;
> +#endif
> +
> + /*
> + * Prime the first unwind.
> + *
> + * In unwind_next() we'll check that the FP points to a valid stack,
> + * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> + * treated as a transition to whichever stack that happens to be. The
> + * prev_fp value won't be used, but we set it to 0 such that it is
> + * definitely not an accessible stack address.
> + */
> + bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> + state->prev_fp = 0;
> + state->prev_type = STACK_TYPE_UNKNOWN;
> +}
> +
> +#endif /* __ASM_STACKTRACE_COMMON_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index fcaa151b81f1..94a5dd2ab8fd 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -18,63 +18,6 @@
> #include <asm/stack_pointer.h>
> #include <asm/stacktrace.h>
>
> -/*
> - * A snapshot of a frame record or fp/lr register values, along with some
> - * accounting information necessary for robust unwinding.
> - *
> - * @fp: The fp value in the frame record (or the real fp)
> - * @pc: The lr value in the frame record (or the real lr)
> - *
> - * @stacks_done: Stacks which have been entirely unwound, for which it is no
> - * longer valid to unwind to.
> - *
> - * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> - * of 0. This is used to ensure that within a stack, each
> - * subsequent frame record is at an increasing address.
> - * @prev_type: The type of stack this frame record was on, or a synthetic
> - * value of STACK_TYPE_UNKNOWN. This is used to detect a
> - * transition from one stack to another.
> - *
> - * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> - * associated with the most recently encountered replacement lr
> - * value.
> - *
> - * @task: The task being unwound.
> - */
> -struct unwind_state {
> - unsigned long fp;
> - unsigned long pc;
> - DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> - unsigned long prev_fp;
> - enum stack_type prev_type;
> -#ifdef CONFIG_KRETPROBES
> - struct llist_node *kr_cur;
> -#endif
> - struct task_struct *task;
> -};
> -
> -static void unwind_init_common(struct unwind_state *state,
> - struct task_struct *task)
> -{
> - state->task = task;
> -#ifdef CONFIG_KRETPROBES
> - state->kr_cur = NULL;
> -#endif
> -
> - /*
> - * Prime the first unwind.
> - *
> - * In unwind_next() we'll check that the FP points to a valid stack,
> - * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> - * treated as a transition to whichever stack that happens to be. The
> - * prev_fp value won't be used, but we set it to 0 such that it is
> - * definitely not an accessible stack address.
> - */
> - bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> - state->prev_fp = 0;
> - state->prev_type = STACK_TYPE_UNKNOWN;
> -}
> -
> /*
> * Start an unwind from a pt_regs.
> *
> --
> 2.37.0.170.g444d1eabd0-goog
>
Hi Kalesh,
On Fri, Jul 15, 2022 at 7:10 AM Kalesh Singh <[email protected]> wrote:
>
> Hi all,
>
> This is v4 of the series adding support for nVHE hypervisor stacktraces;
> and is based on arm64 for-next/stacktrace.
>
> Thanks all for your feedback on previous revisions. Mark Brown, I
> appreciate your Reviewed-by on the v3, I have dropped the tags in this
> new verision since I think the series has changed quite a bit.
>
> The previous versions were posted at:
> v3: https://lore.kernel.org/r/[email protected]/
> v2: https://lore.kernel.org/r/[email protected]/
> v1: https://lore.kernel.org/r/[email protected]/
>
> The main updates in this version are to address concerens from Marc on the
> memory usage and reusing the common code by refactoring into a shared header.
>
> Thanks,
> Kalesh
I tested an earlier version of this patch series, and it worked fine,
with symbolization. However, testing it now, both with nvhe and with
pkvm the symbolization isn't working for me. e.g.
[ 32.986706] kvm [251]: Protected nVHE HYP call trace:
[ 32.986796] kvm [251]: [<ffff800008f8b0e0>] 0xffff800008f8b0e0
[ 32.987391] kvm [251]: [<ffff800008f8b388>] 0xffff800008f8b388
[ 32.987493] kvm [251]: [<ffff800008f8d230>] 0xffff800008f8d230
[ 32.987591] kvm [251]: [<ffff800008f8d51c>] 0xffff800008f8d51c
[ 32.987695] kvm [251]: [<ffff800008f8c064>] 0xffff800008f8c064
[ 32.987803] kvm [251]: ---- End of Protected nVHE HYP call trace ----
CONFIG_PROTECTED_NVHE_STACKTRACE CONFIG_NVHE_EL2_DEBUG and
CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT are all enabled. Generating
a backtrace in the host I get proper symbolisation.
Is there anything else you'd like to know about my setup that would
help get to the bottom of this?
Thanks,
/fuad
>
> ============
>
> KVM nVHE Stack unwinding.
> ===
>
> nVHE has two modes of operation: protected (pKVM) and unprotected
> (conventional nVHE). Depending on the mode, a slightly different approach
> is used to dump the hyperviosr stacktrace but the core unwinding logic
> remains the same.
>
> Protected nVHE (pKVM) stacktraces
> ====
>
> In protected nVHE mode, the host cannot directly access hypervisor memory.
>
> The hypervisor stack unwinding happens in EL2 and is made accessible to
> the host via a shared buffer. Symbolizing and printing the stacktrace
> addresses is delegated to the host and happens in EL1.
>
> Non-protected (Conventional) nVHE stacktraces
> ====
>
> In non-protected mode, the host is able to directly access the hypervisor
> stack pages.
>
> The hypervisor stack unwinding and dumping of the stacktrace is performed
> by the host in EL1, as this avoids the memory overhead of setting up
> shared buffers between the host and hypervisor.
>
> Resuing the Core Unwinding Logic
> ====
>
> Since the hypervisor cannot link against the kernel code in proteced mode.
> The common stack unwinding code is moved to a shared header to allow reuse
> in the nVHE hypervisor.
>
> Reducing the memory footprint
> ====
>
> In this version the below steps were taken to reduce the memory usage of
> nVHE stack unwinding:
>
> 1) The nVHE overflow stack is reduced from PAGE_SIZE to 4KB; benificial
> for configurations with non 4KB pages (16KB or 64KB pages).
> 2) In protected nVHE mode (pKVM), the shared stacktrace buffers with the
> host are reduced from PAGE_SIZE to the minimum size required.
> 3) In systems other than Android, conventional nVHE makes up the vast
> majority of use case. So the pKVM stack tracing is disabled by default
> (!CONFIG_PROTECTED_NVHE_STACKTRACE), which avoid the memory usage for
> setting up shared buffers.
> 4) In non-protected nVHE mode (conventional nVHE), the stack unwinding
> is done directly in EL1 by the host and no shared buffers with the
> hyperviosr are needed.
>
> Sample Output
> ====
>
> The below shows an example output from a simple stack overflow test:
>
> [ 126.862960] kvm [371]: nVHE hyp panic at: [<ffff8000090a51d0>] __kvm_nvhe_recursive_death+0x10/0x34!
> [ 126.869920] kvm [371]: Protected nVHE HYP call trace:
> [ 126.870528] kvm [371]: [<ffff8000090a5570>] __kvm_nvhe_hyp_panic+0xac/0xf8
> [ 126.871342] kvm [371]: [<ffff8000090a55cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> [ 126.872174] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> [ 126.872971] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> . . .
>
> [ 126.927314] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> [ 126.927727] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> [ 126.928137] kvm [371]: [<ffff8000090a4de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
> [ 126.928561] kvm [371]: [<ffff8000090a7b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
> [ 126.928984] kvm [371]: [<ffff8000090a78b8>] __kvm_nvhe_handle_trap+0xc4/0x128
> [ 126.929385] kvm [371]: [<ffff8000090a6864>] __kvm_nvhe___host_exit+0x64/0x64
> [ 126.929804] kvm [371]: ---- End of Protected nVHE HYP call trace ----
>
> ============
>
>
> Kalesh Singh (18):
> arm64: stacktrace: Add shared header for common stack unwinding code
> arm64: stacktrace: Factor out on_accessible_stack_common()
> arm64: stacktrace: Factor out unwind_next_common()
> arm64: stacktrace: Handle frame pointer from different address spaces
> arm64: stacktrace: Factor out common unwind()
> arm64: stacktrace: Add description of stacktrace/common.h
> KVM: arm64: On stack overflow switch to hyp overflow_stack
> KVM: arm64: Add PROTECTED_NVHE_STACKTRACE Kconfig
> KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
> KVM: arm64: Stub implementation of pKVM HYP stack unwinder
> KVM: arm64: Stub implementation of non-protected nVHE HYP stack
> unwinder
> KVM: arm64: Save protected-nVHE (pKVM) hyp stacktrace
> KVM: arm64: Prepare non-protected nVHE hypervisor stacktrace
> KVM: arm64: Implement protected nVHE hyp stack unwinder
> KVM: arm64: Implement non-protected nVHE hyp stack unwinder
> KVM: arm64: Introduce pkvm_dump_backtrace()
> KVM: arm64: Introduce hyp_dump_backtrace()
> KVM: arm64: Dump nVHE hypervisor stack on panic
>
> arch/arm64/include/asm/kvm_asm.h | 16 ++
> arch/arm64/include/asm/memory.h | 7 +
> arch/arm64/include/asm/stacktrace.h | 92 ++++---
> arch/arm64/include/asm/stacktrace/common.h | 224 ++++++++++++++++
> arch/arm64/include/asm/stacktrace/nvhe.h | 291 +++++++++++++++++++++
> arch/arm64/kernel/stacktrace.c | 157 -----------
> arch/arm64/kvm/Kconfig | 15 ++
> arch/arm64/kvm/arm.c | 2 +-
> arch/arm64/kvm/handle_exit.c | 4 +
> arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
> arch/arm64/kvm/hyp/nvhe/host.S | 9 +-
> arch/arm64/kvm/hyp/nvhe/stacktrace.c | 108 ++++++++
> arch/arm64/kvm/hyp/nvhe/switch.c | 5 +
> 13 files changed, 727 insertions(+), 205 deletions(-)
> create mode 100644 arch/arm64/include/asm/stacktrace/common.h
> create mode 100644 arch/arm64/include/asm/stacktrace/nvhe.h
> create mode 100644 arch/arm64/kvm/hyp/nvhe/stacktrace.c
>
>
> base-commit: 82a592c13b0aeff94d84d54183dae0b26384c95f
> --
> 2.37.0.170.g444d1eabd0-goog
>
On Fri, Jul 15, 2022 at 6:55 AM 'Fuad Tabba' via kernel-team
<[email protected]> wrote:
>
> Hi Kalesh,
>
> On Fri, Jul 15, 2022 at 7:10 AM Kalesh Singh <[email protected]> wrote:
> >
> > Hi all,
> >
> > This is v4 of the series adding support for nVHE hypervisor stacktraces;
> > and is based on arm64 for-next/stacktrace.
> >
> > Thanks all for your feedback on previous revisions. Mark Brown, I
> > appreciate your Reviewed-by on the v3, I have dropped the tags in this
> > new verision since I think the series has changed quite a bit.
> >
> > The previous versions were posted at:
> > v3: https://lore.kernel.org/r/[email protected]/
> > v2: https://lore.kernel.org/r/[email protected]/
> > v1: https://lore.kernel.org/r/[email protected]/
> >
> > The main updates in this version are to address concerens from Marc on the
> > memory usage and reusing the common code by refactoring into a shared header.
> >
> > Thanks,
> > Kalesh
>
> I tested an earlier version of this patch series, and it worked fine,
> with symbolization. However, testing it now, both with nvhe and with
> pkvm the symbolization isn't working for me. e.g.
>
> [ 32.986706] kvm [251]: Protected nVHE HYP call trace:
> [ 32.986796] kvm [251]: [<ffff800008f8b0e0>] 0xffff800008f8b0e0
> [ 32.987391] kvm [251]: [<ffff800008f8b388>] 0xffff800008f8b388
> [ 32.987493] kvm [251]: [<ffff800008f8d230>] 0xffff800008f8d230
> [ 32.987591] kvm [251]: [<ffff800008f8d51c>] 0xffff800008f8d51c
> [ 32.987695] kvm [251]: [<ffff800008f8c064>] 0xffff800008f8c064
> [ 32.987803] kvm [251]: ---- End of Protected nVHE HYP call trace ----
>
> CONFIG_PROTECTED_NVHE_STACKTRACE CONFIG_NVHE_EL2_DEBUG and
> CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT are all enabled. Generating
> a backtrace in the host I get proper symbolisation.
>
> Is there anything else you'd like to know about my setup that would
> help get to the bottom of this?
Hi Fuad,
Thanks for reviewing it. Can you attach the .config when you have a
chance please? I will try reproducing it on my end.
--Kalesh
>
> Thanks,
> /fuad
>
>
>
>
> >
> > ============
> >
> > KVM nVHE Stack unwinding.
> > ===
> >
> > nVHE has two modes of operation: protected (pKVM) and unprotected
> > (conventional nVHE). Depending on the mode, a slightly different approach
> > is used to dump the hyperviosr stacktrace but the core unwinding logic
> > remains the same.
> >
> > Protected nVHE (pKVM) stacktraces
> > ====
> >
> > In protected nVHE mode, the host cannot directly access hypervisor memory.
> >
> > The hypervisor stack unwinding happens in EL2 and is made accessible to
> > the host via a shared buffer. Symbolizing and printing the stacktrace
> > addresses is delegated to the host and happens in EL1.
> >
> > Non-protected (Conventional) nVHE stacktraces
> > ====
> >
> > In non-protected mode, the host is able to directly access the hypervisor
> > stack pages.
> >
> > The hypervisor stack unwinding and dumping of the stacktrace is performed
> > by the host in EL1, as this avoids the memory overhead of setting up
> > shared buffers between the host and hypervisor.
> >
> > Resuing the Core Unwinding Logic
> > ====
> >
> > Since the hypervisor cannot link against the kernel code in proteced mode.
> > The common stack unwinding code is moved to a shared header to allow reuse
> > in the nVHE hypervisor.
> >
> > Reducing the memory footprint
> > ====
> >
> > In this version the below steps were taken to reduce the memory usage of
> > nVHE stack unwinding:
> >
> > 1) The nVHE overflow stack is reduced from PAGE_SIZE to 4KB; benificial
> > for configurations with non 4KB pages (16KB or 64KB pages).
> > 2) In protected nVHE mode (pKVM), the shared stacktrace buffers with the
> > host are reduced from PAGE_SIZE to the minimum size required.
> > 3) In systems other than Android, conventional nVHE makes up the vast
> > majority of use case. So the pKVM stack tracing is disabled by default
> > (!CONFIG_PROTECTED_NVHE_STACKTRACE), which avoid the memory usage for
> > setting up shared buffers.
> > 4) In non-protected nVHE mode (conventional nVHE), the stack unwinding
> > is done directly in EL1 by the host and no shared buffers with the
> > hyperviosr are needed.
> >
> > Sample Output
> > ====
> >
> > The below shows an example output from a simple stack overflow test:
> >
> > [ 126.862960] kvm [371]: nVHE hyp panic at: [<ffff8000090a51d0>] __kvm_nvhe_recursive_death+0x10/0x34!
> > [ 126.869920] kvm [371]: Protected nVHE HYP call trace:
> > [ 126.870528] kvm [371]: [<ffff8000090a5570>] __kvm_nvhe_hyp_panic+0xac/0xf8
> > [ 126.871342] kvm [371]: [<ffff8000090a55cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> > [ 126.872174] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > [ 126.872971] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > . . .
> >
> > [ 126.927314] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > [ 126.927727] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > [ 126.928137] kvm [371]: [<ffff8000090a4de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
> > [ 126.928561] kvm [371]: [<ffff8000090a7b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
> > [ 126.928984] kvm [371]: [<ffff8000090a78b8>] __kvm_nvhe_handle_trap+0xc4/0x128
> > [ 126.929385] kvm [371]: [<ffff8000090a6864>] __kvm_nvhe___host_exit+0x64/0x64
> > [ 126.929804] kvm [371]: ---- End of Protected nVHE HYP call trace ----
> >
> > ============
> >
> >
> > Kalesh Singh (18):
> > arm64: stacktrace: Add shared header for common stack unwinding code
> > arm64: stacktrace: Factor out on_accessible_stack_common()
> > arm64: stacktrace: Factor out unwind_next_common()
> > arm64: stacktrace: Handle frame pointer from different address spaces
> > arm64: stacktrace: Factor out common unwind()
> > arm64: stacktrace: Add description of stacktrace/common.h
> > KVM: arm64: On stack overflow switch to hyp overflow_stack
> > KVM: arm64: Add PROTECTED_NVHE_STACKTRACE Kconfig
> > KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
> > KVM: arm64: Stub implementation of pKVM HYP stack unwinder
> > KVM: arm64: Stub implementation of non-protected nVHE HYP stack
> > unwinder
> > KVM: arm64: Save protected-nVHE (pKVM) hyp stacktrace
> > KVM: arm64: Prepare non-protected nVHE hypervisor stacktrace
> > KVM: arm64: Implement protected nVHE hyp stack unwinder
> > KVM: arm64: Implement non-protected nVHE hyp stack unwinder
> > KVM: arm64: Introduce pkvm_dump_backtrace()
> > KVM: arm64: Introduce hyp_dump_backtrace()
> > KVM: arm64: Dump nVHE hypervisor stack on panic
> >
> > arch/arm64/include/asm/kvm_asm.h | 16 ++
> > arch/arm64/include/asm/memory.h | 7 +
> > arch/arm64/include/asm/stacktrace.h | 92 ++++---
> > arch/arm64/include/asm/stacktrace/common.h | 224 ++++++++++++++++
> > arch/arm64/include/asm/stacktrace/nvhe.h | 291 +++++++++++++++++++++
> > arch/arm64/kernel/stacktrace.c | 157 -----------
> > arch/arm64/kvm/Kconfig | 15 ++
> > arch/arm64/kvm/arm.c | 2 +-
> > arch/arm64/kvm/handle_exit.c | 4 +
> > arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
> > arch/arm64/kvm/hyp/nvhe/host.S | 9 +-
> > arch/arm64/kvm/hyp/nvhe/stacktrace.c | 108 ++++++++
> > arch/arm64/kvm/hyp/nvhe/switch.c | 5 +
> > 13 files changed, 727 insertions(+), 205 deletions(-)
> > create mode 100644 arch/arm64/include/asm/stacktrace/common.h
> > create mode 100644 arch/arm64/include/asm/stacktrace/nvhe.h
> > create mode 100644 arch/arm64/kvm/hyp/nvhe/stacktrace.c
> >
> >
> > base-commit: 82a592c13b0aeff94d84d54183dae0b26384c95f
> > --
> > 2.37.0.170.g444d1eabd0-goog
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Fri, Jul 15, 2022 at 11:58 AM Kalesh Singh <[email protected]> wrote:
>
> On Fri, Jul 15, 2022 at 6:55 AM 'Fuad Tabba' via kernel-team
> <[email protected]> wrote:
> >
> > Hi Kalesh,
> >
> > On Fri, Jul 15, 2022 at 7:10 AM Kalesh Singh <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > This is v4 of the series adding support for nVHE hypervisor stacktraces;
> > > and is based on arm64 for-next/stacktrace.
> > >
> > > Thanks all for your feedback on previous revisions. Mark Brown, I
> > > appreciate your Reviewed-by on the v3, I have dropped the tags in this
> > > new verision since I think the series has changed quite a bit.
> > >
> > > The previous versions were posted at:
> > > v3: https://lore.kernel.org/r/[email protected]/
> > > v2: https://lore.kernel.org/r/[email protected]/
> > > v1: https://lore.kernel.org/r/[email protected]/
> > >
> > > The main updates in this version are to address concerens from Marc on the
> > > memory usage and reusing the common code by refactoring into a shared header.
> > >
> > > Thanks,
> > > Kalesh
> >
> > I tested an earlier version of this patch series, and it worked fine,
> > with symbolization. However, testing it now, both with nvhe and with
> > pkvm the symbolization isn't working for me. e.g.
> >
> > [ 32.986706] kvm [251]: Protected nVHE HYP call trace:
> > [ 32.986796] kvm [251]: [<ffff800008f8b0e0>] 0xffff800008f8b0e0
> > [ 32.987391] kvm [251]: [<ffff800008f8b388>] 0xffff800008f8b388
> > [ 32.987493] kvm [251]: [<ffff800008f8d230>] 0xffff800008f8d230
> > [ 32.987591] kvm [251]: [<ffff800008f8d51c>] 0xffff800008f8d51c
> > [ 32.987695] kvm [251]: [<ffff800008f8c064>] 0xffff800008f8c064
> > [ 32.987803] kvm [251]: ---- End of Protected nVHE HYP call trace ----
> >
> > CONFIG_PROTECTED_NVHE_STACKTRACE CONFIG_NVHE_EL2_DEBUG and
> > CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT are all enabled. Generating
> > a backtrace in the host I get proper symbolisation.
> >
> > Is there anything else you'd like to know about my setup that would
> > help get to the bottom of this?
>
> Hi Fuad,
>
> Thanks for reviewing it. Can you attach the .config when you have a
> chance please? I will try reproducing it on my end.
My local config had CONFIG_RANDOMIZE_BASE off. I have posted a fix for
the existing occurrences [1]. I'll address those for the unwinder in
the next version of this series.
[1] https://lore.kernel.org/r/[email protected]/
Thanks,
Kalesh
>
> --Kalesh
>
> >
> > Thanks,
> > /fuad
> >
> >
> >
> >
> > >
> > > ============
> > >
> > > KVM nVHE Stack unwinding.
> > > ===
> > >
> > > nVHE has two modes of operation: protected (pKVM) and unprotected
> > > (conventional nVHE). Depending on the mode, a slightly different approach
> > > is used to dump the hyperviosr stacktrace but the core unwinding logic
> > > remains the same.
> > >
> > > Protected nVHE (pKVM) stacktraces
> > > ====
> > >
> > > In protected nVHE mode, the host cannot directly access hypervisor memory.
> > >
> > > The hypervisor stack unwinding happens in EL2 and is made accessible to
> > > the host via a shared buffer. Symbolizing and printing the stacktrace
> > > addresses is delegated to the host and happens in EL1.
> > >
> > > Non-protected (Conventional) nVHE stacktraces
> > > ====
> > >
> > > In non-protected mode, the host is able to directly access the hypervisor
> > > stack pages.
> > >
> > > The hypervisor stack unwinding and dumping of the stacktrace is performed
> > > by the host in EL1, as this avoids the memory overhead of setting up
> > > shared buffers between the host and hypervisor.
> > >
> > > Resuing the Core Unwinding Logic
> > > ====
> > >
> > > Since the hypervisor cannot link against the kernel code in proteced mode.
> > > The common stack unwinding code is moved to a shared header to allow reuse
> > > in the nVHE hypervisor.
> > >
> > > Reducing the memory footprint
> > > ====
> > >
> > > In this version the below steps were taken to reduce the memory usage of
> > > nVHE stack unwinding:
> > >
> > > 1) The nVHE overflow stack is reduced from PAGE_SIZE to 4KB; benificial
> > > for configurations with non 4KB pages (16KB or 64KB pages).
> > > 2) In protected nVHE mode (pKVM), the shared stacktrace buffers with the
> > > host are reduced from PAGE_SIZE to the minimum size required.
> > > 3) In systems other than Android, conventional nVHE makes up the vast
> > > majority of use case. So the pKVM stack tracing is disabled by default
> > > (!CONFIG_PROTECTED_NVHE_STACKTRACE), which avoid the memory usage for
> > > setting up shared buffers.
> > > 4) In non-protected nVHE mode (conventional nVHE), the stack unwinding
> > > is done directly in EL1 by the host and no shared buffers with the
> > > hyperviosr are needed.
> > >
> > > Sample Output
> > > ====
> > >
> > > The below shows an example output from a simple stack overflow test:
> > >
> > > [ 126.862960] kvm [371]: nVHE hyp panic at: [<ffff8000090a51d0>] __kvm_nvhe_recursive_death+0x10/0x34!
> > > [ 126.869920] kvm [371]: Protected nVHE HYP call trace:
> > > [ 126.870528] kvm [371]: [<ffff8000090a5570>] __kvm_nvhe_hyp_panic+0xac/0xf8
> > > [ 126.871342] kvm [371]: [<ffff8000090a55cc>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
> > > [ 126.872174] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > > [ 126.872971] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > > . . .
> > >
> > > [ 126.927314] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > > [ 126.927727] kvm [371]: [<ffff8000090a51e4>] __kvm_nvhe_recursive_death+0x24/0x34
> > > [ 126.928137] kvm [371]: [<ffff8000090a4de4>] __kvm_nvhe___kvm_vcpu_run+0x30/0x40c
> > > [ 126.928561] kvm [371]: [<ffff8000090a7b64>] __kvm_nvhe_handle___kvm_vcpu_run+0x30/0x48
> > > [ 126.928984] kvm [371]: [<ffff8000090a78b8>] __kvm_nvhe_handle_trap+0xc4/0x128
> > > [ 126.929385] kvm [371]: [<ffff8000090a6864>] __kvm_nvhe___host_exit+0x64/0x64
> > > [ 126.929804] kvm [371]: ---- End of Protected nVHE HYP call trace ----
> > >
> > > ============
> > >
> > >
> > > Kalesh Singh (18):
> > > arm64: stacktrace: Add shared header for common stack unwinding code
> > > arm64: stacktrace: Factor out on_accessible_stack_common()
> > > arm64: stacktrace: Factor out unwind_next_common()
> > > arm64: stacktrace: Handle frame pointer from different address spaces
> > > arm64: stacktrace: Factor out common unwind()
> > > arm64: stacktrace: Add description of stacktrace/common.h
> > > KVM: arm64: On stack overflow switch to hyp overflow_stack
> > > KVM: arm64: Add PROTECTED_NVHE_STACKTRACE Kconfig
> > > KVM: arm64: Allocate shared pKVM hyp stacktrace buffers
> > > KVM: arm64: Stub implementation of pKVM HYP stack unwinder
> > > KVM: arm64: Stub implementation of non-protected nVHE HYP stack
> > > unwinder
> > > KVM: arm64: Save protected-nVHE (pKVM) hyp stacktrace
> > > KVM: arm64: Prepare non-protected nVHE hypervisor stacktrace
> > > KVM: arm64: Implement protected nVHE hyp stack unwinder
> > > KVM: arm64: Implement non-protected nVHE hyp stack unwinder
> > > KVM: arm64: Introduce pkvm_dump_backtrace()
> > > KVM: arm64: Introduce hyp_dump_backtrace()
> > > KVM: arm64: Dump nVHE hypervisor stack on panic
> > >
> > > arch/arm64/include/asm/kvm_asm.h | 16 ++
> > > arch/arm64/include/asm/memory.h | 7 +
> > > arch/arm64/include/asm/stacktrace.h | 92 ++++---
> > > arch/arm64/include/asm/stacktrace/common.h | 224 ++++++++++++++++
> > > arch/arm64/include/asm/stacktrace/nvhe.h | 291 +++++++++++++++++++++
> > > arch/arm64/kernel/stacktrace.c | 157 -----------
> > > arch/arm64/kvm/Kconfig | 15 ++
> > > arch/arm64/kvm/arm.c | 2 +-
> > > arch/arm64/kvm/handle_exit.c | 4 +
> > > arch/arm64/kvm/hyp/nvhe/Makefile | 2 +-
> > > arch/arm64/kvm/hyp/nvhe/host.S | 9 +-
> > > arch/arm64/kvm/hyp/nvhe/stacktrace.c | 108 ++++++++
> > > arch/arm64/kvm/hyp/nvhe/switch.c | 5 +
> > > 13 files changed, 727 insertions(+), 205 deletions(-)
> > > create mode 100644 arch/arm64/include/asm/stacktrace/common.h
> > > create mode 100644 arch/arm64/include/asm/stacktrace/nvhe.h
> > > create mode 100644 arch/arm64/kvm/hyp/nvhe/stacktrace.c
> > >
> > >
> > > base-commit: 82a592c13b0aeff94d84d54183dae0b26384c95f
> > > --
> > > 2.37.0.170.g444d1eabd0-goog
> > >
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> >
On Fri, 15 Jul 2022 07:10:15 +0100,
Kalesh Singh <[email protected]> wrote:
>
> Add brief description on how to use stacktrace/common.h to implement
> a stack unwinder.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace/common.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> index f86efe71479d..b362086f4c70 100644
> --- a/arch/arm64/include/asm/stacktrace/common.h
> +++ b/arch/arm64/include/asm/stacktrace/common.h
> @@ -2,6 +2,14 @@
> /*
> * Common arm64 stack unwinder code.
> *
> + * To implement a new arm64 stack unwinder:
> + * 1) Include this header
> + *
> + * 2) Provide implementations for the following functions:
> + * - on_overflow_stack()
> + * - on_accessible_stack()
> + * - unwind_next()
A short description of what these helpers are supposed to do would
also be helpful.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
[- Drew and android-mm, as both addresses bounce]
On Fri, 15 Jul 2022 07:10:17 +0100,
Kalesh Singh <[email protected]> wrote:
>
> This can be used to disable stacktrace for the protected KVM
> nVHE hypervisor, in order to save on the associated memory usage.
>
> This option is disabled by default, since protected KVM is not widely
> used on platforms other than Android currently.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/kvm/Kconfig | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8a5fbbf084df..1edab6f8a3b8 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -46,6 +46,21 @@ menuconfig KVM
>
> If unsure, say N.
>
> +config PROTECTED_NVHE_STACKTRACE
> + bool "Protected KVM hypervisor stacktraces"
> + depends on KVM
> + default n
> + help
> + Say Y here to enable pKVM hypervisor stacktraces on hyp_panic()
> +
> + If you are not using protected nVHE (pKVM), say N.
> +
> + If using protected nVHE mode, but cannot afford the associated
> + memory cost (less than 0.75 page per CPU) of pKVM stacktraces,
> + say N.
> +
> + If unsure, say N.
> +
Can we make this depend on NVHE_EL2_DEBUG instead? I'd like to keep
the disclosing of EL2 information in protected mode a strict debug
feature.
> config NVHE_EL2_DEBUG
> bool "Debug mode for non-VHE EL2 object"
> depends on KVM
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Fri, 15 Jul 2022 07:10:20 +0100,
Kalesh Singh <[email protected]> wrote:
>
> Add stub implementations of non-protected nVHE stack unwinder, for
> building. These are implemented later in this series.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace/nvhe.h | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> index 1eac4e57f2ae..36cf7858ddd8 100644
> --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> @@ -8,6 +8,12 @@
> * the HYP memory. The stack is unwinded in EL2 and dumped to a shared
> * buffer where the host can read and print the stacktrace.
> *
> + * 2) Non-protected nVHE mode - the host can directly access the
> + * HYP stack pages and unwind the HYP stack in EL1. This saves having
> + * to allocate shared buffers for the host to read the unwinded
> + * stacktrace.
> + *
> + *
> * Copyright (C) 2022 Google LLC
> */
> #ifndef __ASM_STACKTRACE_NVHE_H
> @@ -53,5 +59,21 @@ static int notrace unwind_next(struct unwind_state *state)
> NOKPROBE_SYMBOL(unwind_next);
> #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
>
> +/*
> + * Non-protected nVHE HYP stack unwinder
> + */
> +#else /* !__KVM_NVHE_HYPERVISOR__ */
I don't get this path. This either represents the VHE hypervisor or
the kernel proper. Which one is it?
> +static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
> + struct stack_info *info)
> +{
> + return false;
> +}
> +
> +static int notrace unwind_next(struct unwind_state *state)
> +{
> + return 0;
> +}
> +NOKPROBE_SYMBOL(unwind_next);
> +
> #endif /* __KVM_NVHE_HYPERVISOR__ */
> #endif /* __ASM_STACKTRACE_NVHE_H */
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Fri, 15 Jul 2022 07:10:21 +0100,
Kalesh Singh <[email protected]> wrote:
>
> In protected nVHE mode, the host cannot access private owned hypervisor
> memory. Also the hypervisor aims to remains simple to reduce the attack
> surface and does not provide any printk support.
>
> For the above reasons, the approach taken to provide hypervisor stacktraces
> in protected mode is:
> 1) Unwind and save the hyp stack addresses in EL2 to a shared buffer
> with the host (done in this patch).
> 2) Delegate the dumping and symbolization of the addresses to the
> host in EL1 (later patch in the series).
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace/nvhe.h | 18 ++++++
> arch/arm64/kvm/hyp/nvhe/stacktrace.c | 70 ++++++++++++++++++++++++
> 2 files changed, 88 insertions(+)
>
> diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> index 36cf7858ddd8..456a6ae08433 100644
> --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> @@ -21,6 +21,22 @@
>
> #include <asm/stacktrace/common.h>
>
> +/**
> + * kvm_nvhe_unwind_init - Start an unwind from the given nVHE HYP fp and pc
> + *
> + * @fp : frame pointer at which to start the unwinding.
> + * @pc : program counter at which to start the unwinding.
> + */
> +static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
> + unsigned long fp,
> + unsigned long pc)
> +{
> + unwind_init_common(state, NULL);
Huh. Be careful here. This function is only 'inline', which means it
may not be really inlined. We've had tons of similar issues like this
in the past, and although this will not break at runtime anymore, it
will definitely stop the kernel from linking.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Hi Kalesh,
On Fri, Jul 15, 2022 at 7:11 AM Kalesh Singh <[email protected]> wrote:
>
> In protected nVHE mode, the host cannot access private owned hypervisor
> memory. Also the hypervisor aims to remains simple to reduce the attack
> surface and does not provide any printk support.
>
> For the above reasons, the approach taken to provide hypervisor stacktraces
> in protected mode is:
> 1) Unwind and save the hyp stack addresses in EL2 to a shared buffer
> with the host (done in this patch).
> 2) Delegate the dumping and symbolization of the addresses to the
> host in EL1 (later patch in the series).
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace/nvhe.h | 18 ++++++
> arch/arm64/kvm/hyp/nvhe/stacktrace.c | 70 ++++++++++++++++++++++++
> 2 files changed, 88 insertions(+)
>
> diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> index 36cf7858ddd8..456a6ae08433 100644
> --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> @@ -21,6 +21,22 @@
>
> #include <asm/stacktrace/common.h>
>
> +/**
> + * kvm_nvhe_unwind_init - Start an unwind from the given nVHE HYP fp and pc
> + *
> + * @fp : frame pointer at which to start the unwinding.
> + * @pc : program counter at which to start the unwinding.
> + */
> +static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
> + unsigned long fp,
> + unsigned long pc)
> +{
> + unwind_init_common(state, NULL);
> +
> + state->fp = fp;
> + state->pc = pc;
> +}
> +
> static inline bool on_accessible_stack(const struct task_struct *tsk,
> unsigned long sp, unsigned long size,
> struct stack_info *info)
> @@ -33,6 +49,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> */
> #ifdef __KVM_NVHE_HYPERVISOR__
>
> +extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
> +
> #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
> static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
> struct stack_info *info)
> diff --git a/arch/arm64/kvm/hyp/nvhe/stacktrace.c b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> index 96c8b93320eb..832a536e440f 100644
> --- a/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> +++ b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> @@ -11,4 +11,74 @@ DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)
>
> #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
> DEFINE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
> +
> +/**
> + * pkvm_save_backtrace_entry - Saves a protected nVHE HYP stacktrace entry
> + *
> + * @arg : the position of the entry in the stacktrace buffer
> + * @where : the program counter corresponding to the stack frame
> + *
> + * Save the return address of a stack frame to the shared stacktrace buffer.
> + * The host can access this shared buffer from EL1 to dump the backtrace.
> + */
> +static bool pkvm_save_backtrace_entry(void *arg, unsigned long where)
> +{
> + unsigned long **stacktrace_pos = (unsigned long **)arg;
> + unsigned long stacktrace_start, stacktrace_end;
> +
> + stacktrace_start = (unsigned long)this_cpu_ptr(pkvm_stacktrace);
> + stacktrace_end = stacktrace_start + NVHE_STACKTRACE_SIZE - (2 * sizeof(long));
I guess this is related to my comment in patch 9, but why does the end
happen at 2 * instead of just 1 * before the actual end? I guess
because it's inclusive. That said, a comment would be helpful.
Thanks,
/fuad
> +
> + if ((unsigned long) *stacktrace_pos > stacktrace_end)
> + return false;
> +
> + /* Save the entry to the current pos in stacktrace buffer */
> + **stacktrace_pos = where;
> +
> + /* A zero entry delimits the end of the stacktrace. */
> + *(*stacktrace_pos + 1) = 0UL;
> +
> + /* Increment the current pos */
> + ++*stacktrace_pos;
> +
> + return true;
> +}
> +
> +/**
> + * pkvm_save_backtrace - Saves the protected nVHE HYP stacktrace
> + *
> + * @fp : frame pointer at which to start the unwinding.
> + * @pc : program counter at which to start the unwinding.
> + *
> + * Save the unwinded stack addresses to the shared stacktrace buffer.
> + * The host can access this shared buffer from EL1 to dump the backtrace.
> + */
> +static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
> +{
> + void *stacktrace_start = (void *)this_cpu_ptr(pkvm_stacktrace);
> + struct unwind_state state;
> +
> + kvm_nvhe_unwind_init(&state, fp, pc);
> +
> + unwind(&state, pkvm_save_backtrace_entry, &stacktrace_start);
> +}
> +#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
> +static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
> +{
> +}
> #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
> +
> +/**
> + * kvm_nvhe_prepare_backtrace - prepare to dump the nVHE backtrace
> + *
> + * @fp : frame pointer at which to start the unwinding.
> + * @pc : program counter at which to start the unwinding.
> + *
> + * Saves the information needed by the host to dump the nVHE hypervisor
> + * backtrace.
> + */
> +void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc)
> +{
> + if (is_protected_kvm_enabled())
> + pkvm_save_backtrace(fp, pc);
> +}
> --
> 2.37.0.170.g444d1eabd0-goog
>
Hi,
Can you please explain why you are targetting my @oracle.com email
address with this patch set?
This causes me problems as I use Outlook's Web interface for that
which doesn't appear to cope with the threading, and most certainly
is only capable of top-reply only which is against Linux kernel email
standards.
Thanks.
On Thu, Jul 14, 2022 at 11:10:10PM -0700, Kalesh Singh wrote:
> In order to reuse the arm64 stack unwinding logic for the nVHE
> hypervisor stack, move the common code to a shared header
> (arch/arm64/include/asm/stacktrace/common.h).
>
> The nVHE hypervisor cannot safely link against kernel code, so we
> make use of the shared header to avoid duplicated logic later in
> this series.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace.h | 35 +------
> arch/arm64/include/asm/stacktrace/common.h | 105 +++++++++++++++++++++
> arch/arm64/kernel/stacktrace.c | 57 -----------
> 3 files changed, 106 insertions(+), 91 deletions(-)
> create mode 100644 arch/arm64/include/asm/stacktrace/common.h
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index aec9315bf156..79f455b37c84 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -8,52 +8,19 @@
> #include <linux/percpu.h>
> #include <linux/sched.h>
> #include <linux/sched/task_stack.h>
> -#include <linux/types.h>
> #include <linux/llist.h>
>
> #include <asm/memory.h>
> #include <asm/ptrace.h>
> #include <asm/sdei.h>
>
> -enum stack_type {
> - STACK_TYPE_UNKNOWN,
> - STACK_TYPE_TASK,
> - STACK_TYPE_IRQ,
> - STACK_TYPE_OVERFLOW,
> - STACK_TYPE_SDEI_NORMAL,
> - STACK_TYPE_SDEI_CRITICAL,
> - __NR_STACK_TYPES
> -};
> -
> -struct stack_info {
> - unsigned long low;
> - unsigned long high;
> - enum stack_type type;
> -};
> +#include <asm/stacktrace/common.h>
>
> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> const char *loglvl);
>
> DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
>
> -static inline bool on_stack(unsigned long sp, unsigned long size,
> - unsigned long low, unsigned long high,
> - enum stack_type type, struct stack_info *info)
> -{
> - if (!low)
> - return false;
> -
> - if (sp < low || sp + size < sp || sp + size > high)
> - return false;
> -
> - if (info) {
> - info->low = low;
> - info->high = high;
> - info->type = type;
> - }
> - return true;
> -}
> -
> static inline bool on_irq_stack(unsigned long sp, unsigned long size,
> struct stack_info *info)
> {
> diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> new file mode 100644
> index 000000000000..64ae4f6b06fe
> --- /dev/null
> +++ b/arch/arm64/include/asm/stacktrace/common.h
> @@ -0,0 +1,105 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Common arm64 stack unwinder code.
> + *
> + * Copyright (C) 2012 ARM Ltd.
> + */
> +#ifndef __ASM_STACKTRACE_COMMON_H
> +#define __ASM_STACKTRACE_COMMON_H
> +
> +#include <linux/bitmap.h>
> +#include <linux/bitops.h>
> +#include <linux/types.h>
> +
> +enum stack_type {
> + STACK_TYPE_UNKNOWN,
> + STACK_TYPE_TASK,
> + STACK_TYPE_IRQ,
> + STACK_TYPE_OVERFLOW,
> + STACK_TYPE_SDEI_NORMAL,
> + STACK_TYPE_SDEI_CRITICAL,
> + __NR_STACK_TYPES
> +};
> +
> +struct stack_info {
> + unsigned long low;
> + unsigned long high;
> + enum stack_type type;
> +};
> +
> +/*
> + * A snapshot of a frame record or fp/lr register values, along with some
> + * accounting information necessary for robust unwinding.
> + *
> + * @fp: The fp value in the frame record (or the real fp)
> + * @pc: The lr value in the frame record (or the real lr)
> + *
> + * @stacks_done: Stacks which have been entirely unwound, for which it is no
> + * longer valid to unwind to.
> + *
> + * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> + * of 0. This is used to ensure that within a stack, each
> + * subsequent frame record is at an increasing address.
> + * @prev_type: The type of stack this frame record was on, or a synthetic
> + * value of STACK_TYPE_UNKNOWN. This is used to detect a
> + * transition from one stack to another.
> + *
> + * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> + * associated with the most recently encountered replacement lr
> + * value.
> + *
> + * @task: The task being unwound.
> + */
> +struct unwind_state {
> + unsigned long fp;
> + unsigned long pc;
> + DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> + unsigned long prev_fp;
> + enum stack_type prev_type;
> +#ifdef CONFIG_KRETPROBES
> + struct llist_node *kr_cur;
> +#endif
> + struct task_struct *task;
> +};
> +
> +static inline bool on_stack(unsigned long sp, unsigned long size,
> + unsigned long low, unsigned long high,
> + enum stack_type type, struct stack_info *info)
> +{
> + if (!low)
> + return false;
> +
> + if (sp < low || sp + size < sp || sp + size > high)
> + return false;
> +
> + if (info) {
> + info->low = low;
> + info->high = high;
> + info->type = type;
> + }
> + return true;
> +}
> +
> +static inline void unwind_init_common(struct unwind_state *state,
> + struct task_struct *task)
> +{
> + state->task = task;
> +#ifdef CONFIG_KRETPROBES
> + state->kr_cur = NULL;
> +#endif
> +
> + /*
> + * Prime the first unwind.
> + *
> + * In unwind_next() we'll check that the FP points to a valid stack,
> + * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> + * treated as a transition to whichever stack that happens to be. The
> + * prev_fp value won't be used, but we set it to 0 such that it is
> + * definitely not an accessible stack address.
> + */
> + bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> + state->prev_fp = 0;
> + state->prev_type = STACK_TYPE_UNKNOWN;
> +}
> +
> +#endif /* __ASM_STACKTRACE_COMMON_H */
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index fcaa151b81f1..94a5dd2ab8fd 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -18,63 +18,6 @@
> #include <asm/stack_pointer.h>
> #include <asm/stacktrace.h>
>
> -/*
> - * A snapshot of a frame record or fp/lr register values, along with some
> - * accounting information necessary for robust unwinding.
> - *
> - * @fp: The fp value in the frame record (or the real fp)
> - * @pc: The lr value in the frame record (or the real lr)
> - *
> - * @stacks_done: Stacks which have been entirely unwound, for which it is no
> - * longer valid to unwind to.
> - *
> - * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> - * of 0. This is used to ensure that within a stack, each
> - * subsequent frame record is at an increasing address.
> - * @prev_type: The type of stack this frame record was on, or a synthetic
> - * value of STACK_TYPE_UNKNOWN. This is used to detect a
> - * transition from one stack to another.
> - *
> - * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> - * associated with the most recently encountered replacement lr
> - * value.
> - *
> - * @task: The task being unwound.
> - */
> -struct unwind_state {
> - unsigned long fp;
> - unsigned long pc;
> - DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> - unsigned long prev_fp;
> - enum stack_type prev_type;
> -#ifdef CONFIG_KRETPROBES
> - struct llist_node *kr_cur;
> -#endif
> - struct task_struct *task;
> -};
> -
> -static void unwind_init_common(struct unwind_state *state,
> - struct task_struct *task)
> -{
> - state->task = task;
> -#ifdef CONFIG_KRETPROBES
> - state->kr_cur = NULL;
> -#endif
> -
> - /*
> - * Prime the first unwind.
> - *
> - * In unwind_next() we'll check that the FP points to a valid stack,
> - * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> - * treated as a transition to whichever stack that happens to be. The
> - * prev_fp value won't be used, but we set it to 0 such that it is
> - * definitely not an accessible stack address.
> - */
> - bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> - state->prev_fp = 0;
> - state->prev_type = STACK_TYPE_UNKNOWN;
> -}
> -
> /*
> * Start an unwind from a pt_regs.
> *
> --
> 2.37.0.170.g444d1eabd0-goog
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Jul 18, 2022 at 5:52 AM Russell King (Oracle)
<[email protected]> wrote:
>
> Hi,
>
> Can you please explain why you are targetting my @oracle.com email
> address with this patch set?
>
> This causes me problems as I use Outlook's Web interface for that
> which doesn't appear to cope with the threading, and most certainly
> is only capable of top-reply only which is against Linux kernel email
> standards.
Hi Russell,
Sorry I wasn't aware of it (I got your oracle email from
get_maintainer script). Going forward I'll use the one you responded
from instead.
Thanks,
Kalesh
>
> Thanks.
>
> On Thu, Jul 14, 2022 at 11:10:10PM -0700, Kalesh Singh wrote:
> > In order to reuse the arm64 stack unwinding logic for the nVHE
> > hypervisor stack, move the common code to a shared header
> > (arch/arm64/include/asm/stacktrace/common.h).
> >
> > The nVHE hypervisor cannot safely link against kernel code, so we
> > make use of the shared header to avoid duplicated logic later in
> > this series.
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/stacktrace.h | 35 +------
> > arch/arm64/include/asm/stacktrace/common.h | 105 +++++++++++++++++++++
> > arch/arm64/kernel/stacktrace.c | 57 -----------
> > 3 files changed, 106 insertions(+), 91 deletions(-)
> > create mode 100644 arch/arm64/include/asm/stacktrace/common.h
> >
> > diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> > index aec9315bf156..79f455b37c84 100644
> > --- a/arch/arm64/include/asm/stacktrace.h
> > +++ b/arch/arm64/include/asm/stacktrace.h
> > @@ -8,52 +8,19 @@
> > #include <linux/percpu.h>
> > #include <linux/sched.h>
> > #include <linux/sched/task_stack.h>
> > -#include <linux/types.h>
> > #include <linux/llist.h>
> >
> > #include <asm/memory.h>
> > #include <asm/ptrace.h>
> > #include <asm/sdei.h>
> >
> > -enum stack_type {
> > - STACK_TYPE_UNKNOWN,
> > - STACK_TYPE_TASK,
> > - STACK_TYPE_IRQ,
> > - STACK_TYPE_OVERFLOW,
> > - STACK_TYPE_SDEI_NORMAL,
> > - STACK_TYPE_SDEI_CRITICAL,
> > - __NR_STACK_TYPES
> > -};
> > -
> > -struct stack_info {
> > - unsigned long low;
> > - unsigned long high;
> > - enum stack_type type;
> > -};
> > +#include <asm/stacktrace/common.h>
> >
> > extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> > const char *loglvl);
> >
> > DECLARE_PER_CPU(unsigned long *, irq_stack_ptr);
> >
> > -static inline bool on_stack(unsigned long sp, unsigned long size,
> > - unsigned long low, unsigned long high,
> > - enum stack_type type, struct stack_info *info)
> > -{
> > - if (!low)
> > - return false;
> > -
> > - if (sp < low || sp + size < sp || sp + size > high)
> > - return false;
> > -
> > - if (info) {
> > - info->low = low;
> > - info->high = high;
> > - info->type = type;
> > - }
> > - return true;
> > -}
> > -
> > static inline bool on_irq_stack(unsigned long sp, unsigned long size,
> > struct stack_info *info)
> > {
> > diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> > new file mode 100644
> > index 000000000000..64ae4f6b06fe
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/stacktrace/common.h
> > @@ -0,0 +1,105 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Common arm64 stack unwinder code.
> > + *
> > + * Copyright (C) 2012 ARM Ltd.
> > + */
> > +#ifndef __ASM_STACKTRACE_COMMON_H
> > +#define __ASM_STACKTRACE_COMMON_H
> > +
> > +#include <linux/bitmap.h>
> > +#include <linux/bitops.h>
> > +#include <linux/types.h>
> > +
> > +enum stack_type {
> > + STACK_TYPE_UNKNOWN,
> > + STACK_TYPE_TASK,
> > + STACK_TYPE_IRQ,
> > + STACK_TYPE_OVERFLOW,
> > + STACK_TYPE_SDEI_NORMAL,
> > + STACK_TYPE_SDEI_CRITICAL,
> > + __NR_STACK_TYPES
> > +};
> > +
> > +struct stack_info {
> > + unsigned long low;
> > + unsigned long high;
> > + enum stack_type type;
> > +};
> > +
> > +/*
> > + * A snapshot of a frame record or fp/lr register values, along with some
> > + * accounting information necessary for robust unwinding.
> > + *
> > + * @fp: The fp value in the frame record (or the real fp)
> > + * @pc: The lr value in the frame record (or the real lr)
> > + *
> > + * @stacks_done: Stacks which have been entirely unwound, for which it is no
> > + * longer valid to unwind to.
> > + *
> > + * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> > + * of 0. This is used to ensure that within a stack, each
> > + * subsequent frame record is at an increasing address.
> > + * @prev_type: The type of stack this frame record was on, or a synthetic
> > + * value of STACK_TYPE_UNKNOWN. This is used to detect a
> > + * transition from one stack to another.
> > + *
> > + * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> > + * associated with the most recently encountered replacement lr
> > + * value.
> > + *
> > + * @task: The task being unwound.
> > + */
> > +struct unwind_state {
> > + unsigned long fp;
> > + unsigned long pc;
> > + DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> > + unsigned long prev_fp;
> > + enum stack_type prev_type;
> > +#ifdef CONFIG_KRETPROBES
> > + struct llist_node *kr_cur;
> > +#endif
> > + struct task_struct *task;
> > +};
> > +
> > +static inline bool on_stack(unsigned long sp, unsigned long size,
> > + unsigned long low, unsigned long high,
> > + enum stack_type type, struct stack_info *info)
> > +{
> > + if (!low)
> > + return false;
> > +
> > + if (sp < low || sp + size < sp || sp + size > high)
> > + return false;
> > +
> > + if (info) {
> > + info->low = low;
> > + info->high = high;
> > + info->type = type;
> > + }
> > + return true;
> > +}
> > +
> > +static inline void unwind_init_common(struct unwind_state *state,
> > + struct task_struct *task)
> > +{
> > + state->task = task;
> > +#ifdef CONFIG_KRETPROBES
> > + state->kr_cur = NULL;
> > +#endif
> > +
> > + /*
> > + * Prime the first unwind.
> > + *
> > + * In unwind_next() we'll check that the FP points to a valid stack,
> > + * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> > + * treated as a transition to whichever stack that happens to be. The
> > + * prev_fp value won't be used, but we set it to 0 such that it is
> > + * definitely not an accessible stack address.
> > + */
> > + bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> > + state->prev_fp = 0;
> > + state->prev_type = STACK_TYPE_UNKNOWN;
> > +}
> > +
> > +#endif /* __ASM_STACKTRACE_COMMON_H */
> > diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> > index fcaa151b81f1..94a5dd2ab8fd 100644
> > --- a/arch/arm64/kernel/stacktrace.c
> > +++ b/arch/arm64/kernel/stacktrace.c
> > @@ -18,63 +18,6 @@
> > #include <asm/stack_pointer.h>
> > #include <asm/stacktrace.h>
> >
> > -/*
> > - * A snapshot of a frame record or fp/lr register values, along with some
> > - * accounting information necessary for robust unwinding.
> > - *
> > - * @fp: The fp value in the frame record (or the real fp)
> > - * @pc: The lr value in the frame record (or the real lr)
> > - *
> > - * @stacks_done: Stacks which have been entirely unwound, for which it is no
> > - * longer valid to unwind to.
> > - *
> > - * @prev_fp: The fp that pointed to this frame record, or a synthetic value
> > - * of 0. This is used to ensure that within a stack, each
> > - * subsequent frame record is at an increasing address.
> > - * @prev_type: The type of stack this frame record was on, or a synthetic
> > - * value of STACK_TYPE_UNKNOWN. This is used to detect a
> > - * transition from one stack to another.
> > - *
> > - * @kr_cur: When KRETPROBES is selected, holds the kretprobe instance
> > - * associated with the most recently encountered replacement lr
> > - * value.
> > - *
> > - * @task: The task being unwound.
> > - */
> > -struct unwind_state {
> > - unsigned long fp;
> > - unsigned long pc;
> > - DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
> > - unsigned long prev_fp;
> > - enum stack_type prev_type;
> > -#ifdef CONFIG_KRETPROBES
> > - struct llist_node *kr_cur;
> > -#endif
> > - struct task_struct *task;
> > -};
> > -
> > -static void unwind_init_common(struct unwind_state *state,
> > - struct task_struct *task)
> > -{
> > - state->task = task;
> > -#ifdef CONFIG_KRETPROBES
> > - state->kr_cur = NULL;
> > -#endif
> > -
> > - /*
> > - * Prime the first unwind.
> > - *
> > - * In unwind_next() we'll check that the FP points to a valid stack,
> > - * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> > - * treated as a transition to whichever stack that happens to be. The
> > - * prev_fp value won't be used, but we set it to 0 such that it is
> > - * definitely not an accessible stack address.
> > - */
> > - bitmap_zero(state->stacks_done, __NR_STACK_TYPES);
> > - state->prev_fp = 0;
> > - state->prev_type = STACK_TYPE_UNKNOWN;
> > -}
> > -
> > /*
> > * Start an unwind from a pt_regs.
> > *
> > --
> > 2.37.0.170.g444d1eabd0-goog
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> >
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Jul 18, 2022 at 08:26:14AM -0700, Kalesh Singh wrote:
> On Mon, Jul 18, 2022 at 5:52 AM Russell King (Oracle)
> <[email protected]> wrote:
> >
> > Hi,
> >
> > Can you please explain why you are targetting my @oracle.com email
> > address with this patch set?
> >
> > This causes me problems as I use Outlook's Web interface for that
> > which doesn't appear to cope with the threading, and most certainly
> > is only capable of top-reply only which is against Linux kernel email
> > standards.
>
> Hi Russell,
>
> Sorry I wasn't aware of it (I got your oracle email from
> get_maintainer script). Going forward I'll use the one you responded
> from instead.
Oh, this is the very annoying behaviour of get_maintainer.pl default
mode to think that if someone touches a file, they're interested in
future changes to it. In this case, it's because we both touched
arch/arm64/include/asm/memory.h back in November 2021, and this
silly script thinks I'll still be interested.
b89ddf4cca43 arm64/bpf: Remove 128MB limit for BPF JIT programs
(The patch was originally developed for Oracle's UEK kernels, hence
why it's got my @oracle.com address, but was later merged upstream.
Interestingly, no one spotted that Alan Maguire's s-o-b should've
been on it, as he was involved in the submission path to mainline.)
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On 2022-07-18 17:51, Kalesh Singh wrote:
> On Mon, Jul 18, 2022 at 12:31 AM Marc Zyngier <[email protected]> wrote:
>>
>> On Fri, 15 Jul 2022 07:10:20 +0100,
>> Kalesh Singh <[email protected]> wrote:
>> >
>> > Add stub implementations of non-protected nVHE stack unwinder, for
>> > building. These are implemented later in this series.
>> >
>> > Signed-off-by: Kalesh Singh <[email protected]>
>> > ---
>> > arch/arm64/include/asm/stacktrace/nvhe.h | 22 ++++++++++++++++++++++
>> > 1 file changed, 22 insertions(+)
>> >
>> > diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
>> > index 1eac4e57f2ae..36cf7858ddd8 100644
>> > --- a/arch/arm64/include/asm/stacktrace/nvhe.h
>> > +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
>> > @@ -8,6 +8,12 @@
>> > * the HYP memory. The stack is unwinded in EL2 and dumped to a shared
>> > * buffer where the host can read and print the stacktrace.
>> > *
>> > + * 2) Non-protected nVHE mode - the host can directly access the
>> > + * HYP stack pages and unwind the HYP stack in EL1. This saves having
>> > + * to allocate shared buffers for the host to read the unwinded
>> > + * stacktrace.
>> > + *
>> > + *
>> > * Copyright (C) 2022 Google LLC
>> > */
>> > #ifndef __ASM_STACKTRACE_NVHE_H
>> > @@ -53,5 +59,21 @@ static int notrace unwind_next(struct unwind_state *state)
>> > NOKPROBE_SYMBOL(unwind_next);
>> > #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
>> >
>> > +/*
>> > + * Non-protected nVHE HYP stack unwinder
>> > + */
>> > +#else /* !__KVM_NVHE_HYPERVISOR__ */
>>
>> I don't get this path. This either represents the VHE hypervisor or
>> the kernel proper. Which one is it?
>
> Hi Marc. This is run from kernel proper context. And it's the
> unwinding for conventional nVHE (non-protected). The unwinding is done
> from the host kernel in EL1.
This really deserves a comment here, as the one you currently
have is a bit misleading (and you probably want to move it
below the #else).
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Sun, Jul 17, 2022 at 2:58 AM Marc Zyngier <[email protected]> wrote:
>
> On Fri, 15 Jul 2022 07:10:15 +0100,
> Kalesh Singh <[email protected]> wrote:
> >
> > Add brief description on how to use stacktrace/common.h to implement
> > a stack unwinder.
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/stacktrace/common.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/stacktrace/common.h b/arch/arm64/include/asm/stacktrace/common.h
> > index f86efe71479d..b362086f4c70 100644
> > --- a/arch/arm64/include/asm/stacktrace/common.h
> > +++ b/arch/arm64/include/asm/stacktrace/common.h
> > @@ -2,6 +2,14 @@
> > /*
> > * Common arm64 stack unwinder code.
> > *
> > + * To implement a new arm64 stack unwinder:
> > + * 1) Include this header
> > + *
> > + * 2) Provide implementations for the following functions:
> > + * - on_overflow_stack()
> > + * - on_accessible_stack()
> > + * - unwind_next()
>
> A short description of what these helpers are supposed to do would
> also be helpful.
Thanks Fuad, Marc. I'll add descriptions in the next version.
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Sun, Jul 17, 2022 at 11:56 PM Marc Zyngier <[email protected]> wrote:
>
> [- Drew and android-mm, as both addresses bounce]
>
> On Fri, 15 Jul 2022 07:10:17 +0100,
> Kalesh Singh <[email protected]> wrote:
> >
> > This can be used to disable stacktrace for the protected KVM
> > nVHE hypervisor, in order to save on the associated memory usage.
> >
> > This option is disabled by default, since protected KVM is not widely
> > used on platforms other than Android currently.
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/kvm/Kconfig | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index 8a5fbbf084df..1edab6f8a3b8 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -46,6 +46,21 @@ menuconfig KVM
> >
> > If unsure, say N.
> >
> > +config PROTECTED_NVHE_STACKTRACE
> > + bool "Protected KVM hypervisor stacktraces"
> > + depends on KVM
> > + default n
> > + help
> > + Say Y here to enable pKVM hypervisor stacktraces on hyp_panic()
> > +
> > + If you are not using protected nVHE (pKVM), say N.
> > +
> > + If using protected nVHE mode, but cannot afford the associated
> > + memory cost (less than 0.75 page per CPU) of pKVM stacktraces,
> > + say N.
> > +
> > + If unsure, say N.
> > +
>
> Can we make this depend on NVHE_EL2_DEBUG instead? I'd like to keep
> the disclosing of EL2 information in protected mode a strict debug
> feature.
Hi Marc,
An earlier version was similar to what you propose. The unwinding
depended on NVHE_EL2_DEBUG and all unwinding was done from EL1 with
host stage 2 being disabled. The reason the design was changed is
because Android expressed the need for pKVM hyp stacktraces in
production environments. [1]
[1] https://lore.kernel.org/all/CAC_TJveNRaDFcQGo9-eqKa3=1DnuVDs4U+ye795VcJ1ozVkMyg@mail.gmail.com/
--Kalesh
>
> > config NVHE_EL2_DEBUG
> > bool "Debug mode for non-VHE EL2 object"
> > depends on KVM
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Mon, Jul 18, 2022 at 12:31 AM Marc Zyngier <[email protected]> wrote:
>
> On Fri, 15 Jul 2022 07:10:20 +0100,
> Kalesh Singh <[email protected]> wrote:
> >
> > Add stub implementations of non-protected nVHE stack unwinder, for
> > building. These are implemented later in this series.
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/stacktrace/nvhe.h | 22 ++++++++++++++++++++++
> > 1 file changed, 22 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> > index 1eac4e57f2ae..36cf7858ddd8 100644
> > --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> > +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> > @@ -8,6 +8,12 @@
> > * the HYP memory. The stack is unwinded in EL2 and dumped to a shared
> > * buffer where the host can read and print the stacktrace.
> > *
> > + * 2) Non-protected nVHE mode - the host can directly access the
> > + * HYP stack pages and unwind the HYP stack in EL1. This saves having
> > + * to allocate shared buffers for the host to read the unwinded
> > + * stacktrace.
> > + *
> > + *
> > * Copyright (C) 2022 Google LLC
> > */
> > #ifndef __ASM_STACKTRACE_NVHE_H
> > @@ -53,5 +59,21 @@ static int notrace unwind_next(struct unwind_state *state)
> > NOKPROBE_SYMBOL(unwind_next);
> > #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
> >
> > +/*
> > + * Non-protected nVHE HYP stack unwinder
> > + */
> > +#else /* !__KVM_NVHE_HYPERVISOR__ */
>
> I don't get this path. This either represents the VHE hypervisor or
> the kernel proper. Which one is it?
Hi Marc. This is run from kernel proper context. And it's the
unwinding for conventional nVHE (non-protected). The unwinding is done
from the host kernel in EL1.
>
> > +static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
> > + struct stack_info *info)
> > +{
> > + return false;
> > +}
> > +
> > +static int notrace unwind_next(struct unwind_state *state)
> > +{
> > + return 0;
> > +}
> > +NOKPROBE_SYMBOL(unwind_next);
> > +
> > #endif /* __KVM_NVHE_HYPERVISOR__ */
> > #endif /* __ASM_STACKTRACE_NVHE_H */
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>
On Mon, Jul 18, 2022 at 3:07 AM Fuad Tabba <[email protected]> wrote:
>
> Hi Kalesh,
>
> On Fri, Jul 15, 2022 at 7:11 AM Kalesh Singh <[email protected]> wrote:
> >
> > In protected nVHE mode, the host cannot access private owned hypervisor
> > memory. Also the hypervisor aims to remains simple to reduce the attack
> > surface and does not provide any printk support.
> >
> > For the above reasons, the approach taken to provide hypervisor stacktraces
> > in protected mode is:
> > 1) Unwind and save the hyp stack addresses in EL2 to a shared buffer
> > with the host (done in this patch).
> > 2) Delegate the dumping and symbolization of the addresses to the
> > host in EL1 (later patch in the series).
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/stacktrace/nvhe.h | 18 ++++++
> > arch/arm64/kvm/hyp/nvhe/stacktrace.c | 70 ++++++++++++++++++++++++
> > 2 files changed, 88 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> > index 36cf7858ddd8..456a6ae08433 100644
> > --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> > +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> > @@ -21,6 +21,22 @@
> >
> > #include <asm/stacktrace/common.h>
> >
> > +/**
> > + * kvm_nvhe_unwind_init - Start an unwind from the given nVHE HYP fp and pc
> > + *
> > + * @fp : frame pointer at which to start the unwinding.
> > + * @pc : program counter at which to start the unwinding.
> > + */
> > +static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
> > + unsigned long fp,
> > + unsigned long pc)
> > +{
> > + unwind_init_common(state, NULL);
> > +
> > + state->fp = fp;
> > + state->pc = pc;
> > +}
> > +
> > static inline bool on_accessible_stack(const struct task_struct *tsk,
> > unsigned long sp, unsigned long size,
> > struct stack_info *info)
> > @@ -33,6 +49,8 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
> > */
> > #ifdef __KVM_NVHE_HYPERVISOR__
> >
> > +extern void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc);
> > +
> > #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
> > static inline bool on_overflow_stack(unsigned long sp, unsigned long size,
> > struct stack_info *info)
> > diff --git a/arch/arm64/kvm/hyp/nvhe/stacktrace.c b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> > index 96c8b93320eb..832a536e440f 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/stacktrace.c
> > @@ -11,4 +11,74 @@ DEFINE_PER_CPU(unsigned long [OVERFLOW_STACK_SIZE/sizeof(long)], overflow_stack)
> >
> > #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE
> > DEFINE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace);
> > +
> > +/**
> > + * pkvm_save_backtrace_entry - Saves a protected nVHE HYP stacktrace entry
> > + *
> > + * @arg : the position of the entry in the stacktrace buffer
> > + * @where : the program counter corresponding to the stack frame
> > + *
> > + * Save the return address of a stack frame to the shared stacktrace buffer.
> > + * The host can access this shared buffer from EL1 to dump the backtrace.
> > + */
> > +static bool pkvm_save_backtrace_entry(void *arg, unsigned long where)
> > +{
> > + unsigned long **stacktrace_pos = (unsigned long **)arg;
> > + unsigned long stacktrace_start, stacktrace_end;
> > +
> > + stacktrace_start = (unsigned long)this_cpu_ptr(pkvm_stacktrace);
> > + stacktrace_end = stacktrace_start + NVHE_STACKTRACE_SIZE - (2 * sizeof(long));
>
> I guess this is related to my comment in patch 9, but why does the end
> happen at 2 * instead of just 1 * before the actual end? I guess
> because it's inclusive. That said, a comment would be helpful.
>
The intention is to check that we can fit 2 entries (the stacktrace
entry and null entry). I think the "end" naming is a bit misleading.
Let me try to clarify it better in the next version.
Thanks,
Kalesh
> Thanks,
> /fuad
>
> > +
> > + if ((unsigned long) *stacktrace_pos > stacktrace_end)
> > + return false;
> > +
> > + /* Save the entry to the current pos in stacktrace buffer */
> > + **stacktrace_pos = where;
> > +
> > + /* A zero entry delimits the end of the stacktrace. */
> > + *(*stacktrace_pos + 1) = 0UL;
> > +
> > + /* Increment the current pos */
> > + ++*stacktrace_pos;
> > +
> > + return true;
> > +}
> > +
> > +/**
> > + * pkvm_save_backtrace - Saves the protected nVHE HYP stacktrace
> > + *
> > + * @fp : frame pointer at which to start the unwinding.
> > + * @pc : program counter at which to start the unwinding.
> > + *
> > + * Save the unwinded stack addresses to the shared stacktrace buffer.
> > + * The host can access this shared buffer from EL1 to dump the backtrace.
> > + */
> > +static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
> > +{
> > + void *stacktrace_start = (void *)this_cpu_ptr(pkvm_stacktrace);
> > + struct unwind_state state;
> > +
> > + kvm_nvhe_unwind_init(&state, fp, pc);
> > +
> > + unwind(&state, pkvm_save_backtrace_entry, &stacktrace_start);
> > +}
> > +#else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */
> > +static void pkvm_save_backtrace(unsigned long fp, unsigned long pc)
> > +{
> > +}
> > #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */
> > +
> > +/**
> > + * kvm_nvhe_prepare_backtrace - prepare to dump the nVHE backtrace
> > + *
> > + * @fp : frame pointer at which to start the unwinding.
> > + * @pc : program counter at which to start the unwinding.
> > + *
> > + * Saves the information needed by the host to dump the nVHE hypervisor
> > + * backtrace.
> > + */
> > +void kvm_nvhe_prepare_backtrace(unsigned long fp, unsigned long pc)
> > +{
> > + if (is_protected_kvm_enabled())
> > + pkvm_save_backtrace(fp, pc);
> > +}
> > --
> > 2.37.0.170.g444d1eabd0-goog
> >
On Mon, Jul 18, 2022 at 2:36 AM Marc Zyngier <[email protected]> wrote:
>
> On Fri, 15 Jul 2022 07:10:21 +0100,
> Kalesh Singh <[email protected]> wrote:
> >
> > In protected nVHE mode, the host cannot access private owned hypervisor
> > memory. Also the hypervisor aims to remains simple to reduce the attack
> > surface and does not provide any printk support.
> >
> > For the above reasons, the approach taken to provide hypervisor stacktraces
> > in protected mode is:
> > 1) Unwind and save the hyp stack addresses in EL2 to a shared buffer
> > with the host (done in this patch).
> > 2) Delegate the dumping and symbolization of the addresses to the
> > host in EL1 (later patch in the series).
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/stacktrace/nvhe.h | 18 ++++++
> > arch/arm64/kvm/hyp/nvhe/stacktrace.c | 70 ++++++++++++++++++++++++
> > 2 files changed, 88 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/stacktrace/nvhe.h b/arch/arm64/include/asm/stacktrace/nvhe.h
> > index 36cf7858ddd8..456a6ae08433 100644
> > --- a/arch/arm64/include/asm/stacktrace/nvhe.h
> > +++ b/arch/arm64/include/asm/stacktrace/nvhe.h
> > @@ -21,6 +21,22 @@
> >
> > #include <asm/stacktrace/common.h>
> >
> > +/**
> > + * kvm_nvhe_unwind_init - Start an unwind from the given nVHE HYP fp and pc
> > + *
> > + * @fp : frame pointer at which to start the unwinding.
> > + * @pc : program counter at which to start the unwinding.
> > + */
> > +static __always_inline void kvm_nvhe_unwind_init(struct unwind_state *state,
> > + unsigned long fp,
> > + unsigned long pc)
> > +{
> > + unwind_init_common(state, NULL);
>
> Huh. Be careful here. This function is only 'inline', which means it
> may not be really inlined. We've had tons of similar issues like this
> in the past, and although this will not break at runtime anymore, it
> will definitely stop the kernel from linking.
Ahh, there are a few other always inline *unwind_init* functions that
use this. I'll update in the next version.
Thanks,
Kalesh
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
On Mon, 18 Jul 2022 18:03:30 +0100,
Kalesh Singh <[email protected]> wrote:
>
> On Sun, Jul 17, 2022 at 11:56 PM Marc Zyngier <[email protected]> wrote:
> >
> > [- Drew and android-mm, as both addresses bounce]
> >
> > On Fri, 15 Jul 2022 07:10:17 +0100,
> > Kalesh Singh <[email protected]> wrote:
> > >
> > > This can be used to disable stacktrace for the protected KVM
> > > nVHE hypervisor, in order to save on the associated memory usage.
> > >
> > > This option is disabled by default, since protected KVM is not widely
> > > used on platforms other than Android currently.
> > >
> > > Signed-off-by: Kalesh Singh <[email protected]>
> > > ---
> > > arch/arm64/kvm/Kconfig | 15 +++++++++++++++
> > > 1 file changed, 15 insertions(+)
> > >
> > > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > > index 8a5fbbf084df..1edab6f8a3b8 100644
> > > --- a/arch/arm64/kvm/Kconfig
> > > +++ b/arch/arm64/kvm/Kconfig
> > > @@ -46,6 +46,21 @@ menuconfig KVM
> > >
> > > If unsure, say N.
> > >
> > > +config PROTECTED_NVHE_STACKTRACE
> > > + bool "Protected KVM hypervisor stacktraces"
> > > + depends on KVM
> > > + default n
> > > + help
> > > + Say Y here to enable pKVM hypervisor stacktraces on hyp_panic()
> > > +
> > > + If you are not using protected nVHE (pKVM), say N.
> > > +
> > > + If using protected nVHE mode, but cannot afford the associated
> > > + memory cost (less than 0.75 page per CPU) of pKVM stacktraces,
> > > + say N.
> > > +
> > > + If unsure, say N.
> > > +
> >
> > Can we make this depend on NVHE_EL2_DEBUG instead? I'd like to keep
> > the disclosing of EL2 information in protected mode a strict debug
> > feature.
>
> Hi Marc,
>
> An earlier version was similar to what you propose. The unwinding
> depended on NVHE_EL2_DEBUG and all unwinding was done from EL1 with
> host stage 2 being disabled. The reason the design was changed is
> because Android expressed the need for pKVM hyp stacktraces in
> production environments. [1]
I think that's an Android-specific requirement that doesn't apply to
upstream. If Android wants to enable this in production (and
potentially leak details of the hypervisor address space), that's
Android's business, and they can carry a patch for that. Upstream
shouldn't have to cater for such a thing.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Fri, 15 Jul 2022 07:10:09 +0100,
Kalesh Singh <[email protected]> wrote:
>
> Hi all,
>
> This is v4 of the series adding support for nVHE hypervisor stacktraces;
> and is based on arm64 for-next/stacktrace.
>
> Thanks all for your feedback on previous revisions. Mark Brown, I
> appreciate your Reviewed-by on the v3, I have dropped the tags in this
> new verision since I think the series has changed quite a bit.
>
> The previous versions were posted at:
> v3: https://lore.kernel.org/r/[email protected]/
> v2: https://lore.kernel.org/r/[email protected]/
> v1: https://lore.kernel.org/r/[email protected]/
>
> The main updates in this version are to address concerens from Marc on the
> memory usage and reusing the common code by refactoring into a shared header.
Overall, the series looks better. I've pointed out a few things that
need changing, but my overall gripe is around the abuse the
stacktrace/nvhe.h as a dumping ground. A lot of the code there could
be pushed to handle_exit.c (or some other compilation unit).
I've pushed an example of a 10 minutes refactor in my tree
(kvm-arm64/nvhe-stacktrace), and I'm sure these are the lowest hanging
fruits.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
On Tue, Jul 19, 2022 at 3:35 AM Marc Zyngier <[email protected]> wrote:
>
> On Mon, 18 Jul 2022 18:03:30 +0100,
> Kalesh Singh <[email protected]> wrote:
> >
> > On Sun, Jul 17, 2022 at 11:56 PM Marc Zyngier <[email protected]> wrote:
> > >
> > > [- Drew and android-mm, as both addresses bounce]
> > >
> > > On Fri, 15 Jul 2022 07:10:17 +0100,
> > > Kalesh Singh <[email protected]> wrote:
> > > >
> > > > This can be used to disable stacktrace for the protected KVM
> > > > nVHE hypervisor, in order to save on the associated memory usage.
> > > >
> > > > This option is disabled by default, since protected KVM is not widely
> > > > used on platforms other than Android currently.
> > > >
> > > > Signed-off-by: Kalesh Singh <[email protected]>
> > > > ---
> > > > arch/arm64/kvm/Kconfig | 15 +++++++++++++++
> > > > 1 file changed, 15 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > > > index 8a5fbbf084df..1edab6f8a3b8 100644
> > > > --- a/arch/arm64/kvm/Kconfig
> > > > +++ b/arch/arm64/kvm/Kconfig
> > > > @@ -46,6 +46,21 @@ menuconfig KVM
> > > >
> > > > If unsure, say N.
> > > >
> > > > +config PROTECTED_NVHE_STACKTRACE
> > > > + bool "Protected KVM hypervisor stacktraces"
> > > > + depends on KVM
> > > > + default n
> > > > + help
> > > > + Say Y here to enable pKVM hypervisor stacktraces on hyp_panic()
> > > > +
> > > > + If you are not using protected nVHE (pKVM), say N.
> > > > +
> > > > + If using protected nVHE mode, but cannot afford the associated
> > > > + memory cost (less than 0.75 page per CPU) of pKVM stacktraces,
> > > > + say N.
> > > > +
> > > > + If unsure, say N.
> > > > +
> > >
> > > Can we make this depend on NVHE_EL2_DEBUG instead? I'd like to keep
> > > the disclosing of EL2 information in protected mode a strict debug
> > > feature.
> >
> > Hi Marc,
> >
> > An earlier version was similar to what you propose. The unwinding
> > depended on NVHE_EL2_DEBUG and all unwinding was done from EL1 with
> > host stage 2 being disabled. The reason the design was changed is
> > because Android expressed the need for pKVM hyp stacktraces in
> > production environments. [1]
>
> I think that's an Android-specific requirement that doesn't apply to
> upstream. If Android wants to enable this in production (and
> potentially leak details of the hypervisor address space), that's
> Android's business, and they can carry a patch for that. Upstream
> shouldn't have to cater for such a thing.
Hi Marc,
For android it's important to be able to debug issues from the field.
But I agree no need to subject upstream to the same requirements. I'll
guard this with the NVHE_EL2_DEBUG config in the next version.
Thanks,
Kalesh
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.