Hi all,
This is v2 of the nVHE hypervisor stack enhancements. v1 can be found at:
https://lore.kernel.org/r/[email protected]/
This version has been updated to work for 'classic' KVM in nVHE mode in
addition to pKVM, per Marc; and rebased on 5.17-rc5.
The cover letter has been copied below for convenience.
Thanks,
Kalesh
-----
This series adds the following stack features to the KVM nVHE hypervisor:
== Hyp Stack Guard Pages ==
Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.
== Hyp Stack Unwinder ==
Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)
The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.
When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below:
kvm [408]: nVHE hyp panic at: \
[<ffffffc01161460c>] __kvm_nvhe_overflow_stack+0x10/0x34!
kvm [408]: nVHE HYP call trace:
kvm [408]: [<ffffffc011614974>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
kvm [408]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .
kvm [408]: [<ffffffc01161461c>] __kvm_nvhe_overflow_stack+0x20/0x34
kvm [408]: [<ffffffc01161421c>] __kvm_nvhe___kvm_vcpu_run+0x2c/0x40c
kvm [408]: [<ffffffc011615e14>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c8/0x36c
kvm [408]: [<ffffffc0116157c4>] __kvm_nvhe_handle_trap+0xa4/0x124
kvm [408]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
kvm [408]: ---- end of nVHE HYP call trace ----
Kalesh Singh (8):
KVM: arm64: Introduce hyp_alloc_private_va_range()
KVM: arm64: Introduce pkvm_alloc_private_va_range()
KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
KVM: arm64: Detect and handle hypervisor stack overflows
KVM: arm64: Add hypervisor overflow stack
KVM: arm64: Unwind and dump nVHE HYP stacktrace
KVM: arm64: Symbolize the nVHE HYP backtrace
Quentin Perret (1):
arm64: asm: Introduce test_sp_overflow macro
arch/arm64/include/asm/assembler.h | 11 +
arch/arm64/include/asm/kvm_asm.h | 18 ++
arch/arm64/include/asm/kvm_mmu.h | 4 +
arch/arm64/kernel/entry.S | 7 +-
arch/arm64/kvm/Kconfig | 5 +-
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/arm.c | 34 +++-
arch/arm64/kvm/handle_exit.c | 16 +-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 3 +-
arch/arm64/kvm/hyp/nvhe/host.S | 21 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 5 +-
arch/arm64/kvm/hyp/nvhe/mm.c | 49 +++--
arch/arm64/kvm/hyp/nvhe/setup.c | 25 ++-
arch/arm64/kvm/hyp/nvhe/switch.c | 29 +++
arch/arm64/kvm/mmu.c | 61 ++++--
arch/arm64/kvm/stacktrace.c | 290 +++++++++++++++++++++++++++
arch/arm64/kvm/stacktrace.h | 17 ++
scripts/kallsyms.c | 2 +-
18 files changed, 533 insertions(+), 65 deletions(-)
create mode 100644 arch/arm64/kvm/stacktrace.c
create mode 100644 arch/arm64/kvm/stacktrace.h
base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
--
2.35.1.473.g83b2b277ed-goog
Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..69df21320b09 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
if (ret)
return ret;
- end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+ /*
+ * Private mappings are allocated upwards from __io_map_base
+ * so allocate the guard page first then the stack.
+ */
+ start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+ if (IS_ERR_OR_NULL(start))
+ return PTR_ERR(start);
+
+ /*
+ * The stack is aligned to twice its size to facilitate overflow
+ * detection.
+ */
+ end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
start = end - PAGE_SIZE;
- ret = pkvm_create_mappings(start, end, PAGE_HYP);
- if (ret)
- return ret;
+ start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+ PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
+ if (IS_ERR_OR_NULL(start))
+ return PTR_ERR(start);
+ end = start + PAGE_SIZE;
+
+ /* Update stack_hyp_va to end of the stack's private VA range */
+ per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
}
/*
--
2.35.1.473.g83b2b277ed-goog
From: Quentin Perret <[email protected]>
The asm entry code in the kernel uses a trick to check if VMAP'd stacks
have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
checking the SP's THREAD_SHIFT bit.
Protected KVM will soon make use of a similar trick to detect stack
overflows, so factor out the asm code in a re-usable macro.
Signed-off-by: Quentin Perret <[email protected]>
[Kalesh - Resolve minor conflicts]
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/assembler.h | 11 +++++++++++
arch/arm64/kernel/entry.S | 7 +------
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index e8bd0af0141c..ad40eb0eee83 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -850,4 +850,15 @@ alternative_endif
#endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
+/*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ */
+.macro test_sp_overflow shift, label
+ add sp, sp, x0 // sp' = sp + x0
+ sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
+ tbnz x0, #\shift, \label
+ sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
+ sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
+.endm
+
#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 772ec2ecf488..ce99ee30c77e 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -53,15 +53,10 @@ alternative_else_nop_endif
sub sp, sp, #PT_REGS_SIZE
#ifdef CONFIG_VMAP_STACK
/*
- * Test whether the SP has overflowed, without corrupting a GPR.
* Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
* should always be zero.
*/
- add sp, sp, x0 // sp' = sp + x0
- sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
- tbnz x0, #THREAD_SHIFT, 0f
- sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
- sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
+ test_sp_overflow THREAD_SHIFT, 0f
b el\el\ht\()_\regsize\()_\label
0:
--
2.35.1.473.g83b2b277ed-goog
hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the nVHE hypervisor. Also update __create_hyp_private_mapping()
to allow specifying an alignment for the private VA mapping.
These will be used to implement stack guard pages for KVM nVHE hypervisor
(nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 4 +++
arch/arm64/kvm/mmu.c | 61 +++++++++++++++++++++-----------
2 files changed, 44 insertions(+), 21 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..0b0c71302b92 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
int kvm_share_hyp(void *from, void *to);
void kvm_unshare_hyp(void *from, void *to);
int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+ size_t align, unsigned long *haddr,
+ enum kvm_pgtable_prot prot);
int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
void __iomem **kaddr,
void __iomem **haddr);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index bc2aba953299..e5abcce44ad0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
return 0;
}
-static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
- unsigned long *haddr,
- enum kvm_pgtable_prot prot)
+
+/*
+ * Allocates a private VA range below io_map_base.
+ *
+ * @size: The size of the VA range to reserve.
+ * @align: The required alignment for the allocation.
+ */
+unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
{
unsigned long base;
- int ret = 0;
-
- if (!kvm_host_owns_hyp_mappings()) {
- base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
- phys_addr, size, prot);
- if (IS_ERR_OR_NULL((void *)base))
- return PTR_ERR((void *)base);
- *haddr = base;
-
- return 0;
- }
mutex_lock(&kvm_hyp_pgd_mutex);
@@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
*
* The allocated size is always a multiple of PAGE_SIZE.
*/
- size = PAGE_ALIGN(size + offset_in_page(phys_addr));
- base = io_map_base - size;
+ base = io_map_base - PAGE_ALIGN(size);
+ base = ALIGN_DOWN(base, align);
/*
* Verify that BIT(VA_BITS - 1) hasn't been flipped by
@@ -493,20 +487,45 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
* overflowed the idmap/IO address range.
*/
if ((base ^ io_map_base) & BIT(VA_BITS - 1))
- ret = -ENOMEM;
+ base = (unsigned long)ERR_PTR(-ENOMEM);
else
io_map_base = base;
mutex_unlock(&kvm_hyp_pgd_mutex);
+ return base;
+}
+
+int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
+ size_t align, unsigned long *haddr,
+ enum kvm_pgtable_prot prot)
+{
+ unsigned long addr;
+ int ret = 0;
+
+ if (!kvm_host_owns_hyp_mappings()) {
+ addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
+ phys_addr, size, prot);
+ if (IS_ERR_OR_NULL((void *)addr))
+ return PTR_ERR((void *)addr);
+ *haddr = addr;
+
+ return 0;
+ }
+
+ size += offset_in_page(phys_addr);
+ addr = hyp_alloc_private_va_range(size, align);
+ if (IS_ERR_OR_NULL((void *)addr))
+ return PTR_ERR((void *)addr);
+
if (ret)
goto out;
- ret = __create_hyp_mappings(base, size, phys_addr, prot);
+ ret = __create_hyp_mappings(addr, size, phys_addr, prot);
if (ret)
goto out;
- *haddr = base + offset_in_page(phys_addr);
+ *haddr = addr + offset_in_page(phys_addr);
out:
return ret;
}
@@ -537,7 +556,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
return 0;
}
- ret = __create_hyp_private_mapping(phys_addr, size,
+ ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
&addr, PAGE_HYP_DEVICE);
if (ret) {
iounmap(*kaddr);
@@ -564,7 +583,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
BUG_ON(is_kernel_in_hyp_mode());
- ret = __create_hyp_private_mapping(phys_addr, size,
+ ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
&addr, PAGE_HYP_EXEC);
if (ret) {
*haddr = NULL;
--
2.35.1.473.g83b2b277ed-goog
Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/host.S | 5 +++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 78e4b612ac06..751a4b9e429f 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -171,6 +171,10 @@ SYM_FUNC_END(__host_hvc)
b hyp_panic
.L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+ /* Switch to the overflow stack */
+ adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
/*
* Reset SP to the top of the stack, to allow handling the hyp_panic.
* This corrupts the stack but is ok, since we won't be attempting
@@ -178,6 +182,7 @@ SYM_FUNC_END(__host_hvc)
*/
ldr_this_cpu x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
mov sp, x0
+#endif
bl hyp_panic_bad_stack
ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 5a2e1ab79913..2accc158210f 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+ __aligned(16);
+#endif
+
static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
--
2.35.1.473.g83b2b277ed-goog
On Tue, Feb 22, 2022 at 08:51:02AM -0800, Kalesh Singh wrote:
> hyp_alloc_private_va_range() can be used to reserve private VA ranges
> in the nVHE hypervisor. Also update __create_hyp_private_mapping()
> to allow specifying an alignment for the private VA mapping.
>
> These will be used to implement stack guard pages for KVM nVHE hypervisor
> (nVHE Hyp mode / not pKVM), in a subsequent patch in the series.
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 4 +++
> arch/arm64/kvm/mmu.c | 61 +++++++++++++++++++++-----------
> 2 files changed, 44 insertions(+), 21 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..0b0c71302b92 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -153,6 +153,10 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
> int kvm_share_hyp(void *from, void *to);
> void kvm_unshare_hyp(void *from, void *to);
> int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align);
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> + size_t align, unsigned long *haddr,
> + enum kvm_pgtable_prot prot);
> int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> void __iomem **kaddr,
> void __iomem **haddr);
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index bc2aba953299..e5abcce44ad0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -457,22 +457,16 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> return 0;
> }
>
> -static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> - unsigned long *haddr,
> - enum kvm_pgtable_prot prot)
> +
> +/*
> + * Allocates a private VA range below io_map_base.
> + *
> + * @size: The size of the VA range to reserve.
> + * @align: The required alignment for the allocation.
> + */
> +unsigned long hyp_alloc_private_va_range(size_t size, size_t align)
> {
> unsigned long base;
> - int ret = 0;
> -
> - if (!kvm_host_owns_hyp_mappings()) {
> - base = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> - phys_addr, size, prot);
> - if (IS_ERR_OR_NULL((void *)base))
> - return PTR_ERR((void *)base);
There is a latent bug here; PTR_ERR() is not valid for NULL.
Today on arm64 that will happen to return 0, which may or may not be
what you want, but it's a bad pattern regardless.
That applies to the two copies below that this has been transformed
into.
Thanks,
Mark
> - *haddr = base;
> -
> - return 0;
> - }
>
> mutex_lock(&kvm_hyp_pgd_mutex);
>
> @@ -484,8 +478,8 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> *
> * The allocated size is always a multiple of PAGE_SIZE.
> */
> - size = PAGE_ALIGN(size + offset_in_page(phys_addr));
> - base = io_map_base - size;
> + base = io_map_base - PAGE_ALIGN(size);
> + base = ALIGN_DOWN(base, align);
>
> /*
> * Verify that BIT(VA_BITS - 1) hasn't been flipped by
> @@ -493,20 +487,45 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> * overflowed the idmap/IO address range.
> */
> if ((base ^ io_map_base) & BIT(VA_BITS - 1))
> - ret = -ENOMEM;
> + base = (unsigned long)ERR_PTR(-ENOMEM);
> else
> io_map_base = base;
>
> mutex_unlock(&kvm_hyp_pgd_mutex);
>
> + return base;
> +}
> +
> +int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
> + size_t align, unsigned long *haddr,
> + enum kvm_pgtable_prot prot)
> +{
> + unsigned long addr;
> + int ret = 0;
> +
> + if (!kvm_host_owns_hyp_mappings()) {
> + addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
> + phys_addr, size, prot);
> + if (IS_ERR_OR_NULL((void *)addr))
> + return PTR_ERR((void *)addr);
> + *haddr = addr;
> +
> + return 0;
> + }
> +
> + size += offset_in_page(phys_addr);
> + addr = hyp_alloc_private_va_range(size, align);
> + if (IS_ERR_OR_NULL((void *)addr))
> + return PTR_ERR((void *)addr);
> +
> if (ret)
> goto out;
>
> - ret = __create_hyp_mappings(base, size, phys_addr, prot);
> + ret = __create_hyp_mappings(addr, size, phys_addr, prot);
> if (ret)
> goto out;
>
> - *haddr = base + offset_in_page(phys_addr);
> + *haddr = addr + offset_in_page(phys_addr);
> out:
> return ret;
> }
> @@ -537,7 +556,7 @@ int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
> return 0;
> }
>
> - ret = __create_hyp_private_mapping(phys_addr, size,
> + ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> &addr, PAGE_HYP_DEVICE);
> if (ret) {
> iounmap(*kaddr);
> @@ -564,7 +583,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>
> BUG_ON(is_kernel_in_hyp_mode());
>
> - ret = __create_hyp_private_mapping(phys_addr, size,
> + ret = __create_hyp_private_mapping(phys_addr, size, PAGE_SIZE,
> &addr, PAGE_HYP_EXEC);
> if (ret) {
> *haddr = NULL;
> --
> 2.35.1.473.g83b2b277ed-goog
>
On Tue, Feb 22, 2022 at 08:51:05AM -0800, Kalesh Singh wrote:
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 27af337f9fea..69df21320b09 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> if (ret)
> return ret;
>
> - end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
> + /*
> + * Private mappings are allocated upwards from __io_map_base
> + * so allocate the guard page first then the stack.
> + */
> + start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> + if (IS_ERR_OR_NULL(start))
> + return PTR_ERR(start);
As on a prior patch, this usage of PTR_ERR() pattern is wrong when the
ptr is NULL.
> + /*
> + * The stack is aligned to twice its size to facilitate overflow
> + * detection.
> + */
> + end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
> start = end - PAGE_SIZE;
> - ret = pkvm_create_mappings(start, end, PAGE_HYP);
> - if (ret)
> - return ret;
> + start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
> + PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
> + if (IS_ERR_OR_NULL(start))
> + return PTR_ERR(start);
Likewise.
Thanks,
Mark.
> + end = start + PAGE_SIZE;
> +
> + /* Update stack_hyp_va to end of the stack's private VA range */
> + per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
> }
>
> /*
> --
> 2.35.1.473.g83b2b277ed-goog
>
pkvm_hyp_alloc_private_va_range() can be used to reserve private VA ranges
in the pKVM nVHE hypervisor (). Also update __pkvm_create_private_mapping()
to allow specifying an alignment for the private VA mapping.
These will be used to implement stack guard pages for pKVM nVHE hypervisor
(in a subsequent patch in the series).
Credits to Quentin Perret <[email protected]> for the idea of moving
private VA allocation out of __pkvm_create_private_mapping()
Signed-off-by: Kalesh Singh <[email protected]>
---
Changes in v2:
- Allow specifying an alignment for the private VA allocations, per Marc
arch/arm64/kvm/hyp/include/nvhe/mm.h | 3 +-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 5 +--
arch/arm64/kvm/hyp/nvhe/mm.c | 49 +++++++++++++++++++---------
arch/arm64/kvm/mmu.c | 2 +-
4 files changed, 39 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 2d08510c6cc1..05d06ad00347 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -20,7 +20,8 @@ int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
- enum kvm_pgtable_prot prot);
+ size_t align, enum kvm_pgtable_prot prot);
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align);
static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
unsigned long *start, unsigned long *end)
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5e2197db0d32..96b2312a0f1d 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -158,9 +158,10 @@ static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ct
{
DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
DECLARE_REG(size_t, size, host_ctxt, 2);
- DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+ DECLARE_REG(size_t, align, host_ctxt, 3);
+ DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
- cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, prot);
+ cpu_reg(host_ctxt, 1) = __pkvm_create_private_mapping(phys, size, align, prot);
}
static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 526a7d6fa86f..298fbbe4651d 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -37,26 +37,46 @@ static int __pkvm_create_mappings(unsigned long start, unsigned long size,
return err;
}
-unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
- enum kvm_pgtable_prot prot)
+/*
+ * Allocates a private VA range above __io_map_base.
+ *
+ * @size: The size of the VA range to reserve.
+ * @align: The required alignment for the allocation.
+ */
+unsigned long pkvm_alloc_private_va_range(size_t size, size_t align)
{
- unsigned long addr;
- int err;
+ unsigned long base, addr;
hyp_spin_lock(&pkvm_pgd_lock);
- size = PAGE_ALIGN(size + offset_in_page(phys));
- addr = __io_map_base;
- __io_map_base += size;
+ addr = ALIGN(__io_map_base, align);
+
+ /* The allocated size is always a multiple of PAGE_SIZE */
+ base = addr + PAGE_ALIGN(size);
/* Are we overflowing on the vmemmap ? */
- if (__io_map_base > __hyp_vmemmap) {
- __io_map_base -= size;
+ if (base > __hyp_vmemmap)
addr = (unsigned long)ERR_PTR(-ENOMEM);
+ else
+ __io_map_base = base;
+
+ hyp_spin_unlock(&pkvm_pgd_lock);
+
+ return addr;
+}
+
+unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
+ size_t align, enum kvm_pgtable_prot prot)
+{
+ unsigned long addr;
+ int err;
+
+ size += offset_in_page(phys);
+ addr = pkvm_alloc_private_va_range(size, align);
+ if (IS_ERR((void *)addr))
goto out;
- }
- err = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, size, phys, prot);
+ err = __pkvm_create_mappings(addr, size, phys, prot);
if (err) {
addr = (unsigned long)ERR_PTR(err);
goto out;
@@ -64,8 +84,6 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
addr = addr + offset_in_page(phys);
out:
- hyp_spin_unlock(&pkvm_pgd_lock);
-
return addr;
}
@@ -152,9 +170,8 @@ int hyp_map_vectors(void)
return 0;
phys = __hyp_pa(__bp_harden_hyp_vecs);
- bp_base = (void *)__pkvm_create_private_mapping(phys,
- __BP_HARDEN_HYP_VECS_SZ,
- PAGE_HYP_EXEC);
+ bp_base = (void *)__pkvm_create_private_mapping(phys, __BP_HARDEN_HYP_VECS_SZ,
+ PAGE_SIZE, PAGE_HYP_EXEC);
if (IS_ERR_OR_NULL(bp_base))
return PTR_ERR(bp_base);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index e5abcce44ad0..18a711d6a52f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -505,7 +505,7 @@ int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
if (!kvm_host_owns_hyp_mappings()) {
addr = kvm_call_hyp_nvhe(__pkvm_create_private_mapping,
- phys_addr, size, prot);
+ phys_addr, size, align, prot);
if (IS_ERR_OR_NULL((void *)addr))
return PTR_ERR((void *)addr);
*haddr = addr;
--
2.35.1.473.g83b2b277ed-goog
On Tue, Feb 22, 2022 at 10:32 AM Mark Rutland <[email protected]> wrote:
>
> On Tue, Feb 22, 2022 at 08:51:06AM -0800, Kalesh Singh wrote:
> > From: Quentin Perret <[email protected]>
> >
> > The asm entry code in the kernel uses a trick to check if VMAP'd stacks
> > have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
> > checking the SP's THREAD_SHIFT bit.
> >
> > Protected KVM will soon make use of a similar trick to detect stack
> > overflows, so factor out the asm code in a re-usable macro.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > [Kalesh - Resolve minor conflicts]
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/include/asm/assembler.h | 11 +++++++++++
> > arch/arm64/kernel/entry.S | 7 +------
> > 2 files changed, 12 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> > index e8bd0af0141c..ad40eb0eee83 100644
> > --- a/arch/arm64/include/asm/assembler.h
> > +++ b/arch/arm64/include/asm/assembler.h
> > @@ -850,4 +850,15 @@ alternative_endif
> >
> > #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
> >
> > +/*
> > + * Test whether the SP has overflowed, without corrupting a GPR.
> > + */
> > +.macro test_sp_overflow shift, label
> > + add sp, sp, x0 // sp' = sp + x0
> > + sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
> > + tbnz x0, #\shift, \label
> > + sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
> > + sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> > +.endm
>
> I'm a little unhappy about factoring this out, since it's not really
> self-contained and leaves sp and x0 partially-swapped when it branches
> to the label. You can't really make that clear with comments on the
> macro, and you need comments at each use-sire, so I'd ratehr we just
> open-coded a copy of this.
>
> > +
> > #endif /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > index 772ec2ecf488..ce99ee30c77e 100644
> > --- a/arch/arm64/kernel/entry.S
> > +++ b/arch/arm64/kernel/entry.S
> > @@ -53,15 +53,10 @@ alternative_else_nop_endif
> > sub sp, sp, #PT_REGS_SIZE
> > #ifdef CONFIG_VMAP_STACK
> > /*
> > - * Test whether the SP has overflowed, without corrupting a GPR.
> > * Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
> > * should always be zero.
> > */
> > - add sp, sp, x0 // sp' = sp + x0
> > - sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
> > - tbnz x0, #THREAD_SHIFT, 0f
> > - sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
> > - sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> > + test_sp_overflow THREAD_SHIFT, 0f
> > b el\el\ht\()_\regsize\()_\label
> >
> > 0:
>
> Further to my comment above, immediately after this we have:
>
> /* Stash the original SP (minus PT_REGS_SIZE) in tpidr_el0. */
> msr tpidr_el0, x0
>
> /* Recover the original x0 value and stash it in tpidrro_el0 */
> sub x0, sp, x0
> msr tpidrro_el0, x0
>
> ... which is really surprising with the `test_sp_overflow` macro because
> it's not clear that modifies x0 and sp in this way.
Hi Mark,
I agree the macro hides the fact that sp and x0 are left in an
'corrupt' state if the branch happens. Not a problem in this case but
it could be misleading to new users. I'll remove this per your
suggestion in the next version.
Thanks,
Kalesh
>
> Thanks,
> Mark.
> ...
>
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
Unwind the stack in EL1, when CONFIG_NVHE_EL2_DEBUG is enabled. This is
possible because CONFIG_NVHE_EL2_DEBUG disables the host stage 2 protection
which allows host to access the hypervisor stack pages in EL1.
Unwinding and dumping hyp call traces is gated on CONFIG_NVHE_EL2_DEBUG
to avoid the potential leaking of information to the host.
A simple stack overflow test produces the following output:
[ 580.376051][ T412] kvm: nVHE hyp panic at: ffffffc0116145c4!
[ 580.378034][ T412] kvm [412]: nVHE HYP call trace:
[ 580.378591][ T412] kvm [412]: [<ffffffc011614934>]
[ 580.378993][ T412] kvm [412]: [<ffffffc01160fa48>]
[ 580.379386][ T412] kvm [412]: [<ffffffc0116145dc>] // Non-terminating recursive call
[ 580.379772][ T412] kvm [412]: [<ffffffc0116145dc>]
[ 580.380158][ T412] kvm [412]: [<ffffffc0116145dc>]
[ 580.380544][ T412] kvm [412]: [<ffffffc0116145dc>]
[ 580.380928][ T412] kvm [412]: [<ffffffc0116145dc>]
. . .
Since nVHE hyp symbols are not included by kallsyms to avoid issues
with aliasing, we fallback to the vmlinux addresses. Symbolizing the
addresses is handled in the next patch in this series.
Signed-off-by: Kalesh Singh <[email protected]>
---
Changes in v2:
- Add cpu_prepare_nvhe_panic_info()
- Move updating the panic info to hyp_panic(), so that unwinding also
works for conventional nVHE Hyp-mode.
arch/arm64/include/asm/kvm_asm.h | 17 ++
arch/arm64/kvm/Kconfig | 5 +-
arch/arm64/kvm/Makefile | 1 +
arch/arm64/kvm/arm.c | 2 +-
arch/arm64/kvm/handle_exit.c | 3 +
arch/arm64/kvm/hyp/nvhe/switch.c | 19 ++
arch/arm64/kvm/stacktrace.c | 290 +++++++++++++++++++++++++++++++
arch/arm64/kvm/stacktrace.h | 17 ++
8 files changed, 351 insertions(+), 3 deletions(-)
create mode 100644 arch/arm64/kvm/stacktrace.c
create mode 100644 arch/arm64/kvm/stacktrace.h
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 2e277f2ed671..af44b3a0596b 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -176,6 +176,23 @@ struct kvm_nvhe_init_params {
unsigned long vtcr;
};
+#ifdef CONFIG_NVHE_EL2_DEBUG
+/*
+ * Used by the host in EL1 to dump the nVHE hypervisor backtrace on
+ * hyp_panic. This is possible because CONFIG_NVHE_EL2_DEBUG disables
+ * the host stage 2 protection. See: __hyp_do_panic()
+ *
+ * @hyp_stack_base: hyp VA of the hyp_stack base.
+ * @hyp_overflow_stack_base: hyp VA of the hyp_overflow_stack base.
+ * @start_fp: hyp FP where the hyp backtrace should begin.
+ */
+struct kvm_nvhe_panic_info {
+ unsigned long hyp_stack_base;
+ unsigned long hyp_overflow_stack_base;
+ unsigned long start_fp;
+};
+#endif
+
/* Translate a kernel address @ptr into its equivalent linear mapping */
#define kvm_ksym_ref(ptr) \
({ \
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8a5fbbf084df..75f2c8255ff0 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -51,8 +51,9 @@ config NVHE_EL2_DEBUG
depends on KVM
help
Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
- Failure reports will BUG() in the hypervisor. This is intended for
- local EL2 hypervisor development.
+ Failure reports will BUG() in the hypervisor; and panics will print
+ the hypervisor call stack. This is intended for local EL2 hypervisor
+ development.
If unsure, say N.
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 91861fd8b897..262b5c58cc62 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -23,6 +23,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-its.o vgic/vgic-debug.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o
+kvm-$(CONFIG_NVHE_EL2_DEBUG) += stacktrace.o
always-y := hyp_constants.h hyp-constants.s
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7e2e680c3ffb..491cf1eb28f6 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -49,7 +49,7 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
-static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index e3140abd2e2e..b038c32a3236 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -23,6 +23,7 @@
#define CREATE_TRACE_POINTS
#include "trace_handle_exit.h"
+#include "stacktrace.h"
typedef int (*exit_handle_fn)(struct kvm_vcpu *);
@@ -326,6 +327,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
}
+ hyp_dump_backtrace(hyp_offset);
+
/*
* Hyp has panicked and we're going to handle that by panicking the
* kernel. The kernel offset will be revealed in the panic so we're
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 2accc158210f..57ab23f03b1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -37,6 +37,23 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
#ifdef CONFIG_NVHE_EL2_DEBUG
DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
__aligned(16);
+DEFINE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+DECLARE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+static void cpu_prepare_nvhe_panic_info(void)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr(&kvm_panic_info);
+ struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
+
+ panic_info->hyp_stack_base = (unsigned long)(params->stack_hyp_va - PAGE_SIZE);
+ panic_info->hyp_overflow_stack_base = (unsigned long)this_cpu_ptr(hyp_overflow_stack);
+ panic_info->start_fp = (unsigned long)__builtin_frame_address(0);
+}
+#else
+static void cpu_prepare_nvhe_panic_info(void)
+{
+}
#endif
static void __activate_traps(struct kvm_vcpu *vcpu)
@@ -360,6 +377,8 @@ void __noreturn hyp_panic(void)
struct kvm_cpu_context *host_ctxt;
struct kvm_vcpu *vcpu;
+ cpu_prepare_nvhe_panic_info();
+
host_ctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
vcpu = host_ctxt->__hyp_running_vcpu;
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
new file mode 100644
index 000000000000..cdd672bf0ea8
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ *
+ * Code mostly copied from the arm64 kernel stack unwinder
+ * and adapted to the nVHE hypervisor.
+ *
+ * See: arch/arm64/kernel/stacktrace.c
+ *
+ * CONFIG_NVHE_EL2_DEBUG disables the host stage-2 protection
+ * allowing us to access the hypervisor stack pages and
+ * consequently unwind its stack from the host in EL1.
+ *
+ * See: __hyp_do_panic()
+ */
+
+#include <asm/kvm_asm.h>
+#include <asm/kvm_hyp.h>
+#include <linux/kvm_host.h>
+#include "stacktrace.h"
+
+DECLARE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
+DECLARE_KVM_NVHE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack);
+DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_panic_info, kvm_panic_info);
+
+enum hyp_stack_type {
+ HYP_STACK_TYPE_UNKNOWN,
+ HYP_STACK_TYPE_HYP,
+ HYP_STACK_TYPE_OVERFLOW,
+ __NR_HYP_STACK_TYPES
+};
+
+struct hyp_stack_info {
+ unsigned long low;
+ unsigned long high;
+ enum hyp_stack_type type;
+};
+
+/*
+ * A snapshot of a frame record or fp/lr register values, along with some
+ * accounting information necessary for robust unwinding.
+ *
+ * @fp: The fp value in the frame record (or the real fp)
+ * @pc: The pc value calculated from lr in the frame record.
+ *
+ * @stacks_done: Stacks which have been entirely unwound, for which it is no
+ * longer valid to unwind to.
+ *
+ * @prev_fp: The fp that pointed to this frame record, or a synthetic value
+ * of 0. This is used to ensure that within a stack, each
+ * subsequent frame record is at an increasing address.
+ * @prev_type: The type of stack this frame record was on, or a synthetic
+ * value of HYP_STACK_TYPE_UNKNOWN. This is used to detect a
+ * transition from one stack to another.
+ */
+struct hyp_stackframe {
+ unsigned long fp;
+ unsigned long pc;
+ DECLARE_BITMAP(stacks_done, __NR_HYP_STACK_TYPES);
+ unsigned long prev_fp;
+ enum hyp_stack_type prev_type;
+};
+
+static inline bool __on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+ unsigned long low, unsigned long high,
+ enum hyp_stack_type type,
+ struct hyp_stack_info *info)
+{
+ if (!low)
+ return false;
+
+ if (hyp_sp < low || hyp_sp + size < hyp_sp || hyp_sp + size > high)
+ return false;
+
+ if (info) {
+ info->low = low;
+ info->high = high;
+ info->type = type;
+ }
+ return true;
+}
+
+static inline bool on_hyp_overflow_stack(unsigned long hyp_sp, unsigned long size,
+ struct hyp_stack_info *info)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+ unsigned long low = (unsigned long)panic_info->hyp_overflow_stack_base;
+ unsigned long high = low + PAGE_SIZE;
+
+ return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_OVERFLOW, info);
+}
+
+static inline bool on_hyp_stack(unsigned long hyp_sp, unsigned long size,
+ struct hyp_stack_info *info)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+ unsigned long low = (unsigned long)panic_info->hyp_stack_base;
+ unsigned long high = low + PAGE_SIZE;
+
+ return __on_hyp_stack(hyp_sp, size, low, high, HYP_STACK_TYPE_HYP, info);
+}
+
+static inline bool on_hyp_accessible_stack(unsigned long hyp_sp, unsigned long size,
+ struct hyp_stack_info *info)
+{
+ if (info)
+ info->type = HYP_STACK_TYPE_UNKNOWN;
+
+ if (on_hyp_stack(hyp_sp, size, info))
+ return true;
+ if (on_hyp_overflow_stack(hyp_sp, size, info))
+ return true;
+
+ return false;
+}
+
+static unsigned long __hyp_stack_kern_va(unsigned long hyp_va)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+ unsigned long hyp_base, kern_base, hyp_offset;
+
+ hyp_base = (unsigned long)panic_info->hyp_stack_base;
+ hyp_offset = hyp_va - hyp_base;
+
+ kern_base = (unsigned long)*this_cpu_ptr(&kvm_arm_hyp_stack_page);
+
+ return kern_base + hyp_offset;
+}
+
+static unsigned long __hyp_overflow_stack_kern_va(unsigned long hyp_va)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+ unsigned long hyp_base, kern_base, hyp_offset;
+
+ hyp_base = (unsigned long)panic_info->hyp_overflow_stack_base;
+ hyp_offset = hyp_va - hyp_base;
+
+ kern_base = (unsigned long)this_cpu_ptr_nvhe_sym(hyp_overflow_stack);
+
+ return kern_base + hyp_offset;
+}
+
+/*
+ * Convert hypervisor stack VA to a kernel VA.
+ *
+ * The hypervisor stack is mapped in the flexible 'private' VA range, to allow
+ * for guard pages below the stack. Consequently, the fixed offset address
+ * translation macros won't work here.
+ *
+ * The kernel VA is calculated as an offset from the kernel VA of the hypervisor
+ * stack base. See: __hyp_stack_kern_va(), __hyp_overflow_stack_kern_va()
+ */
+static unsigned long hyp_stack_kern_va(unsigned long hyp_va,
+ enum hyp_stack_type stack_type)
+{
+ switch (stack_type) {
+ case HYP_STACK_TYPE_HYP:
+ return __hyp_stack_kern_va(hyp_va);
+ case HYP_STACK_TYPE_OVERFLOW:
+ return __hyp_overflow_stack_kern_va(hyp_va);
+ default:
+ return 0UL;
+ }
+}
+
+/*
+ * Unwind from one frame record (A) to the next frame record (B).
+ *
+ * We terminate early if the location of B indicates a malformed chain of frame
+ * records (e.g. a cycle), determined based on the location and fp value of A
+ * and the location (but not the fp value) of B.
+ */
+static int notrace hyp_unwind_frame(struct hyp_stackframe *frame)
+{
+ unsigned long fp = frame->fp, fp_kern_va;
+ struct hyp_stack_info info;
+
+ if (fp & 0x7)
+ return -EINVAL;
+
+ if (!on_hyp_accessible_stack(fp, 16, &info))
+ return -EINVAL;
+
+ if (test_bit(info.type, frame->stacks_done))
+ return -EINVAL;
+
+ /*
+ * As stacks grow downward, any valid record on the same stack must be
+ * at a strictly higher address than the prior record.
+ *
+ * Stacks can nest in the following order:
+ *
+ * HYP -> OVERFLOW
+ *
+ * ... but the nesting itself is strict. Once we transition from one
+ * stack to another, it's never valid to unwind back to that first
+ * stack.
+ */
+ if (info.type == frame->prev_type) {
+ if (fp <= frame->prev_fp)
+ return -EINVAL;
+ } else {
+ set_bit(frame->prev_type, frame->stacks_done);
+ }
+
+ /* Translate the hyp stack address to a kernel address */
+ fp_kern_va = hyp_stack_kern_va(fp, info.type);
+ if (!fp_kern_va)
+ return -EINVAL;
+
+ /*
+ * Record this frame record's values and location. The prev_fp and
+ * prev_type are only meaningful to the next hyp_unwind_frame()
+ * invocation.
+ */
+ frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va));
+ /* PC = LR - 4; All aarch64 instructions are 32-bits in size */
+ frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp_kern_va + 8)) - 4;
+ frame->prev_fp = fp;
+ frame->prev_type = info.type;
+
+ return 0;
+}
+
+/*
+ * AArch64 PCS assigns the frame pointer to x29.
+ *
+ * A simple function prologue looks like this:
+ * sub sp, sp, #0x10
+ * stp x29, x30, [sp]
+ * mov x29, sp
+ *
+ * A simple function epilogue looks like this:
+ * mov sp, x29
+ * ldp x29, x30, [sp]
+ * add sp, sp, #0x10
+ */
+static void hyp_start_backtrace(struct hyp_stackframe *frame, unsigned long fp)
+{
+ frame->fp = fp;
+
+ /*
+ * Prime the first unwind.
+ *
+ * In hyp_unwind_frame() we'll check that the FP points to a valid
+ * stack, which can't be HYP_STACK_TYPE_UNKNOWN, and the first unwind
+ * will be treated as a transition to whichever stack that happens to
+ * be. The prev_fp value won't be used, but we set it to 0 such that
+ * it is definitely not an accessible stack address. The first frame
+ * (hyp_panic()) is skipped, so we also set PC to 0.
+ */
+ bitmap_zero(frame->stacks_done, __NR_HYP_STACK_TYPES);
+ frame->pc = frame->prev_fp = 0;
+ frame->prev_type = HYP_STACK_TYPE_UNKNOWN;
+}
+
+static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_offset)
+{
+ unsigned long va_mask = GENMASK_ULL(vabits_actual - 1, 0);
+
+ hyp_pc &= va_mask; /* Mask tags */
+ hyp_pc += hyp_offset;
+
+ kvm_err("[<%016lx>]\n", hyp_pc);
+}
+
+void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+ struct kvm_nvhe_panic_info *panic_info = this_cpu_ptr_nvhe_sym(kvm_panic_info);
+ struct hyp_stackframe frame;
+ int frame_nr = 0;
+ int skip = 1; /* Skip the first frame: hyp_panic() */
+
+ kvm_err("nVHE HYP call trace:\n");
+
+ hyp_start_backtrace(&frame, (unsigned long)panic_info->start_fp);
+
+ do {
+ if (skip) {
+ skip--;
+ continue;
+ }
+
+ hyp_dump_backtrace_entry(frame.pc, hyp_offset);
+
+ frame_nr++;
+ } while (!hyp_unwind_frame(&frame));
+
+ kvm_err("---- end of nVHE HYP call trace ----\n");
+}
diff --git a/arch/arm64/kvm/stacktrace.h b/arch/arm64/kvm/stacktrace.h
new file mode 100644
index 000000000000..40c397394b9b
--- /dev/null
+++ b/arch/arm64/kvm/stacktrace.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Stack unwinder for EL2 nVHE hypervisor.
+ */
+
+#ifndef __KVM_HYP_STACKTRACE_H
+#define __KVM_HYP_STACKTRACE_H
+
+#ifdef CONFIG_NVHE_EL2_DEBUG
+void hyp_dump_backtrace(unsigned long hyp_offset);
+#else
+static inline void hyp_dump_backtrace(unsigned long hyp_offset)
+{
+}
+#endif /* CONFIG_NVHE_EL2_DEBUG */
+
+#endif /* __KVM_HYP_STACKTRACE_H */
--
2.35.1.473.g83b2b277ed-goog
Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/arm.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 30 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..2e277f2ed671 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
unsigned long tcr_el2;
unsigned long tpidr_el2;
unsigned long stack_hyp_va;
+ unsigned long stack_pa;
phys_addr_t pgd_pa;
unsigned long hcr_el2;
unsigned long vttbr;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..7e2e680c3ffb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
params->tcr_el2 = tcr;
- params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
params->pgd_pa = kvm_mmu_get_httbr();
if (is_protected_kvm_enabled())
params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
@@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
* Map the Hyp stack pages
*/
for_each_possible_cpu(cpu) {
+ struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
- err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
- PAGE_HYP);
+ unsigned long stack_hyp_va, guard_hyp_va;
+ /*
+ * Private mappings are allocated downwards from io_map_base
+ * so allocate the stack first then the guard page.
+ *
+ * The stack is aligned to twice its size to facilitate overflow
+ * detection.
+ */
+ err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
+ PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
if (err) {
kvm_err("Cannot map hyp stack\n");
goto out_err;
}
+
+ /* Allocate unbacked private VA range for stack guard page */
+ guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+ if (IS_ERR((void *)guard_hyp_va)) {
+ err = PTR_ERR((void *)guard_hyp_va);
+ kvm_err("Cannot allocate hyp stack guard page\n");
+ goto out_err;
+ }
+
+ /*
+ * Save the stack PA in nvhe_init_params. This will be needed to recreate
+ * the stack mapping in protected nVHE mode. __hyp_pa() won't do the right
+ * thing there, since the stack has been mapped in the flexible private
+ * VA space.
+ */
+ params->stack_pa = __pa(stack_page) + PAGE_SIZE;
+
+ params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
}
for_each_possible_cpu(cpu) {
--
2.35.1.473.g83b2b277ed-goog
On Tue, Feb 22, 2022 at 08:51:06AM -0800, Kalesh Singh wrote:
> From: Quentin Perret <[email protected]>
>
> The asm entry code in the kernel uses a trick to check if VMAP'd stacks
> have overflowed by aligning them at THREAD_SHIFT * 2 granularity and
> checking the SP's THREAD_SHIFT bit.
>
> Protected KVM will soon make use of a similar trick to detect stack
> overflows, so factor out the asm code in a re-usable macro.
>
> Signed-off-by: Quentin Perret <[email protected]>
> [Kalesh - Resolve minor conflicts]
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
> arch/arm64/include/asm/assembler.h | 11 +++++++++++
> arch/arm64/kernel/entry.S | 7 +------
> 2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index e8bd0af0141c..ad40eb0eee83 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -850,4 +850,15 @@ alternative_endif
>
> #endif /* GNU_PROPERTY_AARCH64_FEATURE_1_DEFAULT */
>
> +/*
> + * Test whether the SP has overflowed, without corrupting a GPR.
> + */
> +.macro test_sp_overflow shift, label
> + add sp, sp, x0 // sp' = sp + x0
> + sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
> + tbnz x0, #\shift, \label
> + sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
> + sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> +.endm
I'm a little unhappy about factoring this out, since it's not really
self-contained and leaves sp and x0 partially-swapped when it branches
to the label. You can't really make that clear with comments on the
macro, and you need comments at each use-sire, so I'd ratehr we just
open-coded a copy of this.
> +
> #endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index 772ec2ecf488..ce99ee30c77e 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -53,15 +53,10 @@ alternative_else_nop_endif
> sub sp, sp, #PT_REGS_SIZE
> #ifdef CONFIG_VMAP_STACK
> /*
> - * Test whether the SP has overflowed, without corrupting a GPR.
> * Task and IRQ stacks are aligned so that SP & (1 << THREAD_SHIFT)
> * should always be zero.
> */
> - add sp, sp, x0 // sp' = sp + x0
> - sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
> - tbnz x0, #THREAD_SHIFT, 0f
> - sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
> - sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
> + test_sp_overflow THREAD_SHIFT, 0f
> b el\el\ht\()_\regsize\()_\label
>
> 0:
Further to my comment above, immediately after this we have:
/* Stash the original SP (minus PT_REGS_SIZE) in tpidr_el0. */
msr tpidr_el0, x0
/* Recover the original x0 value and stash it in tpidrro_el0 */
sub x0, sp, x0
msr tpidrro_el0, x0
... which is really surprising with the `test_sp_overflow` macro because
it's not clear that modifies x0 and sp in this way.
Thanks,
Mark.
...
> --
> 2.35.1.473.g83b2b277ed-goog
>
The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned to twice their size (PAGE_SIZE), meaning that any valid stack
address has PAGE_SHIFT bit as 0. This allows us to conveniently check for
overflow in the exception entry without corrupting any GPRs. We won't
recover from a stack overflow so panic the hypervisor.
Signed-off-by: Kalesh Singh <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/host.S | 16 ++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
2 files changed, 21 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..78e4b612ac06 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,10 @@ SYM_FUNC_END(__host_hvc)
.macro invalid_host_el2_vect
.align 7
+
+ /* Test stack overflow without corrupting GPRs */
+ test_sp_overflow PAGE_SHIFT, .L__hyp_sp_overflow\@
+
/* If a guest is loaded, panic out of it. */
stp x0, x1, [sp, #-16]!
get_loaded_vcpu x0, x1
@@ -165,6 +169,18 @@ SYM_FUNC_END(__host_hvc)
* been partially clobbered by __host_enter.
*/
b hyp_panic
+
+.L__hyp_sp_overflow\@:
+ /*
+ * Reset SP to the top of the stack, to allow handling the hyp_panic.
+ * This corrupts the stack but is ok, since we won't be attempting
+ * any unwinding here.
+ */
+ ldr_this_cpu x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+ mov sp, x0
+
+ bl hyp_panic_bad_stack
+ ASM_BUG()
.endm
.macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..5a2e1ab79913 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
unreachable();
}
+void __noreturn hyp_panic_bad_stack(void)
+{
+ hyp_panic();
+}
+
asmlinkage void kvm_unexpected_el2_exception(void)
{
return __kvm_unexpected_el2_exception();
--
2.35.1.473.g83b2b277ed-goog
On Tue, Feb 22, 2022 at 10:55 AM Mark Rutland <[email protected]> wrote:
>
> On Tue, Feb 22, 2022 at 08:51:05AM -0800, Kalesh Singh wrote:
> > Maps the stack pages in the flexible private VA range and allocates
> > guard pages below the stack as unbacked VA space. The stack is aligned
> > to twice its size to aid overflow detection (implemented in a subsequent
> > patch in the series).
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
> > 1 file changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 27af337f9fea..69df21320b09 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > if (ret)
> > return ret;
> >
> > - end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
> > + /*
> > + * Private mappings are allocated upwards from __io_map_base
> > + * so allocate the guard page first then the stack.
> > + */
> > + start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> > + if (IS_ERR_OR_NULL(start))
> > + return PTR_ERR(start);
>
> As on a prior patch, this usage of PTR_ERR() pattern is wrong when the
> ptr is NULL.
Ack. I'll fix these in the next version.
Thanks,
Kalesh
>
> > + /*
> > + * The stack is aligned to twice its size to facilitate overflow
> > + * detection.
> > + */
> > + end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
> > start = end - PAGE_SIZE;
> > - ret = pkvm_create_mappings(start, end, PAGE_HYP);
> > - if (ret)
> > - return ret;
> > + start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
> > + PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
> > + if (IS_ERR_OR_NULL(start))
> > + return PTR_ERR(start);
>
> Likewise.
>
> Thanks,
> Mark.
>
> > + end = start + PAGE_SIZE;
> > +
> > + /* Update stack_hyp_va to end of the stack's private VA range */
> > + per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
> > }
> >
> > /*
> > --
> > 2.35.1.473.g83b2b277ed-goog
> >
On Wed, 23 Feb 2022 09:05:18 +0000,
kernel test robot <[email protected]> wrote:
>
> Hi Kalesh,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
>
> url: https://github.com/0day-ci/linux/commits/Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> base: cfb92440ee71adcc2105b0890bb01ac3cddb8507
> config: arm64-randconfig-r011-20220221 (https://download.01.org/0day-ci/archive/20220223/[email protected]/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install arm64 cross compiling tool for clang build
> # apt-get install binutils-aarch64-linux-gnu
> # https://github.com/0day-ci/linux/commit/7fe99fd40f7c4b2973218045ca5b9c9160524db1
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> git checkout 7fe99fd40f7c4b2973218045ca5b9c9160524db1
> # save the config file to linux build tree
> mkdir build_dir
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <[email protected]>
>
> All warnings (new ones prefixed by >>):
>
> include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> #define NULL ((void *)0)
> ^~~~~~~~~~~
> arch/arm64/kvm/hyp/nvhe/switch.c:200:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
> [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
> ^~~~~~~~~~~~~~~~~~~~~
> arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
> [0 ... ESR_ELx_EC_MAX] = NULL,
> ^~~~
> include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> #define NULL ((void *)0)
> ^~~~~~~~~~~
Kalesh, please ignore this nonsense. There may be things to improve,
but this is *NOT* one of them.
These reports are pretty useless, and just lead people to ignore real
bug reports.
M.
--
Without deviation from the norm, progress is not possible.
Hi Kalesh,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
url: https://github.com/0day-ci/linux/commits/Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
base: cfb92440ee71adcc2105b0890bb01ac3cddb8507
config: arm64-randconfig-r011-20220221 (https://download.01.org/0day-ci/archive/20220223/[email protected]/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm64 cross compiling tool for clang build
# apt-get install binutils-aarch64-linux-gnu
# https://github.com/0day-ci/linux/commit/7fe99fd40f7c4b2973218045ca5b9c9160524db1
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
git checkout 7fe99fd40f7c4b2973218045ca5b9c9160524db1
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:200:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
^~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:201:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
^~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:202:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
^~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:203:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
^~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:208:24: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64,
^~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:209:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted,
^~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:210:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_FP_ASIMD] = kvm_handle_pvm_fpsimd,
^~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:211:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low,
^~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:212:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low,
^~~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:213:22: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
[ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth,
^~~~~~~~~~~~~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:207:28: note: previous initialization is here
[0 ... ESR_ELx_EC_MAX] = NULL,
^~~~
include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
#define NULL ((void *)0)
^~~~~~~~~~~
arch/arm64/kvm/hyp/nvhe/switch.c:350:17: warning: no previous prototype for function 'hyp_panic' [-Wmissing-prototypes]
void __noreturn hyp_panic(void)
^
arch/arm64/kvm/hyp/nvhe/switch.c:350:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __noreturn hyp_panic(void)
^
static
>> arch/arm64/kvm/hyp/nvhe/switch.c:372:17: warning: no previous prototype for function 'hyp_panic_bad_stack' [-Wmissing-prototypes]
void __noreturn hyp_panic_bad_stack(void)
^
arch/arm64/kvm/hyp/nvhe/switch.c:372:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
void __noreturn hyp_panic_bad_stack(void)
^
static
arch/arm64/kvm/hyp/nvhe/switch.c:377:17: warning: no previous prototype for function 'kvm_unexpected_el2_exception' [-Wmissing-prototypes]
asmlinkage void kvm_unexpected_el2_exception(void)
^
arch/arm64/kvm/hyp/nvhe/switch.c:377:12: note: declare 'static' if the function is not intended to be used outside of this translation unit
asmlinkage void kvm_unexpected_el2_exception(void)
^
static
16 warnings generated.
vim +/hyp_panic_bad_stack +372 arch/arm64/kvm/hyp/nvhe/switch.c
371
> 372 void __noreturn hyp_panic_bad_stack(void)
373 {
374 hyp_panic();
375 }
376
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
Reintroduce the __kvm_nvhe_ symbols in kallsyms, ignoring the local
symbols in this namespace. The local symbols are not informative and
can cause aliasing issues when symbolizing the addresses.
With the necessary symbols now in kallsyms we can symbolize nVHE
stacktrace addresses using the %pB print format specifier.
Some sample call traces:
-------
[ 167.018598][ T407] kvm [407]: nVHE hyp panic at: [<ffffffc0116145cc>] __kvm_nvhe_overflow_stack+0x10/0x34!
[ 167.020841][ T407] kvm [407]: nVHE HYP call trace:
[ 167.021371][ T407] kvm [407]: [<ffffffc011614934>] __kvm_nvhe_hyp_panic_bad_stack+0xc/0x10
[ 167.021972][ T407] kvm [407]: [<ffffffc01160fa48>] __kvm_nvhe___kvm_hyp_host_vector+0x248/0x794
[ 167.022572][ T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[ 167.023135][ T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[ 167.023699][ T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
[ 167.024261][ T407] kvm [407]: [<ffffffc0116145dc>] __kvm_nvhe_overflow_stack+0x20/0x34
. . .
-------
[ 166.161699][ T409] kvm [409]: Invalid host exception to nVHE hyp!
[ 166.163789][ T409] kvm [409]: nVHE HYP call trace:
[ 166.164709][ T409] kvm [409]: [<ffffffc011614fa0>] __kvm_nvhe_handle___kvm_vcpu_run+0x198/0x21c
[ 166.165352][ T409] kvm [409]: [<ffffffc011614980>] __kvm_nvhe_handle_trap+0xa4/0x124
[ 166.165911][ T409] kvm [409]: [<ffffffc01160f060>] __kvm_nvhe___host_exit+0x60/0x64
[ 166.166657][ T409] Kernel panic - not syncing: HYP panic:
. . .
-------
Signed-off-by: Kalesh Singh <[email protected]>
---
Changes in v2:
- Fix printk warnings - %p expects (void *)
arch/arm64/kvm/handle_exit.c | 13 +++++--------
arch/arm64/kvm/stacktrace.c | 2 +-
scripts/kallsyms.c | 2 +-
3 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index b038c32a3236..1b953005d301 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -296,13 +296,8 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
u64 elr_in_kimg = __phys_to_kimg(elr_phys);
u64 hyp_offset = elr_in_kimg - kaslr_offset() - elr_virt;
u64 mode = spsr & PSR_MODE_MASK;
+ u64 panic_addr = elr_virt + hyp_offset;
- /*
- * The nVHE hyp symbols are not included by kallsyms to avoid issues
- * with aliasing. That means that the symbols cannot be printed with the
- * "%pS" format specifier, so fall back to the vmlinux address if
- * there's no better option.
- */
if (mode != PSR_MODE_EL2t && mode != PSR_MODE_EL2h) {
kvm_err("Invalid host exception to nVHE hyp!\n");
} else if (ESR_ELx_EC(esr) == ESR_ELx_EC_BRK64 &&
@@ -322,9 +317,11 @@ void __noreturn __cold nvhe_hyp_panic_handler(u64 esr, u64 spsr,
if (file)
kvm_err("nVHE hyp BUG at: %s:%u!\n", file, line);
else
- kvm_err("nVHE hyp BUG at: %016llx!\n", elr_virt + hyp_offset);
+ kvm_err("nVHE hyp BUG at: [<%016llx>] %pB!\n", panic_addr,
+ (void *)panic_addr);
} else {
- kvm_err("nVHE hyp panic at: %016llx!\n", elr_virt + hyp_offset);
+ kvm_err("nVHE hyp panic at: [<%016llx>] %pB!\n", panic_addr,
+ (void *)panic_addr);
}
hyp_dump_backtrace(hyp_offset);
diff --git a/arch/arm64/kvm/stacktrace.c b/arch/arm64/kvm/stacktrace.c
index cdd672bf0ea8..896c225a4a89 100644
--- a/arch/arm64/kvm/stacktrace.c
+++ b/arch/arm64/kvm/stacktrace.c
@@ -261,7 +261,7 @@ static void hyp_dump_backtrace_entry(unsigned long hyp_pc, unsigned long hyp_off
hyp_pc &= va_mask; /* Mask tags */
hyp_pc += hyp_offset;
- kvm_err("[<%016lx>]\n", hyp_pc);
+ kvm_err("[<%016lx>] %pB\n", hyp_pc, (void *)hyp_pc);
}
void hyp_dump_backtrace(unsigned long hyp_offset)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 54ad86d13784..19aba43d9da4 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -111,7 +111,7 @@ static bool is_ignored_symbol(const char *name, char type)
".LASANPC", /* s390 kasan local symbols */
"__crc_", /* modversions */
"__efistub_", /* arm64 EFI stub namespace */
- "__kvm_nvhe_", /* arm64 non-VHE KVM namespace */
+ "__kvm_nvhe_$", /* arm64 local symbols in non-VHE KVM namespace */
"__AArch64ADRPThunk_", /* arm64 lld */
"__ARMV5PILongThunk_", /* arm lld */
"__ARMV7PILongThunk_",
--
2.35.1.473.g83b2b277ed-goog
On Wed, Feb 23, 2022 at 09:16:59AM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 09:05:18 +0000,
> kernel test robot <[email protected]> wrote:
> >
> > Hi Kalesh,
> >
> > Thank you for the patch! Perhaps something to improve:
> >
> > [auto build test WARNING on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
> >
> > url: https://github.com/0day-ci/linux/commits/Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> > base: cfb92440ee71adcc2105b0890bb01ac3cddb8507
> > config: arm64-randconfig-r011-20220221 (https://download.01.org/0day-ci/archive/20220223/[email protected]/config)
> > compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
> > reproduce (this is a W=1 build):
> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # install arm64 cross compiling tool for clang build
> > # apt-get install binutils-aarch64-linux-gnu
> > # https://github.com/0day-ci/linux/commit/7fe99fd40f7c4b2973218045ca5b9c9160524db1
> > git remote add linux-review https://github.com/0day-ci/linux
> > git fetch --no-tags linux-review Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
> > git checkout 7fe99fd40f7c4b2973218045ca5b9c9160524db1
> > # save the config file to linux build tree
> > mkdir build_dir
> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/
> >
> > If you fix the issue, kindly add following tag as appropriate
> > Reported-by: kernel test robot <[email protected]>
> >
> > All warnings (new ones prefixed by >>):
> >
> > include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> > #define NULL ((void *)0)
> > ^~~~~~~~~~~
> > arch/arm64/kvm/hyp/nvhe/switch.c:200:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
> > [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
> > ^~~~~~~~~~~~~~~~~~~~~
> > arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
> > [0 ... ESR_ELx_EC_MAX] = NULL,
> > ^~~~
> > include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
> > #define NULL ((void *)0)
> > ^~~~~~~~~~~
>
> Kalesh, please ignore this nonsense. There may be things to improve,
> but this is *NOT* one of them.
>
> These reports are pretty useless, and just lead people to ignore real
> bug reports.
Hi Kalesh, sorry there're some irrelevant issues mixed in the report,
kindly ignore them. And the valuable ones are the new ones that
prefixed by >>, as the below one in original report.
>> arch/arm64/kvm/hyp/nvhe/switch.c:372:17: warning: no previous prototype for function 'hyp_panic_bad_stack' [-Wmissing-prototypes]
void __noreturn hyp_panic_bad_stack(void)
^
Thanks
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.
> _______________________________________________
> kbuild-all mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
On 2022-02-23 12:34, Philip Li wrote:
> On Wed, Feb 23, 2022 at 09:16:59AM +0000, Marc Zyngier wrote:
>> On Wed, 23 Feb 2022 09:05:18 +0000,
>> kernel test robot <[email protected]> wrote:
>> >
>> > Hi Kalesh,
>> >
>> > Thank you for the patch! Perhaps something to improve:
>> >
>> > [auto build test WARNING on cfb92440ee71adcc2105b0890bb01ac3cddb8507]
>> >
>> > url: https://github.com/0day-ci/linux/commits/Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
>> > base: cfb92440ee71adcc2105b0890bb01ac3cddb8507
>> > config: arm64-randconfig-r011-20220221 (https://download.01.org/0day-ci/archive/20220223/[email protected]/config)
>> > compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
>> > reproduce (this is a W=1 build):
>> > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>> > chmod +x ~/bin/make.cross
>> > # install arm64 cross compiling tool for clang build
>> > # apt-get install binutils-aarch64-linux-gnu
>> > # https://github.com/0day-ci/linux/commit/7fe99fd40f7c4b2973218045ca5b9c9160524db1
>> > git remote add linux-review https://github.com/0day-ci/linux
>> > git fetch --no-tags linux-review Kalesh-Singh/KVM-arm64-Hypervisor-stack-enhancements/20220223-010522
>> > git checkout 7fe99fd40f7c4b2973218045ca5b9c9160524db1
>> > # save the config file to linux build tree
>> > mkdir build_dir
>> > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash arch/arm64/
>> >
>> > If you fix the issue, kindly add following tag as appropriate
>> > Reported-by: kernel test robot <[email protected]>
>> >
>> > All warnings (new ones prefixed by >>):
>> >
>> > include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
>> > #define NULL ((void *)0)
>> > ^~~~~~~~~~~
>> > arch/arm64/kvm/hyp/nvhe/switch.c:200:27: warning: initializer overrides prior initialization of this subobject [-Winitializer-overrides]
>> > [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd,
>> > ^~~~~~~~~~~~~~~~~~~~~
>> > arch/arm64/kvm/hyp/nvhe/switch.c:196:28: note: previous initialization is here
>> > [0 ... ESR_ELx_EC_MAX] = NULL,
>> > ^~~~
>> > include/linux/stddef.h:8:14: note: expanded from macro 'NULL'
>> > #define NULL ((void *)0)
>> > ^~~~~~~~~~~
>>
>> Kalesh, please ignore this nonsense. There may be things to improve,
>> but this is *NOT* one of them.
>>
>> These reports are pretty useless, and just lead people to ignore real
>> bug reports.
>
> Hi Kalesh, sorry there're some irrelevant issues mixed in the report,
> kindly ignore them. And the valuable ones are the new ones that
> prefixed by >>, as the below one in original report.
>
>>> arch/arm64/kvm/hyp/nvhe/switch.c:372:17: warning: no previous
>>> prototype for function 'hyp_panic_bad_stack' [-Wmissing-prototypes]
> void __noreturn hyp_panic_bad_stack(void)
> ^
This is only called from assembly code, so a prototype wouldn't bring
much.
M.
--
Jazz is not dead. It just smells funny...