2022-02-25 08:53:52

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 0/8] KVM: arm64: Hypervisor stack enhancements

Hi all,

This is v4 of the nVHE hypervisor stack enhancements.

Previous versions can be found at:
v3: https://lore.kernel.org/r/[email protected]/
v2: https://lore.kernel.org/r/[email protected]/
v1: https://lore.kernel.org/r/[email protected]/

The previous cover letter has been copied below for convenience.

Thanks,
Kalesh

-----

This series is based on 5.17-rc5 and adds the following stack features to
the KVM nVHE hypervisor:

== Hyp Stack Guard Pages ==

Based on the technique used by arm64 VMAP_STACK to detect overflow.
i.e. the stack is aligned to twice its size which ensure that the
'stack shift' bit of any valid SP is 0. The 'stack shift' bit can be
tested in the exception entry to detect overflow without corrupting GPRs.

== Hyp Stack Unwinder ==

Based on the arm64 kernel stack unwinder
(See: arch/arm64/kernel/stacktrace.c)

The unwinding and dumping of the hyp stack is not enabled by default and
depends on CONFIG_NVHE_EL2_DEBUG to avoid potential information leaks.

When CONFIG_NVHE_EL2_DEBUG is enabled the host stage 2 protection is
disabled, allowing the host to read the hypervisor stack pages and unwind
the stack from EL1. This allows us to print the hypervisor stacktrace
before panicking the host; as shown below.

Example call trace:

[ 98.916444][ T426] kvm [426]: nVHE hyp panic at: [<ffffffc0096156fc>] __kvm_nvhe_overflow_stack+0x8/0x34!
[ 98.918360][ T426] nVHE HYP call trace:
[ 98.918692][ T426] kvm [426]: [<ffffffc009615aac>] __kvm_nvhe_cpu_prepare_nvhe_panic_info+0x4c/0x68
[ 98.919545][ T426] kvm [426]: [<ffffffc0096159a4>] __kvm_nvhe_hyp_panic+0x2c/0xe8
[ 98.920107][ T426] kvm [426]: [<ffffffc009615ad8>] __kvm_nvhe_hyp_panic_bad_stack+0x10/0x10
[ 98.920665][ T426] kvm [426]: [<ffffffc009610a4c>] __kvm_nvhe___kvm_hyp_host_vector+0x24c/0x794
[ 98.921292][ T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
. . .

[ 98.973382][ T426] kvm [426]: [<ffffffc009615718>] __kvm_nvhe_overflow_stack+0x24/0x34
[ 98.973816][ T426] kvm [426]: [<ffffffc0096152f4>] __kvm_nvhe___kvm_vcpu_run+0x38/0x438
[ 98.974255][ T426] kvm [426]: [<ffffffc009616f80>] __kvm_nvhe_handle___kvm_vcpu_run+0x1c4/0x364
[ 98.974719][ T426] kvm [426]: [<ffffffc009616928>] __kvm_nvhe_handle_trap+0xa8/0x130
[ 98.975152][ T426] kvm [426]: [<ffffffc009610064>] __kvm_nvhe___host_exit+0x64/0x64
[ 98.975588][ T426] ---- end of nVHE HYP call trace ----


Kalesh Singh (8):
KVM: arm64: Introduce hyp_alloc_private_va_range()
KVM: arm64: Introduce pkvm_alloc_private_va_range()
KVM: arm64: Add guard pages for KVM nVHE hypervisor stack
KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack
KVM: arm64: Detect and handle hypervisor stack overflows
KVM: arm64: Add hypervisor overflow stack
KVM: arm64: Unwind and dump nVHE HYP stacktrace
KVM: arm64: Symbolize the nVHE HYP backtrace

arch/arm64/include/asm/kvm_asm.h | 21 +++
arch/arm64/include/asm/kvm_mmu.h | 4 +
arch/arm64/include/asm/stacktrace.h | 12 ++
arch/arm64/kernel/stacktrace.c | 210 ++++++++++++++++++++++++---
arch/arm64/kvm/Kconfig | 5 +-
arch/arm64/kvm/arm.c | 34 ++++-
arch/arm64/kvm/handle_exit.c | 16 +-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 3 +-
arch/arm64/kvm/hyp/nvhe/host.S | 29 ++++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 5 +-
arch/arm64/kvm/hyp/nvhe/mm.c | 60 +++++---
arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++-
arch/arm64/kvm/hyp/nvhe/switch.c | 30 +++-
arch/arm64/kvm/mmu.c | 65 ++++++---
scripts/kallsyms.c | 2 +-
15 files changed, 428 insertions(+), 93 deletions(-)


base-commit: cfb92440ee71adcc2105b0890bb01ac3cddb8507
--
2.35.1.574.g5d30c73bfb-goog


2022-02-25 09:23:56

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 6/8] KVM: arm64: Add hypervisor overflow stack

Allocate and switch to 16-byte aligned secondary stack on overflow. This
provides us stack space to better handle overflows; and is used in
a subsequent patch to dump the hypervisor stacktrace. The overflow stack
is only allocated if CONFIG_NVHE_EL2_DEBUG is enabled, as hypervisor
stacktraces is a debug feature dependent on CONFIG_NVHE_EL2_DEBUG.

Signed-off-by: Kalesh Singh <[email protected]>
---

Changes in v4:
- Update comment to clarify resetting the SP to the top of the stack
only happens if CONFIG_NVHE_EL2_DEBUG is disabled, per Fuad

arch/arm64/kvm/hyp/nvhe/host.S | 11 ++++++++---
arch/arm64/kvm/hyp/nvhe/switch.c | 5 +++++
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 749961bfa5ba..2c04f3e6b3f0 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -179,13 +179,18 @@ SYM_FUNC_END(__host_hvc)
b hyp_panic

.L__hyp_sp_overflow\@:
+#ifdef CONFIG_NVHE_EL2_DEBUG
+ /* Switch to the overflow stack */
+ adr_this_cpu sp, hyp_overflow_stack + PAGE_SIZE, x0
+#else
/*
- * Reset SP to the top of the stack, to allow handling the hyp_panic.
- * This corrupts the stack but is ok, since we won't be attempting
- * any unwinding here.
+ * If !CONFIG_NVHE_EL2_DEBUG, reset SP to the top of the stack, to
+ * allow handling the hyp_panic. This corrupts the stack but is ok,
+ * since we won't be attempting any unwinding here.
*/
ldr_this_cpu x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
mov sp, x0
+#endif

bl hyp_panic_bad_stack
ASM_BUG()
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 703a5d3f611b..efc20273a352 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -34,6 +34,11 @@ DEFINE_PER_CPU(struct kvm_host_data, kvm_host_data);
DEFINE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt);
DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);

+#ifdef CONFIG_NVHE_EL2_DEBUG
+DEFINE_PER_CPU(unsigned long [PAGE_SIZE/sizeof(long)], hyp_overflow_stack)
+ __aligned(16);
+#endif
+
static void __activate_traps(struct kvm_vcpu *vcpu)
{
u64 val;
--
2.35.1.574.g5d30c73bfb-goog

2022-02-25 09:59:48

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 5/8] KVM: arm64: Detect and handle hypervisor stack overflows

The hypervisor stacks (for both nVHE Hyp mode and nVHE protected mode)
are aligned to twice their size (PAGE_SIZE), meaning that any valid stack
address has PAGE_SHIFT bit as 0. This allows us to conveniently check for
overflow in the exception entry without corrupting any GPRs. We won't
recover from a stack overflow so panic the hypervisor.

Signed-off-by: Kalesh Singh <[email protected]>
---

Changes in v3:
- Remove test_sp_overflow macro, per Mark
- Add asmlinkage attribute for hyp_panic, hyp_panic_bad_stack, per Ard

arch/arm64/kvm/hyp/nvhe/host.S | 24 ++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/switch.c | 7 ++++++-
2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/host.S b/arch/arm64/kvm/hyp/nvhe/host.S
index 3d613e721a75..749961bfa5ba 100644
--- a/arch/arm64/kvm/hyp/nvhe/host.S
+++ b/arch/arm64/kvm/hyp/nvhe/host.S
@@ -153,6 +153,18 @@ SYM_FUNC_END(__host_hvc)

.macro invalid_host_el2_vect
.align 7
+
+ /*
+ * Test whether the SP has overflowed, without corrupting a GPR.
+ * nVHE hypervisor stacks are aligned so that SP & (1 << PAGE_SHIFT)
+ * should always be zero.
+ */
+ add sp, sp, x0 // sp' = sp + x0
+ sub x0, sp, x0 // x0' = sp' - x0 = (sp + x0) - x0 = sp
+ tbnz x0, #PAGE_SHIFT, .L__hyp_sp_overflow\@
+ sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
+ sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
+
/* If a guest is loaded, panic out of it. */
stp x0, x1, [sp, #-16]!
get_loaded_vcpu x0, x1
@@ -165,6 +177,18 @@ SYM_FUNC_END(__host_hvc)
* been partially clobbered by __host_enter.
*/
b hyp_panic
+
+.L__hyp_sp_overflow\@:
+ /*
+ * Reset SP to the top of the stack, to allow handling the hyp_panic.
+ * This corrupts the stack but is ok, since we won't be attempting
+ * any unwinding here.
+ */
+ ldr_this_cpu x0, kvm_init_params + NVHE_INIT_STACK_HYP_VA, x1
+ mov sp, x0
+
+ bl hyp_panic_bad_stack
+ ASM_BUG()
.endm

.macro invalid_host_el1_vect
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6410d21d8695..703a5d3f611b 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -347,7 +347,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
return exit_code;
}

-void __noreturn hyp_panic(void)
+asmlinkage void __noreturn hyp_panic(void)
{
u64 spsr = read_sysreg_el2(SYS_SPSR);
u64 elr = read_sysreg_el2(SYS_ELR);
@@ -369,6 +369,11 @@ void __noreturn hyp_panic(void)
unreachable();
}

+asmlinkage void __noreturn hyp_panic_bad_stack(void)
+{
+ hyp_panic();
+}
+
asmlinkage void kvm_unexpected_el2_exception(void)
{
return __kvm_unexpected_el2_exception();
--
2.35.1.574.g5d30c73bfb-goog

2022-02-25 10:31:40

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <[email protected]>
---

Changes in v4:
- Replace IS_ERR_OR_NULL check with IS_ERR check now that
hyp_alloc_private_va_range() returns an error for null
pointer, per Fuad
- Format comments to < 80 cols, per Fuad

Changes in v3:
- Handle null ptr in IS_ERR_OR_NULL checks, per Mark

arch/arm64/include/asm/kvm_asm.h | 1 +
arch/arm64/kvm/arm.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d5b0386ef765..2e277f2ed671 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
unsigned long tcr_el2;
unsigned long tpidr_el2;
unsigned long stack_hyp_va;
+ unsigned long stack_pa;
phys_addr_t pgd_pa;
unsigned long hcr_el2;
unsigned long vttbr;
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..0a83c0e7f838 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
params->tcr_el2 = tcr;

- params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
params->pgd_pa = kvm_mmu_get_httbr();
if (is_protected_kvm_enabled())
params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
@@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
* Map the Hyp stack pages
*/
for_each_possible_cpu(cpu) {
+ struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
- err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
- PAGE_HYP);
+ unsigned long stack_hyp_va, guard_hyp_va;

+ /*
+ * Private mappings are allocated downwards from io_map_base
+ * so allocate the stack first then the guard page.
+ *
+ * The stack is aligned to twice its size to facilitate overflow
+ * detection.
+ */
+ err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
+ PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
if (err) {
kvm_err("Cannot map hyp stack\n");
goto out_err;
}
+
+ /* Allocate unbacked private VA range for stack guard page */
+ guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+ if (IS_ERR((void *)guard_hyp_va)) {
+ err = PTR_ERR((void *)guard_hyp_va);
+ kvm_err("Cannot allocate hyp stack guard page\n");
+ goto out_err;
+ }
+
+ /*
+ * Save the stack PA in nvhe_init_params. This will be needed
+ * to recreate the stack mapping in protected nVHE mode.
+ * __hyp_pa() won't do the right thing there, since the stack
+ * has been mapped in the flexible private VA space.
+ */
+ params->stack_pa = __pa(stack_page) + PAGE_SIZE;
+
+ params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
}

for_each_possible_cpu(cpu) {
--
2.35.1.574.g5d30c73bfb-goog

2022-02-25 11:41:08

by Kalesh Singh

[permalink] [raw]
Subject: [PATCH v4 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack

Maps the stack pages in the flexible private VA range and allocates
guard pages below the stack as unbacked VA space. The stack is aligned
to twice its size to aid overflow detection (implemented in a subsequent
patch in the series).

Signed-off-by: Kalesh Singh <[email protected]>
---

Changes in v4:
- Replace IS_ERR_OR_NULL check with IS_ERR check now that
pkvm_alloc_private_va_range() returns an error for null
pointer, per Fuad

Changes in v3:
- Handle null ptr in IS_ERR_OR_NULL checks, per Mark

arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 27af337f9fea..1b69a25c1861 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
if (ret)
return ret;

- end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
+ /*
+ * Private mappings are allocated upwards from __io_map_base
+ * so allocate the guard page first then the stack.
+ */
+ start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
+ if (IS_ERR(start))
+ return PTR_ERR(start);
+
+ /*
+ * The stack is aligned to twice its size to facilitate overflow
+ * detection.
+ */
+ end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
start = end - PAGE_SIZE;
- ret = pkvm_create_mappings(start, end, PAGE_HYP);
- if (ret)
- return ret;
+ start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
+ PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);
+ if (IS_ERR(start))
+ return PTR_ERR(start);
+ end = start + PAGE_SIZE;
+
+ /* Update stack_hyp_va to end of the stack's private VA range */
+ per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
}

/*
--
2.35.1.574.g5d30c73bfb-goog

2022-03-02 11:16:00

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v4 4/8] KVM: arm64: Add guard pages for pKVM (protected nVHE) hypervisor stack

On Fri, 25 Feb 2022 03:34:49 +0000,
Kalesh Singh <[email protected]> wrote:
>
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
>
> Changes in v4:
> - Replace IS_ERR_OR_NULL check with IS_ERR check now that
> pkvm_alloc_private_va_range() returns an error for null
> pointer, per Fuad
>
> Changes in v3:
> - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
> arch/arm64/kvm/hyp/nvhe/setup.c | 25 +++++++++++++++++++++----
> 1 file changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 27af337f9fea..1b69a25c1861 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -105,11 +105,28 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> if (ret)
> return ret;
>
> - end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va;
> + /*
> + * Private mappings are allocated upwards from __io_map_base
> + * so allocate the guard page first then the stack.
> + */
> + start = (void *)pkvm_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
> + if (IS_ERR(start))
> + return PTR_ERR(start);
> +
> + /*
> + * The stack is aligned to twice its size to facilitate overflow
> + * detection.
> + */
> + end = (void *)per_cpu_ptr(&kvm_init_params, i)->stack_pa;
> start = end - PAGE_SIZE;
> - ret = pkvm_create_mappings(start, end, PAGE_HYP);
> - if (ret)
> - return ret;
> + start = (void *)__pkvm_create_private_mapping((phys_addr_t)start,
> + PAGE_SIZE, PAGE_SIZE * 2, PAGE_HYP);

Similar comments as the previous patch. I'd rather you treat each
stack as a two-page VA, populated by a single page. It would be a lot
clearer, and less fragile.

> + if (IS_ERR(start))
> + return PTR_ERR(start);
> + end = start + PAGE_SIZE;
> +
> + /* Update stack_hyp_va to end of the stack's private VA range */
> + per_cpu_ptr(&kvm_init_params, i)->stack_hyp_va = (unsigned long) end;
> }
>
> /*

Thanks,

M.

--
Without deviation from the norm, progress is not possible.

2022-03-02 18:21:04

by Kalesh Singh

[permalink] [raw]
Subject: Re: [PATCH v4 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack

On Tue, Mar 1, 2022 at 11:53 PM Marc Zyngier <[email protected]> wrote:
>
> On Fri, 25 Feb 2022 03:34:48 +0000,
> Kalesh Singh <[email protected]> wrote:
> >
> > Maps the stack pages in the flexible private VA range and allocates
> > guard pages below the stack as unbacked VA space. The stack is aligned
> > to twice its size to aid overflow detection (implemented in a subsequent
> > patch in the series).
> >
> > Signed-off-by: Kalesh Singh <[email protected]>
> > ---
> >
> > Changes in v4:
> > - Replace IS_ERR_OR_NULL check with IS_ERR check now that
> > hyp_alloc_private_va_range() returns an error for null
> > pointer, per Fuad
> > - Format comments to < 80 cols, per Fuad
> >
> > Changes in v3:
> > - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
> >
> > arch/arm64/include/asm/kvm_asm.h | 1 +
> > arch/arm64/kvm/arm.c | 32 +++++++++++++++++++++++++++++---
> > 2 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> > index d5b0386ef765..2e277f2ed671 100644
> > --- a/arch/arm64/include/asm/kvm_asm.h
> > +++ b/arch/arm64/include/asm/kvm_asm.h
> > @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
> > unsigned long tcr_el2;
> > unsigned long tpidr_el2;
> > unsigned long stack_hyp_va;
> > + unsigned long stack_pa;
> > phys_addr_t pgd_pa;
> > unsigned long hcr_el2;
> > unsigned long vttbr;
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index ecc5958e27fe..0a83c0e7f838 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
> > tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
> > params->tcr_el2 = tcr;
> >
> > - params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> > params->pgd_pa = kvm_mmu_get_httbr();
> > if (is_protected_kvm_enabled())
> > params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> > @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
> > * Map the Hyp stack pages
> > */
> > for_each_possible_cpu(cpu) {
> > + struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
> > char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> > - err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> > - PAGE_HYP);
> > + unsigned long stack_hyp_va, guard_hyp_va;
> >
> > + /*
> > + * Private mappings are allocated downwards from io_map_base
> > + * so allocate the stack first then the guard page.
> > + *
> > + * The stack is aligned to twice its size to facilitate overflow
> > + * detection.
> > + */
> > + err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> > + PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);
>
> Right, I guess that's where my earlier ask breaks, as you want an
> alignment that is *larger* than the allocation.
>
> > if (err) {
> > kvm_err("Cannot map hyp stack\n");
> > goto out_err;
> > }
> > +
> > + /* Allocate unbacked private VA range for stack guard page */
> > + guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);
>
> Huh. You are implicitly relying on the VA allocator handing you an
> address contiguous with the previous mapping. That's... brave. I'd
> rather you allocate the VA space upfront with the correct alignment
> and then map the single page where it should be in the VA region.
>
> That'd be a lot less fragile.

Agreed. I'll fix it in the next version.

Thanks,
Kalesh
>
> > + if (IS_ERR((void *)guard_hyp_va)) {
> > + err = PTR_ERR((void *)guard_hyp_va);
> > + kvm_err("Cannot allocate hyp stack guard page\n");
> > + goto out_err;
> > + }
> > +
> > + /*
> > + * Save the stack PA in nvhe_init_params. This will be needed
> > + * to recreate the stack mapping in protected nVHE mode.
> > + * __hyp_pa() won't do the right thing there, since the stack
> > + * has been mapped in the flexible private VA space.
> > + */
> > + params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> > +
> > + params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
> > }
> >
> > for_each_possible_cpu(cpu) {
>
> Thanks,
>
> M.
>
> --
> Without deviation from the norm, progress is not possible.

2022-03-02 23:07:05

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v4 3/8] KVM: arm64: Add guard pages for KVM nVHE hypervisor stack

On Fri, 25 Feb 2022 03:34:48 +0000,
Kalesh Singh <[email protected]> wrote:
>
> Maps the stack pages in the flexible private VA range and allocates
> guard pages below the stack as unbacked VA space. The stack is aligned
> to twice its size to aid overflow detection (implemented in a subsequent
> patch in the series).
>
> Signed-off-by: Kalesh Singh <[email protected]>
> ---
>
> Changes in v4:
> - Replace IS_ERR_OR_NULL check with IS_ERR check now that
> hyp_alloc_private_va_range() returns an error for null
> pointer, per Fuad
> - Format comments to < 80 cols, per Fuad
>
> Changes in v3:
> - Handle null ptr in IS_ERR_OR_NULL checks, per Mark
>
> arch/arm64/include/asm/kvm_asm.h | 1 +
> arch/arm64/kvm/arm.c | 32 +++++++++++++++++++++++++++++---
> 2 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index d5b0386ef765..2e277f2ed671 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -169,6 +169,7 @@ struct kvm_nvhe_init_params {
> unsigned long tcr_el2;
> unsigned long tpidr_el2;
> unsigned long stack_hyp_va;
> + unsigned long stack_pa;
> phys_addr_t pgd_pa;
> unsigned long hcr_el2;
> unsigned long vttbr;
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ecc5958e27fe..0a83c0e7f838 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1541,7 +1541,6 @@ static void cpu_prepare_hyp_mode(int cpu)
> tcr |= (idmap_t0sz & GENMASK(TCR_TxSZ_WIDTH - 1, 0)) << TCR_T0SZ_OFFSET;
> params->tcr_el2 = tcr;
>
> - params->stack_hyp_va = kern_hyp_va(per_cpu(kvm_arm_hyp_stack_page, cpu) + PAGE_SIZE);
> params->pgd_pa = kvm_mmu_get_httbr();
> if (is_protected_kvm_enabled())
> params->hcr_el2 = HCR_HOST_NVHE_PROTECTED_FLAGS;
> @@ -1990,14 +1989,41 @@ static int init_hyp_mode(void)
> * Map the Hyp stack pages
> */
> for_each_possible_cpu(cpu) {
> + struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
> char *stack_page = (char *)per_cpu(kvm_arm_hyp_stack_page, cpu);
> - err = create_hyp_mappings(stack_page, stack_page + PAGE_SIZE,
> - PAGE_HYP);
> + unsigned long stack_hyp_va, guard_hyp_va;
>
> + /*
> + * Private mappings are allocated downwards from io_map_base
> + * so allocate the stack first then the guard page.
> + *
> + * The stack is aligned to twice its size to facilitate overflow
> + * detection.
> + */
> + err = __create_hyp_private_mapping(__pa(stack_page), PAGE_SIZE,
> + PAGE_SIZE * 2, &stack_hyp_va, PAGE_HYP);

Right, I guess that's where my earlier ask breaks, as you want an
alignment that is *larger* than the allocation.

> if (err) {
> kvm_err("Cannot map hyp stack\n");
> goto out_err;
> }
> +
> + /* Allocate unbacked private VA range for stack guard page */
> + guard_hyp_va = hyp_alloc_private_va_range(PAGE_SIZE, PAGE_SIZE);

Huh. You are implicitly relying on the VA allocator handing you an
address contiguous with the previous mapping. That's... brave. I'd
rather you allocate the VA space upfront with the correct alignment
and then map the single page where it should be in the VA region.

That'd be a lot less fragile.

> + if (IS_ERR((void *)guard_hyp_va)) {
> + err = PTR_ERR((void *)guard_hyp_va);
> + kvm_err("Cannot allocate hyp stack guard page\n");
> + goto out_err;
> + }
> +
> + /*
> + * Save the stack PA in nvhe_init_params. This will be needed
> + * to recreate the stack mapping in protected nVHE mode.
> + * __hyp_pa() won't do the right thing there, since the stack
> + * has been mapped in the flexible private VA space.
> + */
> + params->stack_pa = __pa(stack_page) + PAGE_SIZE;
> +
> + params->stack_hyp_va = stack_hyp_va + PAGE_SIZE;
> }
>
> for_each_possible_cpu(cpu) {

Thanks,

M.

--
Without deviation from the norm, progress is not possible.