Hi all,
This is the v3 of the series previously posted here:
https://lore.kernel.org/kvmarm/[email protected]/
This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].
V3 includes a bunch of clean-ups and small refactorings all over the
place as well as a few new features. Specifically, this now allows us to
remove memory pages from the host stage 2 cleanly, and this series does
so for all the .hyp memory sections (which has uncovered existing bugs
upstream and in v2 of this series -- see [3] and [4]). This also now
makes good use of block mappings whenever that is possible, and has
gotten a bit more testing on real hardware (which helped uncover other
bugs [5]).
The other changes to v3 include:
- clean-ups, refactoring and extra comments all over the place (Will);
- dropped fdt hook in favor of memblock API now that the relevant
patches ([6]) are merged (Rob);
- moved the CPU feature copy stuff to __init/__initdata (Marc);
- fixed FWB support (Mate);
- rebased on v5.12-rc1.
This series depends on Will's vCPU context fix ([5]) and Marc's PMU
fixes ([7]). And here's a branch with all the goodies applied:
https://android-kvm.googlesource.com/linux qperret/host-stage2-v3
Thanks,
Quentin
[1] https://youtu.be/54q6RzS9BpQ?t=10859
[2] https://youtu.be/wY-u6n75iXc
[3] https://lore.kernel.org/kvmarm/[email protected]/
[4] https://lore.kernel.org/kvmarm/[email protected]/
[5] https://lore.kernel.org/kvmarm/[email protected]/
[6] https://lore.kernel.org/lkml/[email protected]/
[7] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pmu-undef-NV
Quentin Perret (29):
KVM: arm64: Initialize kvm_nvhe_init_params early
KVM: arm64: Avoid free_page() in page-table allocator
KVM: arm64: Factor memory allocation out of pgtable.c
KVM: arm64: Introduce a BSS section for use at Hyp
KVM: arm64: Make kvm_call_hyp() a function call at Hyp
KVM: arm64: Allow using kvm_nvhe_sym() in hyp code
KVM: arm64: Introduce an early Hyp page allocator
KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp
KVM: arm64: Introduce a Hyp buddy page allocator
KVM: arm64: Enable access to sanitized CPU features at EL2
KVM: arm64: Factor out vector address calculation
KVM: arm64: Prepare the creation of s1 mappings at EL2
KVM: arm64: Elevate hypervisor mappings creation at EL2
KVM: arm64: Use kvm_arch for stage 2 pgtable
KVM: arm64: Use kvm_arch in kvm_s2_mmu
KVM: arm64: Set host stage 2 using kvm_nvhe_init_params
KVM: arm64: Refactor kvm_arm_setup_stage2()
KVM: arm64: Refactor __load_guest_stage2()
KVM: arm64: Refactor __populate_fault_info()
KVM: arm64: Make memcache anonymous in pgtable allocator
KVM: arm64: Reserve memory for host stage 2
KVM: arm64: Sort the hypervisor memblocks
KVM: arm64: Introduce PROT_NONE mappings for stage 2
KVM: arm64: Refactor stage2_map_set_prot_attr()
KVM: arm64: Add kvm_pgtable_stage2_idmap_greedy()
KVM: arm64: Wrap the host with a stage 2
KVM: arm64: Page-align the .hyp sections
KVM: arm64: Disable PMU support in protected mode
KVM: arm64: Protect the .hyp sections from the host
Will Deacon (3):
arm64: lib: Annotate {clear,copy}_page() as position-independent
KVM: arm64: Link position-independent string routines into .hyp.text
arm64: kvm: Add standalone ticket spinlock implementation for use at
hyp
arch/arm64/include/asm/cpufeature.h | 1 +
arch/arm64/include/asm/hyp_image.h | 7 +
arch/arm64/include/asm/kvm_asm.h | 9 +
arch/arm64/include/asm/kvm_cpufeature.h | 19 ++
arch/arm64/include/asm/kvm_host.h | 19 +-
arch/arm64/include/asm/kvm_hyp.h | 8 +
arch/arm64/include/asm/kvm_mmu.h | 23 +-
arch/arm64/include/asm/kvm_pgtable.h | 117 ++++++-
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/asm-offsets.c | 3 +
arch/arm64/kernel/cpufeature.c | 13 +
arch/arm64/kernel/image-vars.h | 30 ++
arch/arm64/kernel/vmlinux.lds.S | 74 +++--
arch/arm64/kvm/arm.c | 199 ++++++++++--
arch/arm64/kvm/hyp/Makefile | 2 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 37 ++-
arch/arm64/kvm/hyp/include/nvhe/early_alloc.h | 14 +
arch/arm64/kvm/hyp/include/nvhe/gfp.h | 55 ++++
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 36 +++
arch/arm64/kvm/hyp/include/nvhe/memory.h | 52 +++
arch/arm64/kvm/hyp/include/nvhe/mm.h | 92 ++++++
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 92 ++++++
arch/arm64/kvm/hyp/nvhe/Makefile | 9 +-
arch/arm64/kvm/hyp/nvhe/cache.S | 13 +
arch/arm64/kvm/hyp/nvhe/cpufeature.c | 8 +
arch/arm64/kvm/hyp/nvhe/early_alloc.c | 54 ++++
arch/arm64/kvm/hyp/nvhe/hyp-init.S | 46 ++-
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 69 ++++
arch/arm64/kvm/hyp/nvhe/hyp.lds.S | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 235 ++++++++++++++
arch/arm64/kvm/hyp/nvhe/mm.c | 173 ++++++++++
arch/arm64/kvm/hyp/nvhe/page_alloc.c | 195 ++++++++++++
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 4 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 212 +++++++++++++
arch/arm64/kvm/hyp/nvhe/stub.c | 22 ++
arch/arm64/kvm/hyp/nvhe/switch.c | 12 +-
arch/arm64/kvm/hyp/nvhe/tlb.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 298 ++++++++++++++----
arch/arm64/kvm/hyp/reserved_mem.c | 113 +++++++
arch/arm64/kvm/mmu.c | 115 ++++++-
arch/arm64/kvm/perf.c | 3 +-
arch/arm64/kvm/pmu.c | 8 +-
arch/arm64/kvm/reset.c | 42 +--
arch/arm64/kvm/sys_regs.c | 21 ++
arch/arm64/lib/clear_page.S | 4 +-
arch/arm64/lib/copy_page.S | 4 +-
arch/arm64/mm/init.c | 3 +
47 files changed, 2356 insertions(+), 215 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_cpufeature.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/early_alloc.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/gfp.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/memory.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/mm.h
create mode 100644 arch/arm64/kvm/hyp/include/nvhe/spinlock.h
create mode 100644 arch/arm64/kvm/hyp/nvhe/cache.S
create mode 100644 arch/arm64/kvm/hyp/nvhe/cpufeature.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/early_alloc.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/mem_protect.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/mm.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/page_alloc.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/setup.c
create mode 100644 arch/arm64/kvm/hyp/nvhe/stub.c
create mode 100644 arch/arm64/kvm/hyp/reserved_mem.c
--
2.30.1.766.gb4fecdf3b7-goog
kvm_call_hyp() has some logic to issue a function call or a hypercall
depending on the EL at which the kernel is running. However, all the
code compiled under __KVM_NVHE_HYPERVISOR__ is guaranteed to only run
at EL2 which allows us to simplify.
Add ifdefery to kvm_host.h to simplify kvm_call_hyp() in .hyp.text.
Acked-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..06ca4828005f 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -591,6 +591,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
void kvm_arm_halt_guest(struct kvm *kvm);
void kvm_arm_resume_guest(struct kvm *kvm);
+#ifndef __KVM_NVHE_HYPERVISOR__
#define kvm_call_hyp_nvhe(f, ...) \
({ \
struct arm_smccc_res res; \
@@ -630,6 +631,11 @@ void kvm_arm_resume_guest(struct kvm *kvm);
\
ret; \
})
+#else /* __KVM_NVHE_HYPERVISOR__ */
+#define kvm_call_hyp(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_ret(f, ...) f(__VA_ARGS__)
+#define kvm_call_hyp_nvhe(f, ...) f(__VA_ARGS__)
+#endif /* __KVM_NVHE_HYPERVISOR__ */
void force_vm_exit(const cpumask_t *mask);
void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
--
2.30.1.766.gb4fecdf3b7-goog
From: Will Deacon <[email protected]>
clear_page() and copy_page() are suitable for use outside of the kernel
address space, so annotate them as position-independent code.
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/lib/clear_page.S | 4 ++--
arch/arm64/lib/copy_page.S | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/lib/clear_page.S b/arch/arm64/lib/clear_page.S
index 073acbf02a7c..b84b179edba3 100644
--- a/arch/arm64/lib/clear_page.S
+++ b/arch/arm64/lib/clear_page.S
@@ -14,7 +14,7 @@
* Parameters:
* x0 - dest
*/
-SYM_FUNC_START(clear_page)
+SYM_FUNC_START_PI(clear_page)
mrs x1, dczid_el0
and w1, w1, #0xf
mov x2, #4
@@ -25,5 +25,5 @@ SYM_FUNC_START(clear_page)
tst x0, #(PAGE_SIZE - 1)
b.ne 1b
ret
-SYM_FUNC_END(clear_page)
+SYM_FUNC_END_PI(clear_page)
EXPORT_SYMBOL(clear_page)
diff --git a/arch/arm64/lib/copy_page.S b/arch/arm64/lib/copy_page.S
index e7a793961408..29144f4cd449 100644
--- a/arch/arm64/lib/copy_page.S
+++ b/arch/arm64/lib/copy_page.S
@@ -17,7 +17,7 @@
* x0 - dest
* x1 - src
*/
-SYM_FUNC_START(copy_page)
+SYM_FUNC_START_PI(copy_page)
alternative_if ARM64_HAS_NO_HW_PREFETCH
// Prefetch three cache lines ahead.
prfm pldl1strm, [x1, #128]
@@ -75,5 +75,5 @@ alternative_else_nop_endif
stnp x16, x17, [x0, #112 - 256]
ret
-SYM_FUNC_END(copy_page)
+SYM_FUNC_END_PI(copy_page)
EXPORT_SYMBOL(copy_page)
--
2.30.1.766.gb4fecdf3b7-goog
In order to ease its re-use in other code paths, refactor
stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
No functional change intended.
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8e7059fcfd40..8aa01a9e2603 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
return vtcr;
}
-static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
- struct stage2_map_data *data)
+static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
{
bool device = prot & KVM_PGTABLE_PROT_DEVICE;
kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
@@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
if (prot & KVM_PGTABLE_PROT_NONE) {
if (prot != KVM_PGTABLE_PROT_NONE)
- return -EINVAL;
+ return 0;
attr |= KVM_PTE_LEAF_SW_BIT_PROT_NONE;
- goto out;
+ return attr;
}
if (!(prot & KVM_PGTABLE_PROT_X))
attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
else if (device)
- return -EINVAL;
+ return 0;
if (prot & KVM_PGTABLE_PROT_R)
attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
@@ -523,9 +522,7 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
-out:
- data->attr = attr;
- return 0;
+ return attr;
}
static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
@@ -708,9 +705,9 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
.arg = &map_data,
};
- ret = stage2_map_set_prot_attr(prot, &map_data);
- if (ret)
- return ret;
+ map_data.attr = stage2_get_prot_attr(prot);
+ if (!map_data.attr)
+ return -EINVAL;
ret = kvm_pgtable_walk(pgt, addr, size, &walker);
dsb(ishst);
--
2.30.1.766.gb4fecdf3b7-goog
On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> In order to ease its re-use in other code paths, refactor
> stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> No functional change intended.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
> 1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8e7059fcfd40..8aa01a9e2603 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> return vtcr;
> }
>
> -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> - struct stage2_map_data *data)
> +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> {
> bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>
> if (prot & KVM_PGTABLE_PROT_NONE) {
> if (prot != KVM_PGTABLE_PROT_NONE)
> - return -EINVAL;
> + return 0;
Hmm, does the architecture actually say that having all these attributes
as 0 is illegal? If not, I think it would be better to keep the int return
code and replace the 'data' parameter with a pointer to a kvm_pte_t.
Does that work?
Will
On Thursday 04 Mar 2021 at 20:03:36 (+0000), Will Deacon wrote:
> On Tue, Mar 02, 2021 at 02:59:57PM +0000, Quentin Perret wrote:
> > In order to ease its re-use in other code paths, refactor
> > stage2_map_set_prot_attr() to not depend on a stage2_map_data struct.
> > No functional change intended.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/pgtable.c | 19 ++++++++-----------
> > 1 file changed, 8 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 8e7059fcfd40..8aa01a9e2603 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -494,8 +494,7 @@ u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> > return vtcr;
> > }
> >
> > -static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> > - struct stage2_map_data *data)
> > +static kvm_pte_t stage2_get_prot_attr(enum kvm_pgtable_prot prot)
> > {
> > bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> > kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> > @@ -504,15 +503,15 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >
> > if (prot & KVM_PGTABLE_PROT_NONE) {
> > if (prot != KVM_PGTABLE_PROT_NONE)
> > - return -EINVAL;
> > + return 0;
>
> Hmm, does the architecture actually say that having all these attributes
> as 0 is illegal?
Hmm, that's a good point, that might not be the case. I assumed we would
have no use for this, but there we can easily avoid the restriction
so...
> If not, I think it would be better to keep the int return
> code and replace the 'data' parameter with a pointer to a kvm_pte_t.
>
> Does that work?
I think so yes, I'll fix it up.
Cheers,
Quentin