A subsequent change to the page table walkers adds RCU protection for
walking stage-2 page tables. KVM uses a global lock to serialize hyp
stage-1 walks, meaning RCU protection is quite meaningless for
protecting hyp stage-1 walkers.
Add a new helper, kvm_pgtable_hyp_walk(), for use when walking hyp
stage-1 tables. Call directly into __kvm_pgtable_walk() as table
concatenation is not a supported feature at stage-1.
No functional change intended.
Signed-off-by: Oliver Upton <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 24 ++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/setup.c | 2 +-
arch/arm64/kvm/hyp/pgtable.c | 18 +++++++++++++++---
3 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index a874ce0ce7b5..43b2f1882e11 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -596,6 +596,30 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_pgtable_walker *walker);
+/**
+ * kvm_pgtable_hyp_walk() - Walk a hyp stage-1 page-table.
+ * @pgt: Page-table structure initialized by kvm_pgtable_hyp_init().
+ * @addr: Input address for the start of the walk.
+ * @size: Size of the range to walk.
+ * @walker: Walker callback description.
+ *
+ * The offset of @addr within a page is ignored and @size is rounded-up to
+ * the next page boundary.
+ *
+ * The walker will walk the page-table entries corresponding to the input
+ * address range specified, visiting entries according to the walker flags.
+ * Invalid entries are treated as leaf entries. Leaf entries are reloaded
+ * after invoking the walker callback, allowing the walker to descend into
+ * a newly installed table.
+ *
+ * Returning a negative error code from the walker callback function will
+ * terminate the walk immediately with the same error code.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_hyp_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ struct kvm_pgtable_walker *walker);
+
/**
* kvm_pgtable_get_leaf() - Walk a page-table and retrieve the leaf entry
* with its level.
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 1068338d77f3..55eeb3ed1891 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -246,7 +246,7 @@ static int finalize_host_mappings(void)
struct memblock_region *reg = &hyp_memory[i];
u64 start = (u64)hyp_phys_to_virt(reg->base);
- ret = kvm_pgtable_walk(&pkvm_pgtable, start, reg->size, &walker);
+ ret = kvm_pgtable_hyp_walk(&pkvm_pgtable, start, reg->size, &walker);
if (ret)
return ret;
}
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 5bca9610d040..385fa1051b5d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -335,6 +335,18 @@ int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
return ret;
}
+int kvm_pgtable_hyp_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
+ struct kvm_pgtable_walker *walker)
+{
+ struct kvm_pgtable_walk_data data = {
+ .walker = walker,
+ .addr = ALIGN_DOWN(addr, PAGE_SIZE),
+ .end = PAGE_ALIGN(addr + size),
+ };
+
+ return __kvm_pgtable_walk(&data, pgt->mm_ops, pgt->pgd, pgt->start_level);
+}
+
struct hyp_map_data {
u64 phys;
kvm_pte_t attr;
@@ -454,7 +466,7 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
if (ret)
return ret;
- ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+ ret = kvm_pgtable_hyp_walk(pgt, addr, size, &walker);
dsb(ishst);
isb();
return ret;
@@ -512,7 +524,7 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
if (!pgt->mm_ops->page_count)
return 0;
- kvm_pgtable_walk(pgt, addr, size, &walker);
+ kvm_pgtable_hyp_walk(pgt, addr, size, &walker);
return unmapped;
}
@@ -557,7 +569,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
};
- WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
+ WARN_ON(kvm_pgtable_hyp_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
pgt->mm_ops->put_page(kvm_dereference_pteref(pgt->pgd, false));
pgt->pgd = NULL;
}
--
2.38.1.431.g37b22c650d-goog
On Mon, Nov 14, 2022 at 08:11:27PM +0000, Oliver Upton wrote:
> A subsequent change to the page table walkers adds RCU protection for
> walking stage-2 page tables. KVM uses a global lock to serialize hyp
> stage-1 walks, meaning RCU protection is quite meaningless for
> protecting hyp stage-1 walkers.
>
> Add a new helper, kvm_pgtable_hyp_walk(), for use when walking hyp
> stage-1 tables. Call directly into __kvm_pgtable_walk() as table
> concatenation is not a supported feature at stage-1.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <[email protected]>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 24 ++++++++++++++++++++++++
> arch/arm64/kvm/hyp/nvhe/setup.c | 2 +-
> arch/arm64/kvm/hyp/pgtable.c | 18 +++++++++++++++---
> 3 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index a874ce0ce7b5..43b2f1882e11 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -596,6 +596,30 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
> int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> struct kvm_pgtable_walker *walker);
>
> +/**
> + * kvm_pgtable_hyp_walk() - Walk a hyp stage-1 page-table.
> + * @pgt: Page-table structure initialized by kvm_pgtable_hyp_init().
> + * @addr: Input address for the start of the walk.
> + * @size: Size of the range to walk.
> + * @walker: Walker callback description.
> + *
> + * The offset of @addr within a page is ignored and @size is rounded-up to
> + * the next page boundary.
> + *
> + * The walker will walk the page-table entries corresponding to the input
> + * address range specified, visiting entries according to the walker flags.
> + * Invalid entries are treated as leaf entries. Leaf entries are reloaded
> + * after invoking the walker callback, allowing the walker to descend into
> + * a newly installed table.
> + *
> + * Returning a negative error code from the walker callback function will
> + * terminate the walk immediately with the same error code.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_hyp_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> + struct kvm_pgtable_walker *walker);
Hmm, this feels like slightly the wrong abstraction to me -- there's nothing
hyp-specific about the problem being solved, it's just that the only user
is for hyp walks.
Could we instead rework 'struct kvm_pgtable' slightly so that the existing
'flags' field is no-longer stage-2 specific and includes a KVM_PGTABLE_LOCKED
flag which could be set by kvm_pgtable_hyp_init()?
That way the top-level API remains unchanged and the existing callers will
continue to work.
Cheers,
Will
Hey Will,
On Tue, Nov 15, 2022 at 01:25:34PM +0000, Will Deacon wrote:
[...]
> On Mon, Nov 14, 2022 at 08:11:27PM +0000, Oliver Upton wrote:
> > +int kvm_pgtable_hyp_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > + struct kvm_pgtable_walker *walker);
>
> Hmm, this feels like slightly the wrong abstraction to me -- there's nothing
> hyp-specific about the problem being solved, it's just that the only user
> is for hyp walks.
>
> Could we instead rework 'struct kvm_pgtable' slightly so that the existing
> 'flags' field is no-longer stage-2 specific and includes a KVM_PGTABLE_LOCKED
> flag which could be set by kvm_pgtable_hyp_init()?
>
> That way the top-level API remains unchanged and the existing callers will
> continue to work.
Thanks for the suggestion! Yeah, this should be described by the flags
instead.
We already have KVM_PGTABLE_WALK_SHARED, I could actually condition the
RCU lock/unlock on that one. That would make it an explicit opt-in
instead of requiring an opt out with callers passing KVM_PGTABLE_LOCKED.
Thoughts?
--
Thanks,
Oliver