2021-07-29 13:29:24

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 00/21] Track shared pages at EL2 in protected mode

Hi all,

This is v3 of the patch series previously posted here:

https://lore.kernel.org/kvmarm/[email protected]/

This series aims to improve how the nVHE hypervisor tracks ownership of
memory pages when running in protected mode ("kvm-arm.mode=protected" on
the kernel command line).

The main issue with the existing ownership tracking code is that it is
completely binary: a page is either owned by an entity (e.g. the host)
or not. However, we'll need something smarter to track shared pages, as
is needed for virtio, or even just host/hypervisor communications.

This series introduces a few changes to the kvm page-table library to
allow annotating shared pages in ignored bits (a.k.a. software bits) of
leaf entries, and makes use of that infrastructure to track all pages
that are shared between the host and the hypervisor. We will obviously
want to apply the same treatment to guest stage-2 page-tables, but that
is not really possible to do until EL2 manages them directly, so I'll
keep that for another series.

The series is based on the kvmarm/fixes branch, and has been tested on
AML-S905X-CC (Le Potato) and using various Qemu configurations.

Changes since v2:
- Renamed and refactored the find_range() path for host memory aborts;
- Added hyp_assert_lock_held() using Will's hyp_spin_is_locked()
helper, and sprinkled a few of them throughout the series;
- Changed how host stage-2 mappings are adjusted after __pkvm_init() by
walking the hyp stage-1 instead of relying on the host calling
__pkvm_mark_hyp.

Changes since v1:
- Changed the 'share' hypercall to accept a single page at a time;
- Dropped the patch allowing to continue stage-2 map when hitting the
EAGAIN case;
- Dropped some of the custom pgtable walkers and used Marc's get_leaf()
patch instead;
- Changed pgtable API to manipulate SW bits directly rather than
specifying shared pages;
- Added comments and documentations all over;
- Cleanups and small refactoring.

Thanks,
Quentin

Marc Zyngier (1):
KVM: arm64: Introduce helper to retrieve a PTE and its level

Quentin Perret (19):
KVM: arm64: Introduce hyp_assert_lock_held()
KVM: arm64: Provide the host_stage2_try() helper macro
KVM: arm64: Expose page-table helpers
KVM: arm64: Optimize host memory aborts
KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED
KVM: arm64: Don't overwrite software bits with owner id
KVM: arm64: Tolerate re-creating hyp mappings to set software bits
KVM: arm64: Enable forcing page-level stage-2 mappings
KVM: arm64: Allow populating software bits
KVM: arm64: Add helpers to tag shared pages in SW bits
KVM: arm64: Expose host stage-2 manipulation helpers
KVM: arm64: Expose pkvm_hyp_id
KVM: arm64: Introduce addr_is_memory()
KVM: arm64: Enable retrieving protections attributes of PTEs
KVM: arm64: Mark host bss and rodata section as shared
KVM: arm64: Remove __pkvm_mark_hyp
KVM: arm64: Refactor protected nVHE stage-1 locking
KVM: arm64: Restrict EL2 stage-1 changes in protected mode
KVM: arm64: Make __pkvm_create_mappings static

Will Deacon (1):
KVM: arm64: Add hyp_spin_is_locked() for basic locking assertions at
EL2

arch/arm64/include/asm/kvm_asm.h | 5 +-
arch/arm64/include/asm/kvm_pgtable.h | 166 ++++++++----
arch/arm64/kvm/Kconfig | 9 +
arch/arm64/kvm/arm.c | 46 ----
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 33 ++-
arch/arm64/kvm/hyp/include/nvhe/mm.h | 3 +-
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 25 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 20 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 221 ++++++++++++++--
arch/arm64/kvm/hyp/nvhe/mm.c | 22 +-
arch/arm64/kvm/hyp/nvhe/setup.c | 82 +++++-
arch/arm64/kvm/hyp/pgtable.c | 247 +++++++++---------
arch/arm64/kvm/mmu.c | 28 +-
13 files changed, 625 insertions(+), 282 deletions(-)

--
2.32.0.432.gabb21c7263-goog



2021-07-29 13:29:33

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 01/21] KVM: arm64: Add hyp_spin_is_locked() for basic locking assertions at EL2

From: Will Deacon <[email protected]>

Introduce hyp_spin_is_locked() so that functions can easily assert that
a given lock is held (albeit possibly by another CPU!) without having to
drag full lockdep support up to EL2.

Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 76b537f8d1c6..04f65b655fcf 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -15,6 +15,7 @@

#include <asm/alternative.h>
#include <asm/lse.h>
+#include <asm/rwonce.h>

typedef union hyp_spinlock {
u32 __val;
@@ -89,4 +90,11 @@ static inline void hyp_spin_unlock(hyp_spinlock_t *lock)
: "memory");
}

+static inline bool hyp_spin_is_locked(hyp_spinlock_t *lock)
+{
+ hyp_spinlock_t lockval = READ_ONCE(*lock);
+
+ return lockval.owner != lockval.next;
+}
+
#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:29:40

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 03/21] KVM: arm64: Provide the host_stage2_try() helper macro

We currently unmap all MMIO mappings from the host stage-2 to recycle
the pages whenever we run out. In order to make this pattern easy to
re-use from other places, factor the logic out into a dedicated macro.
While at it, apply the macro for the kvm_pgtable_stage2_set_owner()
calls. They're currently only called early on and are guaranteed to
succeed, but making them robust to the -ENOMEM case doesn't hurt and
will avoid painful debugging sessions later on.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 40 +++++++++++++++------------
1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index d938ce95d3bd..74280a753efb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -208,6 +208,25 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
prot, &host_s2_pool);
}

+/*
+ * The pool has been provided with enough pages to cover all of memory with
+ * page granularity, but it is difficult to know how much of the MMIO range
+ * we will need to cover upfront, so we may need to 'recycle' the pages if we
+ * run out.
+ */
+#define host_stage2_try(fn, ...) \
+ ({ \
+ int __ret; \
+ hyp_assert_lock_held(&host_kvm.lock); \
+ __ret = fn(__VA_ARGS__); \
+ if (__ret == -ENOMEM) { \
+ __ret = host_stage2_unmap_dev_all(); \
+ if (!__ret) \
+ __ret = fn(__VA_ARGS__); \
+ } \
+ __ret; \
+ })
+
static int host_stage2_idmap(u64 addr)
{
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
@@ -223,22 +242,7 @@ static int host_stage2_idmap(u64 addr)
if (ret)
goto unlock;

- ret = __host_stage2_idmap(range.start, range.end, prot);
- if (ret != -ENOMEM)
- goto unlock;
-
- /*
- * The pool has been provided with enough pages to cover all of memory
- * with page granularity, but it is difficult to know how much of the
- * MMIO range we will need to cover upfront, so we may need to 'recycle'
- * the pages if we run out.
- */
- ret = host_stage2_unmap_dev_all();
- if (ret)
- goto unlock;
-
- ret = __host_stage2_idmap(range.start, range.end, prot);
-
+ ret = host_stage2_try(__host_stage2_idmap, range.start, range.end, prot);
unlock:
hyp_spin_unlock(&host_kvm.lock);

@@ -257,8 +261,8 @@ int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
return -EINVAL;

hyp_spin_lock(&host_kvm.lock);
- ret = kvm_pgtable_stage2_set_owner(&host_kvm.pgt, start, end - start,
- &host_s2_pool, pkvm_hyp_id);
+ ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
+ start, end - start, &host_s2_pool, pkvm_hyp_id);
hyp_spin_unlock(&host_kvm.lock);

return ret != -EAGAIN ? ret : 0;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:29:41

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 04/21] KVM: arm64: Introduce helper to retrieve a PTE and its level

From: Marc Zyngier <[email protected]>

It is becoming a common need to fetch the PTE for a given address
together with its level. Add such a helper.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 19 ++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 39 ++++++++++++++++++++++++++++
2 files changed, 58 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index f004c0115d89..082b9d65f40b 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -432,6 +432,25 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
struct kvm_pgtable_walker *walker);

+/**
+ * kvm_pgtable_get_leaf() - Walk a page-table and retrieve the leaf entry
+ * with its level.
+ * @pgt: Page-table structure initialised by kvm_pgtable_*_init().
+ * @addr: Input address for the start of the walk.
+ * @ptep: Pointer to storage for the retrieved PTE.
+ * @level: Pointer to storage for the level of the retrieved PTE.
+ *
+ * The offset of @addr within a page is ignored.
+ *
+ * The walker will walk the page-table entries corresponding to the input
+ * address specified, retrieving the leaf corresponding to this address.
+ * Invalid entries are treated as leaf entries.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
+ kvm_pte_t *ptep, u32 *level);
+
/**
* kvm_pgtable_stage2_find_range() - Find a range of Intermediate Physical
* Addresses with compatible permission
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 05321f4165e3..78f36bd5df6c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -326,6 +326,45 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
return _kvm_pgtable_walk(&walk_data);
}

+struct leaf_walk_data {
+ kvm_pte_t pte;
+ u32 level;
+};
+
+static int leaf_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+ enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+ struct leaf_walk_data *data = arg;
+
+ data->pte = *ptep;
+ data->level = level;
+
+ return 0;
+}
+
+int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
+ kvm_pte_t *ptep, u32 *level)
+{
+ struct leaf_walk_data data;
+ struct kvm_pgtable_walker walker = {
+ .cb = leaf_walker,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ .arg = &data,
+ };
+ int ret;
+
+ ret = kvm_pgtable_walk(pgt, ALIGN_DOWN(addr, PAGE_SIZE),
+ PAGE_SIZE, &walker);
+ if (!ret) {
+ if (ptep)
+ *ptep = data.pte;
+ if (level)
+ *level = data.level;
+ }
+
+ return ret;
+}
+
struct hyp_map_data {
u64 phys;
kvm_pte_t attr;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:29:55

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 06/21] KVM: arm64: Optimize host memory aborts

The kvm_pgtable_stage2_find_range() function is used in the host memory
abort path to try and look for the largest block mapping that can be
used to map the faulting address. In order to do so, the function
currently walks the stage-2 page-table and looks for existing
incompatible mappings within the range of the largest possible block.
If incompatible mappings are found, it tries the same procedure again,
but using a smaller block range, and repeats until a matching range is
found (potentially up to page granularity). While this approach has
benefits (mostly in the fact that it proactively coalesces host stage-2
mappings), it can be slow if the ranges are fragmented, and it isn't
optimized to deal with CPUs faulting on the same IPA as all of them will
do all the work every time.

To avoid these issues, remove kvm_pgtable_stage2_find_range(), and walk
the page-table only once in the host_mem_abort() path to find the
closest leaf to the input address. With this, use the corresponding
range if it is invalid and not owned by another entity. If a valid leaf
is found, return -EAGAIN similar to what is done in the
kvm_pgtable_stage2_map() path to optimize concurrent faults.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 30 -----------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 45 +++++++++++++++-
arch/arm64/kvm/hyp/pgtable.c | 74 ---------------------------
3 files changed, 44 insertions(+), 105 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 6938eac72c1f..83c5c97d9eac 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -154,16 +154,6 @@ enum kvm_pgtable_prot {
#define PAGE_HYP_RO (KVM_PGTABLE_PROT_R)
#define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)

-/**
- * struct kvm_mem_range - Range of Intermediate Physical Addresses
- * @start: Start of the range.
- * @end: End of the range.
- */
-struct kvm_mem_range {
- u64 start;
- u64 end;
-};
-
/**
* enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
* @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
@@ -490,24 +480,4 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
*/
int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
kvm_pte_t *ptep, u32 *level);
-
-/**
- * kvm_pgtable_stage2_find_range() - Find a range of Intermediate Physical
- * Addresses with compatible permission
- * attributes.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Address that must be covered by the range.
- * @prot: Protection attributes that the range must be compatible with.
- * @range: Range structure used to limit the search space at call time and
- * that will hold the result.
- *
- * The offset of @addr within a page is ignored. An IPA is compatible with @prot
- * iff its corresponding stage-2 page-table entry has default ownership and, if
- * valid, is mapped with protection attributes identical to @prot.
- *
- * Return: 0 on success, negative error code on failure.
- */
-int kvm_pgtable_stage2_find_range(struct kvm_pgtable *pgt, u64 addr,
- enum kvm_pgtable_prot prot,
- struct kvm_mem_range *range);
#endif /* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 74280a753efb..2148d3968aa5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -159,6 +159,11 @@ static int host_stage2_unmap_dev_all(void)
return kvm_pgtable_stage2_unmap(pgt, addr, BIT(pgt->ia_bits) - addr);
}

+struct kvm_mem_range {
+ u64 start;
+ u64 end;
+};
+
static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
{
int cur, left = 0, right = hyp_memblock_nr;
@@ -227,6 +232,44 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
__ret; \
})

+static inline bool range_included(struct kvm_mem_range *child,
+ struct kvm_mem_range *parent)
+{
+ return parent->start <= child->start && child->end <= parent->end;
+}
+
+static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
+{
+ struct kvm_mem_range cur;
+ kvm_pte_t pte;
+ u32 level;
+ int ret;
+
+ hyp_assert_lock_held(&host_kvm.lock);
+ ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
+ if (ret)
+ return ret;
+
+ if (kvm_pte_valid(pte))
+ return -EAGAIN;
+
+ if (pte)
+ return -EPERM;
+
+ do {
+ u64 granule = kvm_granule_size(level);
+ cur.start = ALIGN_DOWN(addr, granule);
+ cur.end = cur.start + granule;
+ level++;
+ } while ((level < KVM_PGTABLE_MAX_LEVELS) &&
+ !(kvm_level_supports_block_mapping(level) &&
+ range_included(&cur, range)));
+
+ *range = cur;
+
+ return 0;
+}
+
static int host_stage2_idmap(u64 addr)
{
enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
@@ -238,7 +281,7 @@ static int host_stage2_idmap(u64 addr)
prot |= KVM_PGTABLE_PROT_X;

hyp_spin_lock(&host_kvm.lock);
- ret = kvm_pgtable_stage2_find_range(&host_kvm.pgt, addr, prot, &range);
+ ret = host_stage2_adjust_range(addr, &range);
if (ret)
goto unlock;

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 49d768b92997..4dff2ad39ee4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1102,77 +1102,3 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
pgt->pgd = NULL;
}
-
-#define KVM_PTE_LEAF_S2_COMPAT_MASK (KVM_PTE_LEAF_ATTR_S2_PERMS | \
- KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
- KVM_PTE_LEAF_ATTR_S2_IGNORED)
-
-static int stage2_check_permission_walker(u64 addr, u64 end, u32 level,
- kvm_pte_t *ptep,
- enum kvm_pgtable_walk_flags flag,
- void * const arg)
-{
- kvm_pte_t old_attr, pte = *ptep, *new_attr = arg;
-
- /*
- * Compatible mappings are either invalid and owned by the page-table
- * owner (whose id is 0), or valid with matching permission attributes.
- */
- if (kvm_pte_valid(pte)) {
- old_attr = pte & KVM_PTE_LEAF_S2_COMPAT_MASK;
- if (old_attr != *new_attr)
- return -EEXIST;
- } else if (pte) {
- return -EEXIST;
- }
-
- return 0;
-}
-
-int kvm_pgtable_stage2_find_range(struct kvm_pgtable *pgt, u64 addr,
- enum kvm_pgtable_prot prot,
- struct kvm_mem_range *range)
-{
- kvm_pte_t attr;
- struct kvm_pgtable_walker check_perm_walker = {
- .cb = stage2_check_permission_walker,
- .flags = KVM_PGTABLE_WALK_LEAF,
- .arg = &attr,
- };
- u64 granule, start, end;
- u32 level;
- int ret;
-
- ret = stage2_set_prot_attr(pgt, prot, &attr);
- if (ret)
- return ret;
- attr &= KVM_PTE_LEAF_S2_COMPAT_MASK;
-
- for (level = pgt->start_level; level < KVM_PGTABLE_MAX_LEVELS; level++) {
- granule = kvm_granule_size(level);
- start = ALIGN_DOWN(addr, granule);
- end = start + granule;
-
- if (!kvm_level_supports_block_mapping(level))
- continue;
-
- if (start < range->start || range->end < end)
- continue;
-
- /*
- * Check the presence of existing mappings with incompatible
- * permissions within the current block range, and try one level
- * deeper if one is found.
- */
- ret = kvm_pgtable_walk(pgt, start, granule, &check_perm_walker);
- if (ret != -EEXIST)
- break;
- }
-
- if (!ret) {
- range->start = start;
- range->end = end;
- }
-
- return ret;
-}
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:05

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 05/21] KVM: arm64: Expose page-table helpers

The KVM pgtable API exposes the kvm_pgtable_walk() function to allow
the definition of walkers outside of pgtable.c. However, it is not easy
to implement any of those walkers without some of the low-level helpers.
Move some of them to the header file to allow re-use from other places.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 40 ++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 39 ---------------------------
2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 082b9d65f40b..6938eac72c1f 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -25,6 +25,46 @@ static inline u64 kvm_get_parange(u64 mmfr0)

typedef u64 kvm_pte_t;

+#define KVM_PTE_VALID BIT(0)
+
+#define KVM_PTE_ADDR_MASK GENMASK(47, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_48 GENMASK(15, 12)
+
+static inline bool kvm_pte_valid(kvm_pte_t pte)
+{
+ return pte & KVM_PTE_VALID;
+}
+
+static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
+{
+ u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+ if (PAGE_SHIFT == 16)
+ pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+ return pa;
+}
+
+static inline u64 kvm_granule_shift(u32 level)
+{
+ /* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
+ return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
+}
+
+static inline u64 kvm_granule_size(u32 level)
+{
+ return BIT(kvm_granule_shift(level));
+}
+
+static inline bool kvm_level_supports_block_mapping(u32 level)
+{
+ /*
+ * Reject invalid block mappings and don't bother with 4TB mappings for
+ * 52-bit PAs.
+ */
+ return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1));
+}
+
/**
* struct kvm_pgtable_mm_ops - Memory management callbacks.
* @zalloc_page: Allocate a single zeroed memory page.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 78f36bd5df6c..49d768b92997 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -11,16 +11,12 @@
#include <asm/kvm_pgtable.h>
#include <asm/stage2_pgtable.h>

-#define KVM_PTE_VALID BIT(0)

#define KVM_PTE_TYPE BIT(1)
#define KVM_PTE_TYPE_BLOCK 0
#define KVM_PTE_TYPE_PAGE 1
#define KVM_PTE_TYPE_TABLE 1

-#define KVM_PTE_ADDR_MASK GENMASK(47, PAGE_SHIFT)
-#define KVM_PTE_ADDR_51_48 GENMASK(15, 12)
-
#define KVM_PTE_LEAF_ATTR_LO GENMASK(11, 2)

#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX GENMASK(4, 2)
@@ -61,17 +57,6 @@ struct kvm_pgtable_walk_data {
u64 end;
};

-static u64 kvm_granule_shift(u32 level)
-{
- /* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
- return ARM64_HW_PGTABLE_LEVEL_SHIFT(level);
-}
-
-static u64 kvm_granule_size(u32 level)
-{
- return BIT(kvm_granule_shift(level));
-}
-
#define KVM_PHYS_INVALID (-1ULL)

static bool kvm_phys_is_valid(u64 phys)
@@ -79,15 +64,6 @@ static bool kvm_phys_is_valid(u64 phys)
return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_PARANGE_MAX));
}

-static bool kvm_level_supports_block_mapping(u32 level)
-{
- /*
- * Reject invalid block mappings and don't bother with 4TB mappings for
- * 52-bit PAs.
- */
- return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1));
-}
-
static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
{
u64 granule = kvm_granule_size(level);
@@ -135,11 +111,6 @@ static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
}

-static bool kvm_pte_valid(kvm_pte_t pte)
-{
- return pte & KVM_PTE_VALID;
-}
-
static bool kvm_pte_table(kvm_pte_t pte, u32 level)
{
if (level == KVM_PGTABLE_MAX_LEVELS - 1)
@@ -151,16 +122,6 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
}

-static u64 kvm_pte_to_phys(kvm_pte_t pte)
-{
- u64 pa = pte & KVM_PTE_ADDR_MASK;
-
- if (PAGE_SHIFT == 16)
- pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
-
- return pa;
-}
-
static kvm_pte_t kvm_phys_to_pte(u64 pa)
{
kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:13

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 02/21] KVM: arm64: Introduce hyp_assert_lock_held()

Introduce a poor man's lockdep implementation at EL2 which allows to
BUG() whenever a hyp spinlock is not held when it should. Hide this
feature behind a new Kconfig option that targets the EL2 object
specifically, instead of piggy backing on the existing CONFIG_LOCKDEP.
EL2 cannot WARN() cleanly to report locking issues, hence BUG() is the
only option and it is not clear whether we want this widely enabled.
This is most likely going to be useful for local testing until the EL2
WARN() situation has improved.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/Kconfig | 9 +++++++++
arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 17 +++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index a4eba0908bfa..9b9721895e5c 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -46,6 +46,15 @@ if KVM

source "virt/kvm/Kconfig"

+config NVHE_EL2_DEBUG
+ bool "Debug mode for non-VHE EL2 object"
+ help
+ Say Y here to enable the debug mode for the non-VHE KVM EL2 object.
+ Failure reports will BUG() in the hypervisor. This is intended for
+ local EL2 hypervisor development.
+
+ If unsure, say N.
+
endif # KVM

endif # VIRTUALIZATION
diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 04f65b655fcf..4652fd04bdbe 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -97,4 +97,21 @@ static inline bool hyp_spin_is_locked(hyp_spinlock_t *lock)
return lockval.owner != lockval.next;
}

+#ifdef CONFIG_NVHE_EL2_DEBUG
+static inline void hyp_assert_lock_held(hyp_spinlock_t *lock)
+{
+ /*
+ * The __pkvm_init() path accesses protected data-structures without
+ * holding locks as the other CPUs are guaranteed to not enter EL2
+ * concurrently at this point in time. The point by which EL2 is
+ * initialized on all CPUs is reflected in the pkvm static key, so
+ * wait until it is set before checking the lock state.
+ */
+ if (static_branch_likely(&kvm_protected_mode_initialized))
+ BUG_ON(!hyp_spin_is_locked(lock));
+}
+#else
+static inline void hyp_assert_lock_held(hyp_spinlock_t *lock) { }
+#endif
+
#endif /* __ARM64_KVM_NVHE_SPINLOCK_H__ */
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:20

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 07/21] KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED

The ignored bits for both stage-1 and stage-2 page and block
descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to
make it applicable to both. And while at it, since these bits are more
commonly known as 'software' bits, rename accordingly.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4dff2ad39ee4..59a394d82de3 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -36,6 +36,8 @@

#define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 51)

+#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
+
#define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)

#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)
@@ -44,8 +46,6 @@
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)

-#define KVM_PTE_LEAF_ATTR_S2_IGNORED GENMASK(58, 55)
-
#define KVM_INVALID_PTE_OWNER_MASK GENMASK(63, 56)
#define KVM_MAX_OWNER_ID 1

--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:22

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 08/21] KVM: arm64: Don't overwrite software bits with owner id

We will soon start annotating page-tables with new flags to track shared
pages and such, and we will do so in valid mappings using software bits
in the PTEs, as provided by the architecture. However, it is possible
that we will need to use those flags to annotate invalid mappings as
well in the future, similar to what we do to track page ownership in the
host stage-2.

In order to facilitate the annotation of invalid mappings with such
flags, it would be preferable to re-use the same bits as for valid
mappings (bits [58-55]), but these are currently used for ownership
encoding. Since we have plenty of bits left to use in invalid
mappings, move the ownership bits further down the PTE to avoid the
conflict.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 59a394d82de3..1ee1168ac32d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -46,7 +46,7 @@
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)

-#define KVM_INVALID_PTE_OWNER_MASK GENMASK(63, 56)
+#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
#define KVM_MAX_OWNER_ID 1

struct kvm_pgtable_walk_data {
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:34

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 11/21] KVM: arm64: Allow populating software bits

Introduce infrastructure allowing to manipulate software bits in stage-1
and stage-2 page-tables using additional entries in the kvm_pgtable_prot
enum.

This is heavily inspired by Marc's implementation of a similar feature
in the NV patch series, but adapted to allow stage-1 changes as well:

https://lore.kernel.org/kvmarm/[email protected]/

Suggested-by: Marc Zyngier <[email protected]>
Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 12 +++++++++++-
arch/arm64/kvm/hyp/pgtable.c | 5 +++++
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index ba7dcade2798..d5ca9b6ce241 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -121,6 +121,10 @@ enum kvm_pgtable_stage2_flags {
* @KVM_PGTABLE_PROT_W: Write permission.
* @KVM_PGTABLE_PROT_R: Read permission.
* @KVM_PGTABLE_PROT_DEVICE: Device attributes.
+ * @KVM_PGTABLE_PROT_SW0: Software bit 0.
+ * @KVM_PGTABLE_PROT_SW1: Software bit 1.
+ * @KVM_PGTABLE_PROT_SW2: Software bit 2.
+ * @KVM_PGTABLE_PROT_SW3: Software bit 3.
*/
enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_X = BIT(0),
@@ -128,6 +132,11 @@ enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_R = BIT(2),

KVM_PGTABLE_PROT_DEVICE = BIT(3),
+
+ KVM_PGTABLE_PROT_SW0 = BIT(55),
+ KVM_PGTABLE_PROT_SW1 = BIT(56),
+ KVM_PGTABLE_PROT_SW2 = BIT(57),
+ KVM_PGTABLE_PROT_SW3 = BIT(58),
};

#define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
@@ -419,7 +428,8 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
* If there is a valid, leaf page-table entry used to translate @addr, then
* relax the permissions in that entry according to the read, write and
* execute permissions specified by @prot. No permissions are removed, and
- * TLB invalidation is performed after updating the entry.
+ * TLB invalidation is performed after updating the entry. Software bits cannot
+ * be set or cleared using kvm_pgtable_stage2_relax_perms().
*
* Return: 0 on success, negative error code on failure.
*/
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index bcc02e6e0f62..1915489bb127 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -357,6 +357,7 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
+ attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
*ptep = attr;

return 0;
@@ -558,6 +559,7 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p

attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ attr |= prot & KVM_PTE_LEAF_ATTR_HI_SW;
*ptep = attr;

return 0;
@@ -1025,6 +1027,9 @@ int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
u32 level;
kvm_pte_t set = 0, clr = 0;

+ if (prot & KVM_PTE_LEAF_ATTR_HI_SW)
+ return -EINVAL;
+
if (prot & KVM_PGTABLE_PROT_R)
set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;

--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:42

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 12/21] KVM: arm64: Add helpers to tag shared pages in SW bits

We will soon start annotating shared pages in page-tables in nVHE
protected mode. Define all the states in which a page can be (owned,
shared and owned, shared and borrowed), and provide helpers allowing to
convert this into SW bits annotations using the matching prot
attributes.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 26 +++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 9c227d87c36d..ae355bfd8c01 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -12,6 +12,32 @@
#include <asm/virt.h>
#include <nvhe/spinlock.h>

+/*
+ * SW bits 0-1 are reserved to track the memory ownership state of each page:
+ * 00: The page is owned solely by the page-table owner.
+ * 01: The page is owned by the page-table owner, but is shared
+ * with another entity.
+ * 10: The page is shared with, but not owned by the page-table owner.
+ * 11: Reserved for future use (lending).
+ */
+enum pkvm_page_state {
+ PKVM_PAGE_OWNED = 0ULL,
+ PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
+ PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
+};
+
+#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
+static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
+ enum pkvm_page_state state)
+{
+ return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
+}
+
+static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
+{
+ return prot & PKVM_PAGE_STATE_PROT_MASK;
+}
+
struct host_kvm {
struct kvm_arch arch;
struct kvm_pgtable pgt;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:50

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 14/21] KVM: arm64: Expose pkvm_hyp_id

Allow references to the hypervisor's owner id from outside
mem_protect.c.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 47c2a0c51612..cc86598654b9 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -46,6 +46,8 @@ struct host_kvm {
};
extern struct host_kvm host_kvm;

+extern const u8 pkvm_hyp_id;
+
int __pkvm_prot_finalize(void);
int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a7f6134789e0..bb18940c3d12 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -31,7 +31,7 @@ static struct hyp_pool host_s2_pool;
u64 id_aa64mmfr0_el1_sys_val;
u64 id_aa64mmfr1_el1_sys_val;

-static const u8 pkvm_hyp_id = 1;
+const u8 pkvm_hyp_id = 1;

static void *host_s2_zalloc_pages_exact(size_t size)
{
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:30:55

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 16/21] KVM: arm64: Enable retrieving protections attributes of PTEs

Introduce helper functions in the KVM stage-2 and stage-1 page-table
manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
PTE. This will be useful to implement custom walkers outside of
pgtable.c.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 20 +++++++++++++++
arch/arm64/kvm/hyp/pgtable.c | 37 ++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index d5ca9b6ce241..7ff9f52239ba 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -505,4 +505,24 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
*/
int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
kvm_pte_t *ptep, u32 *level);
+
+/**
+ * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
+ * stage-2 Page-Table Entry.
+ * @pte: Page-table entry
+ *
+ * Return: protection attributes of the page-table entry in the enum
+ * kvm_pgtable_prot format.
+ */
+enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
+
+/**
+ * kvm_pgtable_hyp_pte_prot() - Retrieve the protection attributes of a stage-1
+ * Page-Table Entry.
+ * @pte: Page-table entry
+ *
+ * Return: protection attributes of the page-table entry in the enum
+ * kvm_pgtable_prot format.
+ */
+enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
#endif /* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 1915489bb127..a6eda8f23cb6 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -363,6 +363,26 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
return 0;
}

+enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
+{
+ enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
+ u32 ap;
+
+ if (!kvm_pte_valid(pte))
+ return prot;
+
+ if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
+ prot |= KVM_PGTABLE_PROT_X;
+
+ ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
+ if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
+ prot |= KVM_PGTABLE_PROT_R;
+ else if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RW)
+ prot |= KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
+
+ return prot;
+}
+
static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
{
/*
@@ -565,6 +585,23 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
return 0;
}

+enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte)
+{
+ enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
+
+ if (!kvm_pte_valid(pte))
+ return prot;
+
+ if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R)
+ prot |= KVM_PGTABLE_PROT_R;
+ if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W)
+ prot |= KVM_PGTABLE_PROT_W;
+ if (!(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN))
+ prot |= KVM_PGTABLE_PROT_X;
+
+ return prot;
+}
+
static bool stage2_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
{
if (!kvm_pte_valid(old) || !kvm_pte_valid(new))
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:05

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 15/21] KVM: arm64: Introduce addr_is_memory()

Introduce a helper usable in nVHE protected mode to check whether a
physical address is in a RAM region or not.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +++++++
2 files changed, 8 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index cc86598654b9..5968fbbb3514 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,6 +51,7 @@ extern const u8 pkvm_hyp_id;
int __pkvm_prot_finalize(void);
int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);

+bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
int kvm_host_prepare_stage2(void *pgt_pool_base);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bb18940c3d12..4532f3d55a1a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -196,6 +196,13 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
return false;
}

+bool addr_is_memory(phys_addr_t phys)
+{
+ struct kvm_mem_range range;
+
+ return find_mem_range(phys, &range);
+}
+
static bool range_is_memory(u64 start, u64 end)
{
struct kvm_mem_range r1, r2;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:18

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 09/21] KVM: arm64: Tolerate re-creating hyp mappings to set software bits

The current hypervisor stage-1 mapping code doesn't allow changing an
existing valid mapping. Relax this condition by allowing changes that
only target software bits, as that will soon be needed to annotate shared
pages.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 1ee1168ac32d..2689fcb7901d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -362,6 +362,21 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
return 0;
}

+static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
+{
+ /*
+ * Tolerate KVM recreating the exact same mapping, or changing software
+ * bits if the existing mapping was valid.
+ */
+ if (old == new)
+ return false;
+
+ if (!kvm_pte_valid(old))
+ return true;
+
+ return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW);
+}
+
static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep, struct hyp_map_data *data)
{
@@ -371,9 +386,8 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
if (!kvm_block_mapping_supported(addr, end, phys, level))
return false;

- /* Tolerate KVM recreating the exact same mapping */
new = kvm_init_valid_leaf_pte(phys, data->attr, level);
- if (old != new && !WARN_ON(kvm_pte_valid(old)))
+ if (hyp_pte_needs_update(old, new))
smp_store_release(ptep, new);

data->phys += granule;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:38

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

As the hypervisor maps the host's .bss and .rodata sections in its
stage-1, make sure to tag them as shared in hyp and host page-tables.

But since the hypervisor relies on the presence of these mappings, we
cannot let the host in complete control of the memory regions -- it
must not unshare or donate them to another entity for example. To
prevent this, let's transfer the ownership of those ranges to the
hypervisor itself, and share the pages back with the host.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/setup.c | 82 +++++++++++++++++++++++++++++----
1 file changed, 74 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0b574d106519..7f557b264f62 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -58,6 +58,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
{
void *start, *end, *virt = hyp_phys_to_virt(phys);
unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
+ enum kvm_pgtable_prot prot;
int ret, i;

/* Recreate the hyp page-table using the early page allocator */
@@ -83,10 +84,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
if (ret)
return ret;

- ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
- if (ret)
- return ret;
-
ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
if (ret)
return ret;
@@ -95,10 +92,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
if (ret)
return ret;

- ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
- if (ret)
- return ret;
-
ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
if (ret)
return ret;
@@ -117,6 +110,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
return ret;
}

+ /*
+ * Map the host's .bss and .rodata sections RO in the hypervisor, but
+ * transfer the ownerhsip from the host to the hypervisor itself to
+ * make sure it can't be donated or shared with another entity.
+ *
+ * The ownership transtion requires matching changes in the host
+ * stage-2. This will done later (see finalize_host_mappings()) once the
+ * hyp_vmemmap is addressable.
+ */
+ prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
+ ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
+ if (ret)
+ return ret;
+
+ ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
+ if (ret)
+ return ret;
+
return 0;
}

@@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
hyp_put_page(&hpool, addr);
}

+static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
+ kvm_pte_t *ptep,
+ enum kvm_pgtable_walk_flags flag,
+ void * const arg)
+{
+ enum kvm_pgtable_prot prot;
+ enum pkvm_page_state state;
+ kvm_pte_t pte = *ptep;
+ phys_addr_t phys;
+
+ if (!kvm_pte_valid(pte))
+ return 0;
+
+ if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
+ return -EINVAL;
+
+ phys = kvm_pte_to_phys(pte);
+ if (!addr_is_memory(phys))
+ return 0;
+
+ /*
+ * Adjust the host stage-2 mappings to match the ownership attributes
+ * configured in the hypervisor stage-1.
+ */
+ state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
+ switch (state) {
+ case PKVM_PAGE_OWNED:
+ return host_stage2_set_owner_locked(phys, phys + PAGE_SIZE, pkvm_hyp_id);
+ case PKVM_PAGE_SHARED_OWNED:
+ prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_BORROWED);
+ break;
+ case PKVM_PAGE_SHARED_BORROWED:
+ prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return host_stage2_idmap_locked(phys, phys + PAGE_SIZE, prot);
+}
+
+static int finalize_host_mappings(void)
+{
+ struct kvm_pgtable_walker walker = {
+ .cb = finalize_host_mappings_walker,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
+}
+
void __noreturn __pkvm_init_finalise(void)
{
struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -167,6 +229,10 @@ void __noreturn __pkvm_init_finalise(void)
if (ret)
goto out;

+ ret = finalize_host_mappings();
+ if (ret)
+ goto out;
+
pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
.zalloc_page = hyp_zalloc_hyp_page,
.phys_to_virt = hyp_phys_to_virt,
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:50

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 13/21] KVM: arm64: Expose host stage-2 manipulation helpers

We will need to manipulate the host stage-2 page-table from outside
mem_protect.c soon. Introduce two functions allowing this, and make
them usable to users of mem_protect.h.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 17 ++++++++++++++++-
2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ae355bfd8c01..47c2a0c51612 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -49,6 +49,8 @@ extern struct host_kvm host_kvm;
int __pkvm_prot_finalize(void);
int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);

+int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
+int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
int kvm_host_prepare_stage2(void *pgt_pool_base);
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 70c57d2c3024..a7f6134789e0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -272,6 +272,21 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return 0;
}

+int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot)
+{
+ hyp_assert_lock_held(&host_kvm.lock);
+
+ return host_stage2_try(__host_stage2_idmap, start, end, prot);
+}
+
+int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id)
+{
+ hyp_assert_lock_held(&host_kvm.lock);
+
+ return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
+ start, end - start, &host_s2_pool, owner_id);
+}
+
static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
{
/*
@@ -309,7 +324,7 @@ static int host_stage2_idmap(u64 addr)
if (ret)
goto unlock;

- ret = host_stage2_try(__host_stage2_idmap, range.start, range.end, prot);
+ ret = host_stage2_idmap_locked(range.start, range.end, prot);
unlock:
hyp_spin_unlock(&host_kvm.lock);

--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:50

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

Much of the stage-2 manipulation logic relies on being able to destroy
block mappings if e.g. installing a smaller mapping in the range. The
rationale for this behaviour is that stage-2 mappings can always be
re-created lazily. However, this gets more complicated when the stage-2
page-table is used to store metadata about the underlying pages. In such
cases, destroying a block mapping may lead to losing part of the state,
and confuse the user of those metadata (such as the hypervisor in nVHE
protected mode).

To avoid this, introduce a callback function in the pgtable struct which
is called during all map operations to determine whether the mappings
can use blocks, or should be forced to page granularity. This is used by
the hypervisor when creating the host stage-2 to force page-level
mappings when using non-default protection attributes.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 65 ++++++++++++++++-----------
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++--
arch/arm64/kvm/hyp/pgtable.c | 29 +++++++++---
3 files changed, 91 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 83c5c97d9eac..ba7dcade2798 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -115,25 +115,6 @@ enum kvm_pgtable_stage2_flags {
KVM_PGTABLE_S2_IDMAP = BIT(1),
};

-/**
- * struct kvm_pgtable - KVM page-table.
- * @ia_bits: Maximum input address size, in bits.
- * @start_level: Level at which the page-table walk starts.
- * @pgd: Pointer to the first top-level entry of the page-table.
- * @mm_ops: Memory management callbacks.
- * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
- */
-struct kvm_pgtable {
- u32 ia_bits;
- u32 start_level;
- kvm_pte_t *pgd;
- struct kvm_pgtable_mm_ops *mm_ops;
-
- /* Stage-2 only */
- struct kvm_s2_mmu *mmu;
- enum kvm_pgtable_stage2_flags flags;
-};
-
/**
* enum kvm_pgtable_prot - Page-table permissions and attributes.
* @KVM_PGTABLE_PROT_X: Execute permission.
@@ -149,11 +130,41 @@ enum kvm_pgtable_prot {
KVM_PGTABLE_PROT_DEVICE = BIT(3),
};

-#define PAGE_HYP (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
+#define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
+#define KVM_PGTABLE_PROT_RWX (KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X)
+
+#define PAGE_HYP KVM_PGTABLE_PROT_RW
#define PAGE_HYP_EXEC (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X)
#define PAGE_HYP_RO (KVM_PGTABLE_PROT_R)
#define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)

+typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
+ enum kvm_pgtable_prot prot);
+
+/**
+ * struct kvm_pgtable - KVM page-table.
+ * @ia_bits: Maximum input address size, in bits.
+ * @start_level: Level at which the page-table walk starts.
+ * @pgd: Pointer to the first top-level entry of the page-table.
+ * @mm_ops: Memory management callbacks.
+ * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
+ * @flags: Stage-2 page-table flags.
+ * @force_pte_cb: Callback function used during map operations to decide
+ * whether block mappings can be used to map the given IPA
+ * range.
+ */
+struct kvm_pgtable {
+ u32 ia_bits;
+ u32 start_level;
+ kvm_pte_t *pgd;
+ struct kvm_pgtable_mm_ops *mm_ops;
+
+ /* Stage-2 only */
+ struct kvm_s2_mmu *mmu;
+ enum kvm_pgtable_stage2_flags flags;
+ kvm_pgtable_force_pte_cb_t force_pte_cb;
+};
+
/**
* enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
* @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
@@ -246,21 +257,25 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);

/**
- * kvm_pgtable_stage2_init_flags() - Initialise a guest stage-2 page-table.
+ * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
* @pgt: Uninitialised page-table structure to initialise.
* @arch: Arch-specific KVM structure representing the guest virtual
* machine.
* @mm_ops: Memory management callbacks.
* @flags: Stage-2 configuration flags.
+ * @force_pte_cb: Callback function used during map operations to decide
+ * whether block mappings can be used to map the given IPA
+ * range.
*
* Return: 0 on success, negative error code on failure.
*/
-int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch,
- struct kvm_pgtable_mm_ops *mm_ops,
- enum kvm_pgtable_stage2_flags flags);
+int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
+ struct kvm_pgtable_mm_ops *mm_ops,
+ enum kvm_pgtable_stage2_flags flags,
+ kvm_pgtable_force_pte_cb_t force_pte_cb);

#define kvm_pgtable_stage2_init(pgt, arch, mm_ops) \
- kvm_pgtable_stage2_init_flags(pgt, arch, mm_ops, 0)
+ __kvm_pgtable_stage2_init(pgt, arch, mm_ops, 0, NULL)

/**
* kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2148d3968aa5..70c57d2c3024 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -89,6 +89,7 @@ static void prepare_host_vtcr(void)
id_aa64mmfr1_el1_sys_val, phys_shift);
}

+static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
int kvm_host_prepare_stage2(void *pgt_pool_base)
{
struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
@@ -101,8 +102,9 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
if (ret)
return ret;

- ret = kvm_pgtable_stage2_init_flags(&host_kvm.pgt, &host_kvm.arch,
- &host_kvm.mm_ops, KVM_HOST_S2_FLAGS);
+ ret = __kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
+ &host_kvm.mm_ops, KVM_HOST_S2_FLAGS,
+ host_stage2_force_pte_cb);
if (ret)
return ret;

@@ -270,9 +272,31 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
return 0;
}

+static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
+{
+ /*
+ * Block mappings must be used with care in the host stage-2 as a
+ * kvm_pgtable_stage2_map() operation targeting a page in the range of
+ * an existing block will delete the block under the assumption that
+ * mappings in the rest of the block range can always be rebuilt lazily.
+ * That assumption is correct for the host stage-2 with RWX mappings
+ * targeting memory or RW mappings targeting MMIO ranges (see
+ * host_stage2_idmap() below which implements some of the host memory
+ * abort logic). However, this is not safe for any other mappings where
+ * the host stage-2 page-table is in fact the only place where this
+ * state is stored. In all those cases, it is safer to use page-level
+ * mappings, hence avoiding to lose the state because of side-effects in
+ * kvm_pgtable_stage2_map().
+ */
+ if (range_is_memory(addr, end))
+ return prot != KVM_PGTABLE_PROT_RWX;
+ else
+ return prot != KVM_PGTABLE_PROT_RW;
+}
+
static int host_stage2_idmap(u64 addr)
{
- enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
+ enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RW;
struct kvm_mem_range range;
bool is_memory = find_mem_range(addr, &range);
int ret;
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2689fcb7901d..bcc02e6e0f62 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -452,6 +452,8 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
pgt->start_level = KVM_PGTABLE_MAX_LEVELS - levels;
pgt->mm_ops = mm_ops;
pgt->mmu = NULL;
+ pgt->force_pte_cb = NULL;
+
return 0;
}

@@ -489,6 +491,9 @@ struct stage2_map_data {
void *memcache;

struct kvm_pgtable_mm_ops *mm_ops;
+
+ /* Force mappings to page granularity */
+ bool force_pte;
};

u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
@@ -602,6 +607,15 @@ static bool stage2_pte_executable(kvm_pte_t pte)
return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
}

+static bool stage2_block_mapping_allowed(u64 addr, u64 end, u32 level,
+ struct stage2_map_data *data)
+{
+ if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
+ return false;
+
+ return kvm_block_mapping_supported(addr, end, data->phys, level);
+}
+
static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
@@ -611,7 +625,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
struct kvm_pgtable *pgt = data->mmu->pgt;
struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;

- if (!kvm_block_mapping_supported(addr, end, phys, level))
+ if (!stage2_block_mapping_allowed(addr, end, level, data))
return -E2BIG;

if (kvm_phys_is_valid(phys))
@@ -655,7 +669,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
if (data->anchor)
return 0;

- if (!kvm_block_mapping_supported(addr, end, data->phys, level))
+ if (!stage2_block_mapping_allowed(addr, end, level, data))
return 0;

data->childp = kvm_pte_follow(*ptep, data->mm_ops);
@@ -785,6 +799,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
.mmu = pgt->mmu,
.memcache = mc,
.mm_ops = pgt->mm_ops,
+ .force_pte = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -816,6 +831,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
.memcache = mc,
.mm_ops = pgt->mm_ops,
.owner_id = owner_id,
+ .force_pte = true,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
@@ -1057,9 +1073,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
return kvm_pgtable_walk(pgt, addr, size, &walker);
}

-int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch,
- struct kvm_pgtable_mm_ops *mm_ops,
- enum kvm_pgtable_stage2_flags flags)
+
+int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
+ struct kvm_pgtable_mm_ops *mm_ops,
+ enum kvm_pgtable_stage2_flags flags,
+ kvm_pgtable_force_pte_cb_t force_pte_cb)
{
size_t pgd_sz;
u64 vtcr = arch->vtcr;
@@ -1077,6 +1095,7 @@ int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch
pgt->mm_ops = mm_ops;
pgt->mmu = &arch->mmu;
pgt->flags = flags;
+ pgt->force_pte_cb = force_pte_cb;

/* Ensure zeroed PGD pages are visible to the hardware walker */
dsb(ishst);
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:31:58

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
a new pkvm_create_mappings_locked() function. This will be used in later
patches to allow walking and changing the hypervisor stage-1 without
releasing the lock.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mm.h | 1 +
arch/arm64/kvm/hyp/nvhe/mm.c | 18 ++++++++++++++++--
2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 8ec3a5a7744b..c76d7136ed9b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -23,6 +23,7 @@ int hyp_map_vectors(void);
int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
+int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
int __pkvm_create_mappings(unsigned long start, unsigned long size,
unsigned long phys, enum kvm_pgtable_prot prot);
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index a8efdf0f9003..6fbe8e8030f6 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -67,13 +67,15 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
return addr;
}

-int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
{
unsigned long start = (unsigned long)from;
unsigned long end = (unsigned long)to;
unsigned long virt_addr;
phys_addr_t phys;

+ hyp_assert_lock_held(&pkvm_pgd_lock);
+
start = start & PAGE_MASK;
end = PAGE_ALIGN(end);

@@ -81,7 +83,8 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
int err;

phys = hyp_virt_to_phys((void *)virt_addr);
- err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
+ err = kvm_pgtable_hyp_map(&pkvm_pgtable, virt_addr, PAGE_SIZE,
+ phys, prot);
if (err)
return err;
}
@@ -89,6 +92,17 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
return 0;
}

+int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
+{
+ int ret;
+
+ hyp_spin_lock(&pkvm_pgd_lock);
+ ret = pkvm_create_mappings_locked(from, to, prot);
+ hyp_spin_unlock(&pkvm_pgd_lock);
+
+ return ret;
+}
+
int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
{
unsigned long start, end;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:32:01

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 18/21] KVM: arm64: Remove __pkvm_mark_hyp

Now that we mark memory owned by the hypervisor in the host stage-2
during __pkvm_init(), we no longer need to rely on the host to
explicitly mark the hyp sections later on.

Remove the __pkvm_mark_hyp() hypercall altogether.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 3 +-
arch/arm64/kvm/arm.c | 46 -------------------
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 -
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 ----
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 19 --------
5 files changed, 1 insertion(+), 77 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 9f0bf2109be7..432a9ea1f02e 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -63,8 +63,7 @@
#define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping 17
#define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector 18
#define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize 19
-#define __KVM_HOST_SMCCC_FUNC___pkvm_mark_hyp 20
-#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 21
+#define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 20

#ifndef __ASSEMBLY__

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..2f378482471b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1954,57 +1954,11 @@ static void _kvm_host_prot_finalize(void *discard)
WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize));
}

-static inline int pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
-{
- return kvm_call_hyp_nvhe(__pkvm_mark_hyp, start, end);
-}
-
-#define pkvm_mark_hyp_section(__section) \
- pkvm_mark_hyp(__pa_symbol(__section##_start), \
- __pa_symbol(__section##_end))
-
static int finalize_hyp_mode(void)
{
- int cpu, ret;
-
if (!is_protected_kvm_enabled())
return 0;

- ret = pkvm_mark_hyp_section(__hyp_idmap_text);
- if (ret)
- return ret;
-
- ret = pkvm_mark_hyp_section(__hyp_text);
- if (ret)
- return ret;
-
- ret = pkvm_mark_hyp_section(__hyp_rodata);
- if (ret)
- return ret;
-
- ret = pkvm_mark_hyp_section(__hyp_bss);
- if (ret)
- return ret;
-
- ret = pkvm_mark_hyp(hyp_mem_base, hyp_mem_base + hyp_mem_size);
- if (ret)
- return ret;
-
- for_each_possible_cpu(cpu) {
- phys_addr_t start = virt_to_phys((void *)kvm_arm_hyp_percpu_base[cpu]);
- phys_addr_t end = start + (PAGE_SIZE << nvhe_percpu_order());
-
- ret = pkvm_mark_hyp(start, end);
- if (ret)
- return ret;
-
- start = virt_to_phys((void *)per_cpu(kvm_arm_hyp_stack_page, cpu));
- end = start + PAGE_SIZE;
- ret = pkvm_mark_hyp(start, end);
- if (ret)
- return ret;
- }
-
/*
* Flip the static key upfront as that may no longer be possible
* once the host stage 2 is installed.
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 5968fbbb3514..7ce36fbf5158 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -49,7 +49,6 @@ extern struct host_kvm host_kvm;
extern const u8 pkvm_hyp_id;

int __pkvm_prot_finalize(void);
-int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);

bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 1632f001f4ed..7900d5b66ba3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -163,14 +163,6 @@ static void handle___pkvm_prot_finalize(struct kvm_cpu_context *host_ctxt)
{
cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize();
}
-
-static void handle___pkvm_mark_hyp(struct kvm_cpu_context *host_ctxt)
-{
- DECLARE_REG(phys_addr_t, start, host_ctxt, 1);
- DECLARE_REG(phys_addr_t, end, host_ctxt, 2);
-
- cpu_reg(host_ctxt, 1) = __pkvm_mark_hyp(start, end);
-}
typedef void (*hcall_t)(struct kvm_cpu_context *);

#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -196,7 +188,6 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__pkvm_create_mappings),
HANDLE_FUNC(__pkvm_create_private_mapping),
HANDLE_FUNC(__pkvm_prot_finalize),
- HANDLE_FUNC(__pkvm_mark_hyp),
};

static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 4532f3d55a1a..0ccea58df7e0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -338,25 +338,6 @@ static int host_stage2_idmap(u64 addr)
return ret;
}

-int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
-{
- int ret;
-
- /*
- * host_stage2_unmap_dev_all() currently relies on MMIO mappings being
- * non-persistent, so don't allow changing page ownership in MMIO range.
- */
- if (!range_is_memory(start, end))
- return -EINVAL;
-
- hyp_spin_lock(&host_kvm.lock);
- ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
- start, end - start, &host_s2_pool, pkvm_hyp_id);
- hyp_spin_unlock(&host_kvm.lock);
-
- return ret != -EAGAIN ? ret : 0;
-}
-
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:32:06

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode

The host kernel is currently able to change EL2 stage-1 mappings without
restrictions thanks to the __pkvm_create_mappings() hypercall. But in a
world where the host is no longer part of the TCB, this clearly poses a
problem.

To fix this, introduce a new hypercall to allow the host to share a
physical memory page with the hypervisor, and remove the
__pkvm_create_mappings() variant. The new hypercall implements
ownership and permission checks before allowing the sharing operation,
and it annotates the shared page in the hypervisor stage-1 and host
stage-2 page-tables.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 2 +-
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +--
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 89 +++++++++++++++++++
arch/arm64/kvm/mmu.c | 28 +++++-
5 files changed, 119 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 432a9ea1f02e..aed2aa61766a 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -59,7 +59,7 @@
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13
#define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14
#define __KVM_HOST_SMCCC_FUNC___pkvm_init 15
-#define __KVM_HOST_SMCCC_FUNC___pkvm_create_mappings 16
+#define __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp 16
#define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping 17
#define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector 18
#define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize 19
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 7ce36fbf5158..3d78d7782f1b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -49,6 +49,7 @@ extern struct host_kvm host_kvm;
extern const u8 pkvm_hyp_id;

int __pkvm_prot_finalize(void);
+int __pkvm_host_share_hyp(u64 pfn);

bool addr_is_memory(phys_addr_t phys);
int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 7900d5b66ba3..2da6aa8da868 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -140,14 +140,11 @@ static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt)
cpu_reg(host_ctxt, 1) = pkvm_cpu_set_vector(slot);
}

-static void handle___pkvm_create_mappings(struct kvm_cpu_context *host_ctxt)
+static void handle___pkvm_host_share_hyp(struct kvm_cpu_context *host_ctxt)
{
- DECLARE_REG(unsigned long, start, host_ctxt, 1);
- DECLARE_REG(unsigned long, size, host_ctxt, 2);
- DECLARE_REG(unsigned long, phys, host_ctxt, 3);
- DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
+ DECLARE_REG(u64, pfn, host_ctxt, 1);

- cpu_reg(host_ctxt, 1) = __pkvm_create_mappings(start, size, phys, prot);
+ cpu_reg(host_ctxt, 1) = __pkvm_host_share_hyp(pfn);
}

static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
@@ -185,7 +182,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__vgic_v3_restore_aprs),
HANDLE_FUNC(__pkvm_init),
HANDLE_FUNC(__pkvm_cpu_set_vector),
- HANDLE_FUNC(__pkvm_create_mappings),
+ HANDLE_FUNC(__pkvm_host_share_hyp),
HANDLE_FUNC(__pkvm_create_private_mapping),
HANDLE_FUNC(__pkvm_prot_finalize),
};
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 0ccea58df7e0..1b67f562b6fc 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -338,6 +338,95 @@ static int host_stage2_idmap(u64 addr)
return ret;
}

+static inline bool check_prot(enum kvm_pgtable_prot prot,
+ enum kvm_pgtable_prot required,
+ enum kvm_pgtable_prot denied)
+{
+ return (prot & (required | denied)) == required;
+}
+
+int __pkvm_host_share_hyp(u64 pfn)
+{
+ phys_addr_t addr = hyp_pfn_to_phys(pfn);
+ enum kvm_pgtable_prot prot, cur;
+ void *virt = __hyp_va(addr);
+ enum pkvm_page_state state;
+ kvm_pte_t pte;
+ u32 level;
+ int ret;
+
+ if (!range_is_memory(addr, addr + PAGE_SIZE))
+ return -EINVAL;
+
+ hyp_spin_lock(&host_kvm.lock);
+ hyp_spin_lock(&pkvm_pgd_lock);
+
+ ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
+ if (ret)
+ goto unlock;
+ if (!pte)
+ goto map_shared;
+
+ /*
+ * Check attributes in the host stage-2 PTE. We need the page to be:
+ * - mapped RWX as we're sharing memory;
+ * - not borrowed, as that implies absence of ownership.
+ * Otherwise, we can't let it got through
+ */
+ cur = kvm_pgtable_stage2_pte_prot(pte);
+ prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
+ if (!check_prot(cur, KVM_PGTABLE_PROT_RWX, prot)) {
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ state = pkvm_getstate(cur);
+ if (state == PKVM_PAGE_OWNED)
+ goto map_shared;
+
+ /*
+ * Tolerate double-sharing the same page, but this requires
+ * cross-checking the hypervisor stage-1.
+ */
+ if (state != PKVM_PAGE_SHARED_OWNED) {
+ ret = -EPERM;
+ goto unlock;
+ }
+
+ ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, &level);
+ if (ret)
+ goto unlock;
+
+ /*
+ * If the page has been shared with the hypervisor, it must be
+ * SHARED_BORROWED already.
+ */
+ cur = kvm_pgtable_hyp_pte_prot(pte);
+ prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
+ if (!check_prot(cur, prot, ~prot))
+ ret = EPERM;
+ goto unlock;
+
+map_shared:
+ /*
+ * If the page is not yet shared, adjust mappings in both page-tables
+ * while both locks are held.
+ */
+ prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
+ ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
+ BUG_ON(ret);
+
+ prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
+ ret = host_stage2_idmap_locked(addr, addr + PAGE_SIZE, prot);
+ BUG_ON(ret);
+
+unlock:
+ hyp_spin_unlock(&pkvm_pgd_lock);
+ hyp_spin_unlock(&host_kvm.lock);
+
+ return ret;
+}
+
void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
{
struct kvm_vcpu_fault_info fault;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 0625bf2353c2..cbab146cda6a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -259,10 +259,8 @@ static int __create_hyp_mappings(unsigned long start, unsigned long size,
{
int err;

- if (!kvm_host_owns_hyp_mappings()) {
- return kvm_call_hyp_nvhe(__pkvm_create_mappings,
- start, size, phys, prot);
- }
+ if (WARN_ON(!kvm_host_owns_hyp_mappings()))
+ return -EINVAL;

mutex_lock(&kvm_hyp_pgd_mutex);
err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
@@ -282,6 +280,21 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
}
}

+static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
+{
+ phys_addr_t addr;
+ int ret;
+
+ for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
+ ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
+ __phys_to_pfn(addr));
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
/**
* create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
* @from: The virtual kernel start address of the range
@@ -302,6 +315,13 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
if (is_kernel_in_hyp_mode())
return 0;

+ if (!kvm_host_owns_hyp_mappings()) {
+ if (WARN_ON(prot != PAGE_HYP))
+ return -EPERM;
+ return pkvm_share_hyp(kvm_kaddr_to_phys(from),
+ kvm_kaddr_to_phys(to));
+ }
+
start = start & PAGE_MASK;
end = PAGE_ALIGN(end);

--
2.32.0.432.gabb21c7263-goog


2021-07-29 13:35:10

by Quentin Perret

[permalink] [raw]
Subject: [PATCH v3 21/21] KVM: arm64: Make __pkvm_create_mappings static

The __pkvm_create_mappings() function is no longer used outside of
nvhe/mm.c, make it static.

Signed-off-by: Quentin Perret <[email protected]>
---
arch/arm64/kvm/hyp/include/nvhe/mm.h | 2 --
arch/arm64/kvm/hyp/nvhe/mm.c | 4 ++--
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index c76d7136ed9b..c9a8f535212e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -24,8 +24,6 @@ int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
-int __pkvm_create_mappings(unsigned long start, unsigned long size,
- unsigned long phys, enum kvm_pgtable_prot prot);
unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
enum kvm_pgtable_prot prot);

diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 6fbe8e8030f6..2fabeceb889a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -23,8 +23,8 @@ u64 __io_map_base;
struct memblock_region hyp_memory[HYP_MEMBLOCK_REGIONS];
unsigned int hyp_memblock_nr;

-int __pkvm_create_mappings(unsigned long start, unsigned long size,
- unsigned long phys, enum kvm_pgtable_prot prot)
+static int __pkvm_create_mappings(unsigned long start, unsigned long size,
+ unsigned long phys, enum kvm_pgtable_prot prot)
{
int err;

--
2.32.0.432.gabb21c7263-goog


2021-08-02 09:38:45

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 06/21] KVM: arm64: Optimize host memory aborts

Hi Quentin,

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> The kvm_pgtable_stage2_find_range() function is used in the host memory
> abort path to try and look for the largest block mapping that can be
> used to map the faulting address. In order to do so, the function
> currently walks the stage-2 page-table and looks for existing
> incompatible mappings within the range of the largest possible block.
> If incompatible mappings are found, it tries the same procedure again,
> but using a smaller block range, and repeats until a matching range is
> found (potentially up to page granularity). While this approach has
> benefits (mostly in the fact that it proactively coalesces host stage-2
> mappings), it can be slow if the ranges are fragmented, and it isn't
> optimized to deal with CPUs faulting on the same IPA as all of them will
> do all the work every time.
>
> To avoid these issues, remove kvm_pgtable_stage2_find_range(), and walk
> the page-table only once in the host_mem_abort() path to find the
> closest leaf to the input address. With this, use the corresponding
> range if it is invalid and not owned by another entity. If a valid leaf
> is found, return -EAGAIN similar to what is done in the
> kvm_pgtable_stage2_map() path to optimize concurrent faults.
>
> Signed-off-by: Quentin Perret <[email protected]>

Reviewing the code it seems to work as described, with the lock
assertion ensuring that the caller knows which lock to hold.

Reviewed-by: Fuad Tabba <[email protected]>

Thanks,
/fuad


> ---
> arch/arm64/include/asm/kvm_pgtable.h | 30 -----------
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 45 +++++++++++++++-
> arch/arm64/kvm/hyp/pgtable.c | 74 ---------------------------
> 3 files changed, 44 insertions(+), 105 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 6938eac72c1f..83c5c97d9eac 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -154,16 +154,6 @@ enum kvm_pgtable_prot {
> #define PAGE_HYP_RO (KVM_PGTABLE_PROT_R)
> #define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>
> -/**
> - * struct kvm_mem_range - Range of Intermediate Physical Addresses
> - * @start: Start of the range.
> - * @end: End of the range.
> - */
> -struct kvm_mem_range {
> - u64 start;
> - u64 end;
> -};
> -
> /**
> * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
> * @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
> @@ -490,24 +480,4 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> */
> int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> kvm_pte_t *ptep, u32 *level);
> -
> -/**
> - * kvm_pgtable_stage2_find_range() - Find a range of Intermediate Physical
> - * Addresses with compatible permission
> - * attributes.
> - * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
> - * @addr: Address that must be covered by the range.
> - * @prot: Protection attributes that the range must be compatible with.
> - * @range: Range structure used to limit the search space at call time and
> - * that will hold the result.
> - *
> - * The offset of @addr within a page is ignored. An IPA is compatible with @prot
> - * iff its corresponding stage-2 page-table entry has default ownership and, if
> - * valid, is mapped with protection attributes identical to @prot.
> - *
> - * Return: 0 on success, negative error code on failure.
> - */
> -int kvm_pgtable_stage2_find_range(struct kvm_pgtable *pgt, u64 addr,
> - enum kvm_pgtable_prot prot,
> - struct kvm_mem_range *range);
> #endif /* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 74280a753efb..2148d3968aa5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -159,6 +159,11 @@ static int host_stage2_unmap_dev_all(void)
> return kvm_pgtable_stage2_unmap(pgt, addr, BIT(pgt->ia_bits) - addr);
> }
>
> +struct kvm_mem_range {
> + u64 start;
> + u64 end;
> +};
> +
> static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
> {
> int cur, left = 0, right = hyp_memblock_nr;
> @@ -227,6 +232,44 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
> __ret; \
> })
>
> +static inline bool range_included(struct kvm_mem_range *child,
> + struct kvm_mem_range *parent)
> +{
> + return parent->start <= child->start && child->end <= parent->end;
> +}
> +
> +static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> +{
> + struct kvm_mem_range cur;
> + kvm_pte_t pte;
> + u32 level;
> + int ret;
> +
> + hyp_assert_lock_held(&host_kvm.lock);
> + ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
> + if (ret)
> + return ret;
> +
> + if (kvm_pte_valid(pte))
> + return -EAGAIN;
> +
> + if (pte)
> + return -EPERM;
> +
> + do {
> + u64 granule = kvm_granule_size(level);
> + cur.start = ALIGN_DOWN(addr, granule);
> + cur.end = cur.start + granule;
> + level++;
> + } while ((level < KVM_PGTABLE_MAX_LEVELS) &&
> + !(kvm_level_supports_block_mapping(level) &&
> + range_included(&cur, range)));
> +
> + *range = cur;
> +
> + return 0;
> +}
> +
> static int host_stage2_idmap(u64 addr)
> {
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> @@ -238,7 +281,7 @@ static int host_stage2_idmap(u64 addr)
> prot |= KVM_PGTABLE_PROT_X;
>
> hyp_spin_lock(&host_kvm.lock);
> - ret = kvm_pgtable_stage2_find_range(&host_kvm.pgt, addr, prot, &range);
> + ret = host_stage2_adjust_range(addr, &range);
> if (ret)
> goto unlock;
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 49d768b92997..4dff2ad39ee4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -1102,77 +1102,3 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> pgt->mm_ops->free_pages_exact(pgt->pgd, pgd_sz);
> pgt->pgd = NULL;
> }
> -
> -#define KVM_PTE_LEAF_S2_COMPAT_MASK (KVM_PTE_LEAF_ATTR_S2_PERMS | \
> - KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR | \
> - KVM_PTE_LEAF_ATTR_S2_IGNORED)
> -
> -static int stage2_check_permission_walker(u64 addr, u64 end, u32 level,
> - kvm_pte_t *ptep,
> - enum kvm_pgtable_walk_flags flag,
> - void * const arg)
> -{
> - kvm_pte_t old_attr, pte = *ptep, *new_attr = arg;
> -
> - /*
> - * Compatible mappings are either invalid and owned by the page-table
> - * owner (whose id is 0), or valid with matching permission attributes.
> - */
> - if (kvm_pte_valid(pte)) {
> - old_attr = pte & KVM_PTE_LEAF_S2_COMPAT_MASK;
> - if (old_attr != *new_attr)
> - return -EEXIST;
> - } else if (pte) {
> - return -EEXIST;
> - }
> -
> - return 0;
> -}
> -
> -int kvm_pgtable_stage2_find_range(struct kvm_pgtable *pgt, u64 addr,
> - enum kvm_pgtable_prot prot,
> - struct kvm_mem_range *range)
> -{
> - kvm_pte_t attr;
> - struct kvm_pgtable_walker check_perm_walker = {
> - .cb = stage2_check_permission_walker,
> - .flags = KVM_PGTABLE_WALK_LEAF,
> - .arg = &attr,
> - };
> - u64 granule, start, end;
> - u32 level;
> - int ret;
> -
> - ret = stage2_set_prot_attr(pgt, prot, &attr);
> - if (ret)
> - return ret;
> - attr &= KVM_PTE_LEAF_S2_COMPAT_MASK;
> -
> - for (level = pgt->start_level; level < KVM_PGTABLE_MAX_LEVELS; level++) {
> - granule = kvm_granule_size(level);
> - start = ALIGN_DOWN(addr, granule);
> - end = start + granule;
> -
> - if (!kvm_level_supports_block_mapping(level))
> - continue;
> -
> - if (start < range->start || range->end < end)
> - continue;
> -
> - /*
> - * Check the presence of existing mappings with incompatible
> - * permissions within the current block range, and try one level
> - * deeper if one is found.
> - */
> - ret = kvm_pgtable_walk(pgt, start, granule, &check_perm_walker);
> - if (ret != -EEXIST)
> - break;
> - }
> -
> - if (!ret) {
> - range->start = start;
> - range->end = end;
> - }
> -
> - return ret;
> -}
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 09:39:12

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 03/21] KVM: arm64: Provide the host_stage2_try() helper macro

Hi Quentin.

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> We currently unmap all MMIO mappings from the host stage-2 to recycle
> the pages whenever we run out. In order to make this pattern easy to
> re-use from other places, factor the logic out into a dedicated macro.
> While at it, apply the macro for the kvm_pgtable_stage2_set_owner()
> calls. They're currently only called early on and are guaranteed to
> succeed, but making them robust to the -ENOMEM case doesn't hurt and
> will avoid painful debugging sessions later on.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 40 +++++++++++++++------------
> 1 file changed, 22 insertions(+), 18 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index d938ce95d3bd..74280a753efb 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -208,6 +208,25 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
> prot, &host_s2_pool);
> }
>
> +/*
> + * The pool has been provided with enough pages to cover all of memory with
> + * page granularity, but it is difficult to know how much of the MMIO range
> + * we will need to cover upfront, so we may need to 'recycle' the pages if we
> + * run out.
> + */

The comment you added in V2 about host_kvm.lock got dropped in favor
asserting that the lock is held.

Reviewed-by: Fuad Tabba <[email protected]>

Thanks,
/fuad




> +#define host_stage2_try(fn, ...) \
> + ({ \
> + int __ret; \
> + hyp_assert_lock_held(&host_kvm.lock); \
> + __ret = fn(__VA_ARGS__); \
> + if (__ret == -ENOMEM) { \
> + __ret = host_stage2_unmap_dev_all(); \
> + if (!__ret) \
> + __ret = fn(__VA_ARGS__); \
> + } \
> + __ret; \
> + })
> +
> static int host_stage2_idmap(u64 addr)
> {
> enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> @@ -223,22 +242,7 @@ static int host_stage2_idmap(u64 addr)
> if (ret)
> goto unlock;
>
> - ret = __host_stage2_idmap(range.start, range.end, prot);
> - if (ret != -ENOMEM)
> - goto unlock;
> -
> - /*
> - * The pool has been provided with enough pages to cover all of memory
> - * with page granularity, but it is difficult to know how much of the
> - * MMIO range we will need to cover upfront, so we may need to 'recycle'
> - * the pages if we run out.
> - */
> - ret = host_stage2_unmap_dev_all();
> - if (ret)
> - goto unlock;
> -
> - ret = __host_stage2_idmap(range.start, range.end, prot);
> -
> + ret = host_stage2_try(__host_stage2_idmap, range.start, range.end, prot);
> unlock:
> hyp_spin_unlock(&host_kvm.lock);
>
> @@ -257,8 +261,8 @@ int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
> return -EINVAL;
>
> hyp_spin_lock(&host_kvm.lock);
> - ret = kvm_pgtable_stage2_set_owner(&host_kvm.pgt, start, end - start,
> - &host_s2_pool, pkvm_hyp_id);
> + ret = host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
> + start, end - start, &host_s2_pool, pkvm_hyp_id);
> hyp_spin_unlock(&host_kvm.lock);
>
> return ret != -EAGAIN ? ret : 0;
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 09:39:50

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 07/21] KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED

Hi Quentin,

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> The ignored bits for both stage-1 and stage-2 page and block
> descriptors are in [55:58], so rename KVM_PTE_LEAF_ATTR_S2_IGNORED to
> make it applicable to both. And while at it, since these bits are more
> commonly known as 'software' bits, rename accordingly.

As in the Armv8-A Address Translation spec.

Reviewed-by: Fuad Tabba <[email protected]>

Thanks,
/fuad



> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 4dff2ad39ee4..59a394d82de3 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -36,6 +36,8 @@
>
> #define KVM_PTE_LEAF_ATTR_HI GENMASK(63, 51)
>
> +#define KVM_PTE_LEAF_ATTR_HI_SW GENMASK(58, 55)
> +
> #define KVM_PTE_LEAF_ATTR_HI_S1_XN BIT(54)
>
> #define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)
> @@ -44,8 +46,6 @@
> KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> KVM_PTE_LEAF_ATTR_HI_S2_XN)
>
> -#define KVM_PTE_LEAF_ATTR_S2_IGNORED GENMASK(58, 55)
> -
> #define KVM_INVALID_PTE_OWNER_MASK GENMASK(63, 56)
> #define KVM_MAX_OWNER_ID 1
>
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 09:41:18

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 08/21] KVM: arm64: Don't overwrite software bits with owner id

Hi Quentin,

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> We will soon start annotating page-tables with new flags to track shared
> pages and such, and we will do so in valid mappings using software bits
> in the PTEs, as provided by the architecture. However, it is possible
> that we will need to use those flags to annotate invalid mappings as
> well in the future, similar to what we do to track page ownership in the
> host stage-2.
>
> In order to facilitate the annotation of invalid mappings with such
> flags, it would be preferable to re-use the same bits as for valid
> mappings (bits [58-55]), but these are currently used for ownership
> encoding. Since we have plenty of bits left to use in invalid
> mappings, move the ownership bits further down the PTE to avoid the
> conflict.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 59a394d82de3..1ee1168ac32d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -46,7 +46,7 @@
> KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
> KVM_PTE_LEAF_ATTR_HI_S2_XN)
>
> -#define KVM_INVALID_PTE_OWNER_MASK GENMASK(63, 56)
> +#define KVM_INVALID_PTE_OWNER_MASK GENMASK(9, 2)
> #define KVM_MAX_OWNER_ID 1

This change makes sense so that the same bits can be used to track sharing.

Reviewed-by: Fuad Tabba <[email protected]>

Thanks,
/fuad

>
> struct kvm_pgtable_walk_data {

> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 09:52:01

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

Hi Quentin,


On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> Much of the stage-2 manipulation logic relies on being able to destroy
> block mappings if e.g. installing a smaller mapping in the range. The
> rationale for this behaviour is that stage-2 mappings can always be
> re-created lazily. However, this gets more complicated when the stage-2
> page-table is used to store metadata about the underlying pages. In such
> cases, destroying a block mapping may lead to losing part of the state,
> and confuse the user of those metadata (such as the hypervisor in nVHE
> protected mode).
>
> To avoid this, introduce a callback function in the pgtable struct which
> is called during all map operations to determine whether the mappings
> can use blocks, or should be forced to page granularity. This is used by
> the hypervisor when creating the host stage-2 to force page-level
> mappings when using non-default protection attributes.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 65 ++++++++++++++++-----------
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++--
> arch/arm64/kvm/hyp/pgtable.c | 29 +++++++++---
> 3 files changed, 91 insertions(+), 33 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 83c5c97d9eac..ba7dcade2798 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -115,25 +115,6 @@ enum kvm_pgtable_stage2_flags {
> KVM_PGTABLE_S2_IDMAP = BIT(1),
> };
>
> -/**
> - * struct kvm_pgtable - KVM page-table.
> - * @ia_bits: Maximum input address size, in bits.
> - * @start_level: Level at which the page-table walk starts.
> - * @pgd: Pointer to the first top-level entry of the page-table.
> - * @mm_ops: Memory management callbacks.
> - * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> - */
> -struct kvm_pgtable {
> - u32 ia_bits;
> - u32 start_level;
> - kvm_pte_t *pgd;
> - struct kvm_pgtable_mm_ops *mm_ops;
> -
> - /* Stage-2 only */
> - struct kvm_s2_mmu *mmu;
> - enum kvm_pgtable_stage2_flags flags;
> -};
> -
> /**
> * enum kvm_pgtable_prot - Page-table permissions and attributes.
> * @KVM_PGTABLE_PROT_X: Execute permission.
> @@ -149,11 +130,41 @@ enum kvm_pgtable_prot {
> KVM_PGTABLE_PROT_DEVICE = BIT(3),
> };
>
> -#define PAGE_HYP (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> +#define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> +#define KVM_PGTABLE_PROT_RWX (KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X)
> +
> +#define PAGE_HYP KVM_PGTABLE_PROT_RW
> #define PAGE_HYP_EXEC (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X)
> #define PAGE_HYP_RO (KVM_PGTABLE_PROT_R)
> #define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)

I wonder if it would be useful to add a couple of other aliases for
default memory and default mmio protections, e.g.,

#define KVM_PGTABLE_PROT_MEM KVM_PGTABLE_PROT_RWX
#define KVM_PGTABLE_PROT_MMIO KVM_PGTABLE_PROT_RW

I think that using these below, e.g., host_stage2_force_pte_cb(),
might make it clearer and answer comments you had in earlier patches
about why "RWX" for memory.

>
> +typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
> + enum kvm_pgtable_prot prot);
> +
> +/**
> + * struct kvm_pgtable - KVM page-table.
> + * @ia_bits: Maximum input address size, in bits.
> + * @start_level: Level at which the page-table walk starts.
> + * @pgd: Pointer to the first top-level entry of the page-table.
> + * @mm_ops: Memory management callbacks.
> + * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> + * @flags: Stage-2 page-table flags.
> + * @force_pte_cb: Callback function used during map operations to decide
> + * whether block mappings can be used to map the given IPA
> + * range.
> + */

nit: I think it might be clearer (and probably not longer) to rephrase
to describe in terms of the return value of the callback, e.g., "...
function that returns true if page level mappings must be used instead
of block mappings."

> +struct kvm_pgtable {
> + u32 ia_bits;
> + u32 start_level;
> + kvm_pte_t *pgd;
> + struct kvm_pgtable_mm_ops *mm_ops;
> +
> + /* Stage-2 only */
> + struct kvm_s2_mmu *mmu;
> + enum kvm_pgtable_stage2_flags flags;
> + kvm_pgtable_force_pte_cb_t force_pte_cb;
> +};
> +
> /**
> * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
> * @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
> @@ -246,21 +257,25 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
>
> /**
> - * kvm_pgtable_stage2_init_flags() - Initialise a guest stage-2 page-table.
> + * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
> * @pgt: Uninitialised page-table structure to initialise.
> * @arch: Arch-specific KVM structure representing the guest virtual
> * machine.
> * @mm_ops: Memory management callbacks.
> * @flags: Stage-2 configuration flags.
> + * @force_pte_cb: Callback function used during map operations to decide
> + * whether block mappings can be used to map the given IPA
> + * range.

nit: same nit as above with describing the callback in terms of its return value

> *
> * Return: 0 on success, negative error code on failure.
> */
> -int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> - struct kvm_pgtable_mm_ops *mm_ops,
> - enum kvm_pgtable_stage2_flags flags);
> +int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> + struct kvm_pgtable_mm_ops *mm_ops,
> + enum kvm_pgtable_stage2_flags flags,
> + kvm_pgtable_force_pte_cb_t force_pte_cb);
>
> #define kvm_pgtable_stage2_init(pgt, arch, mm_ops) \
> - kvm_pgtable_stage2_init_flags(pgt, arch, mm_ops, 0)
> + __kvm_pgtable_stage2_init(pgt, arch, mm_ops, 0, NULL)
>
> /**
> * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2148d3968aa5..70c57d2c3024 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -89,6 +89,7 @@ static void prepare_host_vtcr(void)
> id_aa64mmfr1_el1_sys_val, phys_shift);
> }
>
> +static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);

nit: newline

> int kvm_host_prepare_stage2(void *pgt_pool_base)
> {
> struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> @@ -101,8 +102,9 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
> if (ret)
> return ret;
>
> - ret = kvm_pgtable_stage2_init_flags(&host_kvm.pgt, &host_kvm.arch,
> - &host_kvm.mm_ops, KVM_HOST_S2_FLAGS);
> + ret = __kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> + &host_kvm.mm_ops, KVM_HOST_S2_FLAGS,
> + host_stage2_force_pte_cb);
> if (ret)
> return ret;
>
> @@ -270,9 +272,31 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> return 0;
> }
>
> +static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> +{
> + /*
> + * Block mappings must be used with care in the host stage-2 as a
> + * kvm_pgtable_stage2_map() operation targeting a page in the range of
> + * an existing block will delete the block under the assumption that
> + * mappings in the rest of the block range can always be rebuilt lazily.
> + * That assumption is correct for the host stage-2 with RWX mappings
> + * targeting memory or RW mappings targeting MMIO ranges (see
> + * host_stage2_idmap() below which implements some of the host memory
> + * abort logic). However, this is not safe for any other mappings where
> + * the host stage-2 page-table is in fact the only place where this
> + * state is stored. In all those cases, it is safer to use page-level
> + * mappings, hence avoiding to lose the state because of side-effects in
> + * kvm_pgtable_stage2_map().
> + */
> + if (range_is_memory(addr, end))
> + return prot != KVM_PGTABLE_PROT_RWX;
> + else
> + return prot != KVM_PGTABLE_PROT_RW;
> +}

Just checking, I don't think that it's possible for the range to be
big enough to somehow include both memory and mmio, neither now nor in
future use cases, is it?

> +
> static int host_stage2_idmap(u64 addr)
> {
> - enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RW;
> struct kvm_mem_range range;
> bool is_memory = find_mem_range(addr, &range);
> int ret;
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 2689fcb7901d..bcc02e6e0f62 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -452,6 +452,8 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> pgt->start_level = KVM_PGTABLE_MAX_LEVELS - levels;
> pgt->mm_ops = mm_ops;
> pgt->mmu = NULL;
> + pgt->force_pte_cb = NULL;
> +
> return 0;
> }
>
> @@ -489,6 +491,9 @@ struct stage2_map_data {
> void *memcache;
>
> struct kvm_pgtable_mm_ops *mm_ops;
> +
> + /* Force mappings to page granularity */
> + bool force_pte;
> };
>
> u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> @@ -602,6 +607,15 @@ static bool stage2_pte_executable(kvm_pte_t pte)
> return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> }
>
> +static bool stage2_block_mapping_allowed(u64 addr, u64 end, u32 level,
> + struct stage2_map_data *data)
> +{
> + if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> + return false;

I'm not sure I understand why checking the level is necessary. Can
there be block mapping at the last possible level?

Thanks,
/fuad

> +
> + return kvm_block_mapping_supported(addr, end, data->phys, level);
> +}
> +
> static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep,
> struct stage2_map_data *data)
> @@ -611,7 +625,7 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> struct kvm_pgtable *pgt = data->mmu->pgt;
> struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
>
> - if (!kvm_block_mapping_supported(addr, end, phys, level))
> + if (!stage2_block_mapping_allowed(addr, end, level, data))
> return -E2BIG;
>
> if (kvm_phys_is_valid(phys))
> @@ -655,7 +669,7 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> if (data->anchor)
> return 0;
>
> - if (!kvm_block_mapping_supported(addr, end, data->phys, level))
> + if (!stage2_block_mapping_allowed(addr, end, level, data))
> return 0;
>
> data->childp = kvm_pte_follow(*ptep, data->mm_ops);
> @@ -785,6 +799,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> .mmu = pgt->mmu,
> .memcache = mc,
> .mm_ops = pgt->mm_ops,
> + .force_pte = pgt->force_pte_cb && pgt->force_pte_cb(addr, addr + size, prot),
> };
> struct kvm_pgtable_walker walker = {
> .cb = stage2_map_walker,
> @@ -816,6 +831,7 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
> .memcache = mc,
> .mm_ops = pgt->mm_ops,
> .owner_id = owner_id,
> + .force_pte = true,
> };
> struct kvm_pgtable_walker walker = {
> .cb = stage2_map_walker,
> @@ -1057,9 +1073,11 @@ int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
> return kvm_pgtable_walk(pgt, addr, size, &walker);
> }
>
> -int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> - struct kvm_pgtable_mm_ops *mm_ops,
> - enum kvm_pgtable_stage2_flags flags)
> +
> +int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> + struct kvm_pgtable_mm_ops *mm_ops,
> + enum kvm_pgtable_stage2_flags flags,
> + kvm_pgtable_force_pte_cb_t force_pte_cb)
> {
> size_t pgd_sz;
> u64 vtcr = arch->vtcr;
> @@ -1077,6 +1095,7 @@ int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch
> pgt->mm_ops = mm_ops;
> pgt->mmu = &arch->mmu;
> pgt->flags = flags;
> + pgt->force_pte_cb = force_pte_cb;
>
> /* Ensure zeroed PGD pages are visible to the hardware walker */
> dsb(ishst);
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 09:52:48

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 09/21] KVM: arm64: Tolerate re-creating hyp mappings to set software bits

Hi Quentin,

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> The current hypervisor stage-1 mapping code doesn't allow changing an
> existing valid mapping. Relax this condition by allowing changes that
> only target software bits, as that will soon be needed to annotate shared
> pages.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 1ee1168ac32d..2689fcb7901d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -362,6 +362,21 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> return 0;
> }
>
> +static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> +{
> + /*
> + * Tolerate KVM recreating the exact same mapping, or changing software
> + * bits if the existing mapping was valid.
> + */
> + if (old == new)
> + return false;

The added comment clarifies the rationale here. Thanks.

Reviewed-by: Fuad Tabba <[email protected]>

/fuad



/fuad

> + if (!kvm_pte_valid(old))
> + return true;
> +
> + return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW);
> +}
> +
> static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> kvm_pte_t *ptep, struct hyp_map_data *data)
> {
> @@ -371,9 +386,8 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> if (!kvm_block_mapping_supported(addr, end, phys, level))
> return false;
>
> - /* Tolerate KVM recreating the exact same mapping */
> new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> - if (old != new && !WARN_ON(kvm_pte_valid(old)))
> + if (hyp_pte_needs_update(old, new))
> smp_store_release(ptep, new);
>
> data->phys += granule;
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 10:35:03

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 12/21] KVM: arm64: Add helpers to tag shared pages in SW bits

Hi Quentin,


On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> We will soon start annotating shared pages in page-tables in nVHE
> protected mode. Define all the states in which a page can be (owned,
> shared and owned, shared and borrowed), and provide helpers allowing to
> convert this into SW bits annotations using the matching prot
> attributes.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 26 +++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 9c227d87c36d..ae355bfd8c01 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -12,6 +12,32 @@
> #include <asm/virt.h>
> #include <nvhe/spinlock.h>
>
> +/*
> + * SW bits 0-1 are reserved to track the memory ownership state of each page:
> + * 00: The page is owned solely by the page-table owner.

nit: solely -> exclusively, because "exclusive" is the more common
term in context of shared resources

> + * 01: The page is owned by the page-table owner, but is shared
> + * with another entity.
> + * 10: The page is shared with, but not owned by the page-table owner.
> + * 11: Reserved for future use (lending).
> + */
> +enum pkvm_page_state {
> + PKVM_PAGE_OWNED = 0ULL,
> + PKVM_PAGE_SHARED_OWNED = KVM_PGTABLE_PROT_SW0,
> + PKVM_PAGE_SHARED_BORROWED = KVM_PGTABLE_PROT_SW1,
> +};
> +
> +#define PKVM_PAGE_STATE_PROT_MASK (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
> +static inline enum kvm_pgtable_prot pkvm_mkstate(enum kvm_pgtable_prot prot,
> + enum pkvm_page_state state)
> +{
> + return (prot & ~PKVM_PAGE_STATE_PROT_MASK) | state;
> +}
> +
> +static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
> +{
> + return prot & PKVM_PAGE_STATE_PROT_MASK;
> +}
> +

I think that this encoding is pretty neat and easy to follow.

Reviewed-by: Fuad Tabba <[email protected]>

Thanks,
/fuad

> struct host_kvm {
> struct kvm_arch arch;
> struct kvm_pgtable pgt;
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 11:15:41

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 13/21] KVM: arm64: Expose host stage-2 manipulation helpers

Hi Quentin,

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> We will need to manipulate the host stage-2 page-table from outside
> mem_protect.c soon. Introduce two functions allowing this, and make
> them usable to users of mem_protect.h.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 17 ++++++++++++++++-
> 2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index ae355bfd8c01..47c2a0c51612 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -49,6 +49,8 @@ extern struct host_kvm host_kvm;
> int __pkvm_prot_finalize(void);
> int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
>
> +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
> +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
> int kvm_host_prepare_stage2(void *pgt_pool_base);
> void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 70c57d2c3024..a7f6134789e0 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -272,6 +272,21 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> return 0;
> }
>
> +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot)
> +{
> + hyp_assert_lock_held(&host_kvm.lock);
> +
> + return host_stage2_try(__host_stage2_idmap, start, end, prot);
> +}
> +
> +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id)
> +{
> + hyp_assert_lock_held(&host_kvm.lock);
> +
> + return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
> + start, end - start, &host_s2_pool, owner_id);
> +}

This is a potential issue elsewhere as well, but all functions in
kvm_pgtable.h, including kvm_pgtable_stage2_set_owner, specify an
address range via address and size. The two you have introduced here
take a start and an end. I'm not sure if making these two consistent
with the ones in kvm_pgtable.h would be good, or would just complicate
things in other places.

Thanks,
/fuad

> static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> {
> /*
> @@ -309,7 +324,7 @@ static int host_stage2_idmap(u64 addr)
> if (ret)
> goto unlock;
>
> - ret = host_stage2_try(__host_stage2_idmap, range.start, range.end, prot);
> + ret = host_stage2_idmap_locked(range.start, range.end, prot);
> unlock:
> hyp_spin_unlock(&host_kvm.lock);
>
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 14:55:38

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 15/21] KVM: arm64: Introduce addr_is_memory()

Hi Quentin.

On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
>
> Introduce a helper usable in nVHE protected mode to check whether a
> physical address is in a RAM region or not.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +++++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index cc86598654b9..5968fbbb3514 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -51,6 +51,7 @@ extern const u8 pkvm_hyp_id;
> int __pkvm_prot_finalize(void);
> int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
>
> +bool addr_is_memory(phys_addr_t phys);

I'm just wondering about the naming of the function. I understand what
you're trying to achieve with it, but an address without a unit that
conveys size or type seems to be missing something. Would
memregion_addr_is_memory or something like that be a better
description, since it is what find_mem_range finds?

Thanks,
/fuad


> int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
> int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
> int kvm_host_prepare_stage2(void *pgt_pool_base);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index bb18940c3d12..4532f3d55a1a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -196,6 +196,13 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
> return false;
> }
>
> +bool addr_is_memory(phys_addr_t phys)
> +{
> + struct kvm_mem_range range;
> +
> + return find_mem_range(phys, &range);
> +}
> +
> static bool range_is_memory(u64 start, u64 end)
> {
> struct kvm_mem_range r1, r2;
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-02 14:55:44

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 16/21] KVM: arm64: Enable retrieving protections attributes of PTEs

Hi Quentin,

On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
>
> Introduce helper functions in the KVM stage-2 and stage-1 page-table
> manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
> PTE. This will be useful to implement custom walkers outside of
> pgtable.c.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 20 +++++++++++++++
> arch/arm64/kvm/hyp/pgtable.c | 37 ++++++++++++++++++++++++++++
> 2 files changed, 57 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index d5ca9b6ce241..7ff9f52239ba 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -505,4 +505,24 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> */
> int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> kvm_pte_t *ptep, u32 *level);
> +
> +/**
> + * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
> + * stage-2 Page-Table Entry.
> + * @pte: Page-table entry
> + *
> + * Return: protection attributes of the page-table entry in the enum
> + * kvm_pgtable_prot format.
> + */
> +enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> +
> +/**
> + * kvm_pgtable_hyp_pte_prot() - Retrieve the protection attributes of a stage-1
> + * Page-Table Entry.
> + * @pte: Page-table entry
> + *
> + * Return: protection attributes of the page-table entry in the enum
> + * kvm_pgtable_prot format.
> + */
> +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> #endif /* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 1915489bb127..a6eda8f23cb6 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -363,6 +363,26 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> return 0;
> }
>
> +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
> +{
> + enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
> + u32 ap;
> +
> + if (!kvm_pte_valid(pte))
> + return prot;
> +
> + if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
> + prot |= KVM_PGTABLE_PROT_X;
> +
> + ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
> + if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
> + prot |= KVM_PGTABLE_PROT_R;
> + else if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RW)
> + prot |= KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;

nit: why not use the freshly minted KVM_PGTABLE_PROT_RW?

Thanks,
/fuad


> +
> + return prot;
> +}
> +
> static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> {
> /*
> @@ -565,6 +585,23 @@ static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot p
> return 0;
> }
>
> +enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte)
> +{
> + enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
> +
> + if (!kvm_pte_valid(pte))
> + return prot;
> +
> + if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R)
> + prot |= KVM_PGTABLE_PROT_R;
> + if (pte & KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W)
> + prot |= KVM_PGTABLE_PROT_W;
> + if (!(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN))
> + prot |= KVM_PGTABLE_PROT_X;
> +
> + return prot;
> +}
> +
> static bool stage2_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> {
> if (!kvm_pte_valid(old) || !kvm_pte_valid(new))
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-03 05:05:48

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

Hi Quentin,

On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
>
> As the hypervisor maps the host's .bss and .rodata sections in its
> stage-1, make sure to tag them as shared in hyp and host page-tables.
>
> But since the hypervisor relies on the presence of these mappings, we
> cannot let the host in complete control of the memory regions -- it
> must not unshare or donate them to another entity for example. To
> prevent this, let's transfer the ownership of those ranges to the
> hypervisor itself, and share the pages back with the host.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/nvhe/setup.c | 82 +++++++++++++++++++++++++++++----
> 1 file changed, 74 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 0b574d106519..7f557b264f62 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -58,6 +58,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> {
> void *start, *end, *virt = hyp_phys_to_virt(phys);
> unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> + enum kvm_pgtable_prot prot;
> int ret, i;
>
> /* Recreate the hyp page-table using the early page allocator */
> @@ -83,10 +84,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> if (ret)
> return ret;
>
> - ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
> - if (ret)
> - return ret;
> -
> ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
> if (ret)
> return ret;
> @@ -95,10 +92,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> if (ret)
> return ret;
>
> - ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
> - if (ret)
> - return ret;
> -
> ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
> if (ret)
> return ret;
> @@ -117,6 +110,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> return ret;
> }
>
> + /*
> + * Map the host's .bss and .rodata sections RO in the hypervisor, but
> + * transfer the ownerhsip from the host to the hypervisor itself to
> + * make sure it can't be donated or shared with another entity.

nit: ownerhsip -> ownership

> + *
> + * The ownership transtion requires matching changes in the host

nit: transtion -> transition

> + * stage-2. This will done later (see finalize_host_mappings()) once the

nit: will done -> will be done

> + * hyp_vmemmap is addressable.
> + */
> + prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
> + ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
> + if (ret)
> + return ret;
> +
> + ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);

nit: for clarity, I wonder if it might be good to create an alias of
__hyp_bss_end as __bss_start or something. When it's been moved here,
it sticks out a bit more and makes the reader wonder about the
significance of __hyp_bss_end.

> + if (ret)
> + return ret;
> +
> return 0;
> }
>
> @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> hyp_put_page(&hpool, addr);
> }
>
> +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> + kvm_pte_t *ptep,
> + enum kvm_pgtable_walk_flags flag,
> + void * const arg)
> +{
> + enum kvm_pgtable_prot prot;
> + enum pkvm_page_state state;
> + kvm_pte_t pte = *ptep;
> + phys_addr_t phys;
> +
> + if (!kvm_pte_valid(pte))
> + return 0;
> +
> + if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> + return -EINVAL;

I know that it's not in scope here, but I'm wondering whether we
should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe
it would be good to have a helper somewhere for all these checks both
for clarity and to ensure that nothing has gone wrong with the pte.

> +
> + phys = kvm_pte_to_phys(pte);
> + if (!addr_is_memory(phys))
> + return 0;
> +
> + /*
> + * Adjust the host stage-2 mappings to match the ownership attributes
> + * configured in the hypervisor stage-1.
> + */
> + state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> + switch (state) {
> + case PKVM_PAGE_OWNED:
> + return host_stage2_set_owner_locked(phys, phys + PAGE_SIZE, pkvm_hyp_id);
> + case PKVM_PAGE_SHARED_OWNED:
> + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_BORROWED);
> + break;
> + case PKVM_PAGE_SHARED_BORROWED:
> + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return host_stage2_idmap_locked(phys, phys + PAGE_SIZE, prot);
> +}
> +
> +static int finalize_host_mappings(void)
> +{
> + struct kvm_pgtable_walker walker = {
> + .cb = finalize_host_mappings_walker,
> + .flags = KVM_PGTABLE_WALK_LEAF,
> + };
> +
> + return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
> +}
> +
> void __noreturn __pkvm_init_finalise(void)
> {
> struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> @@ -167,6 +229,10 @@ void __noreturn __pkvm_init_finalise(void)
> if (ret)
> goto out;
>
> + ret = finalize_host_mappings();
> + if (ret)
> + goto out;
> +
> pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
> .zalloc_page = hyp_zalloc_hyp_page,
> .phys_to_virt = hyp_phys_to_virt,
> --
> 2.32.0.432.gabb21c7263-goog
>

Thanks,
/fuad

2021-08-03 05:32:42

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

Hi Quentin,

On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
>
> Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
> a new pkvm_create_mappings_locked() function. This will be used in later
> patches to allow walking and changing the hypervisor stage-1 without
> releasing the lock.
>
> Signed-off-by: Quentin Perret <[email protected]>
> ---
> arch/arm64/kvm/hyp/include/nvhe/mm.h | 1 +
> arch/arm64/kvm/hyp/nvhe/mm.c | 18 ++++++++++++++++--
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 8ec3a5a7744b..c76d7136ed9b 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -23,6 +23,7 @@ int hyp_map_vectors(void);
> int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
> int __pkvm_create_mappings(unsigned long start, unsigned long size,
> unsigned long phys, enum kvm_pgtable_prot prot);
> unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index a8efdf0f9003..6fbe8e8030f6 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -67,13 +67,15 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> return addr;
> }
>
> -int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
> {
> unsigned long start = (unsigned long)from;
> unsigned long end = (unsigned long)to;
> unsigned long virt_addr;
> phys_addr_t phys;
>
> + hyp_assert_lock_held(&pkvm_pgd_lock);
> +
> start = start & PAGE_MASK;
> end = PAGE_ALIGN(end);
>
> @@ -81,7 +83,8 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> int err;
>
> phys = hyp_virt_to_phys((void *)virt_addr);
> - err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
> + err = kvm_pgtable_hyp_map(&pkvm_pgtable, virt_addr, PAGE_SIZE,
> + phys, prot);
> if (err)
> return err;
> }
> @@ -89,6 +92,17 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> return 0;
> }
>
> +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> +{
> + int ret;
> +
> + hyp_spin_lock(&pkvm_pgd_lock);
> + ret = pkvm_create_mappings_locked(from, to, prot);
> + hyp_spin_unlock(&pkvm_pgd_lock);
> +
> + return ret;
> +}
> +

I'm wondering whether this patch should also refactor
__pkvm_create_mappings. It doesn't quite do the exact same thing and
has different parameters.

Thanks,
/fuad

> int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
> {
> unsigned long start, end;
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-03 08:26:43

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode

Hi Quentin,

> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 0ccea58df7e0..1b67f562b6fc 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -338,6 +338,95 @@ static int host_stage2_idmap(u64 addr)
> return ret;
> }
>
> +static inline bool check_prot(enum kvm_pgtable_prot prot,
> + enum kvm_pgtable_prot required,
> + enum kvm_pgtable_prot denied)
> +{
> + return (prot & (required | denied)) == required;
> +}
> +
> +int __pkvm_host_share_hyp(u64 pfn)
> +{
> + phys_addr_t addr = hyp_pfn_to_phys(pfn);
> + enum kvm_pgtable_prot prot, cur;
> + void *virt = __hyp_va(addr);
> + enum pkvm_page_state state;
> + kvm_pte_t pte;
> + u32 level;
> + int ret;
> +
> + if (!range_is_memory(addr, addr + PAGE_SIZE))
> + return -EINVAL;
> +
> + hyp_spin_lock(&host_kvm.lock);
> + hyp_spin_lock(&pkvm_pgd_lock);
> +
> + ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
> + if (ret)
> + goto unlock;
> + if (!pte)
> + goto map_shared;

Should this check whether kvm_pte_valid as well, is that guaranteed to
always be the case, or implicitly handled later?

> +
> + /*
> + * Check attributes in the host stage-2 PTE. We need the page to be:
> + * - mapped RWX as we're sharing memory;
> + * - not borrowed, as that implies absence of ownership.
> + * Otherwise, we can't let it got through
> + */
> + cur = kvm_pgtable_stage2_pte_prot(pte);
> + prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
> + if (!check_prot(cur, KVM_PGTABLE_PROT_RWX, prot)) {
> + ret = -EPERM;
> + goto unlock;
> + }
> +
> + state = pkvm_getstate(cur);
> + if (state == PKVM_PAGE_OWNED)
> + goto map_shared;
> +
> + /*
> + * Tolerate double-sharing the same page, but this requires
> + * cross-checking the hypervisor stage-1.
> + */
> + if (state != PKVM_PAGE_SHARED_OWNED) {
> + ret = -EPERM;
> + goto unlock;
> + }
> +
> + ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, &level);
> + if (ret)
> + goto unlock;
> +
> + /*
> + * If the page has been shared with the hypervisor, it must be
> + * SHARED_BORROWED already.
> + */

This comment confused me at first, but then I realized it's referring
to the page from the hyp's point of view. Could you add something to
the comment to that effect?

It might also make it easier to follow if the variables could be
annotated to specify whether cur, state, and prot are the host's or
hyps (and not reuse the same one for both).

> + cur = kvm_pgtable_hyp_pte_prot(pte);
> + prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> + if (!check_prot(cur, prot, ~prot))
> + ret = EPERM;
> + goto unlock;
> +
> +map_shared:
> + /*
> + * If the page is not yet shared, adjust mappings in both page-tables
> + * while both locks are held.
> + */
> + prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> + ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
> + BUG_ON(ret);
> +
> + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> + ret = host_stage2_idmap_locked(addr, addr + PAGE_SIZE, prot);
> + BUG_ON(ret);
> +
> +unlock:
> + hyp_spin_unlock(&pkvm_pgd_lock);
> + hyp_spin_unlock(&host_kvm.lock);
> +
> + return ret;
> +}
> +
> void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
> {
> struct kvm_vcpu_fault_info fault;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 0625bf2353c2..cbab146cda6a 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -259,10 +259,8 @@ static int __create_hyp_mappings(unsigned long start, unsigned long size,
> {
> int err;
>
> - if (!kvm_host_owns_hyp_mappings()) {
> - return kvm_call_hyp_nvhe(__pkvm_create_mappings,
> - start, size, phys, prot);
> - }
> + if (WARN_ON(!kvm_host_owns_hyp_mappings()))
> + return -EINVAL;
>
> mutex_lock(&kvm_hyp_pgd_mutex);
> err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
> @@ -282,6 +280,21 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
> }
> }
>
> +static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> +{
> + phys_addr_t addr;
> + int ret;
> +
> + for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
> + ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
> + __phys_to_pfn(addr));

I guess we don't expect this to happen often, but I wonder if it would
be better to have the looping in the hyp call rather than here, to
reduce the number of hyp calls when sharing.

Thanks,
/fuad

> + if (ret)
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> /**
> * create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
> * @from: The virtual kernel start address of the range
> @@ -302,6 +315,13 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> if (is_kernel_in_hyp_mode())
> return 0;
>
> + if (!kvm_host_owns_hyp_mappings()) {
> + if (WARN_ON(prot != PAGE_HYP))
> + return -EPERM;
> + return pkvm_share_hyp(kvm_kaddr_to_phys(from),
> + kvm_kaddr_to_phys(to));
> + }
> +
> start = start & PAGE_MASK;
> end = PAGE_ALIGN(end);
>
> --
> 2.32.0.432.gabb21c7263-goog
>

2021-08-03 10:17:50

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

Hi Fuad,

On Monday 02 Aug 2021 at 11:49:28 (+0200), Fuad Tabba wrote:
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
> >
> > Much of the stage-2 manipulation logic relies on being able to destroy
> > block mappings if e.g. installing a smaller mapping in the range. The
> > rationale for this behaviour is that stage-2 mappings can always be
> > re-created lazily. However, this gets more complicated when the stage-2
> > page-table is used to store metadata about the underlying pages. In such
> > cases, destroying a block mapping may lead to losing part of the state,
> > and confuse the user of those metadata (such as the hypervisor in nVHE
> > protected mode).
> >
> > To avoid this, introduce a callback function in the pgtable struct which
> > is called during all map operations to determine whether the mappings
> > can use blocks, or should be forced to page granularity. This is used by
> > the hypervisor when creating the host stage-2 to force page-level
> > mappings when using non-default protection attributes.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_pgtable.h | 65 ++++++++++++++++-----------
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 30 +++++++++++--
> > arch/arm64/kvm/hyp/pgtable.c | 29 +++++++++---
> > 3 files changed, 91 insertions(+), 33 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 83c5c97d9eac..ba7dcade2798 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -115,25 +115,6 @@ enum kvm_pgtable_stage2_flags {
> > KVM_PGTABLE_S2_IDMAP = BIT(1),
> > };
> >
> > -/**
> > - * struct kvm_pgtable - KVM page-table.
> > - * @ia_bits: Maximum input address size, in bits.
> > - * @start_level: Level at which the page-table walk starts.
> > - * @pgd: Pointer to the first top-level entry of the page-table.
> > - * @mm_ops: Memory management callbacks.
> > - * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> > - */
> > -struct kvm_pgtable {
> > - u32 ia_bits;
> > - u32 start_level;
> > - kvm_pte_t *pgd;
> > - struct kvm_pgtable_mm_ops *mm_ops;
> > -
> > - /* Stage-2 only */
> > - struct kvm_s2_mmu *mmu;
> > - enum kvm_pgtable_stage2_flags flags;
> > -};
> > -
> > /**
> > * enum kvm_pgtable_prot - Page-table permissions and attributes.
> > * @KVM_PGTABLE_PROT_X: Execute permission.
> > @@ -149,11 +130,41 @@ enum kvm_pgtable_prot {
> > KVM_PGTABLE_PROT_DEVICE = BIT(3),
> > };
> >
> > -#define PAGE_HYP (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > +#define KVM_PGTABLE_PROT_RW (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
> > +#define KVM_PGTABLE_PROT_RWX (KVM_PGTABLE_PROT_RW | KVM_PGTABLE_PROT_X)
> > +
> > +#define PAGE_HYP KVM_PGTABLE_PROT_RW
> > #define PAGE_HYP_EXEC (KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X)
> > #define PAGE_HYP_RO (KVM_PGTABLE_PROT_R)
> > #define PAGE_HYP_DEVICE (PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
>
> I wonder if it would be useful to add a couple of other aliases for
> default memory and default mmio protections, e.g.,
>
> #define KVM_PGTABLE_PROT_MEM KVM_PGTABLE_PROT_RWX
> #define KVM_PGTABLE_PROT_MMIO KVM_PGTABLE_PROT_RW
>
> I think that using these below, e.g., host_stage2_force_pte_cb(),
> might make it clearer and answer comments you had in earlier patches
> about why "RWX" for memory.

Sure I can add something. I'll probably call them something else than
KVM_PGTABLE_PROT_{MEM,MMIO} though, just to make it clear this is all
specific to the host stage-2 stuff and not a general requirement of the
pgtable code to map things like this.

> >
> > +typedef bool (*kvm_pgtable_force_pte_cb_t)(u64 addr, u64 end,
> > + enum kvm_pgtable_prot prot);
> > +
> > +/**
> > + * struct kvm_pgtable - KVM page-table.
> > + * @ia_bits: Maximum input address size, in bits.
> > + * @start_level: Level at which the page-table walk starts.
> > + * @pgd: Pointer to the first top-level entry of the page-table.
> > + * @mm_ops: Memory management callbacks.
> > + * @mmu: Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> > + * @flags: Stage-2 page-table flags.
> > + * @force_pte_cb: Callback function used during map operations to decide
> > + * whether block mappings can be used to map the given IPA
> > + * range.
> > + */
>
> nit: I think it might be clearer (and probably not longer) to rephrase
> to describe in terms of the return value of the callback, e.g., "...
> function that returns true if page level mappings must be used instead
> of block mappings."

Works for me, thanks for the suggestion.

> > +struct kvm_pgtable {
> > + u32 ia_bits;
> > + u32 start_level;
> > + kvm_pte_t *pgd;
> > + struct kvm_pgtable_mm_ops *mm_ops;
> > +
> > + /* Stage-2 only */
> > + struct kvm_s2_mmu *mmu;
> > + enum kvm_pgtable_stage2_flags flags;
> > + kvm_pgtable_force_pte_cb_t force_pte_cb;
> > +};
> > +
> > /**
> > * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
> > * @KVM_PGTABLE_WALK_LEAF: Visit leaf entries, including invalid
> > @@ -246,21 +257,25 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> > u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
> >
> > /**
> > - * kvm_pgtable_stage2_init_flags() - Initialise a guest stage-2 page-table.
> > + * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
> > * @pgt: Uninitialised page-table structure to initialise.
> > * @arch: Arch-specific KVM structure representing the guest virtual
> > * machine.
> > * @mm_ops: Memory management callbacks.
> > * @flags: Stage-2 configuration flags.
> > + * @force_pte_cb: Callback function used during map operations to decide
> > + * whether block mappings can be used to map the given IPA
> > + * range.
>
> nit: same nit as above with describing the callback in terms of its return value
>
> > *
> > * Return: 0 on success, negative error code on failure.
> > */
> > -int kvm_pgtable_stage2_init_flags(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> > - struct kvm_pgtable_mm_ops *mm_ops,
> > - enum kvm_pgtable_stage2_flags flags);
> > +int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_arch *arch,
> > + struct kvm_pgtable_mm_ops *mm_ops,
> > + enum kvm_pgtable_stage2_flags flags,
> > + kvm_pgtable_force_pte_cb_t force_pte_cb);
> >
> > #define kvm_pgtable_stage2_init(pgt, arch, mm_ops) \
> > - kvm_pgtable_stage2_init_flags(pgt, arch, mm_ops, 0)
> > + __kvm_pgtable_stage2_init(pgt, arch, mm_ops, 0, NULL)
> >
> > /**
> > * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 2148d3968aa5..70c57d2c3024 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -89,6 +89,7 @@ static void prepare_host_vtcr(void)
> > id_aa64mmfr1_el1_sys_val, phys_shift);
> > }
> >
> > +static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot);
>
> nit: newline
>
> > int kvm_host_prepare_stage2(void *pgt_pool_base)
> > {
> > struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
> > @@ -101,8 +102,9 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
> > if (ret)
> > return ret;
> >
> > - ret = kvm_pgtable_stage2_init_flags(&host_kvm.pgt, &host_kvm.arch,
> > - &host_kvm.mm_ops, KVM_HOST_S2_FLAGS);
> > + ret = __kvm_pgtable_stage2_init(&host_kvm.pgt, &host_kvm.arch,
> > + &host_kvm.mm_ops, KVM_HOST_S2_FLAGS,
> > + host_stage2_force_pte_cb);
> > if (ret)
> > return ret;
> >
> > @@ -270,9 +272,31 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> > return 0;
> > }
> >
> > +static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot prot)
> > +{
> > + /*
> > + * Block mappings must be used with care in the host stage-2 as a
> > + * kvm_pgtable_stage2_map() operation targeting a page in the range of
> > + * an existing block will delete the block under the assumption that
> > + * mappings in the rest of the block range can always be rebuilt lazily.
> > + * That assumption is correct for the host stage-2 with RWX mappings
> > + * targeting memory or RW mappings targeting MMIO ranges (see
> > + * host_stage2_idmap() below which implements some of the host memory
> > + * abort logic). However, this is not safe for any other mappings where
> > + * the host stage-2 page-table is in fact the only place where this
> > + * state is stored. In all those cases, it is safer to use page-level
> > + * mappings, hence avoiding to lose the state because of side-effects in
> > + * kvm_pgtable_stage2_map().
> > + */
> > + if (range_is_memory(addr, end))
> > + return prot != KVM_PGTABLE_PROT_RWX;
> > + else
> > + return prot != KVM_PGTABLE_PROT_RW;
> > +}
>
> Just checking, I don't think that it's possible for the range to be
> big enough to somehow include both memory and mmio, neither now nor in
> future use cases, is it?

That really shouldn't be the case no -- the host_stage2_idmap() function
tries hard to respect that, so I figured as long as these two are
consistent we should be fine.

> > +
> > static int host_stage2_idmap(u64 addr)
> > {
> > - enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
> > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_RW;
> > struct kvm_mem_range range;
> > bool is_memory = find_mem_range(addr, &range);
> > int ret;
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 2689fcb7901d..bcc02e6e0f62 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -452,6 +452,8 @@ int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
> > pgt->start_level = KVM_PGTABLE_MAX_LEVELS - levels;
> > pgt->mm_ops = mm_ops;
> > pgt->mmu = NULL;
> > + pgt->force_pte_cb = NULL;
> > +
> > return 0;
> > }
> >
> > @@ -489,6 +491,9 @@ struct stage2_map_data {
> > void *memcache;
> >
> > struct kvm_pgtable_mm_ops *mm_ops;
> > +
> > + /* Force mappings to page granularity */
> > + bool force_pte;
> > };
> >
> > u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift)
> > @@ -602,6 +607,15 @@ static bool stage2_pte_executable(kvm_pte_t pte)
> > return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> > }
> >
> > +static bool stage2_block_mapping_allowed(u64 addr, u64 end, u32 level,
> > + struct stage2_map_data *data)
> > +{
> > + if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> > + return false;
>
> I'm not sure I understand why checking the level is necessary. Can
> there be block mapping at the last possible level?

That's probably just a matter of naming, but this function is in fact
called at every level, just like kvm_block_mapping_supported() was
before. And we rely on it returning true at the last level, so I need to
do that check here.

Maybe renaming this stage2_leaf_mapping_allowed() would clarify?

Thanks,
Quentin

2021-08-03 10:22:17

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 13/21] KVM: arm64: Expose host stage-2 manipulation helpers

On Monday 02 Aug 2021 at 13:13:20 (+0200), Fuad Tabba wrote:
> Hi Quentin,
>
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
> >
> > We will need to manipulate the host stage-2 page-table from outside
> > mem_protect.c soon. Introduce two functions allowing this, and make
> > them usable to users of mem_protect.h.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 2 ++
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 17 ++++++++++++++++-
> > 2 files changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index ae355bfd8c01..47c2a0c51612 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -49,6 +49,8 @@ extern struct host_kvm host_kvm;
> > int __pkvm_prot_finalize(void);
> > int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
> >
> > +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot);
> > +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id);
> > int kvm_host_prepare_stage2(void *pgt_pool_base);
> > void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 70c57d2c3024..a7f6134789e0 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -272,6 +272,21 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
> > return 0;
> > }
> >
> > +int host_stage2_idmap_locked(u64 start, u64 end, enum kvm_pgtable_prot prot)
> > +{
> > + hyp_assert_lock_held(&host_kvm.lock);
> > +
> > + return host_stage2_try(__host_stage2_idmap, start, end, prot);
> > +}
> > +
> > +int host_stage2_set_owner_locked(u64 start, u64 end, u8 owner_id)
> > +{
> > + hyp_assert_lock_held(&host_kvm.lock);
> > +
> > + return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
> > + start, end - start, &host_s2_pool, owner_id);
> > +}
>
> This is a potential issue elsewhere as well, but all functions in
> kvm_pgtable.h, including kvm_pgtable_stage2_set_owner, specify an
> address range via address and size. The two you have introduced here
> take a start and an end. I'm not sure if making these two consistent
> with the ones in kvm_pgtable.h would be good, or would just complicate
> things in other places.

Good point, and it looks like specifying these two with start-size
parameters would simplify the callers a tiny bit as well, so I'll fold
that in v4.

Thanks,
Quentin

2021-08-03 10:25:45

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 15/21] KVM: arm64: Introduce addr_is_memory()

On Monday 02 Aug 2021 at 16:52:31 (+0200), Fuad Tabba wrote:
> Hi Quentin.
>
> On Thu, Jul 29, 2021 at 3:28 PM Quentin Perret <[email protected]> wrote:
> >
> > Introduce a helper usable in nVHE protected mode to check whether a
> > physical address is in a RAM region or not.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/mem_protect.c | 7 +++++++
> > 2 files changed, 8 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > index cc86598654b9..5968fbbb3514 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> > @@ -51,6 +51,7 @@ extern const u8 pkvm_hyp_id;
> > int __pkvm_prot_finalize(void);
> > int __pkvm_mark_hyp(phys_addr_t start, phys_addr_t end);
> >
> > +bool addr_is_memory(phys_addr_t phys);
>
> I'm just wondering about the naming of the function. I understand what
> you're trying to achieve with it, but an address without a unit that
> conveys size or type seems to be missing something. Would

Well it does have a type no? I was hopping this would make it clear what
it actually does.

> memregion_addr_is_memory or something like that be a better
> description, since it is what find_mem_range finds?

I think the callers shouldn't need to care about the implementation
details though. This just replies to the question 'is this physical
address in RAM range or not?'. And I could actually imagine that we
would change the implementation some day to avoid the binary search, but
the users probably don't need to care.

Thanks,
Quentin

2021-08-03 10:26:02

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 16/21] KVM: arm64: Enable retrieving protections attributes of PTEs

On Monday 02 Aug 2021 at 16:52:49 (+0200), Fuad Tabba wrote:
> Hi Quentin,
>
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
> >
> > Introduce helper functions in the KVM stage-2 and stage-1 page-table
> > manipulation library allowing to retrieve the enum kvm_pgtable_prot of a
> > PTE. This will be useful to implement custom walkers outside of
> > pgtable.c.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_pgtable.h | 20 +++++++++++++++
> > arch/arm64/kvm/hyp/pgtable.c | 37 ++++++++++++++++++++++++++++
> > 2 files changed, 57 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index d5ca9b6ce241..7ff9f52239ba 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -505,4 +505,24 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> > */
> > int kvm_pgtable_get_leaf(struct kvm_pgtable *pgt, u64 addr,
> > kvm_pte_t *ptep, u32 *level);
> > +
> > +/**
> > + * kvm_pgtable_stage2_pte_prot() - Retrieve the protection attributes of a
> > + * stage-2 Page-Table Entry.
> > + * @pte: Page-table entry
> > + *
> > + * Return: protection attributes of the page-table entry in the enum
> > + * kvm_pgtable_prot format.
> > + */
> > +enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> > +
> > +/**
> > + * kvm_pgtable_hyp_pte_prot() - Retrieve the protection attributes of a stage-1
> > + * Page-Table Entry.
> > + * @pte: Page-table entry
> > + *
> > + * Return: protection attributes of the page-table entry in the enum
> > + * kvm_pgtable_prot format.
> > + */
> > +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> > #endif /* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 1915489bb127..a6eda8f23cb6 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -363,6 +363,26 @@ static int hyp_set_prot_attr(enum kvm_pgtable_prot prot, kvm_pte_t *ptep)
> > return 0;
> > }
> >
> > +enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
> > +{
> > + enum kvm_pgtable_prot prot = pte & KVM_PTE_LEAF_ATTR_HI_SW;
> > + u32 ap;
> > +
> > + if (!kvm_pte_valid(pte))
> > + return prot;
> > +
> > + if (!(pte & KVM_PTE_LEAF_ATTR_HI_S1_XN))
> > + prot |= KVM_PGTABLE_PROT_X;
> > +
> > + ap = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S1_AP, pte);
> > + if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RO)
> > + prot |= KVM_PGTABLE_PROT_R;
> > + else if (ap == KVM_PTE_LEAF_ATTR_LO_S1_AP_RW)
> > + prot |= KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W;
>
> nit: why not use the freshly minted KVM_PGTABLE_PROT_RW?

No good reason, I'll fix that up, thanks!

2021-08-03 10:38:45

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

On Tuesday 03 Aug 2021 at 07:02:42 (+0200), Fuad Tabba wrote:
> Hi Quentin,
>
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
> >
> > As the hypervisor maps the host's .bss and .rodata sections in its
> > stage-1, make sure to tag them as shared in hyp and host page-tables.
> >
> > But since the hypervisor relies on the presence of these mappings, we
> > cannot let the host in complete control of the memory regions -- it
> > must not unshare or donate them to another entity for example. To
> > prevent this, let's transfer the ownership of those ranges to the
> > hypervisor itself, and share the pages back with the host.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/nvhe/setup.c | 82 +++++++++++++++++++++++++++++----
> > 1 file changed, 74 insertions(+), 8 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> > index 0b574d106519..7f557b264f62 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> > @@ -58,6 +58,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > {
> > void *start, *end, *virt = hyp_phys_to_virt(phys);
> > unsigned long pgt_size = hyp_s1_pgtable_pages() << PAGE_SHIFT;
> > + enum kvm_pgtable_prot prot;
> > int ret, i;
> >
> > /* Recreate the hyp page-table using the early page allocator */
> > @@ -83,10 +84,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > if (ret)
> > return ret;
> >
> > - ret = pkvm_create_mappings(__start_rodata, __end_rodata, PAGE_HYP_RO);
> > - if (ret)
> > - return ret;
> > -
> > ret = pkvm_create_mappings(__hyp_rodata_start, __hyp_rodata_end, PAGE_HYP_RO);
> > if (ret)
> > return ret;
> > @@ -95,10 +92,6 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > if (ret)
> > return ret;
> >
> > - ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, PAGE_HYP_RO);
> > - if (ret)
> > - return ret;
> > -
> > ret = pkvm_create_mappings(virt, virt + size, PAGE_HYP);
> > if (ret)
> > return ret;
> > @@ -117,6 +110,24 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
> > return ret;
> > }
> >
> > + /*
> > + * Map the host's .bss and .rodata sections RO in the hypervisor, but
> > + * transfer the ownerhsip from the host to the hypervisor itself to
> > + * make sure it can't be donated or shared with another entity.
>
> nit: ownerhsip -> ownership
>
> > + *
> > + * The ownership transtion requires matching changes in the host
>
> nit: transtion -> transition
>
> > + * stage-2. This will done later (see finalize_host_mappings()) once the
>
> nit: will done -> will be done

Urgh, I clearly went too fast writing this, thanks!

> > + * hyp_vmemmap is addressable.
> > + */
> > + prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
> > + ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
> > + if (ret)
> > + return ret;
> > +
> > + ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
>
> nit: for clarity, I wonder if it might be good to create an alias of
> __hyp_bss_end as __bss_start or something. When it's been moved here,
> it sticks out a bit more and makes the reader wonder about the
> significance of __hyp_bss_end.

I understand what you mean, but I'm not sure this aliasing is really
going to clarify things much. We have a comment in arm.c (see
init_hyp_mode()) to explain exactly why we're doing this, so maybe it
would be worth adding it here too. WDYT?

> > + if (ret)
> > + return ret;
> > +
> > return 0;
> > }
> >
> > @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> > hyp_put_page(&hpool, addr);
> > }
> >
> > +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> > + kvm_pte_t *ptep,
> > + enum kvm_pgtable_walk_flags flag,
> > + void * const arg)
> > +{
> > + enum kvm_pgtable_prot prot;
> > + enum pkvm_page_state state;
> > + kvm_pte_t pte = *ptep;
> > + phys_addr_t phys;
> > +
> > + if (!kvm_pte_valid(pte))
> > + return 0;
> > +
> > + if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> > + return -EINVAL;
>
> I know that it's not in scope here, but I'm wondering whether we
> should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe

Well these would check different things no?

> it would be good to have a helper somewhere for all these checks both
> for clarity and to ensure that nothing has gone wrong with the pte.

The reason I need this check is just to make sure the call to
host_stage2_idmap_locked() further down is correct with a hardcoded
PAGE_SIZE size. The alternative would be to not be lazy and actually
compute the current granule size based on the level and use that, as
that would make this code robust to using block mappings at EL2 stage-1
in the future.

And I'll fix this up for v4.

Cheers,
Quentin

> > +
> > + phys = kvm_pte_to_phys(pte);
> > + if (!addr_is_memory(phys))
> > + return 0;
> > +
> > + /*
> > + * Adjust the host stage-2 mappings to match the ownership attributes
> > + * configured in the hypervisor stage-1.
> > + */
> > + state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> > + switch (state) {
> > + case PKVM_PAGE_OWNED:
> > + return host_stage2_set_owner_locked(phys, phys + PAGE_SIZE, pkvm_hyp_id);
> > + case PKVM_PAGE_SHARED_OWNED:
> > + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_BORROWED);
> > + break;
> > + case PKVM_PAGE_SHARED_BORROWED:
> > + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> > + break;
> > + default:
> > + return -EINVAL;
> > + }
> > +
> > + return host_stage2_idmap_locked(phys, phys + PAGE_SIZE, prot);
> > +}
> > +
> > +static int finalize_host_mappings(void)
> > +{
> > + struct kvm_pgtable_walker walker = {
> > + .cb = finalize_host_mappings_walker,
> > + .flags = KVM_PGTABLE_WALK_LEAF,
> > + };
> > +
> > + return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
> > +}
> > +
> > void __noreturn __pkvm_init_finalise(void)
> > {
> > struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> > @@ -167,6 +229,10 @@ void __noreturn __pkvm_init_finalise(void)
> > if (ret)
> > goto out;
> >
> > + ret = finalize_host_mappings();
> > + if (ret)
> > + goto out;
> > +
> > pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
> > .zalloc_page = hyp_zalloc_hyp_page,
> > .phys_to_virt = hyp_phys_to_virt,
> > --
> > 2.32.0.432.gabb21c7263-goog
> >
>
> Thanks,
> /fuad

2021-08-03 10:38:53

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

Hey Fuad,

On Tuesday 03 Aug 2021 at 07:31:03 (+0200), Fuad Tabba wrote:
> Hi Quentin,
>
> On Thu, Jul 29, 2021 at 3:29 PM Quentin Perret <[email protected]> wrote:
> >
> > Refactor the hypervisor stage-1 locking in nVHE protected mode to expose
> > a new pkvm_create_mappings_locked() function. This will be used in later
> > patches to allow walking and changing the hypervisor stage-1 without
> > releasing the lock.
> >
> > Signed-off-by: Quentin Perret <[email protected]>
> > ---
> > arch/arm64/kvm/hyp/include/nvhe/mm.h | 1 +
> > arch/arm64/kvm/hyp/nvhe/mm.c | 18 ++++++++++++++++--
> > 2 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > index 8ec3a5a7744b..c76d7136ed9b 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> > @@ -23,6 +23,7 @@ int hyp_map_vectors(void);
> > int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
> > int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
> > int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
> > +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
> > int __pkvm_create_mappings(unsigned long start, unsigned long size,
> > unsigned long phys, enum kvm_pgtable_prot prot);
> > unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index a8efdf0f9003..6fbe8e8030f6 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -67,13 +67,15 @@ unsigned long __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
> > return addr;
> > }
> >
> > -int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > +int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot)
> > {
> > unsigned long start = (unsigned long)from;
> > unsigned long end = (unsigned long)to;
> > unsigned long virt_addr;
> > phys_addr_t phys;
> >
> > + hyp_assert_lock_held(&pkvm_pgd_lock);
> > +
> > start = start & PAGE_MASK;
> > end = PAGE_ALIGN(end);
> >
> > @@ -81,7 +83,8 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > int err;
> >
> > phys = hyp_virt_to_phys((void *)virt_addr);
> > - err = __pkvm_create_mappings(virt_addr, PAGE_SIZE, phys, prot);
> > + err = kvm_pgtable_hyp_map(&pkvm_pgtable, virt_addr, PAGE_SIZE,
> > + phys, prot);
> > if (err)
> > return err;
> > }
> > @@ -89,6 +92,17 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > return 0;
> > }
> >
> > +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > +{
> > + int ret;
> > +
> > + hyp_spin_lock(&pkvm_pgd_lock);
> > + ret = pkvm_create_mappings_locked(from, to, prot);
> > + hyp_spin_unlock(&pkvm_pgd_lock);
> > +
> > + return ret;
> > +}
> > +
>
> I'm wondering whether this patch should also refactor
> __pkvm_create_mappings. It doesn't quite do the exact same thing and
> has different parameters.

Sorry, not sure I'm understanding your suggestion here. What do you
think should be done to __pkvm_create_mappings?

Cheers,
Quentin

2021-08-03 10:45:26

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v3 20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode

On Tuesday 03 Aug 2021 at 10:22:03 (+0200), Fuad Tabba wrote:
> Hi Quentin,
>
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 0ccea58df7e0..1b67f562b6fc 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -338,6 +338,95 @@ static int host_stage2_idmap(u64 addr)
> > return ret;
> > }
> >
> > +static inline bool check_prot(enum kvm_pgtable_prot prot,
> > + enum kvm_pgtable_prot required,
> > + enum kvm_pgtable_prot denied)
> > +{
> > + return (prot & (required | denied)) == required;
> > +}
> > +
> > +int __pkvm_host_share_hyp(u64 pfn)
> > +{
> > + phys_addr_t addr = hyp_pfn_to_phys(pfn);
> > + enum kvm_pgtable_prot prot, cur;
> > + void *virt = __hyp_va(addr);
> > + enum pkvm_page_state state;
> > + kvm_pte_t pte;
> > + u32 level;
> > + int ret;
> > +
> > + if (!range_is_memory(addr, addr + PAGE_SIZE))
> > + return -EINVAL;
> > +
> > + hyp_spin_lock(&host_kvm.lock);
> > + hyp_spin_lock(&pkvm_pgd_lock);
> > +
> > + ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
> > + if (ret)
> > + goto unlock;
> > + if (!pte)
> > + goto map_shared;
>
> Should this check whether kvm_pte_valid as well, is that guaranteed to
> always be the case, or implicitly handled later?

Yep, this is implicitly handled by kvm_pgtable_stage2_pte_prot() which
is guaranteed not to return KVM_PGTABLE_PROT_RWX for an invalid mapping.

> > +
> > + /*
> > + * Check attributes in the host stage-2 PTE. We need the page to be:
> > + * - mapped RWX as we're sharing memory;
> > + * - not borrowed, as that implies absence of ownership.
> > + * Otherwise, we can't let it got through
> > + */
> > + cur = kvm_pgtable_stage2_pte_prot(pte);
> > + prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
> > + if (!check_prot(cur, KVM_PGTABLE_PROT_RWX, prot)) {
> > + ret = -EPERM;
> > + goto unlock;
> > + }
> > +
> > + state = pkvm_getstate(cur);
> > + if (state == PKVM_PAGE_OWNED)
> > + goto map_shared;
> > +
> > + /*
> > + * Tolerate double-sharing the same page, but this requires
> > + * cross-checking the hypervisor stage-1.
> > + */
> > + if (state != PKVM_PAGE_SHARED_OWNED) {
> > + ret = -EPERM;
> > + goto unlock;
> > + }
> > +
> > + ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, &level);
> > + if (ret)
> > + goto unlock;
> > +
> > + /*
> > + * If the page has been shared with the hypervisor, it must be
> > + * SHARED_BORROWED already.
> > + */
>
> This comment confused me at first, but then I realized it's referring
> to the page from the hyp's point of view. Could you add something to
> the comment to that effect?

Sure thing.

> It might also make it easier to follow if the variables could be
> annotated to specify whether cur, state, and prot are the host's or
> hyps (and not reuse the same one for both).
>
> > + cur = kvm_pgtable_hyp_pte_prot(pte);
> > + prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> > + if (!check_prot(cur, prot, ~prot))
> > + ret = EPERM;
> > + goto unlock;
> > +
> > +map_shared:
> > + /*
> > + * If the page is not yet shared, adjust mappings in both page-tables
> > + * while both locks are held.
> > + */
> > + prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
> > + ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
> > + BUG_ON(ret);
> > +
> > + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> > + ret = host_stage2_idmap_locked(addr, addr + PAGE_SIZE, prot);
> > + BUG_ON(ret);
> > +
> > +unlock:
> > + hyp_spin_unlock(&pkvm_pgd_lock);
> > + hyp_spin_unlock(&host_kvm.lock);
> > +
> > + return ret;
> > +}
> > +
> > void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
> > {
> > struct kvm_vcpu_fault_info fault;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 0625bf2353c2..cbab146cda6a 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -259,10 +259,8 @@ static int __create_hyp_mappings(unsigned long start, unsigned long size,
> > {
> > int err;
> >
> > - if (!kvm_host_owns_hyp_mappings()) {
> > - return kvm_call_hyp_nvhe(__pkvm_create_mappings,
> > - start, size, phys, prot);
> > - }
> > + if (WARN_ON(!kvm_host_owns_hyp_mappings()))
> > + return -EINVAL;
> >
> > mutex_lock(&kvm_hyp_pgd_mutex);
> > err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
> > @@ -282,6 +280,21 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
> > }
> > }
> >
> > +static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> > +{
> > + phys_addr_t addr;
> > + int ret;
> > +
> > + for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
> > + ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
> > + __phys_to_pfn(addr));
>
> I guess we don't expect this to happen often, but I wonder if it would
> be better to have the looping in the hyp call rather than here, to
> reduce the number of hyp calls when sharing.

Yes, I was wondering the same thing, but ended up doing the looping here
to avoid spending long periods of time in a non-preemptible state at
EL2. Probably doesn't make a big difference for now, but it might if we
ever need to share large memory regions.

Cheers,
Quentin

>
> Thanks,
> /fuad
>
> > + if (ret)
> > + return ret;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > /**
> > * create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
> > * @from: The virtual kernel start address of the range
> > @@ -302,6 +315,13 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > if (is_kernel_in_hyp_mode())
> > return 0;
> >
> > + if (!kvm_host_owns_hyp_mappings()) {
> > + if (WARN_ON(prot != PAGE_HYP))
> > + return -EPERM;
> > + return pkvm_share_hyp(kvm_kaddr_to_phys(from),
> > + kvm_kaddr_to_phys(to));
> > + }
> > +
> > start = start & PAGE_MASK;
> > end = PAGE_ALIGN(end);
> >
> > --
> > 2.32.0.432.gabb21c7263-goog
> >

2021-08-03 10:45:41

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 10/21] KVM: arm64: Enable forcing page-level stage-2 mappings

Hi Quentin,

> > > +static bool stage2_block_mapping_allowed(u64 addr, u64 end, u32 level,
> > > + struct stage2_map_data *data)
> > > +{
> > > + if (data->force_pte && (level < (KVM_PGTABLE_MAX_LEVELS - 1)))
> > > + return false;
> >
> > I'm not sure I understand why checking the level is necessary. Can
> > there be block mapping at the last possible level?
>
> That's probably just a matter of naming, but this function is in fact
> called at every level, just like kvm_block_mapping_supported() was
> before. And we rely on it returning true at the last level, so I need to
> do that check here.
>
> Maybe renaming this stage2_leaf_mapping_allowed() would clarify?

Yes it would.

Thanks,
/fuad

2021-08-03 10:54:06

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 19/21] KVM: arm64: Refactor protected nVHE stage-1 locking

Hi Quentin,

> > > +int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
> > > +{
> > > + int ret;
> > > +
> > > + hyp_spin_lock(&pkvm_pgd_lock);
> > > + ret = pkvm_create_mappings_locked(from, to, prot);
> > > + hyp_spin_unlock(&pkvm_pgd_lock);
> > > +
> > > + return ret;
> > > +}
> > > +
> >
> > I'm wondering whether this patch should also refactor
> > __pkvm_create_mappings. It doesn't quite do the exact same thing and
> > has different parameters.
>
> Sorry, not sure I'm understanding your suggestion here. What do you
> think should be done to __pkvm_create_mappings?

Sorry, my comment wasn't very clear, and "refactor" is the wrong word.
I think it should probably be renamed, because __pkvm_create_mappings
isn't called by pkvm_create_mappings nor by
pkvm_create_mappings_locked. It also has different parameters and
behaves slightly differently.

Thanks,
/fuad

> Cheers,
> Quentin

2021-08-03 10:58:02

by Fuad Tabba

[permalink] [raw]
Subject: Re: [PATCH v3 17/21] KVM: arm64: Mark host bss and rodata section as shared

Hi Quentin,

> > > + ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
> >
> > nit: for clarity, I wonder if it might be good to create an alias of
> > __hyp_bss_end as __bss_start or something. When it's been moved here,
> > it sticks out a bit more and makes the reader wonder about the
> > significance of __hyp_bss_end.
>
> I understand what you mean, but I'm not sure this aliasing is really
> going to clarify things much. We have a comment in arm.c (see
> init_hyp_mode()) to explain exactly why we're doing this, so maybe it
> would be worth adding it here too. WDYT?

Not sure to be honest. Comments are good, until they're stale, and
replicating the comment increases the odds of that happening. No
strong opinion either way.

> > > + if (ret)
> > > + return ret;
> > > +
> > > return 0;
> > > }
> > >
> > > @@ -148,6 +159,57 @@ static void hpool_put_page(void *addr)
> > > hyp_put_page(&hpool, addr);
> > > }
> > >
> > > +static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
> > > + kvm_pte_t *ptep,
> > > + enum kvm_pgtable_walk_flags flag,
> > > + void * const arg)
> > > +{
> > > + enum kvm_pgtable_prot prot;
> > > + enum pkvm_page_state state;
> > > + kvm_pte_t pte = *ptep;
> > > + phys_addr_t phys;
> > > +
> > > + if (!kvm_pte_valid(pte))
> > > + return 0;
> > > +
> > > + if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
> > > + return -EINVAL;
> >
> > I know that it's not in scope here, but I'm wondering whether we
> > should be checking for KVM_PTE_TYPE_PAGE instead of the level. Maybe
>
> Well these would check different things no?
>
> > it would be good to have a helper somewhere for all these checks both
> > for clarity and to ensure that nothing has gone wrong with the pte.
>
> The reason I need this check is just to make sure the call to
> host_stage2_idmap_locked() further down is correct with a hardcoded
> PAGE_SIZE size. The alternative would be to not be lazy and actually
> compute the current granule size based on the level and use that, as
> that would make this code robust to using block mappings at EL2 stage-1
> in the future.
>
> And I'll fix this up for v4.

I get it now. Thanks!
/fuad


> Cheers,
> Quentin
>
> > > +
> > > + phys = kvm_pte_to_phys(pte);
> > > + if (!addr_is_memory(phys))
> > > + return 0;
> > > +
> > > + /*
> > > + * Adjust the host stage-2 mappings to match the ownership attributes
> > > + * configured in the hypervisor stage-1.
> > > + */
> > > + state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
> > > + switch (state) {
> > > + case PKVM_PAGE_OWNED:
> > > + return host_stage2_set_owner_locked(phys, phys + PAGE_SIZE, pkvm_hyp_id);
> > > + case PKVM_PAGE_SHARED_OWNED:
> > > + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_BORROWED);
> > > + break;
> > > + case PKVM_PAGE_SHARED_BORROWED:
> > > + prot = pkvm_mkstate(KVM_PGTABLE_PROT_RWX, PKVM_PAGE_SHARED_OWNED);
> > > + break;
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > +
> > > + return host_stage2_idmap_locked(phys, phys + PAGE_SIZE, prot);
> > > +}
> > > +
> > > +static int finalize_host_mappings(void)
> > > +{
> > > + struct kvm_pgtable_walker walker = {
> > > + .cb = finalize_host_mappings_walker,
> > > + .flags = KVM_PGTABLE_WALK_LEAF,
> > > + };
> > > +
> > > + return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits), &walker);
> > > +}
> > > +
> > > void __noreturn __pkvm_init_finalise(void)
> > > {
> > > struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
> > > @@ -167,6 +229,10 @@ void __noreturn __pkvm_init_finalise(void)
> > > if (ret)
> > > goto out;
> > >
> > > + ret = finalize_host_mappings();
> > > + if (ret)
> > > + goto out;
> > > +
> > > pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
> > > .zalloc_page = hyp_zalloc_hyp_page,
> > > .phys_to_virt = hyp_phys_to_virt,
> > > --
> > > 2.32.0.432.gabb21c7263-goog
> > >
> >
> > Thanks,
> > /fuad