2021-01-22 10:32:38

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v4 0/2] Some optimization for stage-2 translation

Hi, Will, Marc,
Is there any further comment on the v3 series I post previously?
If they are not fine to you, then I think maybe we should just turn
back to the original solution in v1, where I suggestted to filter out
the case of only updating access permissions in the map handler and
handle it right there.

Here are the reasons for my current opinion:
With an errno returned from the map handler for this single case, there
will be one more vcpu exit from guest and we also have to consider the
spurious dirty pages. Besides, it seems that the EAGAIN errno has been
chosen specially for this case and can not be used elsewhere for other
reasons, as we will change this errno to zero at the end of the function.

The v1 solution looks like more concise at last, so I refine the diff
and post the v4 with two patches here, just for a contrast.

Which solution will you prefer now? Could you please let me know.

Thanks,
Yanan.

Links:
v1: https://lore.kernel.org/lkml/[email protected]
v2: https://lore.kernel.org/lkml/[email protected]
v3: https://lore.kernel.org/lkml/[email protected]

---

About patch-1:
Procedures of hyp stage-1 map and guest stage-2 map are quite different,
but they are now tied closely by function kvm_set_valid_leaf_pte().
So adjust the relative code for ease of code maintenance in the future.

About patch-2:
(1) During running time of a a VM with numbers of vCPUs, if some vCPUs
access the same GPA almost at the same time and the stage-2 mapping of
the GPA has not been built yet, as a result they will all cause
translation faults. The first vCPU builds the mapping, and the followed
ones end up updating the valid leaf PTE. Note that these vCPUs might
want different access permissions (RO, RW, RX, RWX, etc.).

(2) It's inevitable that we sometimes will update an existing valid leaf
PTE in the map path, and we all perform break-before-make in this case.
Then more unnecessary translation faults could be caused if the
*break stage* of BBM is just catched by other vCPUs.

With (1) and (2), something unsatisfactory could happen: vCPU A causes
a translation fault and builds the mapping with RW permissions, vCPU B
then update the valid leaf PTE with break-before-make and permissions
are updated back to RO. Besides, *break stage* of BBM may trigger more
translation faults. Finally, some useless small loops could occur.

We can make some optimization to solve above problems: When we need to
update a valid leaf PTE in the translation fault handler, let's filter
out the case where this update only change access permissions that don't
require break-before-make. If there have already been the permissions
we want, don't bother to update. If still more permissions need to be
added, then update the PTE directly without break-before-make.

---

Changelogs

v4->v3:
- Turn back to the original solution in v1 and refine the diff
- Rebased on top of v5.11-rc4

v2->v3:
- Rebased on top of v5.11-rc3
- Refine the commit messages
- Make some adjustment about return value in patch-2 and patch-3

v1->v2:
- Make part of the diff a seperate patch (patch-1)
- Add Will's Signed-off-by for patch-1
- Return an errno when meeting changing permissions case in map path
- Add a new patch (patch-3)

---

Yanan Wang (2):
KVM: arm64: Adjust partial code of hyp stage-1 map and guest stage-2
map
KVM: arm64: Filter out the case of only changing permissions from
stage-2 map path

arch/arm64/include/asm/kvm_pgtable.h | 4 ++
arch/arm64/kvm/hyp/pgtable.c | 88 +++++++++++++++++++---------
2 files changed, 63 insertions(+), 29 deletions(-)

--
2.19.1


2021-01-22 10:32:44

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v4 2/2] KVM: arm64: Filter out the case of only changing permissions from stage-2 map path

(1) During running time of a a VM with numbers of vCPUs, if some vCPUs
access the same GPA almost at the same time and the stage-2 mapping of
the GPA has not been built yet, as a result they will all cause
translation faults. The first vCPU builds the mapping, and the followed
ones end up updating the valid leaf PTE. Note that these vCPUs might
want different access permissions (RO, RW, RX, RWX, etc.).

(2) It's inevitable that we sometimes will update an existing valid leaf
PTE in the map path, and we all perform break-before-make in this case.
Then more unnecessary translation faults could be caused if the
*break stage* of BBM is just catched by other vCPUs.

With (1) and (2), something unsatisfactory could happen: vCPU A causes
a translation fault and builds the mapping with RW permissions, vCPU B
then update the valid leaf PTE with break-before-make and permissions
are updated back to RO. Besides, *break stage* of BBM may trigger more
translation faults. Finally, some useless small loops could occur.

We can make some optimization to solve above problems: When we need to
update a valid leaf PTE in the translation fault handler, let's filter
out the case where this update only change access permissions that don't
require break-before-make. If there have already been the permissions
we want, don't bother to update. If still more permissions need to be
added, then update the PTE directly without break-before-make.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 4 ++
arch/arm64/kvm/hyp/pgtable.c | 62 +++++++++++++++++++++-------
2 files changed, 50 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 52ab38db04c7..2bd4e772ca57 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -157,6 +157,10 @@ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
* If device attributes are not explicitly requested in @prot, then the
* mapping will be normal, cacheable.
*
+ * When there is an existing valid leaf PTE to be updated in this function,
+ * perform break-before-make only if the parameters to be changed for this
+ * update require it, otherwise the PTE can be updated directly.
+ *
* Note that this function will both coalesce existing table entries and split
* existing block mappings, relying on page-faults to fault back areas outside
* of the new mapping lazily.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 2878aaf53b3c..aac1915f9770 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -45,6 +45,10 @@

#define KVM_PTE_LEAF_ATTR_HI_S2_XN BIT(54)

+#define KVM_PTE_LEAF_ATTR_S2_PERMS (KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R | \
+ KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
+ KVM_PTE_LEAF_ATTR_HI_S2_XN)
+
struct kvm_pgtable_walk_data {
struct kvm_pgtable *pgt;
struct kvm_pgtable_walker *walker;
@@ -460,34 +464,60 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
return 0;
}

+static void stage2_map_update_valid_leaf_pte(u64 addr, u32 level,
+ kvm_pte_t *ptep, kvm_pte_t new,
+ struct stage2_map_data *data)
+{
+ kvm_pte_t old = *ptep;
+
+ /*
+ * It's inevitable that we sometimes end up updating an existing valid
+ * leaf PTE on the map path for kinds of reasons, for instance, multiple
+ * vcpus accessing the same GPA page all cause translation faults on the
+ * same time. So perform break-before-make here only if the parameters
+ * to be changed for this update require it, otherwise the PTE can be
+ * updated directly.
+ */
+ if ((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS)) {
+ kvm_set_invalid_pte(ptep);
+ kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+ smp_store_release(ptep, new);
+ return;
+ }
+
+ old ^= KVM_PTE_LEAF_ATTR_HI_S2_XN;
+ new ^= KVM_PTE_LEAF_ATTR_HI_S2_XN;
+ new |= old;
+
+ /*
+ * Update the valid leaf PTE directly without break-before-make if more
+ * permissions need to be added, and skip the update if there have been
+ * already the permissions that we want.
+ */
+ if (new != old) {
+ WRITE_ONCE(*ptep, new ^ KVM_PTE_LEAF_ATTR_HI_S2_XN);
+ kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+ }
+}
+
static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
- kvm_pte_t new, old = *ptep;
+ kvm_pte_t new;
u64 granule = kvm_granule_size(level), phys = data->phys;

if (!kvm_block_mapping_supported(addr, end, phys, level))
return false;

new = kvm_init_valid_leaf_pte(phys, data->attr, level);
- if (kvm_pte_valid(old)) {
- /* Tolerate KVM recreating the exact same mapping */
- if (old == new)
- goto out;
-
- /*
- * There's an existing different valid leaf entry, so perform
- * break-before-make.
- */
- kvm_set_invalid_pte(ptep);
- kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
- put_page(virt_to_page(ptep));
+ if (kvm_pte_valid(*ptep)) {
+ stage2_map_update_valid_leaf_pte(addr, level, ptep, new, data);
+ } else {
+ smp_store_release(ptep, new);
+ get_page(virt_to_page(ptep));
}

- smp_store_release(ptep, new);
- get_page(virt_to_page(ptep));
-out:
data->phys += granule;
return true;
}
--
2.19.1

2021-01-22 11:52:21

by Marc Zyngier

[permalink] [raw]
Subject: Re: [RFC PATCH v4 0/2] Some optimization for stage-2 translation

Hi Yanan,

On 2021-01-22 10:13, Yanan Wang wrote:
> Hi, Will, Marc,
> Is there any further comment on the v3 series I post previously?

None, I was planning to queue them for 5.12 over the weekend.

> If they are not fine to you, then I think maybe we should just turn
> back to the original solution in v1, where I suggestted to filter out
> the case of only updating access permissions in the map handler and
> handle it right there.
>
> Here are the reasons for my current opinion:
> With an errno returned from the map handler for this single case, there
> will be one more vcpu exit from guest and we also have to consider the
> spurious dirty pages. Besides, it seems that the EAGAIN errno has been
> chosen specially for this case and can not be used elsewhere for other
> reasons, as we will change this errno to zero at the end of the
> function.
>
> The v1 solution looks like more concise at last, so I refine the diff
> and post the v4 with two patches here, just for a contrast.
>
> Which solution will you prefer now? Could you please let me know.

I'm still very much opposed to mixing mapping and permission changes.
How bad is the spurious return to a vcpu?

Thanks,

M.
--
Jazz is not dead. It just smells funny...