2021-01-26 13:52:30

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v1 0/5] Enable CPU TTRem feature for stage-2

Hi all,
This series enable CPU TTRem feature for stage-2 page table and a RFC is sent
for some comments, thanks.

The ARMv8.4 TTRem feature offers 3 levels of support when changing block
size without changing any other parameters that are listed as requiring use
of break-before-make. And I found that maybe we can use this feature to make
some improvement for stage-2 page table and the following explains what
TTRem exactly does for the improvement.

If migration of a VM with hugepages is canceled midway, KVM will adjust the
stage-2 table mappings back to block mappings. We currently use BBM to replace
the table entry with a block entry. Take adjustment of 1G block mapping as an
example, with BBM procedures, we have to invalidate the old table entry first,
flush TLB and unmap the old table mappings, right before installing the new
block entry.

So there will be a bit long period when the old table entry is invalid before
installation of the new block entry, if other vCPUs access any guest page within
the 1G range during this period and find the table entry invalid, they will all
exit from guest with a translation fault. Actually, these translation faults
are not necessary, because the block mapping will be built later. Besides, KVM
will still try to build 1G block mappings for these spurious translation faults,
and will perform cache maintenance operations, page table walk, etc.

In summary, the spurious faults are caused by invalidation in BBM procedures.
Approaches of TTRem level 1,2 ensure that there will not be a moment when the
old table entry is invalid before installation of the new block entry. However,
level-2 method will possibly lead to a TLB conflict which is bothering, so we
use nT both at level-1 and level-2 case to avoid handling TLB conflict aborts.

For an implementation which meets level 1 or level 2, the CPU has two responses
to choose when accessing a block table entry with nT bit set: Firstly, CPU will
generate a translation fault, the effect of this response is simier to BBM.
Secondly, CPU can use the block entry for translation. So with the second kind
of implementation, the above described spurious translations can be prevented.

Yanan Wang (5):
KVM: arm64: Detect the ARMv8.4 TTRem feature
KVM: arm64: Add an API to get level of TTRem supported by hardware
KVM: arm64: Support usage of TTRem in guest stage-2 translation
KVM: arm64: Add handling of coalescing tables into a block mapping
KVM: arm64: Adapt page-table code to new handling of coalescing tables

arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/cpufeature.h | 13 ++++++
arch/arm64/kernel/cpufeature.c | 10 +++++
arch/arm64/kvm/hyp/pgtable.c | 62 +++++++++++++++++++++++------
4 files changed, 74 insertions(+), 14 deletions(-)

--
2.19.1


2021-01-26 13:52:47

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v1 5/5] KVM: arm64: Adapt page-table code to new handling of coalescing tables

With new handling of coalescing tables, we can install the block entry
before unmap of the old table mappings. So make the installation in
stage2_map_walk_table_pre(), and elide the installation from function
stage2_map_walk_table_post().

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index ab1c94985ed0..fb755aac4384 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -436,6 +436,7 @@ struct stage2_map_data {
kvm_pte_t attr;

kvm_pte_t *anchor;
+ kvm_pte_t *follow;

struct kvm_s2_mmu *mmu;
struct kvm_mmu_memory_cache *memcache;
@@ -550,13 +551,13 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
kvm_set_invalid_pte(ptep);

/*
- * Invalidate the whole stage-2, as we may have numerous leaf
- * entries below us which would otherwise need invalidating
- * individually.
+ * If there is an existing table entry and block mapping is needed here,
+ * then set the anchor and replace it with a block entry. The sub-level
+ * mappings will later be unmapped lazily.
*/
- kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
data->anchor = ptep;
- return 0;
+ data->follow = kvm_pte_follow(*ptep);
+ return stage2_coalesce_tables_into_block(addr, level, ptep, data);
}

static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
@@ -608,20 +609,18 @@ static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
{
- int ret = 0;
-
if (!data->anchor)
return 0;

- free_page((unsigned long)kvm_pte_follow(*ptep));
- put_page(virt_to_page(ptep));
-
- if (data->anchor == ptep) {
+ if (data->anchor != ptep) {
+ free_page((unsigned long)kvm_pte_follow(*ptep));
+ put_page(virt_to_page(ptep));
+ } else {
+ free_page((unsigned long)data->follow);
data->anchor = NULL;
- ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
}

- return ret;
+ return 0;
}

/*
--
2.19.1

2021-01-26 14:22:33

by Marc Zyngier

[permalink] [raw]
Subject: Re: [RFC PATCH v1 0/5] Enable CPU TTRem feature for stage-2

Hi Yanan,

On 2021-01-26 13:41, Yanan Wang wrote:
> Hi all,
> This series enable CPU TTRem feature for stage-2 page table and a RFC
> is sent
> for some comments, thanks.
>
> The ARMv8.4 TTRem feature offers 3 levels of support when changing
> block
> size without changing any other parameters that are listed as requiring
> use
> of break-before-make. And I found that maybe we can use this feature to
> make
> some improvement for stage-2 page table and the following explains what
> TTRem exactly does for the improvement.
>
> If migration of a VM with hugepages is canceled midway, KVM will adjust
> the
> stage-2 table mappings back to block mappings. We currently use BBM to
> replace
> the table entry with a block entry. Take adjustment of 1G block mapping
> as an
> example, with BBM procedures, we have to invalidate the old table entry
> first,
> flush TLB and unmap the old table mappings, right before installing the
> new
> block entry.

In all honesty, I think the amount of work that is getting added to
support this "migration cancelled mid-way" use case is getting out
of control.

This is adding a complexity and corner cases for a use case that
really shouldn't happen that often. And it is adding it at the worse
possible place, where we really should keep things as straightforward
as possible.

I would expect userspace to have a good enough knowledge of whether
the migration is likely to succeed, and not to attempt it if it is
likely to fail. And yes, it will fail sometimes. But it should be
so rare that adding this various stages of BBM support shouldn't be
that useful.

Or is there something else that I am missing?

Thanks,

M.
--
Jazz is not dead. It just smells funny...

2021-01-26 18:58:19

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v1 2/5] arm64: cpufeature: Add an API to get level of TTRem supported by hardware

The ARMv8.4 architecture offers 3 levels of support when changing
block size without changing any other parameters that are listed
as requiring use of break-before-make. So get the current level
of TTRem supported by hardware and software can use corresponding
process when changing block size.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/include/asm/cpufeature.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9a555809b89c..f8ee7d30829b 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -50,6 +50,11 @@ enum ftr_type {
#define FTR_VISIBLE true /* Feature visible to the user space */
#define FTR_HIDDEN false /* Feature is hidden from the user */

+/* Supported levels of ARMv8.4 TTRem feature */
+#define TTREM_LEVEL0 0
+#define TTREM_LEVEL1 1
+#define TTREM_LEVEL2 2
+
#define FTR_VISIBLE_IF_IS_ENABLED(config) \
(IS_ENABLED(config) ? FTR_VISIBLE : FTR_HIDDEN)

@@ -739,6 +744,14 @@ static inline bool system_supports_tlb_range(void)
cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
}

+static inline u32 system_support_level_of_ttrem(void)
+{
+ u64 mmfr2 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+
+ return cpuid_feature_extract_unsigned_field(mmfr2,
+ ID_AA64MMFR2_BBM_SHIFT);
+}
+
extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt);

static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange)
--
2.19.1

2021-01-26 18:59:03

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v1 3/5] KVM: arm64: Support usage of TTRem in guest stage-2 translation

As TTrem can be used when coalesce existing table mappings into a block
in guest stage-2 translation, so just support usage of it.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..c8b959e3951b 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -437,6 +437,7 @@ struct stage2_map_data {

struct kvm_s2_mmu *mmu;
struct kvm_mmu_memory_cache *memcache;
+ u32 ttrem_level;
};

static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
@@ -633,6 +634,7 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
.phys = ALIGN_DOWN(phys, PAGE_SIZE),
.mmu = pgt->mmu,
.memcache = mc,
+ .ttrem_level = system_support_level_of_ttrem(),
};
struct kvm_pgtable_walker walker = {
.cb = stage2_map_walker,
--
2.19.1

2021-01-26 19:02:06

by Yanan Wang

[permalink] [raw]
Subject: [RFC PATCH v1 4/5] KVM: arm64: Add handling of coalescing tables into a block mapping

If migration of a VM with hugepages is canceled midway, KVM will adjust
the stage-2 table mappings back to block mappings. We currently use BBM
to replace the table entry with a block entry. Take adjustment of 1G block
mapping as an example, with BBM procedures, we have to invalidate the old
table entry of level 1 first, flush TLB and unmap the old table mappings,
right before installing the new block entry.

So there will be a bit long period when the table entry of level 1 is
invalid before installation of block entry, if other vCPUs access any
guest page within the 1G range during this period and find the table
entry invalid, they will all exit from guest with an translation fault.
Actually, these translation faults are not necessary, because the block
mapping will be built later. Besides, KVM will try to build 1G block
mappings for these translation faults, and will perform cache maintenance
operations, page table walk, etc.

Approaches of TTRem level 1,2 ensure that there will be not a moment when
the old table entry is invalid before installation of the new block entry,
so no unnecessary translation faults will be caused. But level-2 method
will possibly lead to a TLB conflict which is bothering, so we use nT both
at level-1 and level-2 case to avoid handling TLB conflict aborts.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c8b959e3951b..ab1c94985ed0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -49,6 +49,8 @@
KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
KVM_PTE_LEAF_ATTR_HI_S2_XN)

+#define KVM_PTE_LEAF_BLOCK_S2_NT BIT(16)
+
struct kvm_pgtable_walk_data {
struct kvm_pgtable *pgt;
struct kvm_pgtable_walker *walker;
@@ -502,6 +504,39 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
return 0;
}

+static int stage2_coalesce_tables_into_block(u64 addr, u32 level,
+ kvm_pte_t *ptep,
+ struct stage2_map_data *data)
+{
+ u32 ttrem_level = data->ttrem_level;
+ u64 granule = kvm_granule_size(level), phys = data->phys;
+ kvm_pte_t new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+
+ switch (ttrem_level) {
+ case TTREM_LEVEL0:
+ kvm_set_invalid_pte(ptep);
+
+ /*
+ * Invalidate the whole stage-2, as we may have numerous leaf
+ * entries below us which would otherwise need invalidating
+ * individually.
+ */
+ kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+ smp_store_release(ptep, new);
+ data->phys += granule;
+ return 0;
+ case TTREM_LEVEL1:
+ case TTREM_LEVEL2:
+ WRITE_ONCE(*ptep, new | KVM_PTE_LEAF_BLOCK_S2_NT);
+ kvm_call_hyp(__kvm_tlb_flush_vmid, data->mmu);
+ WRITE_ONCE(*ptep, new & ~KVM_PTE_LEAF_BLOCK_S2_NT);
+ data->phys += granule;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
--
2.19.1