LinuxLists.cc - [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table

2021-04-15 11:52:13

Subject: [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table

Hi,

This series makes some efficiency improvement of guest stage-2 page
table code, and there are some test results to quantify the benefit.
The code has been re-arranged based on the latest kvmarm/next tree.

Descriptions:
We currently uniformly permorm CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the stage-2 page table code.

In this series, patch #1, #3, #4 make preparation for place movement
of CMOs (adapt to the latest stage-2 page table framework). And patch
#2, #5 move CMOs of D-cache and I-cache to the fault handlers. Patch
#6 introduces a new way to distinguish cases of memcache allocations.

The following are results in v3 to represent the benefit introduced
by movement of CMOs, and they were tested by [1] (kvm/selftest) that
I have posted recently.
[1] https://lore.kernel.org/lkml/[email protected]/

When there are muitiple vcpus concurrently accessing the same memory
region, we can test the execution time of KVM creating new mappings,
updating the permissions of old mappings from RO to RW, and the time
of re-creating the blocks after they have been split.

hardware platform: HiSilicon Kunpeng920 Server
host kernel: Linux mainline v5.12-rc2

cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80
(80 vcpus, 1G memory, page mappings(normal 4K))
KVM_CREATE_MAPPINGS: before 104.35s -> after 90.42s +13.35%
KVM_UPDATE_MAPPINGS: before 78.64s -> after 75.45s + 4.06%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40
(40 vcpus, 20G memory, block mappings(THP 2M))
KVM_CREATE_MAPPINGS: before 15.66s -> after 6.92s +55.80%
KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s +31.00%
KVM_REBUILD_BLOCKS: before 187.34s -> after 131.76s +30.65%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40
(40 vcpus, 20G memory, block mappings(HUGETLB 1G))
KVM_CREATE_MAPPINGS: before 104.54s -> after 3.70s +96.46%
KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s +33.44%
KVM_REBUILD_BLOCKS: before 103.95s -> after 2.96s +97.15%

---

Changelogs:
v4->v5:
- rebased on the latest kvmarm/tree to adapt to the new stage-2 page-table code
- v4: https://lore.kernel.org/lkml/[email protected]/

v3->v4:
- perform D-cache flush if we are not mapping device memory
- rebased on top of mainline v5.12-rc6
- v3: https://lore.kernel.org/lkml/[email protected]/

v2->v3:
- drop patch #3 in v2
- retest v3 based on v5.12-rc2
- v2: https://lore.kernel.org/lkml/[email protected]/

v1->v2:
- rebased on top of mainline v5.12-rc2
- also move CMOs of I-cache to the fault handlers
- retest v2 based on v5.12-rc2
- v1: https://lore.kernel.org/lkml/[email protected]/

---

Yanan Wang (6):
KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
KVM: arm64: Move D-cache flush to the fault handlers
KVM: arm64: Add mm_ops member for structure stage2_attr_data
KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
KVM: arm64: Move I-cache flush to the fault handlers
KVM: arm64: Distinguish cases of memcache allocations completely

arch/arm64/include/asm/kvm_mmu.h | 31 -------------
arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++------
arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++
arch/arm64/kvm/hyp/pgtable.c | 65 +++++++++++++++++++++++-----
arch/arm64/kvm/mmu.c | 51 ++++++++--------------
5 files changed, 107 insertions(+), 89 deletions(-)

--
2.23.0

2021-04-15 11:52:21

by Yanan Wang

[permalink] [raw]

Subject: [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2

We want to move I-cache maintenance for the guest to the stage-2
page table code for performance improvement. Before it can work,
we should first make function invalidate_icache_range available
to non-VHE EL2 to avoid compiling or program running error, as
pgtable.c is now linked into the non-VHE EL2 code for pKVM mode.

In this patch, we only introduce symbol of invalidate_icache_range
with no real functionality in nvhe/cache.S, because there haven't
been situations found currently where I-cache maintenance is also
needed in non-VHE EL2 for pKVM mode.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 36cef6915428..a125ec9aeed2 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -11,3 +11,14 @@ SYM_FUNC_START_PI(__flush_dcache_area)
dcache_by_line_op civac, sy, x0, x1, x2, x3
ret
SYM_FUNC_END_PI(__flush_dcache_area)
+
+/*
+ * invalidate_icache_range(start,end)
+ *
+ * Ensure that the I cache is invalid within specified region.
+ *
+ * - start - virtual start address of region
+ * - end - virtual end address of region
+ */
+SYM_FUNC_START(invalidate_icache_range)
+SYM_FUNC_END(invalidate_icache_range)
--
2.23.0

2021-04-15 11:53:25

by Yanan Wang

[permalink] [raw]

Subject: [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers

In this patch, we move invalidation of I-cache to the fault handlers to
avoid unnecessary I-cache maintenances. On the map path, invalidate the
I-cache if we are going to create an executable stage-2 mapping for guest.
And on the permission path, invalidate the I-cache if we are going to add
an executable permission to the existing guest stage-2 mapping.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 15 --------------
arch/arm64/kvm/hyp/pgtable.c | 35 +++++++++++++++++++++++++++++++-
arch/arm64/kvm/mmu.c | 9 +-------
3 files changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index e9b163c5f023..155492fe5b15 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
}

-static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
- unsigned long size)
-{
- if (icache_is_aliasing()) {
- /* any kind of VIPT cache */
- __flush_icache_all();
- } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
- /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
- void *va = page_address(pfn_to_page(pfn));
-
- invalidate_icache_range((unsigned long)va,
- (unsigned long)va + size);
- }
-}
-
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b480f6d1171e..9f4429d80df0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
}

+static bool stage2_pte_executable(kvm_pte_t pte)
+{
+ return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+}
+
+static void stage2_invalidate_icache(void *addr, u64 size)
+{
+ if (icache_is_aliasing()) {
+ /* Any kind of VIPT cache */
+ __flush_icache_all();
+ } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
+ /*
+ * See comment in __kvm_tlb_flush_vmid_ipa().
+ * Invalidate PIPT, or VPIPT at EL2.
+ */
+ invalidate_icache_range((unsigned long)addr,
+ (unsigned long)addr + size);
+ }
+}
+
static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
u32 level, struct kvm_pgtable_mm_ops *mm_ops)
{
@@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
__flush_dcache_area(mm_ops->phys_to_virt(phys),
granule);
+
+ if (stage2_pte_executable(new))
+ stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
+ granule);
}

smp_store_release(ptep, new);
@@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
* but worst-case the access flag update gets lost and will be
* set on the next access instead.
*/
- if (data->pte != pte)
+ if (data->pte != pte) {
+ /*
+ * Invalidate the instruction cache before updating
+ * if we are going to add the executable permission
+ * for the guest stage-2 PTE.
+ */
+ if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
+ stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
+ kvm_granule_size(level));
WRITE_ONCE(*ptep, pte);
+ }

return 0;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 86f7dd1c234f..aa536392b308 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -694,11 +694,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}

-static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- __invalidate_icache_guest_page(pfn, size);
-}
-
static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
{
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
@@ -967,10 +962,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (writable)
prot |= KVM_PGTABLE_PROT_W;

- if (exec_fault) {
+ if (exec_fault)
prot |= KVM_PGTABLE_PROT_X;
- invalidate_icache_guest_page(pfn, vma_pagesize);
- }

if (device)
prot |= KVM_PGTABLE_PROT_DEVICE;
--
2.23.0

2021-04-15 11:53:55

by Yanan Wang

[permalink] [raw]

Subject: [PATCH v5 3/6] KVM: arm64: Add mm_ops member for structure stage2_attr_data

Also add a mm_ops member for structure stage2_attr_data, since we
will move I-cache maintenance for guest stage-2 to the permission
path and as a result will need mm_ops for address transformation.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index e3606c9dcec7..b480f6d1171e 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -869,10 +869,11 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
}

struct stage2_attr_data {
- kvm_pte_t attr_set;
- kvm_pte_t attr_clr;
- kvm_pte_t pte;
- u32 level;
+ kvm_pte_t attr_set;
+ kvm_pte_t attr_clr;
+ kvm_pte_t pte;
+ u32 level;
+ struct kvm_pgtable_mm_ops *mm_ops;
};

static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
@@ -911,6 +912,7 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
struct stage2_attr_data data = {
.attr_set = attr_set & attr_mask,
.attr_clr = attr_clr & attr_mask,
+ .mm_ops = pgt->mm_ops,
};
struct kvm_pgtable_walker walker = {
.cb = stage2_attr_walker,
--
2.23.0

2021-04-15 11:54:00

by Yanan Wang

[permalink] [raw]

Subject: [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely

With a guest translation fault, the memcache pages are not needed if KVM
is only about to install a new leaf entry into the existing page table.
And with a guest permission fault, the memcache pages are also not needed
for a write_fault in dirty-logging time if KVM is only about to update
the existing leaf entry instead of collapsing a block entry into a table.

By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index aa536392b308..9e35aa5d29f2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
gfn = fault_ipa >> PAGE_SHIFT;
mmap_read_unlock(current->mm);

- /*
- * Permission faults just need to update the existing leaf entry,
- * and so normally don't require allocations from the memcache. The
- * only exception to this is when dirty logging is enabled at runtime
- * and a write fault needs to collapse a block entry into a table.
- */
- if (fault_status != FSC_PERM || (logging_active && write_fault)) {
- ret = kvm_mmu_topup_memory_cache(memcache,
- kvm_mmu_cache_min_pages(kvm));
- if (ret)
- return ret;
- }
-
mmu_seq = vcpu->kvm->mmu_notifier_seq;
/*
* Ensure the read of mmu_notifier_seq happens before we call
@@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
prot |= KVM_PGTABLE_PROT_X;

+ /*
+ * Allocations from the memcache are required only when granule of the
+ * lookup level where the guest fault happened exceeds vma_pagesize,
+ * which means new page tables will be created in the fault handlers.
+ */
+ if (fault_granule > vma_pagesize) {
+ ret = kvm_mmu_topup_memory_cache(memcache,
+ kvm_mmu_cache_min_pages(kvm));
+ if (ret)
+ return ret;
+ }
+
/*
* Under the premise of getting a FSC_PERM fault, we just need to relax
* permissions only if vma_pagesize equals fault_granule. Otherwise,
--
2.23.0

2021-05-12 13:34:39

by Yanan Wang

[permalink] [raw]

Subject: Re: [PATCH v5 0/6] KVM: arm64: Improve efficiency of stage2 page table

A really gentle ping ...

Sincerely,
Yanan

On 2021/4/15 19:50, Yanan Wang wrote:
> Hi,
>
> This series makes some efficiency improvement of guest stage-2 page
> table code, and there are some test results to quantify the benefit.
> The code has been re-arranged based on the latest kvmarm/next tree.
>
> Descriptions:
> We currently uniformly permorm CMOs of D-cache and I-cache in function
> user_mem_abort before calling the fault handlers. If we get concurrent
> guest faults(e.g. translation faults, permission faults) or some really
> unnecessary guest faults caused by BBM, CMOs for the first vcpu are
> necessary while the others later are not.
>
> By moving CMOs to the fault handlers, we can easily identify conditions
> where they are really needed and avoid the unnecessary ones. As it's a
> time consuming process to perform CMOs especially when flushing a block
> range, so this solution reduces much load of kvm and improve efficiency
> of the stage-2 page table code.
>
> In this series, patch #1, #3, #4 make preparation for place movement
> of CMOs (adapt to the latest stage-2 page table framework). And patch
> #2, #5 move CMOs of D-cache and I-cache to the fault handlers. Patch
> #6 introduces a new way to distinguish cases of memcache allocations.
>
> The following are results in v3 to represent the benefit introduced
> by movement of CMOs, and they were tested by [1] (kvm/selftest) that
> I have posted recently.
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> When there are muitiple vcpus concurrently accessing the same memory
> region, we can test the execution time of KVM creating new mappings,
> updating the permissions of old mappings from RO to RW, and the time
> of re-creating the blocks after they have been split.
>
> hardware platform: HiSilicon Kunpeng920 Server
> host kernel: Linux mainline v5.12-rc2
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80
> (80 vcpus, 1G memory, page mappings(normal 4K))
> KVM_CREATE_MAPPINGS: before 104.35s -> after 90.42s +13.35%
> KVM_UPDATE_MAPPINGS: before 78.64s -> after 75.45s + 4.06%
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40
> (40 vcpus, 20G memory, block mappings(THP 2M))
> KVM_CREATE_MAPPINGS: before 15.66s -> after 6.92s +55.80%
> KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s +31.00%
> KVM_REBUILD_BLOCKS: before 187.34s -> after 131.76s +30.65%
>
> cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40
> (40 vcpus, 20G memory, block mappings(HUGETLB 1G))
> KVM_CREATE_MAPPINGS: before 104.54s -> after 3.70s +96.46%
> KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s +33.44%
> KVM_REBUILD_BLOCKS: before 103.95s -> after 2.96s +97.15%
>
> ---
>
> Changelogs:
> v4->v5:
> - rebased on the latest kvmarm/tree to adapt to the new stage-2 page-table code
> - v4: https://lore.kernel.org/lkml/[email protected]/
>
> v3->v4:
> - perform D-cache flush if we are not mapping device memory
> - rebased on top of mainline v5.12-rc6
> - v3: https://lore.kernel.org/lkml/[email protected]/
>
> v2->v3:
> - drop patch #3 in v2
> - retest v3 based on v5.12-rc2
> - v2: https://lore.kernel.org/lkml/[email protected]/
>
> v1->v2:
> - rebased on top of mainline v5.12-rc2
> - also move CMOs of I-cache to the fault handlers
> - retest v2 based on v5.12-rc2
> - v1: https://lore.kernel.org/lkml/[email protected]/
>
> ---
>
> Yanan Wang (6):
> KVM: arm64: Introduce KVM_PGTABLE_S2_GUEST stage-2 flag
> KVM: arm64: Move D-cache flush to the fault handlers
> KVM: arm64: Add mm_ops member for structure stage2_attr_data
> KVM: arm64: Provide invalidate_icache_range at non-VHE EL2
> KVM: arm64: Move I-cache flush to the fault handlers
> KVM: arm64: Distinguish cases of memcache allocations completely
>
> arch/arm64/include/asm/kvm_mmu.h | 31 -------------
> arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++------
> arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++
> arch/arm64/kvm/hyp/pgtable.c | 65 +++++++++++++++++++++++-----
> arch/arm64/kvm/mmu.c | 51 ++++++++--------------
> 5 files changed, 107 insertions(+), 89 deletions(-)
>

2021-06-02 11:02:14

by Quentin Perret

[permalink] [raw]

Subject: Re: [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers

On Thursday 15 Apr 2021 at 19:50:31 (+0800), Yanan Wang wrote:
> In this patch, we move invalidation of I-cache to the fault handlers to

Nit: please avoid using 'This patch' in commit messages, see
Documentation/process/submitting-patches.rst.

> avoid unnecessary I-cache maintenances. On the map path, invalidate the
> I-cache if we are going to create an executable stage-2 mapping for guest.
> And on the permission path, invalidate the I-cache if we are going to add
> an executable permission to the existing guest stage-2 mapping.
>
> Signed-off-by: Yanan Wang <[email protected]>
> ---
> arch/arm64/include/asm/kvm_mmu.h | 15 --------------
> arch/arm64/kvm/hyp/pgtable.c | 35 +++++++++++++++++++++++++++++++-
> arch/arm64/kvm/mmu.c | 9 +-------
> 3 files changed, 35 insertions(+), 24 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index e9b163c5f023..155492fe5b15 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
> return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
> }
>
> -static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
> - unsigned long size)
> -{
> - if (icache_is_aliasing()) {
> - /* any kind of VIPT cache */
> - __flush_icache_all();
> - } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> - /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
> - void *va = page_address(pfn_to_page(pfn));
> -
> - invalidate_icache_range((unsigned long)va,
> - (unsigned long)va + size);
> - }
> -}
> -
> void kvm_set_way_flush(struct kvm_vcpu *vcpu);
> void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b480f6d1171e..9f4429d80df0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
> return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
> }
>
> +static bool stage2_pte_executable(kvm_pte_t pte)
> +{
> + return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
> +}
> +
> +static void stage2_invalidate_icache(void *addr, u64 size)
> +{
> + if (icache_is_aliasing()) {
> + /* Any kind of VIPT cache */
> + __flush_icache_all();
> + } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {

> + /*
> + * See comment in __kvm_tlb_flush_vmid_ipa().
> + * Invalidate PIPT, or VPIPT at EL2.
> + */
> + invalidate_icache_range((unsigned long)addr,
> + (unsigned long)addr + size);
> + }
> +}
> +
> static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
> u32 level, struct kvm_pgtable_mm_ops *mm_ops)
> {
> @@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
> __flush_dcache_area(mm_ops->phys_to_virt(phys),
> granule);
> +
> + if (stage2_pte_executable(new))
> + stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
> + granule);
> }
>
> smp_store_release(ptep, new);
> @@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> * but worst-case the access flag update gets lost and will be
> * set on the next access instead.
> */
> - if (data->pte != pte)
> + if (data->pte != pte) {
> + /*
> + * Invalidate the instruction cache before updating
> + * if we are going to add the executable permission
> + * for the guest stage-2 PTE.
> + */
> + if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
> + stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
> + kvm_granule_size(level));
> WRITE_ONCE(*ptep, pte);
> + }

As for the dcache stuff, it seems like this would be best placed in an
optional mm_ops callback, and have the kernel implement it.

Thanks,
Quentin

2021-06-02 11:08:49

by Quentin Perret

[permalink] [raw]

Subject: Re: [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely

On Thursday 15 Apr 2021 at 19:50:32 (+0800), Yanan Wang wrote:
> With a guest translation fault, the memcache pages are not needed if KVM
> is only about to install a new leaf entry into the existing page table.
> And with a guest permission fault, the memcache pages are also not needed
> for a write_fault in dirty-logging time if KVM is only about to update
> the existing leaf entry instead of collapsing a block entry into a table.
>
> By comparing fault_granule and vma_pagesize, cases that require allocations
> from memcache and cases that don't can be distinguished completely.
>
> Signed-off-by: Yanan Wang <[email protected]>
> ---
> arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
> 1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index aa536392b308..9e35aa5d29f2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> gfn = fault_ipa >> PAGE_SHIFT;
> mmap_read_unlock(current->mm);
>
> - /*
> - * Permission faults just need to update the existing leaf entry,
> - * and so normally don't require allocations from the memcache. The
> - * only exception to this is when dirty logging is enabled at runtime
> - * and a write fault needs to collapse a block entry into a table.
> - */
> - if (fault_status != FSC_PERM || (logging_active && write_fault)) {
> - ret = kvm_mmu_topup_memory_cache(memcache,
> - kvm_mmu_cache_min_pages(kvm));
> - if (ret)
> - return ret;
> - }
> -
> mmu_seq = vcpu->kvm->mmu_notifier_seq;
> /*
> * Ensure the read of mmu_notifier_seq happens before we call
> @@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> prot |= KVM_PGTABLE_PROT_X;
>
> + /*
> + * Allocations from the memcache are required only when granule of the
> + * lookup level where the guest fault happened exceeds vma_pagesize,
> + * which means new page tables will be created in the fault handlers.
> + */
> + if (fault_granule > vma_pagesize) {
> + ret = kvm_mmu_topup_memory_cache(memcache,
> + kvm_mmu_cache_min_pages(kvm));
> + if (ret)
> + return ret;
> + }

You're now doing the top-up in the kvm->mmu_lock critical section. Isn't
this more or less what we try to avoid by using a memory cache?

Thanks,
Quentin

2021-06-03 12:38:32

by Yanan Wang

[permalink] [raw]

Subject: Re: [PATCH v5 4/6] KVM: arm64: Provide invalidate_icache_range at non-VHE EL2

Hi Marc,

On 2021/6/2 18:22, Marc Zyngier wrote:
> On Thu, 15 Apr 2021 12:50:30 +0100,
> Yanan Wang <[email protected]> wrote:
>> We want to move I-cache maintenance for the guest to the stage-2
>> page table code for performance improvement. Before it can work,
>> we should first make function invalidate_icache_range available
>> to non-VHE EL2 to avoid compiling or program running error, as
>> pgtable.c is now linked into the non-VHE EL2 code for pKVM mode.
>>
>> In this patch, we only introduce symbol of invalidate_icache_range
>> with no real functionality in nvhe/cache.S, because there haven't
>> been situations found currently where I-cache maintenance is also
>> needed in non-VHE EL2 for pKVM mode.
>>
>> Signed-off-by: Yanan Wang <[email protected]>
>> ---
>> arch/arm64/kvm/hyp/nvhe/cache.S | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
>> index 36cef6915428..a125ec9aeed2 100644
>> --- a/arch/arm64/kvm/hyp/nvhe/cache.S
>> +++ b/arch/arm64/kvm/hyp/nvhe/cache.S
>> @@ -11,3 +11,14 @@ SYM_FUNC_START_PI(__flush_dcache_area)
>> dcache_by_line_op civac, sy, x0, x1, x2, x3
>> ret
>> SYM_FUNC_END_PI(__flush_dcache_area)
>> +
>> +/*
>> + * invalidate_icache_range(start,end)
>> + *
>> + * Ensure that the I cache is invalid within specified region.
>> + *
>> + * - start - virtual start address of region
>> + * - end - virtual end address of region
>> + */
>> +SYM_FUNC_START(invalidate_icache_range)
>> +SYM_FUNC_END(invalidate_icache_range)
> This is a good indication that something is really wrong.
>
> If you were to provide cache management callbacks as part of the
> mm_ops themselves (or a similar abstraction), you wouldn't have to do
> these things.
Yes, we definitely don't need this work if we do that.

Thanks,
Yanan
>
> M.
>

2021-06-03 12:39:43

by Yanan Wang

[permalink] [raw]

Subject: Re: [PATCH v5 5/6] KVM: arm64: Move I-cache flush to the fault handlers

Hi Quentin,

On 2021/6/2 18:58, Quentin Perret wrote:
> On Thursday 15 Apr 2021 at 19:50:31 (+0800), Yanan Wang wrote:
>> In this patch, we move invalidation of I-cache to the fault handlers to
> Nit: please avoid using 'This patch' in commit messages, see
> Documentation/process/submitting-patches.rst.
Thanks!
I will get rid of this.
>> avoid unnecessary I-cache maintenances. On the map path, invalidate the
>> I-cache if we are going to create an executable stage-2 mapping for guest.
>> And on the permission path, invalidate the I-cache if we are going to add
>> an executable permission to the existing guest stage-2 mapping.
>>
>> Signed-off-by: Yanan Wang <[email protected]>
>> ---
>> arch/arm64/include/asm/kvm_mmu.h | 15 --------------
>> arch/arm64/kvm/hyp/pgtable.c | 35 +++++++++++++++++++++++++++++++-
>> arch/arm64/kvm/mmu.c | 9 +-------
>> 3 files changed, 35 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
>> index e9b163c5f023..155492fe5b15 100644
>> --- a/arch/arm64/include/asm/kvm_mmu.h
>> +++ b/arch/arm64/include/asm/kvm_mmu.h
>> @@ -187,21 +187,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
>> return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
>> }
>>
>> -static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
>> - unsigned long size)
>> -{
>> - if (icache_is_aliasing()) {
>> - /* any kind of VIPT cache */
>> - __flush_icache_all();
>> - } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>> - /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
>> - void *va = page_address(pfn_to_page(pfn));
>> -
>> - invalidate_icache_range((unsigned long)va,
>> - (unsigned long)va + size);
>> - }
>> -}
>> -
>> void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>> void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index b480f6d1171e..9f4429d80df0 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -568,6 +568,26 @@ static bool stage2_pte_cacheable(struct kvm_pgtable *pgt, kvm_pte_t pte)
>> return memattr == KVM_S2_MEMATTR(pgt, NORMAL);
>> }
>>
>> +static bool stage2_pte_executable(kvm_pte_t pte)
>> +{
>> + return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
>> +}
>> +
>> +static void stage2_invalidate_icache(void *addr, u64 size)
>> +{
>> + if (icache_is_aliasing()) {
>> + /* Any kind of VIPT cache */
>> + __flush_icache_all();
>> + } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>
>> + /*
>> + * See comment in __kvm_tlb_flush_vmid_ipa().
>> + * Invalidate PIPT, or VPIPT at EL2.
>> + */
>> + invalidate_icache_range((unsigned long)addr,
>> + (unsigned long)addr + size);
>> + }
>> +}
>> +
>> static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
>> u32 level, struct kvm_pgtable_mm_ops *mm_ops)
>> {
>> @@ -618,6 +638,10 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>> if (stage2_pte_cacheable(pgt, new) && !stage2_has_fwb(pgt))
>> __flush_dcache_area(mm_ops->phys_to_virt(phys),
>> granule);
>> +
>> + if (stage2_pte_executable(new))
>> + stage2_invalidate_icache(mm_ops->phys_to_virt(phys),
>> + granule);
>> }
>>
>> smp_store_release(ptep, new);
>> @@ -896,8 +920,17 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> * but worst-case the access flag update gets lost and will be
>> * set on the next access instead.
>> */
>> - if (data->pte != pte)
>> + if (data->pte != pte) {
>> + /*
>> + * Invalidate the instruction cache before updating
>> + * if we are going to add the executable permission
>> + * for the guest stage-2 PTE.
>> + */
>> + if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
>> + stage2_invalidate_icache(kvm_pte_follow(pte, data->mm_ops),
>> + kvm_granule_size(level));
>> WRITE_ONCE(*ptep, pte);
>> + }
> As for the dcache stuff, it seems like this would be best placed in an
> optional mm_ops callback, and have the kernel implement it.
I think so, that is the preferred way.

Thanks,
Yanan
> Thanks,
> Quentin
> .

2021-06-03 12:56:34

by Yanan Wang

[permalink] [raw]

Subject: Re: [PATCH v5 6/6] KVM: arm64: Distinguish cases of memcache allocations completely

Hi Quentin,

On 2021/6/2 19:07, Quentin Perret wrote:
> On Thursday 15 Apr 2021 at 19:50:32 (+0800), Yanan Wang wrote:
>> With a guest translation fault, the memcache pages are not needed if KVM
>> is only about to install a new leaf entry into the existing page table.
>> And with a guest permission fault, the memcache pages are also not needed
>> for a write_fault in dirty-logging time if KVM is only about to update
>> the existing leaf entry instead of collapsing a block entry into a table.
>>
>> By comparing fault_granule and vma_pagesize, cases that require allocations
>> from memcache and cases that don't can be distinguished completely.
>>
>> Signed-off-by: Yanan Wang <[email protected]>
>> ---
>> arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
>> 1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index aa536392b308..9e35aa5d29f2 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -895,19 +895,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> gfn = fault_ipa >> PAGE_SHIFT;
>> mmap_read_unlock(current->mm);
>>
>> - /*
>> - * Permission faults just need to update the existing leaf entry,
>> - * and so normally don't require allocations from the memcache. The
>> - * only exception to this is when dirty logging is enabled at runtime
>> - * and a write fault needs to collapse a block entry into a table.
>> - */
>> - if (fault_status != FSC_PERM || (logging_active && write_fault)) {
>> - ret = kvm_mmu_topup_memory_cache(memcache,
>> - kvm_mmu_cache_min_pages(kvm));
>> - if (ret)
>> - return ret;
>> - }
>> -
>> mmu_seq = vcpu->kvm->mmu_notifier_seq;
>> /*
>> * Ensure the read of mmu_notifier_seq happens before we call
>> @@ -970,6 +957,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>> else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>> prot |= KVM_PGTABLE_PROT_X;
>>
>> + /*
>> + * Allocations from the memcache are required only when granule of the
>> + * lookup level where the guest fault happened exceeds vma_pagesize,
>> + * which means new page tables will be created in the fault handlers.
>> + */
>> + if (fault_granule > vma_pagesize) {
>> + ret = kvm_mmu_topup_memory_cache(memcache,
>> + kvm_mmu_cache_min_pages(kvm));
>> + if (ret)
>> + return ret;
>> + }
> You're now doing the top-up in the kvm->mmu_lock critical section. Isn't
> this more or less what we try to avoid by using a memory cache?
Oh, right!

This patch intended to clean up the code and avoid the unnecessary top-ups,
but it's a bad idea to do the top-up when holding mmu_lock. I will rearrange
this part and keep it where it should be.

Thanks,
Yanan
> Thanks,
> Quentin
> .