LinuxLists.cc - [PATCH v4 0/2] KVM: arm64: Improve efficiency of stage2 page table

2021-04-09 03:39:21

Subject: [PATCH v4 0/2] KVM: arm64: Improve efficiency of stage2 page table

Hi,

This series makes some efficiency improvement of stage2 page table code,
and there are some test results to quantify the benefit of each patch.

Changelogs:
v3->v4:
- perform D-cache flush if we are not mapping device memory
- rebased on top of mainline v5.12-rc6
- v3: https://lore.kernel.org/lkml/[email protected]/

v2->v3:
- drop patch #3 in v2
- retest v3 based on v5.12-rc2
- v2: https://lore.kernel.org/lkml/[email protected]/

v1->v2:
- rebased on top of mainline v5.12-rc2
- also move CMOs of I-cache to the fault handlers
- retest v2 based on v5.12-rc2
- v1: https://lore.kernel.org/lkml/[email protected]/

About this v4 series:
Patch #1:
We currently uniformly permorm CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the page table code.

So let's move both clean of D-cache and invalidation of I-cache to the
map path and move only invalidation of I-cache to the permission path.
Since the original APIs for CMOs in mmu.c are only called in function
user_mem_abort, we now also move them to pgtable.c.

After this patch, in function stage2_map_walker_try_leaf (map path),
we flush D-cache if we are not mapping device memory and invalidate
I-cache if we are adding executable permission. And in the function
stage2_attr_walker (permission path), we invalidate I-cache if we are
adding executable permission. The logic is consistent with current
code in user_mem_abort (without this patch).

The following results represent the benefit of patch #1 alone, and they
were tested by [1] (kvm/selftest) that I have posted recently.
[1] https://lore.kernel.org/lkml/[email protected]/

When there are muitiple vcpus concurrently accessing the same memory region,
we can test the execution time of KVM creating new mappings, updating the
permissions of old mappings from RO to RW, and rebuilding the blocks after
they have been split.

hardware platform: HiSilicon Kunpeng920 Server
host kernel: Linux mainline v5.12-rc2

cmdline: ./kvm_page_table_test -m 4 -s anonymous -b 1G -v 80
(80 vcpus, 1G memory, page mappings(normal 4K))
KVM_CREATE_MAPPINGS: before 104.35s -> after 90.42s +13.35%
KVM_UPDATE_MAPPINGS: before 78.64s -> after 75.45s + 4.06%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_thp -b 20G -v 40
(40 vcpus, 20G memory, block mappings(THP 2M))
KVM_CREATE_MAPPINGS: before 15.66s -> after 6.92s +55.80%
KVM_UPDATE_MAPPINGS: before 178.80s -> after 123.35s +31.00%
KVM_REBUILD_BLOCKS: before 187.34s -> after 131.76s +30.65%

cmdline: ./kvm_page_table_test -m 4 -s anonymous_hugetlb_1gb -b 20G -v 40
(40 vcpus, 20G memory, block mappings(HUGETLB 1G))
KVM_CREATE_MAPPINGS: before 104.54s -> after 3.70s +96.46%
KVM_UPDATE_MAPPINGS: before 174.20s -> after 115.94s +33.44%
KVM_REBUILD_BLOCKS: before 103.95s -> after 2.96s +97.15%

Patch #2:
A new method to distinguish cases of memcache allocations is introduced.
By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.

Yanan Wang (2):
KVM: arm64: Move CMOs from user_mem_abort to the fault handlers
KVM: arm64: Distinguish cases of memcache allocations completely

arch/arm64/include/asm/kvm_mmu.h | 31 ---------------
arch/arm64/kvm/hyp/pgtable.c | 68 +++++++++++++++++++++++++-------
arch/arm64/kvm/mmu.c | 48 ++++++++--------------
3 files changed, 69 insertions(+), 78 deletions(-)

--
2.19.1

2021-04-09 03:39:53

by Yanan Wang

[permalink] [raw]

Subject: [PATCH v4 1/2] KVM: arm64: Move CMOs from user_mem_abort to the fault handlers

We currently uniformly permorm CMOs of D-cache and I-cache in function
user_mem_abort before calling the fault handlers. If we get concurrent
guest faults(e.g. translation faults, permission faults) or some really
unnecessary guest faults caused by BBM, CMOs for the first vcpu are
necessary while the others later are not.

By moving CMOs to the fault handlers, we can easily identify conditions
where they are really needed and avoid the unnecessary ones. As it's a
time consuming process to perform CMOs especially when flushing a block
range, so this solution reduces much load of kvm and improve efficiency
of the page table code.

So let's move both clean of D-cache and invalidation of I-cache to the
map path and move only invalidation of I-cache to the permission path.
Since the original APIs for CMOs in mmu.c are only called in function
user_mem_abort, we now also move them to pgtable.c.

Signed-off-by: Yanan Wang <[email protected]>
---
arch/arm64/include/asm/kvm_mmu.h | 31 ---------------
arch/arm64/kvm/hyp/pgtable.c | 68 +++++++++++++++++++++++++-------
arch/arm64/kvm/mmu.c | 23 ++---------
3 files changed, 57 insertions(+), 65 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 90873851f677..c31f88306d4e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -177,37 +177,6 @@ static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu)
return (vcpu_read_sys_reg(vcpu, SCTLR_EL1) & 0b101) == 0b101;
}

-static inline void __clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- void *va = page_address(pfn_to_page(pfn));
-
- /*
- * With FWB, we ensure that the guest always accesses memory using
- * cacheable attributes, and we don't have to clean to PoC when
- * faulting in pages. Furthermore, FWB implies IDC, so cleaning to
- * PoU is not required either in this case.
- */
- if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
- return;
-
- kvm_flush_dcache_to_poc(va, size);
-}
-
-static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
- unsigned long size)
-{
- if (icache_is_aliasing()) {
- /* any kind of VIPT cache */
- __flush_icache_all();
- } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
- /* PIPT or VPIPT at EL2 (see comment in __kvm_tlb_flush_vmid_ipa) */
- void *va = page_address(pfn_to_page(pfn));
-
- invalidate_icache_range((unsigned long)va,
- (unsigned long)va + size);
- }
-}
-
void kvm_set_way_flush(struct kvm_vcpu *vcpu);
void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 4d177ce1d536..0e811c86fd06 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -464,6 +464,43 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
return 0;
}

+static bool stage2_pte_cacheable(kvm_pte_t pte)
+{
+ u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
+ return memattr == PAGE_S2_MEMATTR(NORMAL);
+}
+
+static bool stage2_pte_executable(kvm_pte_t pte)
+{
+ return !(pte & KVM_PTE_LEAF_ATTR_HI_S2_XN);
+}
+
+static void stage2_flush_dcache(void *addr, u64 size)
+{
+ /*
+ * With FWB, we ensure that the guest always accesses memory using
+ * cacheable attributes, and we don't have to clean to PoC when
+ * faulting in pages. Furthermore, FWB implies IDC, so cleaning to
+ * PoU is not required either in this case.
+ */
+ if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+ return;
+
+ __flush_dcache_area(addr, size);
+}
+
+static void stage2_invalidate_icache(void *addr, u64 size)
+{
+ if (icache_is_aliasing()) {
+ /* Flush any kind of VIPT icache */
+ __flush_icache_all();
+ } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
+ /* PIPT or VPIPT at EL2 */
+ invalidate_icache_range((unsigned long)addr,
+ (unsigned long)addr + size);
+ }
+}
+
static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
kvm_pte_t *ptep,
struct stage2_map_data *data)
@@ -495,6 +532,13 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
put_page(page);
}

+ /* Perform CMOs before installation of the new PTE */
+ if (stage2_pte_cacheable(new))
+ stage2_flush_dcache(__va(phys), granule);
+
+ if (stage2_pte_executable(new))
+ stage2_invalidate_icache(__va(phys), granule);
+
smp_store_release(ptep, new);
get_page(page);
data->phys += granule;
@@ -651,20 +695,6 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
return ret;
}

-static void stage2_flush_dcache(void *addr, u64 size)
-{
- if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
- return;
-
- __flush_dcache_area(addr, size);
-}
-
-static bool stage2_pte_cacheable(kvm_pte_t pte)
-{
- u64 memattr = pte & KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR;
- return memattr == PAGE_S2_MEMATTR(NORMAL);
-}
-
static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
enum kvm_pgtable_walk_flags flag,
void * const arg)
@@ -743,8 +773,16 @@ static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
* but worst-case the access flag update gets lost and will be
* set on the next access instead.
*/
- if (data->pte != pte)
+ if (data->pte != pte) {
+ /*
+ * Invalidate the instruction cache before updating
+ * if we are going to add the executable permission.
+ */
+ if (!stage2_pte_executable(*ptep) && stage2_pte_executable(pte))
+ stage2_invalidate_icache(kvm_pte_follow(pte),
+ kvm_granule_size(level));
WRITE_ONCE(*ptep, pte);
+ }

return 0;
}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 77cb2d28f2a4..1eec9f63bc6f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -609,16 +609,6 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask);
}

-static void clean_dcache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- __clean_dcache_guest_page(pfn, size);
-}
-
-static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size)
-{
- __invalidate_icache_guest_page(pfn, size);
-}
-
static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
{
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current);
@@ -882,13 +872,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (writable)
prot |= KVM_PGTABLE_PROT_W;

- if (fault_status != FSC_PERM && !device)
- clean_dcache_guest_page(pfn, vma_pagesize);
-
- if (exec_fault) {
+ if (exec_fault)
prot |= KVM_PGTABLE_PROT_X;
- invalidate_icache_guest_page(pfn, vma_pagesize);
- }

if (device)
prot |= KVM_PGTABLE_PROT_DEVICE;
@@ -1144,10 +1129,10 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
trace_kvm_set_spte_hva(hva);

/*
- * We've moved a page around, probably through CoW, so let's treat it
- * just like a translation fault and clean the cache to the PoC.
+ * We've moved a page around, probably through CoW, so let's treat
+ * it just like a translation fault and the map handler will clean
+ * the cache to the PoC.
*/
- clean_dcache_guest_page(pfn, PAGE_SIZE);
handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
return 0;
}
--
2.19.1

2021-04-09 08:11:06

by Quentin Perret

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] KVM: arm64: Move CMOs from user_mem_abort to the fault handlers

Hi Yanan,

On Friday 09 Apr 2021 at 11:36:51 (+0800), Yanan Wang wrote:
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> +static void stage2_invalidate_icache(void *addr, u64 size)
> +{
> + if (icache_is_aliasing()) {
> + /* Flush any kind of VIPT icache */
> + __flush_icache_all();
> + } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> + /* PIPT or VPIPT at EL2 */
> + invalidate_icache_range((unsigned long)addr,
> + (unsigned long)addr + size);
> + }
> +}
> +

I would recommend to try and rebase this patch on kvmarm/next because
we've made a few changes in pgtable.c recently. It is now linked into
the EL2 NVHE code which means there are constraints on what can be used
from there -- you'll need a bit of extra work to make some of these
functions available to EL2.

Thanks,
Quentin

2021-04-09 09:00:55

by Yanan Wang

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] KVM: arm64: Move CMOs from user_mem_abort to the fault handlers

Hi Quentin,

On 2021/4/9 16:08, Quentin Perret wrote:
> Hi Yanan,
>
> On Friday 09 Apr 2021 at 11:36:51 (+0800), Yanan Wang wrote:
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> +static void stage2_invalidate_icache(void *addr, u64 size)
>> +{
>> + if (icache_is_aliasing()) {
>> + /* Flush any kind of VIPT icache */
>> + __flush_icache_all();
>> + } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
>> + /* PIPT or VPIPT at EL2 */
>> + invalidate_icache_range((unsigned long)addr,
>> + (unsigned long)addr + size);
>> + }
>> +}
>> +
> I would recommend to try and rebase this patch on kvmarm/next because
> we've made a few changes in pgtable.c recently. It is now linked into
> the EL2 NVHE code which means there are constraints on what can be used
> from there -- you'll need a bit of extra work to make some of these
> functions available to EL2.
I see, thanks for reminding me this.
I will work on kvmarm/next and send a new version later.

Thanks,
Yanan
>
> Thanks,
> Quentin
> .

2021-04-09 09:02:29

by Marc Zyngier

[permalink] [raw]

Subject: Re: [PATCH v4 1/2] KVM: arm64: Move CMOs from user_mem_abort to the fault handlers

On Fri, 09 Apr 2021 09:08:11 +0100,
Quentin Perret <[email protected]> wrote:
>
> Hi Yanan,
>
> On Friday 09 Apr 2021 at 11:36:51 (+0800), Yanan Wang wrote:
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > +static void stage2_invalidate_icache(void *addr, u64 size)
> > +{
> > + if (icache_is_aliasing()) {
> > + /* Flush any kind of VIPT icache */
> > + __flush_icache_all();
> > + } else if (is_kernel_in_hyp_mode() || !icache_is_vpipt()) {
> > + /* PIPT or VPIPT at EL2 */
> > + invalidate_icache_range((unsigned long)addr,
> > + (unsigned long)addr + size);
> > + }
> > +}
> > +
>
> I would recommend to try and rebase this patch on kvmarm/next because
> we've made a few changes in pgtable.c recently. It is now linked into
> the EL2 NVHE code which means there are constraints on what can be used
> from there -- you'll need a bit of extra work to make some of these
> functions available to EL2.

That's an interesting point.

I wonder whether we are missing something on the i-side for VPITP +
host stage-2 due to switching HCR_EL2.VM. We haven't changed the VMID
(still 0), but I can't bring myself to be sure it doesn't affect the
icache in this case...

M.

--
Without deviation from the norm, progress is not possible.