2023-06-06 19:40:15

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

Patch-1 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-2,3 adds a range-based TLBI mechanism for KVM (VHE and nVHE).

Patch-4 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-5 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-6 operates on stage2_try_break_pte() to use the range based
TLBI instructions when collapsing a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-7 modifies the stage-2 unmap path in which, if the system supports
FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table.
walk. Instead it's done in one go after the entire walk is finished.

The series is based off of upstream v6.4-rc2, and applied David
Matlack's common API for TLB invalidations[1] on top.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified
kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages
backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described
above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz | ADJUST_MAPPINGS (s) | Unmap VM (s) |
| (GB) | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
| 1 | 3.44 | 2.97 | 0.007 | 0.005 |
| 2 | 5.56 | 5.63 | 0.010 | 0.006 |
| 4 | 11.03 | 10.44 | 0.015 | 0.008 |
| 8 | 24.54 | 19.00 | 0.024 | 0.011 |
| 16 | 40.16 | 36.83 | 0.041 | 0.018 |
| 32 | 75.76 | 73.84 | 0.074 | 0.029 |
| 64 | 151.58 | 152.62 | 0.148 | 0.050 |
| 128 | 330.42 | 306.86 | 0.280 | 0.090 |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs | ADJUST_MAPPINGS (s) |
| | Baseline | Baseline + series |
+--------+----------|-------------------+
| 1 | 138.69 | 135.58 |
| 2 | 138.77 | 137.54 |
| 4 | 162.57 | 135.82 |
| 8 | 154.92 | 143.67 |
| 16 | 122.02 | 118.86 |
| 32 | 119.99 | 118.81 |
| 64 | 190.70 | 169.36 |
| 128 | 330.42 | 306.86 |
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For
unmapping 2M hugepages, we see at least a 3x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz | Unmap VM (s) |
| (GB) | Baseline | Baseline + series |
+--------+------------------------------+
| 1 | 0.52 | 0.13 |
| 2 | 1.03 | 0.25 |
| 4 | 2.04 | 0.47 |
| 8 | 4.05 | 0.94 |
| 16 | 8.11 | 1.82 |
| 32 | 16.11 | 3.69 |
| 64 | 32.35 | 7.22 |
| 128 | 64.66 | 14.69 |
+--------+----------+-------------------+

The series sees an average gain of 4x when the guest backed by
PAGE_SIZE (4K) pages.

v5:
Thank you, Marc and Oliver for the comments
- Introduced a helper, kvm_tlb_flush_vmid_range(), to handle
the decision of using range-based TLBI instructions or
invalidating the entire VMID, rather than depending on
__kvm_tlb_flush_vmid_range() for it.
- kvm_tlb_flush_vmid_range() splits the range-based invalidations
if the requested range exceeds MAX_TLBI_RANGE_PAGES.
- All the users in need of invalidating the TLB upon a range
now depends on kvm_tlb_flush_vmid_range() rather than directly
on __kvm_tlb_flush_vmid_range().
- stage2_unmap_defer_tlb_flush() introduces a WARN_ON() to
track if there's any change in TLBIRANGE or FWB support
during the unmap process as the features are based on
alternative patching and the TLBI operations solely depend
on this check.
- Corrected an incorrect hunk being present on v4's patch-3.
- Updated the patches changelog and code comments as per the
suggestions.

v4:
https://lore.kernel.org/all/[email protected]/
Thanks again, Oliver for all the comments
- Updated the __kvm_tlb_flush_vmid_range() implementation for
nVHE to adjust with the modfied __tlb_switch_to_guest() that
accepts a new 'bool nsh' arg.
- Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed
the 'skip_flush' argument.
- Defined stage2_unmap_defer_tlb_flush() to check if the PTE
flushes can be deferred during the unmap table walk. It's
being called from stage2_unmap_put_pte() and
kvm_pgtable_stage2_unmap().
- Got rid of the 'struct stage2_unmap_data'.

v3:
https://lore.kernel.org/all/[email protected]/
Thanks, Oliver for all the suggestions.
- The core flush API (__kvm_tlb_flush_vmid_range()) now checks if
the system support FEAT_TLBIRANGE or not, thus elimiating the
redundancy in the upper layers.
- If FEAT_TLBIRANGE is not supported, the implementation falls
back to invalidating all the TLB entries with the VMID, instead
of doing an iterative flush for the range.
- The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP
if the system doesn't implement FEAT_TLBIRANGE. It depends on
__kvm_tlb_flush_vmid_range() to do take care of the decisions
and return 0 regardless of the underlying feature support.
- __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to
calculate the 'stride'. Instead, it always assumes PAGE_SIZE.
- Fast unmap path is eliminated. Instead, the existing unmap walker
is modified to skip the TLBIs during the walk, and do it all at
once after the walk, using the range-based instructions.

v2:
https://lore.kernel.org/all/[email protected]/
- Rebased the series on top of David Matlack's series for common
TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1:
https://lore.kernel.org/all/[email protected]/

Thank you.
Raghavendra

[1]:
https://lore.kernel.org/linux-arm-kernel/[email protected]/

Raghavendra Rao Ananta (7):
arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
KVM: arm64: Define kvm_tlb_flush_vmid_range()
KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
KVM: arm64: Flush only the memslot after write-protect
KVM: arm64: Invalidate the table entries upon a range
KVM: arm64: Use TLBI range-based intructions for unmap

arch/arm64/include/asm/kvm_asm.h | 3 +
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/include/asm/kvm_pgtable.h | 10 +++
arch/arm64/include/asm/tlbflush.h | 108 ++++++++++++++-------------
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++
arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++
arch/arm64/kvm/hyp/pgtable.c | 90 +++++++++++++++++++---
arch/arm64/kvm/hyp/vhe/tlb.c | 28 +++++++
arch/arm64/kvm/mmu.c | 9 ++-
9 files changed, 228 insertions(+), 64 deletions(-)

--
2.41.0.rc0.172.g3f132b7071-goog



2023-06-06 19:41:35

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [PATCH v5 6/7] KVM: arm64: Invalidate the table entries upon a range

Currently, during the operations such as a hugepage collapse,
KVM would flush the entire VM's context using 'vmalls12e1is'
TLBI operation. Specifically, if the VM is faulting on many
hugepages (say after dirty-logging), it creates a performance
penalty for the guest whose pages have already been faulted
earlier as they would have to refill their TLBs again.

Instead, leverage kvm_tlb_flush_vmid_range() for table entries.
If the system supports it, only the required range will be
flushed. Else, it'll fallback to the previous mechanism.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index df8ac14d9d3d4..50ef7623c54db 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -766,7 +766,8 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
* value (if any).
*/
if (kvm_pte_table(ctx->old, ctx->level))
- kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+ kvm_tlb_flush_vmid_range(mmu, ctx->addr,
+ kvm_granule_size(ctx->level));
else if (kvm_pte_valid(ctx->old))
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);

--
2.41.0.rc0.172.g3f132b7071-goog


2023-06-14 13:12:05

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

Hi Raghavendra,

On Tue, Jun 06, 2023 at 07:28:51PM +0000, Raghavendra Rao Ananta wrote:
> The series is based off of upstream v6.4-rc2, and applied David
> Matlack's common API for TLB invalidations[1] on top.

Sorry I didn't spot the dependency earlier, but this isn't helpful TBH.

David's series was partially applied, and what remains no longer cleanly
applies to the base you suggest. Independent of that, my *strong*
preference is that you just send out a series containing your patches as
well as David's. Coordinating dependent efforts is the only sane thing
to do. Also, those patches are 5 months old at this point which is
ancient history.

> [1]:
> https://lore.kernel.org/linux-arm-kernel/[email protected]/

--
Thanks,
Oliver

2023-06-15 02:23:14

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

On Wed, Jun 14, 2023 at 5:19 AM Oliver Upton <[email protected]> wrote:
>
> Hi Raghavendra,
>
> On Tue, Jun 06, 2023 at 07:28:51PM +0000, Raghavendra Rao Ananta wrote:
> > The series is based off of upstream v6.4-rc2, and applied David
> > Matlack's common API for TLB invalidations[1] on top.
>
> Sorry I didn't spot the dependency earlier, but this isn't helpful TBH.
>
> David's series was partially applied, and what remains no longer cleanly
> applies to the base you suggest. Independent of that, my *strong*
> preference is that you just send out a series containing your patches as
> well as David's. Coordinating dependent efforts is the only sane thing
> to do. Also, those patches are 5 months old at this point which is
> ancient history.
>
Would you rather prefer I detach this series from David's as I'm not
sure what his plans are for future versions?
On the other hand, the patches seem simple enough to rebase and give
another shot at review, but may end up delaying this series.
WDYT?

Thank you.
Raghavendra

> > [1]:
> > https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> --
> Thanks,
> Oliver

2023-06-15 09:25:51

by Oliver Upton

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

+cc Sean

On Wed, Jun 14, 2023 at 06:57:01PM -0700, Raghavendra Rao Ananta wrote:
> On Wed, Jun 14, 2023 at 5:19 AM Oliver Upton <[email protected]> wrote:
> >
> > Hi Raghavendra,
> >
> > On Tue, Jun 06, 2023 at 07:28:51PM +0000, Raghavendra Rao Ananta wrote:
> > > The series is based off of upstream v6.4-rc2, and applied David
> > > Matlack's common API for TLB invalidations[1] on top.
> >
> > Sorry I didn't spot the dependency earlier, but this isn't helpful TBH.
> >
> > David's series was partially applied, and what remains no longer cleanly
> > applies to the base you suggest. Independent of that, my *strong*
> > preference is that you just send out a series containing your patches as
> > well as David's. Coordinating dependent efforts is the only sane thing
> > to do. Also, those patches are 5 months old at this point which is
> > ancient history.
> >
> Would you rather prefer I detach this series from David's as I'm not
> sure what his plans are for future versions?
> On the other hand, the patches seem simple enough to rebase and give
> another shot at review, but may end up delaying this series.
> WDYT?

In cases such as this you'd typically coordinate with the other
developer to pick up their changes as part of your series. Especially
for this case -- David's refactoring is _pointless_ without another
user for that code (i.e. arm64). As fun as it might be to antagonize
Sean, that series pokes x86 and I'd like an ack from on it.

So, please post a combined series that applies cleanly to an early 6.4
rc of your choosing, and cc all affected reviewers/maintainers.

--
Thanks,
Oliver

2023-06-15 14:33:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

On Thu, Jun 15, 2023, Oliver Upton wrote:
> +cc Sean
>
> On Wed, Jun 14, 2023 at 06:57:01PM -0700, Raghavendra Rao Ananta wrote:
> > On Wed, Jun 14, 2023 at 5:19 AM Oliver Upton <[email protected]> wrote:
> > >
> > > Hi Raghavendra,
> > >
> > > On Tue, Jun 06, 2023 at 07:28:51PM +0000, Raghavendra Rao Ananta wrote:
> > > > The series is based off of upstream v6.4-rc2, and applied David
> > > > Matlack's common API for TLB invalidations[1] on top.
> > >
> > > Sorry I didn't spot the dependency earlier, but this isn't helpful TBH.
> > >
> > > David's series was partially applied, and what remains no longer cleanly
> > > applies to the base you suggest. Independent of that, my *strong*
> > > preference is that you just send out a series containing your patches as
> > > well as David's. Coordinating dependent efforts is the only sane thing
> > > to do. Also, those patches are 5 months old at this point which is
> > > ancient history.
> > >
> > Would you rather prefer I detach this series from David's as I'm not
> > sure what his plans are for future versions?
> > On the other hand, the patches seem simple enough to rebase and give
> > another shot at review, but may end up delaying this series.
> > WDYT?
>
> In cases such as this you'd typically coordinate with the other
> developer to pick up their changes as part of your series. Especially
> for this case -- David's refactoring is _pointless_ without another
> user for that code (i.e. arm64). As fun as it might be to antagonize
> Sean, that series pokes x86 and I'd like an ack from on it.
>
> So, please post a combined series that applies cleanly to an early 6.4
> rc of your choosing, and cc all affected reviewers/maintainers.

+1

2023-06-15 17:43:23

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [PATCH v5 0/7] KVM: arm64: Add support for FEAT_TLBIRANGE

Allright, I'll resend the series along with David's patches.

Thank you.
Raghavendra

On Thu, Jun 15, 2023 at 7:14 AM Sean Christopherson <[email protected]> wrote:
>
> On Thu, Jun 15, 2023, Oliver Upton wrote:
> > +cc Sean
> >
> > On Wed, Jun 14, 2023 at 06:57:01PM -0700, Raghavendra Rao Ananta wrote:
> > > On Wed, Jun 14, 2023 at 5:19 AM Oliver Upton <[email protected]> wrote:
> > > >
> > > > Hi Raghavendra,
> > > >
> > > > On Tue, Jun 06, 2023 at 07:28:51PM +0000, Raghavendra Rao Ananta wrote:
> > > > > The series is based off of upstream v6.4-rc2, and applied David
> > > > > Matlack's common API for TLB invalidations[1] on top.
> > > >
> > > > Sorry I didn't spot the dependency earlier, but this isn't helpful TBH.
> > > >
> > > > David's series was partially applied, and what remains no longer cleanly
> > > > applies to the base you suggest. Independent of that, my *strong*
> > > > preference is that you just send out a series containing your patches as
> > > > well as David's. Coordinating dependent efforts is the only sane thing
> > > > to do. Also, those patches are 5 months old at this point which is
> > > > ancient history.
> > > >
> > > Would you rather prefer I detach this series from David's as I'm not
> > > sure what his plans are for future versions?
> > > On the other hand, the patches seem simple enough to rebase and give
> > > another shot at review, but may end up delaying this series.
> > > WDYT?
> >
> > In cases such as this you'd typically coordinate with the other
> > developer to pick up their changes as part of your series. Especially
> > for this case -- David's refactoring is _pointless_ without another
> > user for that code (i.e. arm64). As fun as it might be to antagonize
> > Sean, that series pokes x86 and I'd like an ack from on it.
> >
> > So, please post a combined series that applies cleanly to an early 6.4
> > rc of your choosing, and cc all affected reviewers/maintainers.
>
> +1