2023-06-21 17:53:29

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 00/11] KVM: arm64: Add support for FEAT_TLBIRANGE

In certain code paths, KVM/ARM currently invalidates the entire VM's
page-tables instead of just invalidating a necessary range. For example,
when collapsing a table PTE to a block PTE, instead of iterating over
each PTE and flushing them, KVM uses 'vmalls12e1is' TLBI operation to
flush all the entries. This is inefficient since the guest would have
to refill the TLBs again, even for the addresses that aren't covered
by the table entry. The performance impact would scale poorly if many
addresses in the VM is going through this remapping.

For architectures that implement FEAT_TLBIRANGE, KVM can replace such
inefficient paths by performing the invalidations only on the range of
addresses that are in scope. This series tries to achieve the same in
the areas of stage-2 map, unmap and write-protecting the pages.

As suggested by Oliver in the original v5 of the series [1], I'm
reposting the series by including v2 of David Matlack's 'KVM: Add a
common API for range-based TLB invalidation' series [2].

Patches 1-4 includes David M.'s patches 1, 2, 6, and 7 from [2].

Patch-5 refactors the core arm64's __flush_tlb_range() to be used by
other entities.

Patch-6,7 adds a range-based TLBI mechanism for KVM (VHE and nVHE).

Patch-8 implements the kvm_arch_flush_remote_tlbs_range() for arm64.

Patch-9 aims to flush only the memslot that undergoes a write-protect,
instead of the entire VM.

Patch-10 operates on stage2_try_break_pte() to use the range based
TLBI instructions when collapsing a table entry. The map path is the
immediate consumer of this when KVM remaps a table entry into a block.

Patch-11 modifies the stage-2 unmap path in which, if the system supports
FEAT_TLBIRANGE, the TLB invalidations are skipped during the page-table.
walk. Instead it's done in one go after the entire walk is finished.

The series is based off of upstream v6.4-rc2.

The performance evaluation was done on a hardware that supports
FEAT_TLBIRANGE, on a VHE configuration, using a modified
kvm_page_table_test.
The modified version updates the guest code in the ADJUST_MAPPINGS case
to not only access this page but also to access up to 512 pages
backwards
for every new page it iterates through. This is done to test the effect
of TLBI misses after KVM has handled a fault.

The series captures the impact in the map and unmap paths as described
above.

$ kvm_page_table_test -m 2 -v 128 -s anonymous_hugetlb_2mb -b $i

+--------+------------------------------+------------------------------+
| mem_sz | ADJUST_MAPPINGS (s) | Unmap VM (s) |
| (GB) | Baseline | Baseline + series | Baseline | Baseline + series |
+--------+----------|-------------------+------------------------------+
| 1 | 3.44 | 2.97 | 0.007 | 0.005 |
| 2 | 5.56 | 5.63 | 0.010 | 0.006 |
| 4 | 11.03 | 10.44 | 0.015 | 0.008 |
| 8 | 24.54 | 19.00 | 0.024 | 0.011 |
| 16 | 40.16 | 36.83 | 0.041 | 0.018 |
| 32 | 75.76 | 73.84 | 0.074 | 0.029 |
| 64 | 151.58 | 152.62 | 0.148 | 0.050 |
| 128 | 330.42 | 306.86 | 0.280 | 0.090 |
+--------+----------+-------------------+----------+-------------------+

$ kvm_page_table_test -m 2 -b 128G -s anonymous_hugetlb_2mb -v $i

+--------+------------------------------+
| vCPUs | ADJUST_MAPPINGS (s) |
| | Baseline | Baseline + series |
+--------+----------|-------------------+
| 1 | 138.69 | 135.58 |
| 2 | 138.77 | 137.54 |
| 4 | 162.57 | 135.82 |
| 8 | 154.92 | 143.67 |
| 16 | 122.02 | 118.86 |
| 32 | 119.99 | 118.81 |
| 64 | 190.70 | 169.36 |
| 128 | 330.42 | 306.86 |
+--------+----------+-------------------+

For the ADJUST_MAPPINGS cases, which maps back the 4K table entries to
2M hugepages, the series sees an average improvement of ~7%. For
unmapping 2M hugepages, we see at least a 3x improvement.

$ kvm_page_table_test -m 2 -b $i

+--------+------------------------------+
| mem_sz | Unmap VM (s) |
| (GB) | Baseline | Baseline + series |
+--------+------------------------------+
| 1 | 0.52 | 0.13 |
| 2 | 1.03 | 0.25 |
| 4 | 2.04 | 0.47 |
| 8 | 4.05 | 0.94 |
| 16 | 8.11 | 1.82 |
| 32 | 16.11 | 3.69 |
| 64 | 32.35 | 7.22 |
| 128 | 64.66 | 14.69 |
+--------+----------+-------------------+

The series sees an average gain of 4x when the guest backed by
PAGE_SIZE (4K) pages.

Other testing:
- Booted on x86_64 and ran KVM selftests.
- Build tested for MIPS and RISCV architectures.

Cc: David Matlack <[email protected]>

v5:
https://lore.kernel.org/all/[email protected]/
Thank you, Marc and Oliver for the comments
- Introduced a helper, kvm_tlb_flush_vmid_range(), to handle
the decision of using range-based TLBI instructions or
invalidating the entire VMID, rather than depending on
__kvm_tlb_flush_vmid_range() for it.
- kvm_tlb_flush_vmid_range() splits the range-based invalidations
if the requested range exceeds MAX_TLBI_RANGE_PAGES.
- All the users in need of invalidating the TLB upon a range
now depends on kvm_tlb_flush_vmid_range() rather than directly
on __kvm_tlb_flush_vmid_range().
- stage2_unmap_defer_tlb_flush() introduces a WARN_ON() to
track if there's any change in TLBIRANGE or FWB support
during the unmap process as the features are based on
alternative patching and the TLBI operations solely depend
on this check.
- Corrected an incorrect hunk being present on v4's patch-3.
- Updated the patches changelog and code comments as per the
suggestions.

v4:
https://lore.kernel.org/all/[email protected]/
Thanks again, Oliver for all the comments
- Updated the __kvm_tlb_flush_vmid_range() implementation for
nVHE to adjust with the modfied __tlb_switch_to_guest() that
accepts a new 'bool nsh' arg.
- Renamed stage2_put_pte() to stage2_unmap_put_pte() and removed
the 'skip_flush' argument.
- Defined stage2_unmap_defer_tlb_flush() to check if the PTE
flushes can be deferred during the unmap table walk. It's
being called from stage2_unmap_put_pte() and
kvm_pgtable_stage2_unmap().
- Got rid of the 'struct stage2_unmap_data'.

v3:
https://lore.kernel.org/all/[email protected]/
Thanks, Oliver for all the suggestions.
- The core flush API (__kvm_tlb_flush_vmid_range()) now checks if
the system support FEAT_TLBIRANGE or not, thus elimiating the
redundancy in the upper layers.
- If FEAT_TLBIRANGE is not supported, the implementation falls
back to invalidating all the TLB entries with the VMID, instead
of doing an iterative flush for the range.
- The kvm_arch_flush_remote_tlbs_range() doesn't return -EOPNOTSUPP
if the system doesn't implement FEAT_TLBIRANGE. It depends on
__kvm_tlb_flush_vmid_range() to do take care of the decisions
and return 0 regardless of the underlying feature support.
- __kvm_tlb_flush_vmid_range() doesn't take 'level' as input to
calculate the 'stride'. Instead, it always assumes PAGE_SIZE.
- Fast unmap path is eliminated. Instead, the existing unmap walker
is modified to skip the TLBIs during the walk, and do it all at
once after the walk, using the range-based instructions.

v2:
https://lore.kernel.org/all/[email protected]/
- Rebased the series on top of David Matlack's series for common
TLB invalidation API[1].
- Implement kvm_arch_flush_remote_tlbs_range() for arm64, by extending
the support introduced by [1].
- Use kvm_flush_remote_tlbs_memslot() introduced by [1] to flush
only the current memslot after write-protect.
- Modified the __kvm_tlb_flush_range() macro to accepts 'level' as an
argument to calculate the 'stride' instead of just using PAGE_SIZE.
- Split the patch that introduces the range-based TLBI to KVM and the
implementation of IPA-based invalidation into its own patches.
- Dropped the patch that tries to optimize the mmu notifiers paths.
- Rename the function kvm_table_pte_flush() to
kvm_pgtable_stage2_flush_range(), and accept the range of addresses to
flush. [Oliver]
- Drop the 'tlb_level' argument for stage2_try_break_pte() and directly
pass '0' as 'tlb_level' to kvm_pgtable_stage2_flush_range(). [Oliver]

v1:
https://lore.kernel.org/all/[email protected]/

Thank you.
Raghavendra

[1]: https://lore.kernel.org/all/[email protected]/
[2]:
https://lore.kernel.org/linux-arm-kernel/[email protected]/

David Matlack (4):
KVM: Rename kvm_arch_flush_remote_tlb() to
kvm_arch_flush_remote_tlbs()
KVM: arm64: Use kvm_arch_flush_remote_tlbs()
KVM: Allow range-based TLB invalidation from common code
KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code

Raghavendra Rao Ananta (7):
arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range
KVM: arm64: Implement __kvm_tlb_flush_vmid_range()
KVM: arm64: Define kvm_tlb_flush_vmid_range()
KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()
KVM: arm64: Flush only the memslot after write-protect
KVM: arm64: Invalidate the table entries upon a range
KVM: arm64: Use TLBI range-based intructions for unmap

arch/arm64/include/asm/kvm_asm.h | 3 +
arch/arm64/include/asm/kvm_host.h | 6 ++
arch/arm64/include/asm/kvm_pgtable.h | 10 +++
arch/arm64/include/asm/tlbflush.h | 108 ++++++++++++++-------------
arch/arm64/kvm/Kconfig | 1 -
arch/arm64/kvm/arm.c | 6 --
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++
arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++
arch/arm64/kvm/hyp/pgtable.c | 90 +++++++++++++++++++---
arch/arm64/kvm/hyp/vhe/tlb.c | 28 +++++++
arch/arm64/kvm/mmu.c | 15 +++-
arch/mips/include/asm/kvm_host.h | 4 +-
arch/mips/kvm/mips.c | 12 +--
arch/riscv/kvm/mmu.c | 6 --
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/mmu/mmu.c | 25 ++-----
arch/x86/kvm/mmu/mmu_internal.h | 3 -
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 20 +++--
virt/kvm/Kconfig | 3 -
virt/kvm/kvm_main.c | 35 +++++++--
21 files changed, 294 insertions(+), 131 deletions(-)

--
2.41.0.162.gfafddb0af9-goog



2023-06-21 17:53:31

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 03/11] KVM: Allow range-based TLB invalidation from common code

From: David Matlack <[email protected]>

Make kvm_flush_remote_tlbs_range() visible in common code and create a
default implementation that just invalidates the whole TLB.

This paves the way for several future features/cleanups:

- Introduction of range-based TLBI on ARM.
- Eliminating kvm_arch_flush_remote_tlbs_memslot()
- Moving the KVM/x86 TDP MMU to common code.

No functional change intended.

Signed-off-by: David Matlack <[email protected]>
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/mmu/mmu.c | 9 ++++-----
arch/x86/kvm/mmu/mmu_internal.h | 3 ---
include/linux/kvm_host.h | 9 +++++++++
virt/kvm/kvm_main.c | 13 +++++++++++++
5 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f6e3aa617d8b8..5c4dc547c030c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1804,6 +1804,9 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
return -ENOTSUPP;
}

+#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+
#define kvm_arch_pmi_in_guest(vcpu) \
((vcpu) && (vcpu)->arch.handling_intr_from_guest)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1c..32f7fb1b66c8d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -275,16 +275,15 @@ static inline bool kvm_available_flush_remote_tlbs_range(void)
return kvm_x86_ops.flush_remote_tlbs_range;
}

-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn,
- gfn_t nr_pages)
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
{
int ret = -EOPNOTSUPP;

if (kvm_x86_ops.flush_remote_tlbs_range)
ret = static_call(kvm_x86_flush_remote_tlbs_range)(kvm, start_gfn,
- nr_pages);
- if (ret)
- kvm_flush_remote_tlbs(kvm);
+ pages);
+
+ return ret;
}

static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index);
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index d39af5639ce97..86cb83bb34804 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -170,9 +170,6 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
struct kvm_memory_slot *slot, u64 gfn,
int min_level);

-void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn,
- gfn_t nr_pages);
-
/* Flush the given page (huge or not) of guest memory. */
static inline void kvm_flush_remote_tlbs_gfn(struct kvm *kvm, gfn_t gfn, int level)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 95c3e364f24b4..a054f48498a8f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1357,6 +1357,7 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target);
void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);

void kvm_flush_remote_tlbs(struct kvm *kvm);
+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages);

#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1484,6 +1485,14 @@ static inline int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
}
#endif

+#ifndef __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
+static inline int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm,
+ gfn_t gfn, u64 pages)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
#ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 600a985b86215..fc4ee20d33cc0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -371,6 +371,19 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);

+void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages)
+{
+ if (!kvm_arch_flush_remote_tlbs_range(kvm, gfn, pages))
+ return;
+
+ /*
+ * Fall back to a flushing entire TLBs if the architecture range-based
+ * TLB invalidation is unsupported or can't be performed for whatever
+ * reason.
+ */
+ kvm_flush_remote_tlbs(kvm);
+}
+
static void kvm_flush_shadow_all(struct kvm *kvm)
{
kvm_arch_flush_shadow_all(kvm);
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 17:54:45

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()

Implement the helper kvm_tlb_flush_vmid_range() that acts
as a wrapper for range-based TLB invalidations. For the
given VMID, use the range-based TLBI instructions to do
the job or fallback to invalidating all the TLB entries.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4cd6762bda805..1b12295a83595 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
* kvm_pgtable_prot format.
*/
enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
+
+/**
+ * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
+ *
+ * @mmu: Stage-2 KVM MMU struct
+ * @addr: The base Intermediate physical address from which to invalidate
+ * @size: Size of the range from the base to invalidate
+ */
+void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t addr, size_t size);
#endif /* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 3d61bd3e591d2..df8ac14d9d3d4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
}

+void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t addr, size_t size)
+{
+ unsigned long pages, inval_pages;
+
+ if (!system_supports_tlb_range()) {
+ kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+ return;
+ }
+
+ pages = size >> PAGE_SHIFT;
+ while (pages > 0) {
+ inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
+ kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
+
+ addr += inval_pages << PAGE_SHIFT;
+ pages -= inval_pages;
+ }
+}
+
#define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))

static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:01:05

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 09/11] KVM: arm64: Flush only the memslot after write-protect

After write-protecting the region, currently KVM invalidates
the entire TLB entries using kvm_flush_remote_tlbs(). Instead,
scope the invalidation only to the targeted memslot. If
supported, the architecture would use the range-based TLBI
instructions to flush the memslot or else fallback to flushing
all of the TLBs.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c3ec2141c3284..94f10e670c100 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -992,7 +992,7 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
write_lock(&kvm->mmu_lock);
stage2_wp_range(&kvm->arch.mmu, start, end);
write_unlock(&kvm->mmu_lock);
- kvm_flush_remote_tlbs(kvm);
+ kvm_flush_remote_tlbs_memslot(kvm, memslot);
}

/**
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:01:29

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 11/11] KVM: arm64: Use TLBI range-based intructions for unmap

The current implementation of the stage-2 unmap walker traverses
the given range and, as a part of break-before-make, performs
TLB invalidations with a DSB for every PTE. A multitude of this
combination could cause a performance bottleneck on some systems.

Hence, if the system supports FEAT_TLBIRANGE, defer the TLB
invalidations until the entire walk is finished, and then
use range-based instructions to invalidate the TLBs in one go.
Condition deferred TLB invalidation on the system supporting FWB,
as the optimization is entirely pointless when the unmap walker
needs to perform CMOs.

Rename stage2_put_pte() to stage2_unmap_put_pte() as the function
now serves the stage-2 unmap walker specifically, rather than
acting generic.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 67 +++++++++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 50ef7623c54db..1451f9011590d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -789,16 +789,54 @@ static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t n
smp_store_release(ctx->ptep, new);
}

-static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
- struct kvm_pgtable_mm_ops *mm_ops)
+struct stage2_unmap_data {
+ struct kvm_pgtable *pgt;
+ bool defer_tlb_flush_init;
+};
+
+static bool __stage2_unmap_defer_tlb_flush(struct kvm_pgtable *pgt)
+{
+ /*
+ * If FEAT_TLBIRANGE is implemented, defer the individual
+ * TLB invalidations until the entire walk is finished, and
+ * then use the range-based TLBI instructions to do the
+ * invalidations. Condition deferred TLB invalidation on the
+ * system supporting FWB, as the optimization is entirely
+ * pointless when the unmap walker needs to perform CMOs.
+ */
+ return system_supports_tlb_range() && stage2_has_fwb(pgt);
+}
+
+static bool stage2_unmap_defer_tlb_flush(struct stage2_unmap_data *unmap_data)
+{
+ bool defer_tlb_flush = __stage2_unmap_defer_tlb_flush(unmap_data->pgt);
+
+ /*
+ * Since __stage2_unmap_defer_tlb_flush() is based on alternative
+ * patching and the TLBIs' operations behavior depend on this,
+ * track if there's any change in the state during the unmap sequence.
+ */
+ WARN_ON(unmap_data->defer_tlb_flush_init != defer_tlb_flush);
+ return defer_tlb_flush;
+}
+
+static void stage2_unmap_put_pte(const struct kvm_pgtable_visit_ctx *ctx,
+ struct kvm_s2_mmu *mmu,
+ struct kvm_pgtable_mm_ops *mm_ops)
{
+ struct stage2_unmap_data *unmap_data = ctx->arg;
+
/*
- * Clear the existing PTE, and perform break-before-make with
- * TLB maintenance if it was valid.
+ * Clear the existing PTE, and perform break-before-make if it was
+ * valid. Depending on the system support, the TLB maintenance for
+ * the same can be deferred until the entire unmap is completed.
*/
if (kvm_pte_valid(ctx->old)) {
kvm_clear_pte(ctx->ptep);
- kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
+
+ if (!stage2_unmap_defer_tlb_flush(unmap_data))
+ kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu,
+ ctx->addr, ctx->level);
}

mm_ops->put_page(ctx->ptep);
@@ -1005,7 +1043,8 @@ int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
enum kvm_pgtable_walk_flags visit)
{
- struct kvm_pgtable *pgt = ctx->arg;
+ struct stage2_unmap_data *unmap_data = ctx->arg;
+ struct kvm_pgtable *pgt = unmap_data->pgt;
struct kvm_s2_mmu *mmu = pgt->mmu;
struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
kvm_pte_t *childp = NULL;
@@ -1033,7 +1072,7 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,
* block entry and rely on the remaining portions being faulted
* back lazily.
*/
- stage2_put_pte(ctx, mmu, mm_ops);
+ stage2_unmap_put_pte(ctx, mmu, mm_ops);

if (need_flush && mm_ops->dcache_clean_inval_poc)
mm_ops->dcache_clean_inval_poc(kvm_pte_follow(ctx->old, mm_ops),
@@ -1047,13 +1086,23 @@ static int stage2_unmap_walker(const struct kvm_pgtable_visit_ctx *ctx,

int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
{
+ int ret;
+ struct stage2_unmap_data unmap_data = {
+ .pgt = pgt,
+ .defer_tlb_flush_init = __stage2_unmap_defer_tlb_flush(pgt),
+ };
struct kvm_pgtable_walker walker = {
.cb = stage2_unmap_walker,
- .arg = pgt,
+ .arg = &unmap_data,
.flags = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
};

- return kvm_pgtable_walk(pgt, addr, size, &walker);
+ ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+ if (stage2_unmap_defer_tlb_flush(&unmap_data))
+ /* Perform the deferred TLB invalidations */
+ kvm_tlb_flush_vmid_range(pgt->mmu, addr, size);
+
+ return ret;
}

struct stage2_attr_data {
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:03:23

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 06/11] KVM: arm64: Implement __kvm_tlb_flush_vmid_range()

Define __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
to flush a range of stage-2 page-tables using IPA in one go.
If the system supports FEAT_TLBIRANGE, the following patches
would conviniently replace global TLBI such as vmalls12e1is
in the map, unmap, and dirty-logging paths with ripas2e1is
instead.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_asm.h | 3 +++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++++
arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++++++++++++++++++++++++
arch/arm64/kvm/hyp/vhe/tlb.c | 28 ++++++++++++++++++++++++++++
4 files changed, 72 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 43c3bc0f9544d..60ed0880cc9d6 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -69,6 +69,7 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
__KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
+ __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
__KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
__KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
__KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
@@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
int level);
+extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t start, unsigned long pages);
extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);

extern void __kvm_timer_set_cntvoff(u64 cntvoff);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 728e01d4536b0..a19a9299c8362 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
__kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
}

+static void
+handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
+ DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
+ DECLARE_REG(unsigned long, pages, host_ctxt, 3);
+
+ __kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, pages);
+}
+
static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
@@ -316,6 +326,7 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__kvm_flush_vm_context),
HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
HANDLE_FUNC(__kvm_tlb_flush_vmid),
+ HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
HANDLE_FUNC(__kvm_flush_cpu_context),
HANDLE_FUNC(__kvm_timer_set_cntvoff),
HANDLE_FUNC(__vgic_v3_read_vmcr),
diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
index 978179133f4b9..213b11952f641 100644
--- a/arch/arm64/kvm/hyp/nvhe/tlb.c
+++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
@@ -130,6 +130,36 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
__tlb_switch_to_host(&cxt);
}

+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t start, unsigned long pages)
+{
+ struct tlb_inv_context cxt;
+ unsigned long stride;
+
+ /*
+ * Since the range of addresses may not be mapped at
+ * the same level, assume the worst case as PAGE_SIZE
+ */
+ stride = PAGE_SIZE;
+ start = round_down(start, stride);
+
+ /* Switch to requested VMID */
+ __tlb_switch_to_guest(mmu, &cxt, false);
+
+ __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+ dsb(ish);
+ __tlbi(vmalle1is);
+ dsb(ish);
+ isb();
+
+ /* See the comment below in __kvm_tlb_flush_vmid_ipa() */
+ if (icache_is_vpipt())
+ icache_inval_all_pou();
+
+ __tlb_switch_to_host(&cxt);
+}
+
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
{
struct tlb_inv_context cxt;
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
index 24cef9b87f9e9..3ca3d38b7eb23 100644
--- a/arch/arm64/kvm/hyp/vhe/tlb.c
+++ b/arch/arm64/kvm/hyp/vhe/tlb.c
@@ -111,6 +111,34 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
__tlb_switch_to_host(&cxt);
}

+void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
+ phys_addr_t start, unsigned long pages)
+{
+ struct tlb_inv_context cxt;
+ unsigned long stride;
+
+ /*
+ * Since the range of addresses may not be mapped at
+ * the same level, assume the worst case as PAGE_SIZE
+ */
+ stride = PAGE_SIZE;
+ start = round_down(start, stride);
+
+ dsb(ishst);
+
+ /* Switch to requested VMID */
+ __tlb_switch_to_guest(mmu, &cxt);
+
+ __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
+
+ dsb(ish);
+ __tlbi(vmalle1is);
+ dsb(ish);
+ isb();
+
+ __tlb_switch_to_host(&cxt);
+}
+
void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
{
struct tlb_inv_context cxt;
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:04:42

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 04/11] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code

From: David Matlack <[email protected]>

Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
"arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
range-based TLB invalidation where the range is defined by the memslot.
Now that kvm_flush_remote_tlbs_range() can be called from common code we
can just use that and drop a bunch of duplicate code from the arch
directories.

Note this adds a lockdep assertion for slots_lock being held when
calling kvm_flush_remote_tlbs_memslot(), which was previously only
asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(),
but they all hold the slots_lock, so the lockdep assertion continues to
hold true.

Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating
kvm_flush_remote_tlbs_memslot(), since it is no longer necessary.

Signed-off-by: David Matlack <[email protected]>
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/arm.c | 6 ------
arch/mips/kvm/mips.c | 10 ++--------
arch/riscv/kvm/mmu.c | 6 ------
arch/x86/kvm/mmu/mmu.c | 16 +---------------
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 7 +++----
virt/kvm/kvm_main.c | 18 ++++++++++++++++--
7 files changed, 23 insertions(+), 42 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 14391826241c8..9a75c23351aca 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1459,12 +1459,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)

}

-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *memslot)
-{
- kvm_flush_remote_tlbs(kvm);
-}
-
static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
struct kvm_arm_device_addr *dev_addr)
{
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 6e34bbe244462..614cb7322b6d5 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -199,7 +199,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
/* Flush slot from GPA */
kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
slot->base_gfn + slot->npages - 1);
- kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+ kvm_flush_remote_tlbs_memslot(kvm, slot);
spin_unlock(&kvm->mmu_lock);
}

@@ -235,7 +235,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
new->base_gfn + new->npages - 1);
if (needs_flush)
- kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+ kvm_flush_remote_tlbs_memslot(kvm, new);
spin_unlock(&kvm->mmu_lock);
}
}
@@ -987,12 +987,6 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
return 1;
}

-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *memslot)
-{
- kvm_flush_remote_tlbs(kvm);
-}
-
int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
{
int r;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index f2eb47925806b..97e129620686c 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
}

-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *memslot)
-{
- kvm_flush_remote_tlbs(kvm);
-}
-
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
{
}
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 32f7fb1b66c8d..d8cfad72f2b1f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6654,7 +6654,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
*/
if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
- kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+ kvm_flush_remote_tlbs_memslot(kvm, slot);
}

void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
@@ -6673,20 +6673,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
}
}

-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *memslot)
-{
- /*
- * All current use cases for flushing the TLBs for a specific memslot
- * related to dirty logging, and many do the TLB flush out of mmu_lock.
- * The interaction between the various operations on memslot must be
- * serialized by slots_locks to ensure the TLB flush from one operation
- * is observed by any other operation on the same memslot.
- */
- lockdep_assert_held(&kvm->slots_lock);
- kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
-}
-
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot)
{
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ceb7c5e9cf9e9..2ea17b51037af 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12745,7 +12745,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
* See is_writable_pte() for more details (the case involving
* access-tracked SPTEs is particularly relevant).
*/
- kvm_arch_flush_remote_tlbs_memslot(kvm, new);
+ kvm_flush_remote_tlbs_memslot(kvm, new);
}
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a054f48498a8f..1eec1119cc4c9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1358,6 +1358,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);

void kvm_flush_remote_tlbs(struct kvm *kvm);
void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages);
+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot);

#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
@@ -1386,10 +1388,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
unsigned long mask);
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);

-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *memslot);
-#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
int *is_dirty, struct kvm_memory_slot **memslot);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fc4ee20d33cc0..92218edfc6b84 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -384,6 +384,20 @@ void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages)
kvm_flush_remote_tlbs(kvm);
}

+void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
+ const struct kvm_memory_slot *memslot)
+{
+ /*
+ * All current use cases for flushing the TLBs for a specific memslot
+ * related to dirty logging, and many do the TLB flush out of mmu_lock.
+ * The interaction between the various operations on memslot must be
+ * serialized by slots_locks to ensure the TLB flush from one operation
+ * is observed by any other operation on the same memslot.
+ */
+ lockdep_assert_held(&kvm->slots_lock);
+ kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
+}
+
static void kvm_flush_shadow_all(struct kvm *kvm)
{
kvm_arch_flush_shadow_all(kvm);
@@ -2191,7 +2205,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
}

if (flush)
- kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+ kvm_flush_remote_tlbs_memslot(kvm, memslot);

if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
return -EFAULT;
@@ -2308,7 +2322,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
KVM_MMU_UNLOCK(kvm);

if (flush)
- kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
+ kvm_flush_remote_tlbs_memslot(kvm, memslot);

return 0;
}
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:06:47

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 10/11] KVM: arm64: Invalidate the table entries upon a range

Currently, during the operations such as a hugepage collapse,
KVM would flush the entire VM's context using 'vmalls12e1is'
TLBI operation. Specifically, if the VM is faulting on many
hugepages (say after dirty-logging), it creates a performance
penalty for the guest whose pages have already been faulted
earlier as they would have to refill their TLBs again.

Instead, leverage kvm_tlb_flush_vmid_range() for table entries.
If the system supports it, only the required range will be
flushed. Else, it'll fallback to the previous mechanism.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/kvm/hyp/pgtable.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index df8ac14d9d3d4..50ef7623c54db 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -766,7 +766,8 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
* value (if any).
*/
if (kvm_pte_table(ctx->old, ctx->level))
- kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+ kvm_tlb_flush_vmid_range(mmu, ctx->addr,
+ kvm_granule_size(ctx->level));
else if (kvm_pte_valid(ctx->old))
kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);

--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:06:59

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 08/11] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()

Implement kvm_arch_flush_remote_tlbs_range() for arm64
to invalidate the given range in the TLB.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 3 +++
arch/arm64/kvm/mmu.c | 7 +++++++
2 files changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 81ab41b84f436..343fb530eea9c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
int kvm_arch_flush_remote_tlbs(struct kvm *kvm);

+#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
+
static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d0a0d3dca9316..c3ec2141c3284 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -92,6 +92,13 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
return 0;
}

+int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
+{
+ kvm_tlb_flush_vmid_range(&kvm->arch.mmu,
+ start_gfn << PAGE_SHIFT, pages << PAGE_SHIFT);
+ return 0;
+}
+
static bool kvm_is_device_pfn(unsigned long pfn)
{
return !pfn_is_map_memory(pfn);
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:08:01

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 02/11] KVM: arm64: Use kvm_arch_flush_remote_tlbs()

From: David Matlack <[email protected]>

Use kvm_arch_flush_remote_tlbs() instead of
CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL. The two mechanisms solve the same
problem, allowing architecture-specific code to provide a non-IPI
implementation of remote TLB flushing.

Dropping CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL allows KVM to standardize
all architectures on kvm_arch_flush_remote_tlbs() instead of maintaining
two mechanisms.

Opt to standardize on kvm_arch_flush_remote_tlbs() since it avoids
duplicating the generic TLB stats across architectures that implement
their own remote TLB flush.

This adds an extra function call to the ARM64 kvm_flush_remote_tlbs()
path, but that is a small cost in comparison to flushing remote TLBs.

No functional change intended.

Signed-off-by: David Matlack <[email protected]>
Signed-off-by: Raghavendra Rao Ananta <[email protected]>
Reviewed-by: Zenghui Yu <[email protected]>
Acked-by: Oliver Upton <[email protected]>
---
arch/arm64/include/asm/kvm_host.h | 3 +++
arch/arm64/kvm/Kconfig | 1 -
arch/arm64/kvm/mmu.c | 6 +++---
virt/kvm/Kconfig | 3 ---
virt/kvm/kvm_main.c | 2 --
5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7e7e19ef6993e..81ab41b84f436 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1078,6 +1078,9 @@ int __init kvm_set_ipa_limit(void);
#define __KVM_HAVE_ARCH_VM_ALLOC
struct kvm *kvm_arch_alloc_vm(void);

+#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
+int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
+
static inline bool kvm_vm_is_protected(struct kvm *kvm)
{
return false;
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index f531da6b362e9..6b730fcfee379 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -25,7 +25,6 @@ menuconfig KVM
select MMU_NOTIFIER
select PREEMPT_NOTIFIERS
select HAVE_KVM_CPU_RELAX_INTERCEPT
- select HAVE_KVM_ARCH_TLB_FLUSH_ALL
select KVM_MMIO
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select KVM_XFER_TO_GUEST_WORK
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3b9d4d24c361a..d0a0d3dca9316 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -81,15 +81,15 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
}

/**
- * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
+ * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries for v7/8
* @kvm: pointer to kvm structure.
*
* Interface to HYP function to flush all VM TLB entries
*/
-void kvm_flush_remote_tlbs(struct kvm *kvm)
+int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
{
- ++kvm->stat.generic.remote_tlb_flush_requests;
kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
+ return 0;
}

static bool kvm_is_device_pfn(unsigned long pfn)
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index b74916de5183a..484d0873061ca 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -62,9 +62,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
config KVM_VFIO
bool

-config HAVE_KVM_ARCH_TLB_FLUSH_ALL
- bool
-
config HAVE_KVM_INVALID_WAKEUPS
bool

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a475ff9ef156d..600a985b86215 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -350,7 +350,6 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
}
EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);

-#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
void kvm_flush_remote_tlbs(struct kvm *kvm)
{
++kvm->stat.generic.remote_tlb_flush_requests;
@@ -371,7 +370,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
++kvm->stat.generic.remote_tlb_flush;
}
EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
-#endif

static void kvm_flush_shadow_all(struct kvm *kvm)
{
--
2.41.0.162.gfafddb0af9-goog


2023-06-21 18:08:37

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: [RESEND PATCH v5 05/11] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range

Currently, the core TLB flush functionality of __flush_tlb_range()
hardcodes vae1is (and variants) for the flush operation. In the
upcoming patches, the KVM code reuses this core algorithm with
ipas2e1is for range based TLB invalidations based on the IPA.
Hence, extract the core flush functionality of __flush_tlb_range()
into its own macro that accepts an 'op' argument to pass any
TLBI operation, such that other callers (KVM) can benefit.

No functional changes intended.

Signed-off-by: Raghavendra Rao Ananta <[email protected]>
Reviewed-by: Catalin Marinas <[email protected]>
---
arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++---------------
1 file changed, 55 insertions(+), 53 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 412a3b9a3c25d..4775378b6da1b 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -278,14 +278,61 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
*/
#define MAX_TLBI_OPS PTRS_PER_PTE

+/* When the CPU does not support TLB range operations, flush the TLB
+ * entries one by one at the granularity of 'stride'. If the TLB
+ * range ops are supported, then:
+ *
+ * 1. If 'pages' is odd, flush the first page through non-range
+ * operations;
+ *
+ * 2. For remaining pages: the minimum range granularity is decided
+ * by 'scale', so multiple range TLBI operations may be required.
+ * Start from scale = 0, flush the corresponding number of pages
+ * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
+ * until no pages left.
+ *
+ * Note that certain ranges can be represented by either num = 31 and
+ * scale or num = 0 and scale + 1. The loop below favours the latter
+ * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
+ */
+#define __flush_tlb_range_op(op, start, pages, stride, \
+ asid, tlb_level, tlbi_user) do { \
+ int num = 0; \
+ int scale = 0; \
+ unsigned long addr; \
+ \
+ while (pages > 0) { \
+ if (!system_supports_tlb_range() || \
+ pages % 2 == 1) { \
+ addr = __TLBI_VADDR(start, asid); \
+ __tlbi_level(op, addr, tlb_level); \
+ if (tlbi_user) \
+ __tlbi_user_level(op, addr, tlb_level); \
+ start += stride; \
+ pages -= stride >> PAGE_SHIFT; \
+ continue; \
+ } \
+ \
+ num = __TLBI_RANGE_NUM(pages, scale); \
+ if (num >= 0) { \
+ addr = __TLBI_VADDR_RANGE(start, asid, scale, \
+ num, tlb_level); \
+ __tlbi(r##op, addr); \
+ if (tlbi_user) \
+ __tlbi_user(r##op, addr); \
+ start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
+ pages -= __TLBI_RANGE_PAGES(num, scale); \
+ } \
+ scale++; \
+ } \
+} while (0)
+
static inline void __flush_tlb_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
unsigned long stride, bool last_level,
int tlb_level)
{
- int num = 0;
- int scale = 0;
- unsigned long asid, addr, pages;
+ unsigned long asid, pages;

start = round_down(start, stride);
end = round_up(end, stride);
@@ -307,56 +354,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
dsb(ishst);
asid = ASID(vma->vm_mm);

- /*
- * When the CPU does not support TLB range operations, flush the TLB
- * entries one by one at the granularity of 'stride'. If the TLB
- * range ops are supported, then:
- *
- * 1. If 'pages' is odd, flush the first page through non-range
- * operations;
- *
- * 2. For remaining pages: the minimum range granularity is decided
- * by 'scale', so multiple range TLBI operations may be required.
- * Start from scale = 0, flush the corresponding number of pages
- * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
- * until no pages left.
- *
- * Note that certain ranges can be represented by either num = 31 and
- * scale or num = 0 and scale + 1. The loop below favours the latter
- * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
- */
- while (pages > 0) {
- if (!system_supports_tlb_range() ||
- pages % 2 == 1) {
- addr = __TLBI_VADDR(start, asid);
- if (last_level) {
- __tlbi_level(vale1is, addr, tlb_level);
- __tlbi_user_level(vale1is, addr, tlb_level);
- } else {
- __tlbi_level(vae1is, addr, tlb_level);
- __tlbi_user_level(vae1is, addr, tlb_level);
- }
- start += stride;
- pages -= stride >> PAGE_SHIFT;
- continue;
- }
-
- num = __TLBI_RANGE_NUM(pages, scale);
- if (num >= 0) {
- addr = __TLBI_VADDR_RANGE(start, asid, scale,
- num, tlb_level);
- if (last_level) {
- __tlbi(rvale1is, addr);
- __tlbi_user(rvale1is, addr);
- } else {
- __tlbi(rvae1is, addr);
- __tlbi_user(rvae1is, addr);
- }
- start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
- pages -= __TLBI_RANGE_PAGES(num, scale);
- }
- scale++;
- }
+ if (last_level)
+ __flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
+ else
+ __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
+
dsb(ish);
}

--
2.41.0.162.gfafddb0af9-goog


2023-07-04 23:53:02

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 02/11] KVM: arm64: Use kvm_arch_flush_remote_tlbs()

On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> From: David Matlack <[email protected]>
>
> Use kvm_arch_flush_remote_tlbs() instead of
> CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL. The two mechanisms solve the same
> problem, allowing architecture-specific code to provide a non-IPI
> implementation of remote TLB flushing.
>
> Dropping CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL allows KVM to standardize
> all architectures on kvm_arch_flush_remote_tlbs() instead of maintaining
> two mechanisms.
>
> Opt to standardize on kvm_arch_flush_remote_tlbs() since it avoids
> duplicating the generic TLB stats across architectures that implement
> their own remote TLB flush.
>
> This adds an extra function call to the ARM64 kvm_flush_remote_tlbs()
> path, but that is a small cost in comparison to flushing remote TLBs.
>
> No functional change intended.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It's not true and please refer to the below explanation.

>
> Signed-off-by: David Matlack <[email protected]>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> Reviewed-by: Zenghui Yu <[email protected]>
> Acked-by: Oliver Upton <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 3 +++
> arch/arm64/kvm/Kconfig | 1 -
> arch/arm64/kvm/mmu.c | 6 +++---
> virt/kvm/Kconfig | 3 ---
> virt/kvm/kvm_main.c | 2 --
> 5 files changed, 6 insertions(+), 9 deletions(-)
>

The changes make sense and look good to me. However, there is a functional change because
the generic kvm_arch_flush_remote_tlbs() isn't exactly same to ARM64's variant. Strictly
speaking, they're not interchangeable. In the generic function, both statistics (
remote_tlb_flush_requests and remote_tlb_flush) are increased. Only the former statistic
is increased in ARM64's variant.

It looks correct to increase both statistics, but the commit log may need tweak to reflect
it. With this resolved:

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7e7e19ef6993e..81ab41b84f436 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1078,6 +1078,9 @@ int __init kvm_set_ipa_limit(void);
> #define __KVM_HAVE_ARCH_VM_ALLOC
> struct kvm *kvm_arch_alloc_vm(void);
>
> +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> +int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> +
> static inline bool kvm_vm_is_protected(struct kvm *kvm)
> {
> return false;
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index f531da6b362e9..6b730fcfee379 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -25,7 +25,6 @@ menuconfig KVM
> select MMU_NOTIFIER
> select PREEMPT_NOTIFIERS
> select HAVE_KVM_CPU_RELAX_INTERCEPT
> - select HAVE_KVM_ARCH_TLB_FLUSH_ALL
> select KVM_MMIO
> select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> select KVM_XFER_TO_GUEST_WORK
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 3b9d4d24c361a..d0a0d3dca9316 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -81,15 +81,15 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> }
>
> /**
> - * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> * @kvm: pointer to kvm structure.
> *
> * Interface to HYP function to flush all VM TLB entries
> */
> -void kvm_flush_remote_tlbs(struct kvm *kvm)
> +int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> {
> - ++kvm->stat.generic.remote_tlb_flush_requests;
> kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> + return 0;
> }
>
> static bool kvm_is_device_pfn(unsigned long pfn)
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index b74916de5183a..484d0873061ca 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -62,9 +62,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
> config KVM_VFIO
> bool
>
> -config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> - bool
> -
> config HAVE_KVM_INVALID_WAKEUPS
> bool
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a475ff9ef156d..600a985b86215 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -350,7 +350,6 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> }
> EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
>
> -#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> void kvm_flush_remote_tlbs(struct kvm *kvm)
> {
> ++kvm->stat.generic.remote_tlb_flush_requests;
> @@ -371,7 +370,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> ++kvm->stat.generic.remote_tlb_flush;
> }
> EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
> -#endif
>
> static void kvm_flush_shadow_all(struct kvm *kvm)
> {


2023-07-05 00:18:10

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 05/11] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range

On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> Currently, the core TLB flush functionality of __flush_tlb_range()
> hardcodes vae1is (and variants) for the flush operation. In the
> upcoming patches, the KVM code reuses this core algorithm with
> ipas2e1is for range based TLB invalidations based on the IPA.
> Hence, extract the core flush functionality of __flush_tlb_range()
> into its own macro that accepts an 'op' argument to pass any
> TLBI operation, such that other callers (KVM) can benefit.
>
> No functional changes intended.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> Reviewed-by: Catalin Marinas <[email protected]>
> ---
> arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++---------------
> 1 file changed, 55 insertions(+), 53 deletions(-)
>

With the following nits addressed:

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index 412a3b9a3c25d..4775378b6da1b 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -278,14 +278,61 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
> */
> #define MAX_TLBI_OPS PTRS_PER_PTE
>
> +/* When the CPU does not support TLB range operations, flush the TLB
> + * entries one by one at the granularity of 'stride'. If the TLB
> + * range ops are supported, then:
> + *
> + * 1. If 'pages' is odd, flush the first page through non-range
> + * operations;
> + *
> + * 2. For remaining pages: the minimum range granularity is decided
> + * by 'scale', so multiple range TLBI operations may be required.
> + * Start from scale = 0, flush the corresponding number of pages
> + * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> + * until no pages left.
> + *
> + * Note that certain ranges can be represented by either num = 31 and
> + * scale or num = 0 and scale + 1. The loop below favours the latter
> + * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> + */
> +#define __flush_tlb_range_op(op, start, pages, stride, \
> + asid, tlb_level, tlbi_user) do { \
> + int num = 0; \
> + int scale = 0; \
> + unsigned long addr; \
> + \
> + while (pages > 0) { \
> + if (!system_supports_tlb_range() || \
> + pages % 2 == 1) { \
> + addr = __TLBI_VADDR(start, asid); \
> + __tlbi_level(op, addr, tlb_level); \
> + if (tlbi_user) \
> + __tlbi_user_level(op, addr, tlb_level); \
> + start += stride; \
> + pages -= stride >> PAGE_SHIFT; \
> + continue; \
> + } \
> + \
> + num = __TLBI_RANGE_NUM(pages, scale); \
> + if (num >= 0) { \
> + addr = __TLBI_VADDR_RANGE(start, asid, scale, \
> + num, tlb_level); \
> + __tlbi(r##op, addr); \
> + if (tlbi_user) \
> + __tlbi_user(r##op, addr); \
> + start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
> + pages -= __TLBI_RANGE_PAGES(num, scale); \
> + } \
> + scale++; \
> + } \
> +} while (0)
> +

There is a warning reported from 'checkpatch.pl'.

WARNING: suspect code indent for conditional statements (32, 8)
#52: FILE: arch/arm64/include/asm/tlbflush.h:299:
+ asid, tlb_level, tlbi_user) do { \
[...]
+ unsigned long addr; \

total: 0 errors, 1 warnings, 125 lines checked

You probably need to tweak it as below, to avoid the warning.

#define __flush_tlb_range_op(op, start, pages, stride, \
asid, tlb_level, tlbi_user) \
do { \


> static inline void __flush_tlb_range(struct vm_area_struct *vma,
> unsigned long start, unsigned long end,
> unsigned long stride, bool last_level,
> int tlb_level)
> {
> - int num = 0;
> - int scale = 0;
> - unsigned long asid, addr, pages;
> + unsigned long asid, pages;
>
> start = round_down(start, stride);
> end = round_up(end, stride);
> @@ -307,56 +354,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
> dsb(ishst);
> asid = ASID(vma->vm_mm);
>
> - /*
> - * When the CPU does not support TLB range operations, flush the TLB
> - * entries one by one at the granularity of 'stride'. If the TLB
> - * range ops are supported, then:
> - *
> - * 1. If 'pages' is odd, flush the first page through non-range
> - * operations;
> - *
> - * 2. For remaining pages: the minimum range granularity is decided
> - * by 'scale', so multiple range TLBI operations may be required.
> - * Start from scale = 0, flush the corresponding number of pages
> - * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> - * until no pages left.
> - *
> - * Note that certain ranges can be represented by either num = 31 and
> - * scale or num = 0 and scale + 1. The loop below favours the latter
> - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> - */
> - while (pages > 0) {
> - if (!system_supports_tlb_range() ||
> - pages % 2 == 1) {
> - addr = __TLBI_VADDR(start, asid);
> - if (last_level) {
> - __tlbi_level(vale1is, addr, tlb_level);
> - __tlbi_user_level(vale1is, addr, tlb_level);
> - } else {
> - __tlbi_level(vae1is, addr, tlb_level);
> - __tlbi_user_level(vae1is, addr, tlb_level);
> - }
> - start += stride;
> - pages -= stride >> PAGE_SHIFT;
> - continue;
> - }
> -
> - num = __TLBI_RANGE_NUM(pages, scale);
> - if (num >= 0) {
> - addr = __TLBI_VADDR_RANGE(start, asid, scale,
> - num, tlb_level);
> - if (last_level) {
> - __tlbi(rvale1is, addr);
> - __tlbi_user(rvale1is, addr);
> - } else {
> - __tlbi(rvae1is, addr);
> - __tlbi_user(rvae1is, addr);
> - }
> - start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
> - pages -= __TLBI_RANGE_PAGES(num, scale);
> - }
> - scale++;
> - }
> + if (last_level)
> + __flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
> + else
> + __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
> +
> dsb(ish);
> }
>

Thanks,
Gavin


2023-07-05 00:25:54

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 04/11] KVM: Move kvm_arch_flush_remote_tlbs_memslot() to common code

On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> From: David Matlack <[email protected]>
>
> Move kvm_arch_flush_remote_tlbs_memslot() to common code and drop
> "arch_" from the name. kvm_arch_flush_remote_tlbs_memslot() is just a
> range-based TLB invalidation where the range is defined by the memslot.
> Now that kvm_flush_remote_tlbs_range() can be called from common code we
> can just use that and drop a bunch of duplicate code from the arch
> directories.
>
> Note this adds a lockdep assertion for slots_lock being held when
> calling kvm_flush_remote_tlbs_memslot(), which was previously only
> asserted on x86. MIPS has calls to kvm_flush_remote_tlbs_memslot(),
> but they all hold the slots_lock, so the lockdep assertion continues to
> hold true.
>
> Also drop the CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT ifdef gating
> kvm_flush_remote_tlbs_memslot(), since it is no longer necessary.
>
> Signed-off-by: David Matlack <[email protected]>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/kvm/arm.c | 6 ------
> arch/mips/kvm/mips.c | 10 ++--------
> arch/riscv/kvm/mmu.c | 6 ------
> arch/x86/kvm/mmu/mmu.c | 16 +---------------
> arch/x86/kvm/x86.c | 2 +-
> include/linux/kvm_host.h | 7 +++----
> virt/kvm/kvm_main.c | 18 ++++++++++++++++--
> 7 files changed, 23 insertions(+), 42 deletions(-)
>

It's a nice cleanup to remove kvm_arch_flush_remote_tlbs_memslot() and replace
it with generic kvm_flush_remote_tlbs_memslot():

Reviewed-by: Gavin Shan <[email protected]>

Thanks,
Gavin

> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 14391826241c8..9a75c23351aca 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1459,12 +1459,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>
> }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot)
> -{
> - kvm_flush_remote_tlbs(kvm);
> -}
> -
> static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> struct kvm_arm_device_addr *dev_addr)
> {
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index 6e34bbe244462..614cb7322b6d5 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -199,7 +199,7 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> /* Flush slot from GPA */
> kvm_mips_flush_gpa_pt(kvm, slot->base_gfn,
> slot->base_gfn + slot->npages - 1);
> - kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> + kvm_flush_remote_tlbs_memslot(kvm, slot);
> spin_unlock(&kvm->mmu_lock);
> }
>
> @@ -235,7 +235,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> needs_flush = kvm_mips_mkclean_gpa_pt(kvm, new->base_gfn,
> new->base_gfn + new->npages - 1);
> if (needs_flush)
> - kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> + kvm_flush_remote_tlbs_memslot(kvm, new);
> spin_unlock(&kvm->mmu_lock);
> }
> }
> @@ -987,12 +987,6 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> return 1;
> }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot)
> -{
> - kvm_flush_remote_tlbs(kvm);
> -}
> -
> int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> {
> int r;
> diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
> index f2eb47925806b..97e129620686c 100644
> --- a/arch/riscv/kvm/mmu.c
> +++ b/arch/riscv/kvm/mmu.c
> @@ -406,12 +406,6 @@ void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
> {
> }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot)
> -{
> - kvm_flush_remote_tlbs(kvm);
> -}
> -
> void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free)
> {
> }
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 32f7fb1b66c8d..d8cfad72f2b1f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6654,7 +6654,7 @@ static void kvm_rmap_zap_collapsible_sptes(struct kvm *kvm,
> */
> if (walk_slot_rmaps(kvm, slot, kvm_mmu_zap_collapsible_spte,
> PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL - 1, true))
> - kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
> + kvm_flush_remote_tlbs_memslot(kvm, slot);
> }
>
> void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> @@ -6673,20 +6673,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
> }
> }
>
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot)
> -{
> - /*
> - * All current use cases for flushing the TLBs for a specific memslot
> - * related to dirty logging, and many do the TLB flush out of mmu_lock.
> - * The interaction between the various operations on memslot must be
> - * serialized by slots_locks to ensure the TLB flush from one operation
> - * is observed by any other operation on the same memslot.
> - */
> - lockdep_assert_held(&kvm->slots_lock);
> - kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> -}
> -
> void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
> const struct kvm_memory_slot *memslot)
> {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ceb7c5e9cf9e9..2ea17b51037af 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12745,7 +12745,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
> * See is_writable_pte() for more details (the case involving
> * access-tracked SPTEs is particularly relevant).
> */
> - kvm_arch_flush_remote_tlbs_memslot(kvm, new);
> + kvm_flush_remote_tlbs_memslot(kvm, new);
> }
> }
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index a054f48498a8f..1eec1119cc4c9 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1358,6 +1358,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool yield_to_kernel_mode);
>
> void kvm_flush_remote_tlbs(struct kvm *kvm);
> void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages);
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot);
>
> #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
> int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
> @@ -1386,10 +1388,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
> unsigned long mask);
> void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
>
> -#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> -void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
> - const struct kvm_memory_slot *memslot);
> -#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
> +#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
> int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
> int *is_dirty, struct kvm_memory_slot **memslot);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index fc4ee20d33cc0..92218edfc6b84 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -384,6 +384,20 @@ void kvm_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn, u64 pages)
> kvm_flush_remote_tlbs(kvm);
> }
>
> +void kvm_flush_remote_tlbs_memslot(struct kvm *kvm,
> + const struct kvm_memory_slot *memslot)
> +{
> + /*
> + * All current use cases for flushing the TLBs for a specific memslot
> + * related to dirty logging, and many do the TLB flush out of mmu_lock.
> + * The interaction between the various operations on memslot must be
> + * serialized by slots_locks to ensure the TLB flush from one operation
> + * is observed by any other operation on the same memslot.
> + */
> + lockdep_assert_held(&kvm->slots_lock);
> + kvm_flush_remote_tlbs_range(kvm, memslot->base_gfn, memslot->npages);
> +}
> +
> static void kvm_flush_shadow_all(struct kvm *kvm)
> {
> kvm_arch_flush_shadow_all(kvm);
> @@ -2191,7 +2205,7 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
> }
>
> if (flush)
> - kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> + kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
> if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> return -EFAULT;
> @@ -2308,7 +2322,7 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
> KVM_MMU_UNLOCK(kvm);
>
> if (flush)
> - kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
> + kvm_flush_remote_tlbs_memslot(kvm, memslot);
>
> return 0;
> }


2023-07-05 00:40:50

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 06/11] KVM: arm64: Implement __kvm_tlb_flush_vmid_range()

On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> Define __kvm_tlb_flush_vmid_range() (for VHE and nVHE)
> to flush a range of stage-2 page-tables using IPA in one go.
> If the system supports FEAT_TLBIRANGE, the following patches
> would conviniently replace global TLBI such as vmalls12e1is
^^^^^^^^^^^^
conveniently
> in the map, unmap, and dirty-logging paths with ripas2e1is
> instead.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_asm.h | 3 +++
> arch/arm64/kvm/hyp/nvhe/hyp-main.c | 11 +++++++++++
> arch/arm64/kvm/hyp/nvhe/tlb.c | 30 ++++++++++++++++++++++++++++++
> arch/arm64/kvm/hyp/vhe/tlb.c | 28 ++++++++++++++++++++++++++++
> 4 files changed, 72 insertions(+)
>

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
> index 43c3bc0f9544d..60ed0880cc9d6 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -69,6 +69,7 @@ enum __kvm_host_smccc_func {
> __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
> __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa,
> __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid,
> + __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_range,
> __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context,
> __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff,
> __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr,
> @@ -225,6 +226,8 @@ extern void __kvm_flush_vm_context(void);
> extern void __kvm_flush_cpu_context(struct kvm_s2_mmu *mmu);
> extern void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
> int level);
> +extern void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> + phys_addr_t start, unsigned long pages);
> extern void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu);
>
> extern void __kvm_timer_set_cntvoff(u64 cntvoff);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 728e01d4536b0..a19a9299c8362 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -125,6 +125,16 @@ static void handle___kvm_tlb_flush_vmid_ipa(struct kvm_cpu_context *host_ctxt)
> __kvm_tlb_flush_vmid_ipa(kern_hyp_va(mmu), ipa, level);
> }
>
> +static void
> +handle___kvm_tlb_flush_vmid_range(struct kvm_cpu_context *host_ctxt)
> +{
> + DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> + DECLARE_REG(phys_addr_t, start, host_ctxt, 2);
> + DECLARE_REG(unsigned long, pages, host_ctxt, 3);
> +
> + __kvm_tlb_flush_vmid_range(kern_hyp_va(mmu), start, pages);
> +}
> +
> static void handle___kvm_tlb_flush_vmid(struct kvm_cpu_context *host_ctxt)
> {
> DECLARE_REG(struct kvm_s2_mmu *, mmu, host_ctxt, 1);
> @@ -316,6 +326,7 @@ static const hcall_t host_hcall[] = {
> HANDLE_FUNC(__kvm_flush_vm_context),
> HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa),
> HANDLE_FUNC(__kvm_tlb_flush_vmid),
> + HANDLE_FUNC(__kvm_tlb_flush_vmid_range),
> HANDLE_FUNC(__kvm_flush_cpu_context),
> HANDLE_FUNC(__kvm_timer_set_cntvoff),
> HANDLE_FUNC(__vgic_v3_read_vmcr),
> diff --git a/arch/arm64/kvm/hyp/nvhe/tlb.c b/arch/arm64/kvm/hyp/nvhe/tlb.c
> index 978179133f4b9..213b11952f641 100644
> --- a/arch/arm64/kvm/hyp/nvhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/nvhe/tlb.c
> @@ -130,6 +130,36 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
> __tlb_switch_to_host(&cxt);
> }
>
> +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> + phys_addr_t start, unsigned long pages)
> +{
> + struct tlb_inv_context cxt;
> + unsigned long stride;
> +
> + /*
> + * Since the range of addresses may not be mapped at
> + * the same level, assume the worst case as PAGE_SIZE
> + */
> + stride = PAGE_SIZE;
> + start = round_down(start, stride);
> +
> + /* Switch to requested VMID */
> + __tlb_switch_to_guest(mmu, &cxt, false);
> +
> + __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> +
> + dsb(ish);
> + __tlbi(vmalle1is);
> + dsb(ish);
> + isb();
> +
> + /* See the comment below in __kvm_tlb_flush_vmid_ipa() */
> + if (icache_is_vpipt())
> + icache_inval_all_pou();
> +
> + __tlb_switch_to_host(&cxt);
> +}
> +
> void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
> {
> struct tlb_inv_context cxt;
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 24cef9b87f9e9..3ca3d38b7eb23 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -111,6 +111,34 @@ void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu,
> __tlb_switch_to_host(&cxt);
> }
>
> +void __kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> + phys_addr_t start, unsigned long pages)
> +{
> + struct tlb_inv_context cxt;
> + unsigned long stride;
> +
> + /*
> + * Since the range of addresses may not be mapped at
> + * the same level, assume the worst case as PAGE_SIZE
> + */
> + stride = PAGE_SIZE;
> + start = round_down(start, stride);
> +
> + dsb(ishst);
> +
> + /* Switch to requested VMID */
> + __tlb_switch_to_guest(mmu, &cxt);
> +
> + __flush_tlb_range_op(ipas2e1is, start, pages, stride, 0, 0, false);
> +
> + dsb(ish);
> + __tlbi(vmalle1is);
> + dsb(ish);
> + isb();
> +
> + __tlb_switch_to_host(&cxt);
> +}
> +
> void __kvm_tlb_flush_vmid(struct kvm_s2_mmu *mmu)
> {
> struct tlb_inv_context cxt;


2023-07-05 00:53:18

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 08/11] KVM: arm64: Implement kvm_arch_flush_remote_tlbs_range()

On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> Implement kvm_arch_flush_remote_tlbs_range() for arm64
> to invalidate the given range in the TLB.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_host.h | 3 +++
> arch/arm64/kvm/mmu.c | 7 +++++++
> 2 files changed, 10 insertions(+)
>

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 81ab41b84f436..343fb530eea9c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1081,6 +1081,9 @@ struct kvm *kvm_arch_alloc_vm(void);
> #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
>
> +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS_RANGE
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages);
> +
> static inline bool kvm_vm_is_protected(struct kvm *kvm)
> {
> return false;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d0a0d3dca9316..c3ec2141c3284 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -92,6 +92,13 @@ int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> return 0;
> }
>
> +int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, u64 pages)
> +{
> + kvm_tlb_flush_vmid_range(&kvm->arch.mmu,
> + start_gfn << PAGE_SHIFT, pages << PAGE_SHIFT);
> + return 0;
> +}
> +
> static bool kvm_is_device_pfn(unsigned long pfn)
> {
> return !pfn_is_map_memory(pfn);


2023-07-05 01:16:27

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 09/11] KVM: arm64: Flush only the memslot after write-protect

On 6/22/23 03:50, Raghavendra Rao Ananta wrote:
> After write-protecting the region, currently KVM invalidates
> the entire TLB entries using kvm_flush_remote_tlbs(). Instead,
> scope the invalidation only to the targeted memslot. If
> supported, the architecture would use the range-based TLBI
> instructions to flush the memslot or else fallback to flushing
> all of the TLBs.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/kvm/mmu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c3ec2141c3284..94f10e670c100 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -992,7 +992,7 @@ static void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
> write_lock(&kvm->mmu_lock);
> stage2_wp_range(&kvm->arch.mmu, start, end);
> write_unlock(&kvm->mmu_lock);
> - kvm_flush_remote_tlbs(kvm);
> + kvm_flush_remote_tlbs_memslot(kvm, memslot);
> }
>
> /**


2023-07-05 01:24:03

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 03/11] KVM: Allow range-based TLB invalidation from common code



On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> From: David Matlack <[email protected]>
>
> Make kvm_flush_remote_tlbs_range() visible in common code and create a
> default implementation that just invalidates the whole TLB.
>
> This paves the way for several future features/cleanups:
>
> - Introduction of range-based TLBI on ARM.
> - Eliminating kvm_arch_flush_remote_tlbs_memslot()
> - Moving the KVM/x86 TDP MMU to common code.
>
> No functional change intended.
>
> Signed-off-by: David Matlack <[email protected]>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 3 +++
> arch/x86/kvm/mmu/mmu.c | 9 ++++-----
> arch/x86/kvm/mmu/mmu_internal.h | 3 ---
> include/linux/kvm_host.h | 9 +++++++++
> virt/kvm/kvm_main.c | 13 +++++++++++++
> 5 files changed, 29 insertions(+), 8 deletions(-)
>

Reviewed-by: Gavin Shan <[email protected]>

Thanks,
Gavin


2023-07-05 01:25:08

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()


On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> Implement the helper kvm_tlb_flush_vmid_range() that acts
> as a wrapper for range-based TLB invalidations. For the
> given VMID, use the range-based TLBI instructions to do
> the job or fallback to invalidating all the TLB entries.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
> arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
> 2 files changed, 30 insertions(+)
>

It may be reasonable to fold this to PATCH[08/11] since kvm_tlb_flush_vmid_range() is
only called by ARM64's kvm_arch_flush_remote_tlbs_range(), which is added by PATCH[08/11].
In either way, the changes look good to me:

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 4cd6762bda805..1b12295a83595 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> * kvm_pgtable_prot format.
> */
> enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> +
> +/**
> + * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
> + *
> + * @mmu: Stage-2 KVM MMU struct
> + * @addr: The base Intermediate physical address from which to invalidate
> + * @size: Size of the range from the base to invalidate
> + */
> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> + phys_addr_t addr, size_t size);
> #endif /* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 3d61bd3e591d2..df8ac14d9d3d4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
> return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
> }
>
> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> + phys_addr_t addr, size_t size)
> +{
> + unsigned long pages, inval_pages;
> +
> + if (!system_supports_tlb_range()) {
> + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> + return;
> + }
> +
> + pages = size >> PAGE_SHIFT;
> + while (pages > 0) {
> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
> + kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
> +
> + addr += inval_pages << PAGE_SHIFT;
> + pages -= inval_pages;
> + }
> +}
> +
> #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
>
> static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,

Thanks,
Gavin


2023-07-05 01:25:44

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 10/11] KVM: arm64: Invalidate the table entries upon a range


On 6/22/23 03:50, Raghavendra Rao Ananta wrote:
> Currently, during the operations such as a hugepage collapse,
> KVM would flush the entire VM's context using 'vmalls12e1is'
> TLBI operation. Specifically, if the VM is faulting on many
> hugepages (say after dirty-logging), it creates a performance
> penalty for the guest whose pages have already been faulted
> earlier as they would have to refill their TLBs again.
>
> Instead, leverage kvm_tlb_flush_vmid_range() for table entries.
> If the system supports it, only the required range will be
> flushed. Else, it'll fallback to the previous mechanism.
>
> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> ---
> arch/arm64/kvm/hyp/pgtable.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>

Reviewed-by: Gavin Shan <[email protected]>

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index df8ac14d9d3d4..50ef7623c54db 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -766,7 +766,8 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
> * value (if any).
> */
> if (kvm_pte_table(ctx->old, ctx->level))
> - kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> + kvm_tlb_flush_vmid_range(mmu, ctx->addr,
> + kvm_granule_size(ctx->level));
> else if (kvm_pte_valid(ctx->old))
> kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ctx->addr, ctx->level);
>


2023-07-05 18:25:06

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 02/11] KVM: arm64: Use kvm_arch_flush_remote_tlbs()

On Tue, Jul 4, 2023 at 4:38 PM Gavin Shan <[email protected]> wrote:
>
> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> > From: David Matlack <[email protected]>
> >
> > Use kvm_arch_flush_remote_tlbs() instead of
> > CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL. The two mechanisms solve the same
> > problem, allowing architecture-specific code to provide a non-IPI
> > implementation of remote TLB flushing.
> >
> > Dropping CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL allows KVM to standardize
> > all architectures on kvm_arch_flush_remote_tlbs() instead of maintaining
> > two mechanisms.
> >
> > Opt to standardize on kvm_arch_flush_remote_tlbs() since it avoids
> > duplicating the generic TLB stats across architectures that implement
> > their own remote TLB flush.
> >
> > This adds an extra function call to the ARM64 kvm_flush_remote_tlbs()
> > path, but that is a small cost in comparison to flushing remote TLBs.
> >
> > No functional change intended.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> It's not true and please refer to the below explanation.
>
> >
> > Signed-off-by: David Matlack <[email protected]>
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > Reviewed-by: Zenghui Yu <[email protected]>
> > Acked-by: Oliver Upton <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_host.h | 3 +++
> > arch/arm64/kvm/Kconfig | 1 -
> > arch/arm64/kvm/mmu.c | 6 +++---
> > virt/kvm/Kconfig | 3 ---
> > virt/kvm/kvm_main.c | 2 --
> > 5 files changed, 6 insertions(+), 9 deletions(-)
> >
>
> The changes make sense and look good to me. However, there is a functional change because
> the generic kvm_arch_flush_remote_tlbs() isn't exactly same to ARM64's variant. Strictly
> speaking, they're not interchangeable. In the generic function, both statistics (
> remote_tlb_flush_requests and remote_tlb_flush) are increased. Only the former statistic
> is increased in ARM64's variant.
>
> It looks correct to increase both statistics, but the commit log may need tweak to reflect
> it. With this resolved:
Good catch! I agree, there's a slight functional change here and I'll
adjust the commit message in my next revision.

Thank you.
Raghavendra

> Reviewed-by: Gavin Shan <[email protected]>
>
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7e7e19ef6993e..81ab41b84f436 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -1078,6 +1078,9 @@ int __init kvm_set_ipa_limit(void);
> > #define __KVM_HAVE_ARCH_VM_ALLOC
> > struct kvm *kvm_arch_alloc_vm(void);
> >
> > +#define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLBS
> > +int kvm_arch_flush_remote_tlbs(struct kvm *kvm);
> > +
> > static inline bool kvm_vm_is_protected(struct kvm *kvm)
> > {
> > return false;
> > diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> > index f531da6b362e9..6b730fcfee379 100644
> > --- a/arch/arm64/kvm/Kconfig
> > +++ b/arch/arm64/kvm/Kconfig
> > @@ -25,7 +25,6 @@ menuconfig KVM
> > select MMU_NOTIFIER
> > select PREEMPT_NOTIFIERS
> > select HAVE_KVM_CPU_RELAX_INTERCEPT
> > - select HAVE_KVM_ARCH_TLB_FLUSH_ALL
> > select KVM_MMIO
> > select KVM_GENERIC_DIRTYLOG_READ_PROTECT
> > select KVM_XFER_TO_GUEST_WORK
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 3b9d4d24c361a..d0a0d3dca9316 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -81,15 +81,15 @@ static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> > }
> >
> > /**
> > - * kvm_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> > + * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries for v7/8
> > * @kvm: pointer to kvm structure.
> > *
> > * Interface to HYP function to flush all VM TLB entries
> > */
> > -void kvm_flush_remote_tlbs(struct kvm *kvm)
> > +int kvm_arch_flush_remote_tlbs(struct kvm *kvm)
> > {
> > - ++kvm->stat.generic.remote_tlb_flush_requests;
> > kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
> > + return 0;
> > }
> >
> > static bool kvm_is_device_pfn(unsigned long pfn)
> > diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> > index b74916de5183a..484d0873061ca 100644
> > --- a/virt/kvm/Kconfig
> > +++ b/virt/kvm/Kconfig
> > @@ -62,9 +62,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
> > config KVM_VFIO
> > bool
> >
> > -config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> > - bool
> > -
> > config HAVE_KVM_INVALID_WAKEUPS
> > bool
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index a475ff9ef156d..600a985b86215 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -350,7 +350,6 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req)
> > }
> > EXPORT_SYMBOL_GPL(kvm_make_all_cpus_request);
> >
> > -#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> > void kvm_flush_remote_tlbs(struct kvm *kvm)
> > {
> > ++kvm->stat.generic.remote_tlb_flush_requests;
> > @@ -371,7 +370,6 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
> > ++kvm->stat.generic.remote_tlb_flush;
> > }
> > EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
> > -#endif
> >
> > static void kvm_flush_shadow_all(struct kvm *kvm)
> > {
>

2023-07-05 18:25:16

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 05/11] arm64: tlb: Refactor the core flush algorithm of __flush_tlb_range

On Tue, Jul 4, 2023 at 5:11 PM Gavin Shan <[email protected]> wrote:
>
> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> > Currently, the core TLB flush functionality of __flush_tlb_range()
> > hardcodes vae1is (and variants) for the flush operation. In the
> > upcoming patches, the KVM code reuses this core algorithm with
> > ipas2e1is for range based TLB invalidations based on the IPA.
> > Hence, extract the core flush functionality of __flush_tlb_range()
> > into its own macro that accepts an 'op' argument to pass any
> > TLBI operation, such that other callers (KVM) can benefit.
> >
> > No functional changes intended.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > Reviewed-by: Catalin Marinas <[email protected]>
> > ---
> > arch/arm64/include/asm/tlbflush.h | 108 +++++++++++++++---------------
> > 1 file changed, 55 insertions(+), 53 deletions(-)
> >
>
> With the following nits addressed:
>
> Reviewed-by: Gavin Shan <[email protected]>
>
> > diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> > index 412a3b9a3c25d..4775378b6da1b 100644
> > --- a/arch/arm64/include/asm/tlbflush.h
> > +++ b/arch/arm64/include/asm/tlbflush.h
> > @@ -278,14 +278,61 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
> > */
> > #define MAX_TLBI_OPS PTRS_PER_PTE
> >
> > +/* When the CPU does not support TLB range operations, flush the TLB
> > + * entries one by one at the granularity of 'stride'. If the TLB
> > + * range ops are supported, then:
> > + *
> > + * 1. If 'pages' is odd, flush the first page through non-range
> > + * operations;
> > + *
> > + * 2. For remaining pages: the minimum range granularity is decided
> > + * by 'scale', so multiple range TLBI operations may be required.
> > + * Start from scale = 0, flush the corresponding number of pages
> > + * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> > + * until no pages left.
> > + *
> > + * Note that certain ranges can be represented by either num = 31 and
> > + * scale or num = 0 and scale + 1. The loop below favours the latter
> > + * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> > + */
> > +#define __flush_tlb_range_op(op, start, pages, stride, \
> > + asid, tlb_level, tlbi_user) do { \
> > + int num = 0; \
> > + int scale = 0; \
> > + unsigned long addr; \
> > + \
> > + while (pages > 0) { \
> > + if (!system_supports_tlb_range() || \
> > + pages % 2 == 1) { \
> > + addr = __TLBI_VADDR(start, asid); \
> > + __tlbi_level(op, addr, tlb_level); \
> > + if (tlbi_user) \
> > + __tlbi_user_level(op, addr, tlb_level); \
> > + start += stride; \
> > + pages -= stride >> PAGE_SHIFT; \
> > + continue; \
> > + } \
> > + \
> > + num = __TLBI_RANGE_NUM(pages, scale); \
> > + if (num >= 0) { \
> > + addr = __TLBI_VADDR_RANGE(start, asid, scale, \
> > + num, tlb_level); \
> > + __tlbi(r##op, addr); \
> > + if (tlbi_user) \
> > + __tlbi_user(r##op, addr); \
> > + start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT; \
> > + pages -= __TLBI_RANGE_PAGES(num, scale); \
> > + } \
> > + scale++; \
> > + } \
> > +} while (0)
> > +
>
> There is a warning reported from 'checkpatch.pl'.
>
> WARNING: suspect code indent for conditional statements (32, 8)
> #52: FILE: arch/arm64/include/asm/tlbflush.h:299:
> + asid, tlb_level, tlbi_user) do { \
> [...]
> + unsigned long addr; \
>
> total: 0 errors, 1 warnings, 125 lines checked
>
> You probably need to tweak it as below, to avoid the warning.
>
> #define __flush_tlb_range_op(op, start, pages, stride, \
> asid, tlb_level, tlbi_user) \
> do { \
>
Thanks for the suggestion, Gavin. I'll fix this.

- Raghavendra
>
> > static inline void __flush_tlb_range(struct vm_area_struct *vma,
> > unsigned long start, unsigned long end,
> > unsigned long stride, bool last_level,
> > int tlb_level)
> > {
> > - int num = 0;
> > - int scale = 0;
> > - unsigned long asid, addr, pages;
> > + unsigned long asid, pages;
> >
> > start = round_down(start, stride);
> > end = round_up(end, stride);
> > @@ -307,56 +354,11 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
> > dsb(ishst);
> > asid = ASID(vma->vm_mm);
> >
> > - /*
> > - * When the CPU does not support TLB range operations, flush the TLB
> > - * entries one by one at the granularity of 'stride'. If the TLB
> > - * range ops are supported, then:
> > - *
> > - * 1. If 'pages' is odd, flush the first page through non-range
> > - * operations;
> > - *
> > - * 2. For remaining pages: the minimum range granularity is decided
> > - * by 'scale', so multiple range TLBI operations may be required.
> > - * Start from scale = 0, flush the corresponding number of pages
> > - * ((num+1)*2^(5*scale+1) starting from 'addr'), then increase it
> > - * until no pages left.
> > - *
> > - * Note that certain ranges can be represented by either num = 31 and
> > - * scale or num = 0 and scale + 1. The loop below favours the latter
> > - * since num is limited to 30 by the __TLBI_RANGE_NUM() macro.
> > - */
> > - while (pages > 0) {
> > - if (!system_supports_tlb_range() ||
> > - pages % 2 == 1) {
> > - addr = __TLBI_VADDR(start, asid);
> > - if (last_level) {
> > - __tlbi_level(vale1is, addr, tlb_level);
> > - __tlbi_user_level(vale1is, addr, tlb_level);
> > - } else {
> > - __tlbi_level(vae1is, addr, tlb_level);
> > - __tlbi_user_level(vae1is, addr, tlb_level);
> > - }
> > - start += stride;
> > - pages -= stride >> PAGE_SHIFT;
> > - continue;
> > - }
> > -
> > - num = __TLBI_RANGE_NUM(pages, scale);
> > - if (num >= 0) {
> > - addr = __TLBI_VADDR_RANGE(start, asid, scale,
> > - num, tlb_level);
> > - if (last_level) {
> > - __tlbi(rvale1is, addr);
> > - __tlbi_user(rvale1is, addr);
> > - } else {
> > - __tlbi(rvae1is, addr);
> > - __tlbi_user(rvae1is, addr);
> > - }
> > - start += __TLBI_RANGE_PAGES(num, scale) << PAGE_SHIFT;
> > - pages -= __TLBI_RANGE_PAGES(num, scale);
> > - }
> > - scale++;
> > - }
> > + if (last_level)
> > + __flush_tlb_range_op(vale1is, start, pages, stride, asid, tlb_level, true);
> > + else
> > + __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true);
> > +
> > dsb(ish);
> > }
> >
>
> Thanks,
> Gavin
>

2023-07-05 18:36:25

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()

On Tue, Jul 4, 2023 at 5:31 PM Gavin Shan <[email protected]> wrote:
>
>
> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> > Implement the helper kvm_tlb_flush_vmid_range() that acts
> > as a wrapper for range-based TLB invalidations. For the
> > given VMID, use the range-based TLBI instructions to do
> > the job or fallback to invalidating all the TLB entries.
> >
> > Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> > ---
> > arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
> > arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
> > 2 files changed, 30 insertions(+)
> >
>
> It may be reasonable to fold this to PATCH[08/11] since kvm_tlb_flush_vmid_range() is
> only called by ARM64's kvm_arch_flush_remote_tlbs_range(), which is added by PATCH[08/11].
> In either way, the changes look good to me:
>
Ah, the patches 10 and 11 also call kvm_tlb_flush_vmid_range(), so
probably it's better to keep the definition isolated?

Regards,
Raghavendra
> Reviewed-by: Gavin Shan <[email protected]>
>
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 4cd6762bda805..1b12295a83595 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> > * kvm_pgtable_prot format.
> > */
> > enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> > +
> > +/**
> > + * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
> > + *
> > + * @mmu: Stage-2 KVM MMU struct
> > + * @addr: The base Intermediate physical address from which to invalidate
> > + * @size: Size of the range from the base to invalidate
> > + */
> > +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > + phys_addr_t addr, size_t size);
> > #endif /* __ARM64_KVM_PGTABLE_H__ */
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 3d61bd3e591d2..df8ac14d9d3d4 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
> > return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
> > }
> >
> > +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> > + phys_addr_t addr, size_t size)
> > +{
> > + unsigned long pages, inval_pages;
> > +
> > + if (!system_supports_tlb_range()) {
> > + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > + return;
> > + }
> > +
> > + pages = size >> PAGE_SHIFT;
> > + while (pages > 0) {
> > + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
> > + kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
> > +
> > + addr += inval_pages << PAGE_SHIFT;
> > + pages -= inval_pages;
> > + }
> > +}
> > +
> > #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
> >
> > static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
>
> Thanks,
> Gavin
>

2023-07-06 00:19:00

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()

On 7/6/23 04:28, Raghavendra Rao Ananta wrote:
> On Tue, Jul 4, 2023 at 5:31 PM Gavin Shan <[email protected]> wrote:
>> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
>>> Implement the helper kvm_tlb_flush_vmid_range() that acts
>>> as a wrapper for range-based TLB invalidations. For the
>>> given VMID, use the range-based TLBI instructions to do
>>> the job or fallback to invalidating all the TLB entries.
>>>
>>> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
>>> ---
>>> arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
>>> arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
>>> 2 files changed, 30 insertions(+)
>>>
>>
>> It may be reasonable to fold this to PATCH[08/11] since kvm_tlb_flush_vmid_range() is
>> only called by ARM64's kvm_arch_flush_remote_tlbs_range(), which is added by PATCH[08/11].
>> In either way, the changes look good to me:
>>
> Ah, the patches 10 and 11 also call kvm_tlb_flush_vmid_range(), so
> probably it's better to keep the definition isolated?
>

Thanks for your explanation. It's fine to have two separate patches in this
case. I still need to spend some time to look at PATCH[11/11] whose subject
includes typo (intructions -> instructions)

Thanks,
Gavin

>> Reviewed-by: Gavin Shan <[email protected]>
>>
>>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>>> index 4cd6762bda805..1b12295a83595 100644
>>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>>> @@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
>>> * kvm_pgtable_prot format.
>>> */
>>> enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
>>> +
>>> +/**
>>> + * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
>>> + *
>>> + * @mmu: Stage-2 KVM MMU struct
>>> + * @addr: The base Intermediate physical address from which to invalidate
>>> + * @size: Size of the range from the base to invalidate
>>> + */
>>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>> + phys_addr_t addr, size_t size);
>>> #endif /* __ARM64_KVM_PGTABLE_H__ */
>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>> index 3d61bd3e591d2..df8ac14d9d3d4 100644
>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>> @@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
>>> return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
>>> }
>>>
>>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>> + phys_addr_t addr, size_t size)
>>> +{
>>> + unsigned long pages, inval_pages;
>>> +
>>> + if (!system_supports_tlb_range()) {
>>> + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
>>> + return;
>>> + }
>>> +
>>> + pages = size >> PAGE_SHIFT;
>>> + while (pages > 0) {
>>> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
>>> +
>>> + addr += inval_pages << PAGE_SHIFT;
>>> + pages -= inval_pages;
>>> + }
>>> +}
>>> +
>>> #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
>>>
>>> static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,


2023-07-13 18:57:28

by Raghavendra Rao Ananta

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()

Hi Gavin,

On Wed, Jul 5, 2023 at 5:04 PM Gavin Shan <[email protected]> wrote:
>
> On 7/6/23 04:28, Raghavendra Rao Ananta wrote:
> > On Tue, Jul 4, 2023 at 5:31 PM Gavin Shan <[email protected]> wrote:
> >> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
> >>> Implement the helper kvm_tlb_flush_vmid_range() that acts
> >>> as a wrapper for range-based TLB invalidations. For the
> >>> given VMID, use the range-based TLBI instructions to do
> >>> the job or fallback to invalidating all the TLB entries.
> >>>
> >>> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
> >>> ---
> >>> arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
> >>> arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
> >>> 2 files changed, 30 insertions(+)
> >>>
> >>
> >> It may be reasonable to fold this to PATCH[08/11] since kvm_tlb_flush_vmid_range() is
> >> only called by ARM64's kvm_arch_flush_remote_tlbs_range(), which is added by PATCH[08/11].
> >> In either way, the changes look good to me:
> >>
> > Ah, the patches 10 and 11 also call kvm_tlb_flush_vmid_range(), so
> > probably it's better to keep the definition isolated?
> >
>
> Thanks for your explanation. It's fine to have two separate patches in this
> case. I still need to spend some time to look at PATCH[11/11] whose subject
> includes typo (intructions -> instructions)
>
I'm planning to send v6 soon, but I'm happy to wait if you have any
other comments on v5 patch-11.
Appreciate your help with the reviews.

Thank you.
Raghavendra
> Thanks,
> Gavin
>
> >> Reviewed-by: Gavin Shan <[email protected]>
> >>
> >>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> >>> index 4cd6762bda805..1b12295a83595 100644
> >>> --- a/arch/arm64/include/asm/kvm_pgtable.h
> >>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> >>> @@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
> >>> * kvm_pgtable_prot format.
> >>> */
> >>> enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
> >>> +
> >>> +/**
> >>> + * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
> >>> + *
> >>> + * @mmu: Stage-2 KVM MMU struct
> >>> + * @addr: The base Intermediate physical address from which to invalidate
> >>> + * @size: Size of the range from the base to invalidate
> >>> + */
> >>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >>> + phys_addr_t addr, size_t size);
> >>> #endif /* __ARM64_KVM_PGTABLE_H__ */
> >>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> >>> index 3d61bd3e591d2..df8ac14d9d3d4 100644
> >>> --- a/arch/arm64/kvm/hyp/pgtable.c
> >>> +++ b/arch/arm64/kvm/hyp/pgtable.c
> >>> @@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
> >>> return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
> >>> }
> >>>
> >>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
> >>> + phys_addr_t addr, size_t size)
> >>> +{
> >>> + unsigned long pages, inval_pages;
> >>> +
> >>> + if (!system_supports_tlb_range()) {
> >>> + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> >>> + return;
> >>> + }
> >>> +
> >>> + pages = size >> PAGE_SHIFT;
> >>> + while (pages > 0) {
> >>> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
> >>> + kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
> >>> +
> >>> + addr += inval_pages << PAGE_SHIFT;
> >>> + pages -= inval_pages;
> >>> + }
> >>> +}
> >>> +
> >>> #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
> >>>
> >>> static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
>

2023-07-14 01:34:03

by Gavin Shan

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 07/11] KVM: arm64: Define kvm_tlb_flush_vmid_range()

Hi Raghavendra,

On 7/14/23 04:47, Raghavendra Rao Ananta wrote:
> On Wed, Jul 5, 2023 at 5:04 PM Gavin Shan <[email protected]> wrote:
>>
>> On 7/6/23 04:28, Raghavendra Rao Ananta wrote:
>>> On Tue, Jul 4, 2023 at 5:31 PM Gavin Shan <[email protected]> wrote:
>>>> On 6/22/23 03:49, Raghavendra Rao Ananta wrote:
>>>>> Implement the helper kvm_tlb_flush_vmid_range() that acts
>>>>> as a wrapper for range-based TLB invalidations. For the
>>>>> given VMID, use the range-based TLBI instructions to do
>>>>> the job or fallback to invalidating all the TLB entries.
>>>>>
>>>>> Signed-off-by: Raghavendra Rao Ananta <[email protected]>
>>>>> ---
>>>>> arch/arm64/include/asm/kvm_pgtable.h | 10 ++++++++++
>>>>> arch/arm64/kvm/hyp/pgtable.c | 20 ++++++++++++++++++++
>>>>> 2 files changed, 30 insertions(+)
>>>>>
>>>>
>>>> It may be reasonable to fold this to PATCH[08/11] since kvm_tlb_flush_vmid_range() is
>>>> only called by ARM64's kvm_arch_flush_remote_tlbs_range(), which is added by PATCH[08/11].
>>>> In either way, the changes look good to me:
>>>>
>>> Ah, the patches 10 and 11 also call kvm_tlb_flush_vmid_range(), so
>>> probably it's better to keep the definition isolated?
>>>
>>
>> Thanks for your explanation. It's fine to have two separate patches in this
>> case. I still need to spend some time to look at PATCH[11/11] whose subject
>> includes typo (intructions -> instructions)
>>
> I'm planning to send v6 soon, but I'm happy to wait if you have any
> other comments on v5 patch-11.
> Appreciate your help with the reviews.
>

I didn't get a chance to look at PATCH[11/11] yet. Please post v6 and I will
take a look on PATCH[v6 11/11]. Sorry for the delay.

Thanks,
Gavin

>>>> Reviewed-by: Gavin Shan <[email protected]>
>>>>
>>>>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>>>>> index 4cd6762bda805..1b12295a83595 100644
>>>>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>>>>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>>>>> @@ -682,4 +682,14 @@ enum kvm_pgtable_prot kvm_pgtable_stage2_pte_prot(kvm_pte_t pte);
>>>>> * kvm_pgtable_prot format.
>>>>> */
>>>>> enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte);
>>>>> +
>>>>> +/**
>>>>> + * kvm_tlb_flush_vmid_range() - Invalidate/flush a range of TLB entries
>>>>> + *
>>>>> + * @mmu: Stage-2 KVM MMU struct
>>>>> + * @addr: The base Intermediate physical address from which to invalidate
>>>>> + * @size: Size of the range from the base to invalidate
>>>>> + */
>>>>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>>>> + phys_addr_t addr, size_t size);
>>>>> #endif /* __ARM64_KVM_PGTABLE_H__ */
>>>>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>>>>> index 3d61bd3e591d2..df8ac14d9d3d4 100644
>>>>> --- a/arch/arm64/kvm/hyp/pgtable.c
>>>>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>>>>> @@ -631,6 +631,26 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
>>>>> return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
>>>>> }
>>>>>
>>>>> +void kvm_tlb_flush_vmid_range(struct kvm_s2_mmu *mmu,
>>>>> + phys_addr_t addr, size_t size)
>>>>> +{
>>>>> + unsigned long pages, inval_pages;
>>>>> +
>>>>> + if (!system_supports_tlb_range()) {
>>>>> + kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> + pages = size >> PAGE_SHIFT;
>>>>> + while (pages > 0) {
>>>>> + inval_pages = min(pages, MAX_TLBI_RANGE_PAGES);
>>>>> + kvm_call_hyp(__kvm_tlb_flush_vmid_range, mmu, addr, inval_pages);
>>>>> +
>>>>> + addr += inval_pages << PAGE_SHIFT;
>>>>> + pages -= inval_pages;
>>>>> + }
>>>>> +}
>>>>> +
>>>>> #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
>>>>>
>>>>> static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
>>
>