2023-03-21 22:01:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 00/13] KVM: x86/mmu: Optimize clear dirty log

This is a massaged version of Vipin's series to optimize clearing dirty
state in the TDP MMU. It's basically the same as v3, just spread out over
more patches. The only meaningful difference in the end is that
clear_dirty_gfn_range() also gets similar treatment in handling Dirty vs.
Writable logic.

Vipin, I'm still planning on applying this for 6.4, but the changes ended
up being a wee bit bigger than I'm comfortable making on the fly, thus the
formal posting.

v4:
- Split patches into more fine-grained chunks.
- Massage changelogs as needed.
- Collect reviews. [David]

v3:
- https://lore.kernel.org/all/[email protected]
- Tried to do better job at writing commit messages.
- Made kvm_tdp_mmu_clear_spte_bits() similar to the kvm_tdp_mmu_write_spte().
- clear_dirty_pt_masked() evaluates mask for the bit to be cleared outside the
loop and use that for all of the SPTEs instead of calculating for each SPTE.
- Some naming changes based on the feedbacks.
- Split out the dead code clean from the optimization code.


v2: https://lore.kernel.org/lkml/[email protected]/
- Clear dirty log and age gfn range does not go through
handle_changed_spte, they handle their SPTE changes locally to improve
their speed.
- Clear only specific bits atomically when updating SPTEs in clearing
dirty log and aging gfn range functions.
- Removed tdp_mmu_set_spte_no_[acc_track|dirty_log] APIs.
- Converged all handle_changed_spte related functions to one place.

v1: https://lore.kernel.org/lkml/[email protected]

Vipin Sharma (13):
KVM: x86/mmu: Add a helper function to check if an SPTE needs atomic
write
KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need
wrprot
KVM: x86/mmu: Consolidate Dirty vs. Writable clearing logic in TDP MMU
KVM: x86/mmu: Atomically clear SPTE dirty state in the clear-dirty-log
flow
KVM: x86/mmu: Drop access tracking checks when clearing TDP MMU dirty
bits
KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU
dirty bits
KVM: x86/mmu: Remove "record_dirty_log" in __tdp_mmu_set_spte()
KVM: x86/mmu: Clear only A-bit (if enabled) when aging TDP MMU SPTEs
KVM: x86/mmu: Drop unnecessary dirty log checks when aging TDP MMU
SPTEs
KVM: x86/mmu: Bypass __handle_changed_spte() when aging TDP MMU SPTEs
KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()
KVM: x86/mmu: Remove handle_changed_spte_dirty_log()
KVM: x86/mmu: Merge all handle_changed_pte*() functions

arch/x86/kvm/mmu/tdp_iter.h | 48 +++++---
arch/x86/kvm/mmu/tdp_mmu.c | 215 ++++++++++++------------------------
2 files changed, 106 insertions(+), 157 deletions(-)


base-commit: f3d90f901d18749dca096719540a075f59240051
--
2.40.0.rc2.332.ga46443480c-goog


2023-03-21 22:01:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 01/13] KVM: x86/mmu: Add a helper function to check if an SPTE needs atomic write

From: Vipin Sharma <[email protected]>

Move conditions in kvm_tdp_mmu_write_spte() to check if an SPTE should
be written atomically or not to a separate function.

This new function, kvm_tdp_mmu_spte_need_atomic_write(), will be used
in future commits to optimize clearing bits in SPTEs.

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..c11c5d00b2c1 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -29,23 +29,29 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
WRITE_ONCE(*rcu_dereference(sptep), new_spte);
}

+/*
+ * SPTEs must be modified atomically if they are shadow-present, leaf
+ * SPTEs, and have volatile bits, i.e. has bits that can be set outside
+ * of mmu_lock. The Writable bit can be set by KVM's fast page fault
+ * handler, and Accessed and Dirty bits can be set by the CPU.
+ *
+ * Note, non-leaf SPTEs do have Accessed bits and those bits are
+ * technically volatile, but KVM doesn't consume the Accessed bit of
+ * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
+ * logic needs to be reassessed if KVM were to use non-leaf Accessed
+ * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
+ */
+static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level)
+{
+ return is_shadow_present_pte(old_spte) &&
+ is_last_spte(old_spte, level) &&
+ spte_has_volatile_bits(old_spte);
+}
+
static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
u64 new_spte, int level)
{
- /*
- * Atomically write the SPTE if it is a shadow-present, leaf SPTE with
- * volatile bits, i.e. has bits that can be set outside of mmu_lock.
- * The Writable bit can be set by KVM's fast page fault handler, and
- * Accessed and Dirty bits can be set by the CPU.
- *
- * Note, non-leaf SPTEs do have Accessed bits and those bits are
- * technically volatile, but KVM doesn't consume the Accessed bit of
- * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
- * logic needs to be reassessed if KVM were to use non-leaf Accessed
- * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
- */
- if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
- spte_has_volatile_bits(old_spte))
+ if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level))
return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);

__kvm_tdp_mmu_write_spte(sptep, new_spte);
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:01:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 04/13] KVM: x86/mmu: Atomically clear SPTE dirty state in the clear-dirty-log flow

From: Vipin Sharma <[email protected]>

Optimize the clearing of dirty state in TDP MMU SPTEs by doing an
atomic-AND (on SPTEs that have volatile bits) instead of the full XCHG
that currently ends up being invoked (see kvm_tdp_mmu_write_spte()).
Clearing _only_ the bit in question will allow KVM to skip the many
irrelevant checks in __handle_changed_spte() by avoiding any collateral
damage due to the XCHG writing all SPTE bits, e.g. the XCHG could race
with fast_page_fault() setting the W-bit and the CPU setting the D-bit,
and thus incorrectly drop the CPU's D-bit update.

Link: https://lore.kernel.org/all/Y9hXmz%[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: split the switch to atomic-AND to a separate patch]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 14 ++++++++++++++
arch/x86/kvm/mmu/tdp_mmu.c | 16 ++++++++--------
2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index c11c5d00b2c1..fae559559a80 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -58,6 +58,20 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
return old_spte;
}

+static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte,
+ u64 mask, int level)
+{
+ atomic64_t *sptep_atomic;
+
+ if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) {
+ sptep_atomic = (atomic64_t *)rcu_dereference(sptep);
+ return (u64)atomic64_fetch_and(~mask, sptep_atomic);
+ }
+
+ __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask);
+ return old_spte;
+}
+
/*
* A TDP iterator performs a pre-order walk over a TDP paging structure.
*/
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index b32c9ba05c89..a70cc1dae18a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -770,13 +770,6 @@ static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
_tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
}

-static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
-}
-
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
for_each_tdp_pte(_iter, _root, _start, _end)

@@ -1692,7 +1685,14 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
if (!(iter.old_spte & dbit))
continue;

- tdp_mmu_set_spte_no_dirty_log(kvm, &iter, iter.old_spte & ~dbit);
+ iter.old_spte = tdp_mmu_clear_spte_bits(iter.sptep,
+ iter.old_spte, dbit,
+ iter.level);
+
+ __handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
+ iter.old_spte & ~dbit, iter.level, false);
+ handle_changed_spte_acc_track(iter.old_spte, iter.old_spte & ~dbit,
+ iter.level);
}

rcu_read_unlock();
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:01:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 02/13] KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprot

From: Vipin Sharma <[email protected]>

Use the constant-after-module-load kvm_ad_enabled() to check if SPTEs in
the TDP MMU need to be write-protected when clearing accessed/dirty status
instead of manually checking every SPTE. The per-SPTE A/D enabling is
specific to nested EPT MMUs, i.e. when KVM is using EPT A/D bits but L1 is
not, and so cannot happen in the TDP MMU (which is non-nested only).

Keep the original code as sanity checks buried under MMU_WARN_ON().
MMU_WARN_ON() is more or less useless at the moment, but there are plans
to change that.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
[sean: split to separate patch, apply to dirty path, write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7c25dbf32ecc..5a5642650c3e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1621,7 +1621,10 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
if (!is_shadow_present_pte(iter.old_spte))
continue;

- if (spte_ad_need_write_protect(iter.old_spte)) {
+ MMU_WARN_ON(kvm_ad_enabled() &&
+ spte_ad_need_write_protect(iter.old_spte));
+
+ if (!kvm_ad_enabled()) {
if (is_writable_pte(iter.old_spte))
new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
else
@@ -1685,13 +1688,16 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
if (!mask)
break;

+ MMU_WARN_ON(kvm_ad_enabled() &&
+ spte_ad_need_write_protect(iter.old_spte));
+
if (iter.level > PG_LEVEL_4K ||
!(mask & (1UL << (iter.gfn - gfn))))
continue;

mask &= ~(1UL << (iter.gfn - gfn));

- if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
+ if (wrprot || !kvm_ad_enabled()) {
if (is_writable_pte(iter.old_spte))
new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
else
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:01:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 05/13] KVM: x86/mmu: Drop access tracking checks when clearing TDP MMU dirty bits

From: Vipin Sharma <[email protected]>

Drop the unnecessary call to handle access-tracking changes when clearing
the dirty status of TDP MMU SPTEs. Neither the Dirty bit nor the Writable
bit has any impact on the accessed state of a page, i.e. clearing only
the aforementioned bits doesn't make an accessed SPTE suddently not
accessed.

Signed-off-by: Vipin Sharma <[email protected]>
[sean: split to separate patch, write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a70cc1dae18a..950c5d23ecee 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1691,8 +1691,6 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,

__handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
iter.old_spte & ~dbit, iter.level, false);
- handle_changed_spte_acc_track(iter.old_spte, iter.old_spte & ~dbit,
- iter.level);
}

rcu_read_unlock();
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:02:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 06/13] KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bits

From: Vipin Sharma <[email protected]>

Drop everything except marking the PFN dirty and the relevant tracepoint
parts of __handle_changed_spte() when clearing the dirty status of gfns in
the TDP MMU. Clearing only the Dirty (or Writable) bit doesn't affect
the SPTEs shadow-present status, whether or not the SPTE is a leaf, or
change the SPTE's PFN. I.e. other than marking the PFN dirty, none of the
functional updates handled by __handle_changed_spte() are relevant.

Losing __handle_changed_spte()'s sanity checks does mean that a bug could
theoretical go unnoticed, but that scenario is extremely unlikely, e.g.
would effectively require a misconfigured or a locking bug elsewhere.

Opportunistically remove a comment blurb from __handle_changed_spte()
about all modifications to TDP MMU SPTEs needing to invoke said function,
that "rule" hasn't been true since fast page fault support was added for
the TDP MMU (and perhaps even before).

Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of
clear dirty log stage improved by ~40% in dirty_log_perf_test (with the
full optimization applied).

Before optimization:
--------------------
Iteration 1 clear dirty log time: 3.638543593s
Iteration 2 clear dirty log time: 3.145032742s
Iteration 3 clear dirty log time: 3.142340358s
Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)

After optimization:
-------------------
Iteration 1 clear dirty log time: 2.318988110s
Iteration 2 clear dirty log time: 1.794470164s
Iteration 3 clear dirty log time: 1.791668628s
Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)

Link: https://lore.kernel.org/all/Y9hXmz%[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: split the switch to atomic-AND to a separate patch]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 950c5d23ecee..467931c43968 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -517,7 +517,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
* threads that might be modifying SPTEs.
*
* Handle bookkeeping that might result from the modification of a SPTE.
- * This function must be called for all TDP SPTE modifications.
*/
static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level,
@@ -1689,8 +1688,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
iter.old_spte, dbit,
iter.level);

- __handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
- iter.old_spte & ~dbit, iter.level, false);
+ trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
+ iter.old_spte,
+ iter.old_spte & ~dbit);
+ kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
}

rcu_read_unlock();
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:02:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 07/13] KVM: x86/mmu: Remove "record_dirty_log" in __tdp_mmu_set_spte()

From: Vipin Sharma <[email protected]>

Remove bool parameter "record_dirty_log" from __tdp_mmu_set_spte() and
refactor the code as this variable is always set to true by its caller.

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 467931c43968..3cc81fa22b7f 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -708,18 +708,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* notifier for access tracking. Leaving record_acc_track
* unset in that case prevents page accesses from being
* double counted.
- * @record_dirty_log: Record the page as dirty in the dirty bitmap if
- * appropriate for the change being made. Should be set
- * unless performing certain dirty logging operations.
- * Leaving record_dirty_log unset in that case prevents page
- * writes from being double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track, bool record_dirty_log)
+ bool record_acc_track)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -738,35 +733,34 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

if (record_acc_track)
handle_changed_spte_acc_track(old_spte, new_spte, level);
- if (record_dirty_log)
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
+
+ handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
+ level);
return old_spte;
}

static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track,
- bool record_dirty_log)
+ u64 new_spte, bool record_acc_track)
{
WARN_ON_ONCE(iter->yielded);

iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
iter->old_spte, new_spte,
iter->gfn, iter->level,
- record_acc_track, record_dirty_log);
+ record_acc_track);
}

static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, true);
}

static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, false);
}

#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -916,7 +910,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
return false;

__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true, true);
+ sp->gfn, sp->role.level + 1, true);

return true;
}
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:02:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 03/13] KVM: x86/mmu: Consolidate Dirty vs. Writable clearing logic in TDP MMU

From: Vipin Sharma <[email protected]>

Deduplicate the guts of the TDP MMU's clearing of dirty status by
snapshotting whether to check+clear the Dirty bit vs. the Writable bit,
which is the only difference between the two flavors of dirty tracking.

Note, kvm_ad_enabled() is just a wrapper for shadow_accessed_mask, i.e.
is constant after kvm-{intel,amd}.ko is loaded.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
[sean: split to separate patch, apply to dirty log, write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 35 +++++++++--------------------------
1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 5a5642650c3e..b32c9ba05c89 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1607,8 +1607,8 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm,
static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
gfn_t start, gfn_t end)
{
+ u64 dbit = kvm_ad_enabled() ? shadow_dirty_mask : PT_WRITABLE_MASK;
struct tdp_iter iter;
- u64 new_spte;
bool spte_set = false;

rcu_read_lock();
@@ -1624,19 +1624,10 @@ static bool clear_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
MMU_WARN_ON(kvm_ad_enabled() &&
spte_ad_need_write_protect(iter.old_spte));

- if (!kvm_ad_enabled()) {
- if (is_writable_pte(iter.old_spte))
- new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
- else
- continue;
- } else {
- if (iter.old_spte & shadow_dirty_mask)
- new_spte = iter.old_spte & ~shadow_dirty_mask;
- else
- continue;
- }
+ if (!(iter.old_spte & dbit))
+ continue;

- if (tdp_mmu_set_spte_atomic(kvm, &iter, new_spte))
+ if (tdp_mmu_set_spte_atomic(kvm, &iter, iter.old_spte & ~dbit))
goto retry;

spte_set = true;
@@ -1678,8 +1669,9 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
gfn_t gfn, unsigned long mask, bool wrprot)
{
+ u64 dbit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
+ shadow_dirty_mask;
struct tdp_iter iter;
- u64 new_spte;

rcu_read_lock();

@@ -1697,19 +1689,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,

mask &= ~(1UL << (iter.gfn - gfn));

- if (wrprot || !kvm_ad_enabled()) {
- if (is_writable_pte(iter.old_spte))
- new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
- else
- continue;
- } else {
- if (iter.old_spte & shadow_dirty_mask)
- new_spte = iter.old_spte & ~shadow_dirty_mask;
- else
- continue;
- }
+ if (!(iter.old_spte & dbit))
+ continue;

- tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+ tdp_mmu_set_spte_no_dirty_log(kvm, &iter, iter.old_spte & ~dbit);
}

rcu_read_unlock();
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:02:37

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 08/13] KVM: x86/mmu: Clear only A-bit (if enabled) when aging TDP MMU SPTEs

From: Vipin Sharma <[email protected]>

Use tdp_mmu_clear_spte_bits() when clearing the Accessed bit in TDP MMU
SPTEs so as to use an atomic-AND instead of XCHG to clear the A-bit.
Similar to the D-bit story, this will allow KVM to bypass
__handle_changed_spte() by ensuring only the A-bit is modified.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3cc81fa22b7f..adbdfed287cc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -756,13 +756,6 @@ static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
_tdp_mmu_set_spte(kvm, iter, new_spte, true);
}

-static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false);
-}
-
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
for_each_tdp_pte(_iter, _root, _start, _end)

@@ -1248,33 +1241,44 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
/*
* Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
* if any of the GFNs in the range have been accessed.
+ *
+ * No need to mark the corresponding PFN as accessed as this call is coming
+ * from the clear_young() or clear_flush_young() notifier, which uses the
+ * return value to determine if the page has been accessed.
*/
static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
struct kvm_gfn_range *range)
{
- u64 new_spte = 0;
+ u64 new_spte;

/* If we have a non-accessed entry we don't need to change the pte. */
if (!is_accessed_spte(iter->old_spte))
return false;

- new_spte = iter->old_spte;
-
- if (spte_ad_enabled(new_spte)) {
- new_spte &= ~shadow_accessed_mask;
+ if (spte_ad_enabled(iter->old_spte)) {
+ iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
+ iter->old_spte,
+ shadow_accessed_mask,
+ iter->level);
+ new_spte = iter->old_spte & ~shadow_accessed_mask;
} else {
/*
* Capture the dirty status of the page, so that it doesn't get
* lost when the SPTE is marked for access tracking.
*/
- if (is_writable_pte(new_spte))
- kvm_set_pfn_dirty(spte_to_pfn(new_spte));
+ if (is_writable_pte(iter->old_spte))
+ kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));

- new_spte = mark_spte_for_access_track(new_spte);
+ new_spte = mark_spte_for_access_track(iter->old_spte);
+ iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
+ iter->old_spte, new_spte,
+ iter->level);
}

- tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
-
+ __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+ new_spte, iter->level, false);
+ handle_changed_spte_dirty_log(kvm, iter->as_id, iter->gfn,
+ iter->old_spte, new_spte, iter->level);
return true;
}

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:04:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 09/13] KVM: x86/mmu: Drop unnecessary dirty log checks when aging TDP MMU SPTEs

From: Vipin Sharma <[email protected]>

Drop the unnecessary call to handle dirty log updates when aging TDP MMU
SPTEs, as neither clearing the Accessed bit nor marking a SPTE for access
tracking can _set_ the Writable bit, i.e. can't trigger marking a gfn
dirty in its memslot. The access tracking path can _clear_ the Writable
bit, e.g. if the XCHG races with fast_page_fault() and writes the stale
value without the Writable bit set, but clearing the Writable bit outside
of mmu_lock is not allowed, i.e. access tracking can't spuriously set the
Writable bit.

Signed-off-by: Vipin Sharma <[email protected]>
[sean: split to separate patch, apply to dirty path, write changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index adbdfed287cc..29bb97ff266e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1277,8 +1277,6 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,

__handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
new_spte, iter->level, false);
- handle_changed_spte_dirty_log(kvm, iter->as_id, iter->gfn,
- iter->old_spte, new_spte, iter->level);
return true;
}

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:07:37

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 13/13] KVM: x86/mmu: Merge all handle_changed_pte*() functions

From: Vipin Sharma <[email protected]>

Merge __handle_changed_pte() and handle_changed_spte_acc_track() into a
single function, handle_changed_pte(), as the two are always used
together. Remove the existing handle_changed_pte(), as it's just a
wrapper that calls __handle_changed_pte() and
handle_changed_spte_acc_track().

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++++++---------------------------
1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e8ee49b6da5b..b2fca11b91ff 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,17 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level,
bool shared);

-static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
-{
- if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
- return;
-
- if (is_accessed_spte(old_spte) &&
- (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
- spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
- kvm_set_pfn_accessed(spte_to_pfn(old_spte));
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -487,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
}

/**
- * __handle_changed_spte - handle bookkeeping associated with an SPTE change
+ * handle_changed_spte - handle bookkeeping associated with an SPTE change
* @kvm: kvm instance
* @as_id: the address space of the paging structure the SPTE was a part of
* @gfn: the base GFN that was mapped by the SPTE
@@ -502,9 +491,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
* dirty logging updates are handled in common code, not here (see make_spte()
* and fast_pf_fix_direct_spte()).
*/
-static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
+static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
+ u64 old_spte, u64 new_spte, int level,
+ bool shared)
{
bool was_present = is_shadow_present_pte(old_spte);
bool is_present = is_shadow_present_pte(new_spte);
@@ -588,15 +577,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (was_present && !was_leaf &&
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-}

-static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
-{
- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
- shared);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ if (was_leaf && is_accessed_spte(old_spte) &&
+ (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
+ kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

/*
@@ -639,9 +623,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
return -EBUSY;

- __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
- new_spte, iter->level, true);
- handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
+ handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+ new_spte, iter->level, true);

return 0;
}
@@ -705,8 +688,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);

- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
return old_spte;
}

@@ -1275,7 +1257,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
* Note, when changing a read-only SPTE, it's not strictly necessary to
* zero the SPTE before setting the new PFN, but doing so preserves the
* invariant that the PFN of a present * leaf SPTE can never change.
- * See __handle_changed_spte().
+ * See handle_changed_spte().
*/
tdp_mmu_iter_set_spte(kvm, iter, 0);

@@ -1300,7 +1282,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
/*
* No need to handle the remote TLB flush under RCU protection, the
* target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
- * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
+ * shadow page. See the WARN on pfn_changed in handle_changed_spte().
*/
return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
}
--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:12:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 12/13] KVM: x86/mmu: Remove handle_changed_spte_dirty_log()

From: Vipin Sharma <[email protected]>

Remove handle_changed_spte_dirty_log() as there is no code flow which
sets 4KiB SPTE writable and hit this path. This function marks the page
dirty in a memslot only if new SPTE is 4KiB in size and writable.

Current users of handle_changed_spte_dirty_log() are:
1. set_spte_gfn() - Create only non writable SPTEs.
2. write_protect_gfn() - Change an SPTE to non writable.
3. zap leaf and roots APIs - Everything is 0.
4. handle_removed_pt() - Sets SPTEs to REMOVED_SPTE
5. tdp_mmu_link_sp() - Makes non leaf SPTEs.

There is also no path which creates a writable 4KiB without going
through make_spte() and this functions takes care of marking SPTE dirty
in the memslot if it is PT_WRITABLE.

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: add blurb to __handle_changed_spte()'s comment]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 26 +++-----------------------
1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9649e0fe4302..e8ee49b6da5b 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -345,24 +345,6 @@ static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

-static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level)
-{
- bool pfn_changed;
- struct kvm_memory_slot *slot;
-
- if (level > PG_LEVEL_4K)
- return;
-
- pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-
- if ((!is_writable_pte(old_spte) || pfn_changed) &&
- is_writable_pte(new_spte)) {
- slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
- mark_page_dirty_in_slot(kvm, slot, gfn);
- }
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -516,7 +498,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
* the MMU lock and the operation must synchronize with other
* threads that might be modifying SPTEs.
*
- * Handle bookkeeping that might result from the modification of a SPTE.
+ * Handle bookkeeping that might result from the modification of a SPTE. Note,
+ * dirty logging updates are handled in common code, not here (see make_spte()
+ * and fast_pf_fix_direct_spte()).
*/
static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level,
@@ -613,8 +597,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
shared);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
}

/*
@@ -725,8 +707,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
- level);
return old_spte;
}

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:14:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 11/13] KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()

From: Vipin Sharma <[email protected]>

Remove bool parameter "record_acc_track" from __tdp_mmu_set_spte() and
refactor the code. This variable is always set to true by its caller.

Remove single and double underscore prefix from tdp_mmu_set_spte()
related APIs:
1. Change __tdp_mmu_set_spte() to tdp_mmu_set_spte()
2. Change _tdp_mmu_set_spte() to tdp_mmu_iter_set_spte()

Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++-------------------------
1 file changed, 17 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index cdfb67ef5800..9649e0fe4302 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -695,7 +695,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,


/*
- * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
+ * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
* @kvm: KVM instance
* @as_id: Address space ID, i.e. regular vs. SMM
* @sptep: Pointer to the SPTE
@@ -703,18 +703,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* @new_spte: The new value that will be set for the SPTE
* @gfn: The base GFN that was (or will be) mapped by the SPTE
* @level: The level _containing_ the SPTE (its parent PT's level)
- * @record_acc_track: Notify the MM subsystem of changes to the accessed state
- * of the page. Should be set unless handling an MMU
- * notifier for access tracking. Leaving record_acc_track
- * unset in that case prevents page accesses from being
- * double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
-static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
- u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track)
+static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
+ u64 old_spte, u64 new_spte, gfn_t gfn, int level)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -730,30 +724,19 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);

__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
-
- if (record_acc_track)
- handle_changed_spte_acc_track(old_spte, new_spte, level);
-
+ handle_changed_spte_acc_track(old_spte, new_spte, level);
handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
level);
return old_spte;
}

-static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track)
+static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter,
+ u64 new_spte)
{
WARN_ON_ONCE(iter->yielded);
-
- iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
- iter->old_spte, new_spte,
- iter->gfn, iter->level,
- record_acc_track);
-}
-
-static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true);
+ iter->old_spte = tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
+ iter->old_spte, new_spte,
+ iter->gfn, iter->level);
}

#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -845,7 +828,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
continue;

if (!shared)
- tdp_mmu_set_spte(kvm, &iter, 0);
+ tdp_mmu_iter_set_spte(kvm, &iter, 0);
else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
goto retry;
}
@@ -902,8 +885,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
return false;

- __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true);
+ tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+ sp->gfn, sp->role.level + 1);

return true;
}
@@ -937,7 +920,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
!is_last_spte(iter.old_spte, iter.level))
continue;

- tdp_mmu_set_spte(kvm, &iter, 0);
+ tdp_mmu_iter_set_spte(kvm, &iter, 0);
flush = true;
}

@@ -1107,7 +1090,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
if (ret)
return ret;
} else {
- tdp_mmu_set_spte(kvm, iter, spte);
+ tdp_mmu_iter_set_spte(kvm, iter, spte);
}

tdp_account_mmu_page(kvm, sp);
@@ -1314,13 +1297,13 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
* invariant that the PFN of a present * leaf SPTE can never change.
* See __handle_changed_spte().
*/
- tdp_mmu_set_spte(kvm, iter, 0);
+ tdp_mmu_iter_set_spte(kvm, iter, 0);

if (!pte_write(range->pte)) {
new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
pte_pfn(range->pte));

- tdp_mmu_set_spte(kvm, iter, new_spte);
+ tdp_mmu_iter_set_spte(kvm, iter, new_spte);
}

return true;
@@ -1805,7 +1788,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
if (new_spte == iter.old_spte)
break;

- tdp_mmu_set_spte(kvm, &iter, new_spte);
+ tdp_mmu_iter_set_spte(kvm, &iter, new_spte);
spte_set = true;
}

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-21 22:15:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v4 10/13] KVM: x86/mmu: Bypass __handle_changed_spte() when aging TDP MMU SPTEs

From: Vipin Sharma <[email protected]>

Drop everything except the "tdp_mmu_spte_changed" tracepoint part of
__handle_changed_spte() when aging SPTEs in the TDP MMU, as clearing the
accessed status doesn't affect the SPTE's shadow-present status, whether
or not the SPTE is a leaf, or change the PFN. I.e. none of the functional
updates handled by __handle_changed_spte() are relevant.

Losing __handle_changed_spte()'s sanity checks does mean that a bug could
theoretical go unnoticed, but that scenario is extremely unlikely, e.g.
would effectively require a misconfigured MMU or a locking bug elsewhere.

Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
[sean: massage changelog]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 29bb97ff266e..cdfb67ef5800 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1275,8 +1275,8 @@ static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
iter->level);
}

- __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
- new_spte, iter->level, false);
+ trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level,
+ iter->old_spte, new_spte);
return true;
}

--
2.40.0.rc2.332.ga46443480c-goog

2023-03-31 00:29:39

by Vipin Sharma

[permalink] [raw]
Subject: Re: [PATCH v4 00/13] KVM: x86/mmu: Optimize clear dirty log

On Tue, Mar 21, 2023 at 3:00 PM Sean Christopherson <[email protected]> wrote:
>
> This is a massaged version of Vipin's series to optimize clearing dirty
> state in the TDP MMU. It's basically the same as v3, just spread out over
> more patches. The only meaningful difference in the end is that
> clear_dirty_gfn_range() also gets similar treatment in handling Dirty vs.
> Writable logic.
>
> Vipin, I'm still planning on applying this for 6.4, but the changes ended
> up being a wee bit bigger than I'm comfortable making on the fly, thus the
> formal posting.
>

Changes look good to me and verified by running perf test also that
the improvement is still similar to previous version. Thanks for the
v4 work.

2023-04-04 23:46:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 00/13] KVM: x86/mmu: Optimize clear dirty log

On Tue, 21 Mar 2023 15:00:08 -0700, Sean Christopherson wrote:
> This is a massaged version of Vipin's series to optimize clearing dirty
> state in the TDP MMU. It's basically the same as v3, just spread out over
> more patches. The only meaningful difference in the end is that
> clear_dirty_gfn_range() also gets similar treatment in handling Dirty vs.
> Writable logic.
>
> Vipin, I'm still planning on applying this for 6.4, but the changes ended
> up being a wee bit bigger than I'm comfortable making on the fly, thus the
> formal posting.
>
> [...]

Applied to kvm-x86 mmu, thanks!

[01/13] KVM: x86/mmu: Add a helper function to check if an SPTE needs atomic write
https://github.com/kvm-x86/linux/commit/41e07665f1a6
[02/13] KVM: x86/mmu: Use kvm_ad_enabled() to determine if TDP MMU SPTEs need wrprot
https://github.com/kvm-x86/linux/commit/5982a5392663
[03/13] KVM: x86/mmu: Consolidate Dirty vs. Writable clearing logic in TDP MMU
https://github.com/kvm-x86/linux/commit/697c89bed94e
[04/13] KVM: x86/mmu: Atomically clear SPTE dirty state in the clear-dirty-log flow
https://github.com/kvm-x86/linux/commit/89c313f20c1e
[05/13] KVM: x86/mmu: Drop access tracking checks when clearing TDP MMU dirty bits
https://github.com/kvm-x86/linux/commit/cf05e8c7325e
[06/13] KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bits
https://github.com/kvm-x86/linux/commit/1e0f42985ffa
[07/13] KVM: x86/mmu: Remove "record_dirty_log" in __tdp_mmu_set_spte()
https://github.com/kvm-x86/linux/commit/e73008705d0c
[08/13] KVM: x86/mmu: Clear only A-bit (if enabled) when aging TDP MMU SPTEs
https://github.com/kvm-x86/linux/commit/7ee131e3a3c3
[09/13] KVM: x86/mmu: Drop unnecessary dirty log checks when aging TDP MMU SPTEs
https://github.com/kvm-x86/linux/commit/6141df067d04
[10/13] KVM: x86/mmu: Bypass __handle_changed_spte() when aging TDP MMU SPTEs
https://github.com/kvm-x86/linux/commit/891f11596068
[11/13] KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()
https://github.com/kvm-x86/linux/commit/0b7cc2547d53
[12/13] KVM: x86/mmu: Remove handle_changed_spte_dirty_log()
https://github.com/kvm-x86/linux/commit/1f9973456e80
[13/13] KVM: x86/mmu: Merge all handle_changed_pte*() functions
https://github.com/kvm-x86/linux/commit/40fa907e5a69

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

2023-06-25 08:05:33

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v4 06/13] KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bits

On 22/3/2023 6:00 am, Sean Christopherson wrote:
> From: Vipin Sharma <[email protected]>
>
> Drop everything except marking the PFN dirty and the relevant tracepoint
> parts of __handle_changed_spte() when clearing the dirty status of gfns in
> the TDP MMU. Clearing only the Dirty (or Writable) bit doesn't affect
> the SPTEs shadow-present status, whether or not the SPTE is a leaf, or
> change the SPTE's PFN. I.e. other than marking the PFN dirty, none of the
> functional updates handled by __handle_changed_spte() are relevant.
>
> Losing __handle_changed_spte()'s sanity checks does mean that a bug could
> theoretical go unnoticed, but that scenario is extremely unlikely, e.g.
> would effectively require a misconfigured or a locking bug elsewhere.
>
> Opportunistically remove a comment blurb from __handle_changed_spte()
> about all modifications to TDP MMU SPTEs needing to invoke said function,
> that "rule" hasn't been true since fast page fault support was added for
> the TDP MMU (and perhaps even before).
>
> Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of
> clear dirty log stage improved by ~40% in dirty_log_perf_test (with the
> full optimization applied).
>
> Before optimization:
> --------------------
> Iteration 1 clear dirty log time: 3.638543593s
> Iteration 2 clear dirty log time: 3.145032742s
> Iteration 3 clear dirty log time: 3.142340358s
> Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)
>
> After optimization:
> -------------------
> Iteration 1 clear dirty log time: 2.318988110s
> Iteration 2 clear dirty log time: 1.794470164s
> Iteration 3 clear dirty log time: 1.791668628s
> Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)
>
> Link: https://lore.kernel.org/all/Y9hXmz%[email protected]
> Signed-off-by: Vipin Sharma <[email protected]>
> Reviewed-by: David Matlack <[email protected]>
> [sean: split the switch to atomic-AND to a separate patch]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 950c5d23ecee..467931c43968 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -517,7 +517,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> * threads that might be modifying SPTEs.
> *
> * Handle bookkeeping that might result from the modification of a SPTE.
> - * This function must be called for all TDP SPTE modifications.
> */
> static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> u64 old_spte, u64 new_spte, int level,
> @@ -1689,8 +1688,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> iter.old_spte, dbit,
> iter.level);
>
> - __handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
> - iter.old_spte & ~dbit, iter.level, false);
> + trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,

Here the first parameter "kvm" is no longer used in this context.

Please help confirm that for clear_dirty_pt_masked(), should the "struct kvm
*kvm" parameter
be cleared from the list of incoming parameters ?

> + iter.old_spte,
> + iter.old_spte & ~dbit);
> + kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
> }
>
> rcu_read_unlock();

2023-06-26 21:50:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v4 06/13] KVM: x86/mmu: Bypass __handle_changed_spte() when clearing TDP MMU dirty bits

On Sun, Jun 25, 2023, Like Xu wrote:
> On 22/3/2023 6:00 am, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 950c5d23ecee..467931c43968 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -517,7 +517,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> > * threads that might be modifying SPTEs.
> > *
> > * Handle bookkeeping that might result from the modification of a SPTE.
> > - * This function must be called for all TDP SPTE modifications.
> > */
> > static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> > u64 old_spte, u64 new_spte, int level,
> > @@ -1689,8 +1688,10 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> > iter.old_spte, dbit,
> > iter.level);
> > - __handle_changed_spte(kvm, iter.as_id, iter.gfn, iter.old_spte,
> > - iter.old_spte & ~dbit, iter.level, false);
> > + trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
>
> Here the first parameter "kvm" is no longer used in this context.
>
> Please help confirm that for clear_dirty_pt_masked(), should the "struct kvm
> *kvm" parameter be cleared from the list of incoming parameters ?

Hmm, there's only one caller, so keeping @kvm around "just in case" probably
doesn't make sense, e.g. adding it back so that we could do KVM_BUG_ON() in the
future wouldn't require much churn.

That said, I'm tempted to move the lockdep so that it's more obvious why it's safe
for clear_dirty_pt_masked() to use the non-atomic (for non-volatile SPTEs)
tdp_mmu_clear_spte_bits() helper. for_each_tdp_mmu_root() does its own lockdep,
so the only "loss" in lockdep coverage is if the list is completely empty.

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 512163d52194..0b4f03bef70e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1600,6 +1600,8 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
shadow_dirty_mask;
struct tdp_iter iter;

+ lockdep_assert_held_write(&kvm->mmu_lock);
+
rcu_read_lock();

tdp_root_for_each_leaf_pte(iter, root, gfn + __ffs(mask),
@@ -1646,7 +1648,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
{
struct kvm_mmu_page *root;

- lockdep_assert_held_write(&kvm->mmu_lock);
for_each_tdp_mmu_root(kvm, root, slot->as_id)
clear_dirty_pt_masked(kvm, root, gfn, mask, wrprot);
}