Hi,
This patch series has optimized control flow of clearing dirty log and
improved its performance by ~40% (2% more than v2).
It also got rid of many variants of the handle_changed_spte family of
functions and converged logic to one handle_changed_spte() function. It
also remove tdp_mmu_set_spte_no_[acc_track|dirty_log] and various
booleans for controlling them.
Thanks,
Vipin
v3:
- Tried to do better job at writing commit messages.
- Made kvm_tdp_mmu_clear_spte_bits() similar to the kvm_tdp_mmu_write_spte().
- clear_dirty_pt_masked() evaluates mask for the bit to be cleared outside the
loop and use that for all of the SPTEs instead of calculating for each SPTE.
- Some naming changes based on the feedbacks.
- Split out the dead code clean from the optimization code.
v2: https://lore.kernel.org/lkml/[email protected]/
- Clear dirty log and age gfn range does not go through
handle_changed_spte, they handle their SPTE changes locally to improve
their speed.
- Clear only specific bits atomically when updating SPTEs in clearing
dirty log and aging gfn range functions.
- Removed tdp_mmu_set_spte_no_[acc_track|dirty_log] APIs.
- Converged all handle_changed_spte related functions to one place.
v1: https://lore.kernel.org/lkml/[email protected]/
Vipin Sharma (7):
KVM: x86/mmu: Add a helper function to check if an SPTE needs atomic
write
KVM: x86/mmu: Atomically clear SPTE dirty state in the clear-dirty-log
flow
KVM: x86/mmu: Remove "record_dirty_log" in __tdp_mmu_set_spte()
KVM: x86/mmu: Optimize SPTE change for aging gfn range
KVM: x86/mmu: Remove "record_acc_track" in __tdp_mmu_set_spte()
KVM: x86/mmu: Remove handle_changed_spte_dirty_log()
KVM: x86/mmu: Merge all handle_changed_pte* functions.
arch/x86/kvm/mmu/tdp_iter.h | 48 ++++++---
arch/x86/kvm/mmu/tdp_mmu.c | 190 ++++++++++++------------------------
2 files changed, 96 insertions(+), 142 deletions(-)
--
2.39.1.581.gbfd45094c4-goog
Move conditions in kvm_tdp_mmu_write_spte() to check if an SPTE should
be written atomically or not to a separate function.
This new function, kvm_tdp_mmu_spte_need_atomic_write(), will be used
in future commits to optimize clearing bits in SPTEs.
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..c11c5d00b2c1 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -29,23 +29,29 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
WRITE_ONCE(*rcu_dereference(sptep), new_spte);
}
+/*
+ * SPTEs must be modified atomically if they are shadow-present, leaf
+ * SPTEs, and have volatile bits, i.e. has bits that can be set outside
+ * of mmu_lock. The Writable bit can be set by KVM's fast page fault
+ * handler, and Accessed and Dirty bits can be set by the CPU.
+ *
+ * Note, non-leaf SPTEs do have Accessed bits and those bits are
+ * technically volatile, but KVM doesn't consume the Accessed bit of
+ * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
+ * logic needs to be reassessed if KVM were to use non-leaf Accessed
+ * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
+ */
+static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level)
+{
+ return is_shadow_present_pte(old_spte) &&
+ is_last_spte(old_spte, level) &&
+ spte_has_volatile_bits(old_spte);
+}
+
static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
u64 new_spte, int level)
{
- /*
- * Atomically write the SPTE if it is a shadow-present, leaf SPTE with
- * volatile bits, i.e. has bits that can be set outside of mmu_lock.
- * The Writable bit can be set by KVM's fast page fault handler, and
- * Accessed and Dirty bits can be set by the CPU.
- *
- * Note, non-leaf SPTEs do have Accessed bits and those bits are
- * technically volatile, but KVM doesn't consume the Accessed bit of
- * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This
- * logic needs to be reassessed if KVM were to use non-leaf Accessed
- * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
- */
- if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
- spte_has_volatile_bits(old_spte))
+ if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level))
return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
__kvm_tdp_mmu_write_spte(sptep, new_spte);
--
2.39.1.581.gbfd45094c4-goog
Do atomic-AND to clear the dirty state of SPTEs. Optimize clear-dirty-log
flow by avoiding to go through __handle_changed_spte() and directly call
kvm_set_pfn_dirty() instead.
Atomic-AND allows to fetch the latest value in SPTE, clear only its
dirty state and set the new SPTE value. This optimization avoids
executing unnecessary checks by not calling __handle_changed_spte().
With the removal of tdp_mmu_set_spte_no_dirty_log(), "record_dirty_log"
parameter in __tdp_mmu_set_spte() is now obsolete. It will always be set
to true by its caller. This dead code will be cleaned up in future
commits.
Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of clear
dirty log stage improved by ~40% in dirty_log_perf_test
Before optimization:
--------------------
Iteration 1 clear dirty log time: 3.638543593s
Iteration 2 clear dirty log time: 3.145032742s
Iteration 3 clear dirty log time: 3.142340358s
Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)
After optimization:
-------------------
Iteration 1 clear dirty log time: 2.318988110s
Iteration 2 clear dirty log time: 1.794470164s
Iteration 3 clear dirty log time: 1.791668628s
Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)
Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 14 ++++++++++++++
arch/x86/kvm/mmu/tdp_mmu.c | 35 +++++++++++++++--------------------
2 files changed, 29 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index c11c5d00b2c1..fae559559a80 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -58,6 +58,20 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
return old_spte;
}
+static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte,
+ u64 mask, int level)
+{
+ atomic64_t *sptep_atomic;
+
+ if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) {
+ sptep_atomic = (atomic64_t *)rcu_dereference(sptep);
+ return (u64)atomic64_fetch_and(~mask, sptep_atomic);
+ }
+
+ __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask);
+ return old_spte;
+}
+
/*
* A TDP iterator performs a pre-order walk over a TDP paging structure.
*/
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bba33aea0fb0..66ccbeb9d845 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -771,13 +771,6 @@ static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
_tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
}
-static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
-}
-
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
for_each_tdp_pte(_iter, _root, _start, _end)
@@ -1677,8 +1670,13 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
gfn_t gfn, unsigned long mask, bool wrprot)
{
+ /*
+ * Either all SPTEs in TDP MMU will need write protection or none. This
+ * contract will not be modified for TDP MMU pages.
+ */
+ u64 clear_bit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
+ shadow_dirty_mask;
struct tdp_iter iter;
- u64 new_spte;
rcu_read_lock();
@@ -1693,19 +1691,16 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
mask &= ~(1UL << (iter.gfn - gfn));
- if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
- if (is_writable_pte(iter.old_spte))
- new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
- else
- continue;
- } else {
- if (iter.old_spte & shadow_dirty_mask)
- new_spte = iter.old_spte & ~shadow_dirty_mask;
- else
- continue;
- }
+ if (!(iter.old_spte & clear_bit))
+ continue;
- tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+ iter.old_spte = tdp_mmu_clear_spte_bits(iter.sptep,
+ iter.old_spte,
+ clear_bit, iter.level);
+ trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
+ iter.old_spte,
+ iter.old_spte & ~clear_bit);
+ kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
}
rcu_read_unlock();
--
2.39.1.581.gbfd45094c4-goog
Remove bool parameter "record_dirty_log" from __tdp_mmu_set_spte() and
refactor the code as this variable is always set to true by its caller.
Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 66ccbeb9d845..c895560244de 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -710,18 +710,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* notifier for access tracking. Leaving record_acc_track
* unset in that case prevents page accesses from being
* double counted.
- * @record_dirty_log: Record the page as dirty in the dirty bitmap if
- * appropriate for the change being made. Should be set
- * unless performing certain dirty logging operations.
- * Leaving record_dirty_log unset in that case prevents page
- * writes from being double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track, bool record_dirty_log)
+ bool record_acc_track)
{
lockdep_assert_held_write(&kvm->mmu_lock);
@@ -740,35 +735,34 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
if (record_acc_track)
handle_changed_spte_acc_track(old_spte, new_spte, level);
- if (record_dirty_log)
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
+
+ handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
+ level);
return old_spte;
}
static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track,
- bool record_dirty_log)
+ u64 new_spte, bool record_acc_track)
{
WARN_ON_ONCE(iter->yielded);
iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
iter->old_spte, new_spte,
iter->gfn, iter->level,
- record_acc_track, record_dirty_log);
+ record_acc_track);
}
static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, true);
}
static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, false);
}
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -918,7 +912,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
return false;
__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true, true);
+ sp->gfn, sp->role.level + 1, true);
return true;
}
--
2.39.1.581.gbfd45094c4-goog
No need to check all of the conditions in __handle_changed_spte(). Aging
a gfn range implies resetting access bit or marking spte for access
tracking.
Use atomic operation to only reset those bits. This avoids checking many
conditions in __handle_changed_spte() API. Also, clean up code by
removing dead code and API parameters.
Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 36 +++++++++++++++++++-----------------
1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index c895560244de..5d6e77554797 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -758,13 +758,6 @@ static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
_tdp_mmu_set_spte(kvm, iter, new_spte, true);
}
-static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false);
-}
-
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
for_each_tdp_pte(_iter, _root, _start, _end)
@@ -1251,32 +1244,41 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
/*
* Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
* if any of the GFNs in the range have been accessed.
+ *
+ * No need to mark corresponding PFN as accessed as this call is coming from
+ * the clear_young() or clear_flush_young() notifier, which uses the return
+ * value to determine if the page has been accessed.
*/
static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
struct kvm_gfn_range *range)
{
- u64 new_spte = 0;
+ u64 new_spte;
/* If we have a non-accessed entry we don't need to change the pte. */
if (!is_accessed_spte(iter->old_spte))
return false;
- new_spte = iter->old_spte;
-
- if (spte_ad_enabled(new_spte)) {
- new_spte &= ~shadow_accessed_mask;
+ if (spte_ad_enabled(iter->old_spte)) {
+ iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
+ iter->old_spte,
+ shadow_accessed_mask,
+ iter->level);
+ new_spte = iter->old_spte & ~shadow_accessed_mask;
} else {
+ new_spte = mark_spte_for_access_track(iter->old_spte);
+ iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
+ iter->old_spte, new_spte,
+ iter->level);
/*
* Capture the dirty status of the page, so that it doesn't get
* lost when the SPTE is marked for access tracking.
*/
- if (is_writable_pte(new_spte))
- kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
- new_spte = mark_spte_for_access_track(new_spte);
+ if (is_writable_pte(iter->old_spte))
+ kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
}
- tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
+ trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level,
+ iter->old_spte, new_spte);
return true;
}
--
2.39.1.581.gbfd45094c4-goog
Remove bool parameter "record_acc_track" from __tdp_mmu_set_spte() and
refactor the code. This variable is always set to true by its caller.
Remove single and double underscore prefix from tdp_mmu_set_spte()
related APIs:
1. Change __tdp_mmu_set_spte() to tdp_mmu_set_spte()
2. Change _tdp_mmu_set_spte() to tdp_mmu_iter_set_spte()
Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++-------------------------
1 file changed, 17 insertions(+), 34 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 5d6e77554797..e50e869bf879 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -697,7 +697,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
/*
- * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
+ * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
* @kvm: KVM instance
* @as_id: Address space ID, i.e. regular vs. SMM
* @sptep: Pointer to the SPTE
@@ -705,18 +705,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* @new_spte: The new value that will be set for the SPTE
* @gfn: The base GFN that was (or will be) mapped by the SPTE
* @level: The level _containing_ the SPTE (its parent PT's level)
- * @record_acc_track: Notify the MM subsystem of changes to the accessed state
- * of the page. Should be set unless handling an MMU
- * notifier for access tracking. Leaving record_acc_track
- * unset in that case prevents page accesses from being
- * double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
-static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
- u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track)
+static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
+ u64 old_spte, u64 new_spte, gfn_t gfn, int level)
{
lockdep_assert_held_write(&kvm->mmu_lock);
@@ -732,30 +726,19 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
-
- if (record_acc_track)
- handle_changed_spte_acc_track(old_spte, new_spte, level);
-
+ handle_changed_spte_acc_track(old_spte, new_spte, level);
handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
level);
return old_spte;
}
-static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track)
+static inline void tdp_mmu_iter_set_spte(struct kvm *kvm, struct tdp_iter *iter,
+ u64 new_spte)
{
WARN_ON_ONCE(iter->yielded);
-
- iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
- iter->old_spte, new_spte,
- iter->gfn, iter->level,
- record_acc_track);
-}
-
-static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true);
+ iter->old_spte = tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
+ iter->old_spte, new_spte,
+ iter->gfn, iter->level);
}
#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -847,7 +830,7 @@ static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
continue;
if (!shared)
- tdp_mmu_set_spte(kvm, &iter, 0);
+ tdp_mmu_iter_set_spte(kvm, &iter, 0);
else if (tdp_mmu_set_spte_atomic(kvm, &iter, 0))
goto retry;
}
@@ -904,8 +887,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
return false;
- __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true);
+ tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+ sp->gfn, sp->role.level + 1);
return true;
}
@@ -939,7 +922,7 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,
!is_last_spte(iter.old_spte, iter.level))
continue;
- tdp_mmu_set_spte(kvm, &iter, 0);
+ tdp_mmu_iter_set_spte(kvm, &iter, 0);
flush = true;
}
@@ -1110,7 +1093,7 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
if (ret)
return ret;
} else {
- tdp_mmu_set_spte(kvm, iter, spte);
+ tdp_mmu_iter_set_spte(kvm, iter, spte);
}
tdp_account_mmu_page(kvm, sp);
@@ -1317,13 +1300,13 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
* invariant that the PFN of a present * leaf SPTE can never change.
* See __handle_changed_spte().
*/
- tdp_mmu_set_spte(kvm, iter, 0);
+ tdp_mmu_iter_set_spte(kvm, iter, 0);
if (!pte_write(range->pte)) {
new_spte = kvm_mmu_changed_pte_notifier_make_spte(iter->old_spte,
pte_pfn(range->pte));
- tdp_mmu_set_spte(kvm, iter, new_spte);
+ tdp_mmu_iter_set_spte(kvm, iter, new_spte);
}
return true;
@@ -1814,7 +1797,7 @@ static bool write_protect_gfn(struct kvm *kvm, struct kvm_mmu_page *root,
if (new_spte == iter.old_spte)
break;
- tdp_mmu_set_spte(kvm, &iter, new_spte);
+ tdp_mmu_iter_set_spte(kvm, &iter, new_spte);
spte_set = true;
}
--
2.39.1.581.gbfd45094c4-goog
Remove handle_changed_spte_dirty_log() as there is no code flow which
sets 4KiB SPTE writable and hit this path. This function marks the page
dirty in a memslot only if new SPTE is 4KiB in size and writable.
Current users of handle_changed_spte_dirty_log() are:
1. set_spte_gfn() - Create only non writable SPTEs.
2. write_protect_gfn() - Change an SPTE to non writable.
3. zap leaf and roots APIs - Everything is 0.
4. handle_removed_pt() - Sets SPTEs to REMOVED_SPTE
5. tdp_mmu_link_sp() - Makes non leaf SPTEs.
There is also no path which creates a writable 4KiB without going
through make_spte() and this functions takes care of marking SPTE dirty
in the memslot if it is PT_WRITABLE.
Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 22 ----------------------
1 file changed, 22 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index e50e869bf879..0c031319e901 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -345,24 +345,6 @@ static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}
-static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level)
-{
- bool pfn_changed;
- struct kvm_memory_slot *slot;
-
- if (level > PG_LEVEL_4K)
- return;
-
- pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-
- if ((!is_writable_pte(old_spte) || pfn_changed) &&
- is_writable_pte(new_spte)) {
- slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
- mark_page_dirty_in_slot(kvm, slot, gfn);
- }
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -614,8 +596,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
shared);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
}
/*
@@ -727,8 +707,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
- level);
return old_spte;
}
--
2.39.1.581.gbfd45094c4-goog
__handle_changed_pte() and handle_changed_spte_acc_track() are always
used together. Merge these two functions and name the new function
handle_changed_pte(). Remove the existing handle_changed_pte() function
which just calls __handle_changed_pte and
handle_changed_spte_acc_track().
This converges SPTEs change handling code to a single place.
Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++++++---------------------------
1 file changed, 12 insertions(+), 30 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 0c031319e901..67538ec48ce5 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,17 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level,
bool shared);
-static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
-{
- if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
- return;
-
- if (is_accessed_spte(old_spte) &&
- (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
- spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
- kvm_set_pfn_accessed(spte_to_pfn(old_spte));
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -487,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
}
/**
- * __handle_changed_spte - handle bookkeeping associated with an SPTE change
+ * handle_changed_spte - handle bookkeeping associated with an SPTE change
* @kvm: kvm instance
* @as_id: the address space of the paging structure the SPTE was a part of
* @gfn: the base GFN that was mapped by the SPTE
@@ -501,9 +490,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
* Handle bookkeeping that might result from the modification of a SPTE.
* This function must be called for all TDP SPTE modifications.
*/
-static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
+static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
+ u64 old_spte, u64 new_spte, int level,
+ bool shared)
{
bool was_present = is_shadow_present_pte(old_spte);
bool is_present = is_shadow_present_pte(new_spte);
@@ -587,15 +576,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (was_present && !was_leaf &&
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-}
-static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
-{
- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
- shared);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ if (was_leaf && is_accessed_spte(old_spte) &&
+ (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
+ kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}
/*
@@ -638,9 +622,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
return -EBUSY;
- __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
- new_spte, iter->level, true);
- handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
+ handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+ new_spte, iter->level, true);
return 0;
}
@@ -705,8 +688,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
return old_spte;
}
@@ -1276,7 +1258,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
* Note, when changing a read-only SPTE, it's not strictly necessary to
* zero the SPTE before setting the new PFN, but doing so preserves the
* invariant that the PFN of a present * leaf SPTE can never change.
- * See __handle_changed_spte().
+ * See handle_changed_spte().
*/
tdp_mmu_iter_set_spte(kvm, iter, 0);
@@ -1301,7 +1283,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
/*
* No need to handle the remote TLB flush under RCU protection, the
* target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
- * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
+ * shadow page. See the WARN on pfn_changed in handle_changed_spte().
*/
return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
}
--
2.39.1.581.gbfd45094c4-goog
On Fri, Feb 10, 2023 at 05:46:22PM -0800, Vipin Sharma wrote:
> Remove bool parameter "record_dirty_log" from __tdp_mmu_set_spte() and
> refactor the code as this variable is always set to true by its caller.
>
> Signed-off-by: Vipin Sharma <[email protected]>
> ---
Reviewed-by: David Matlack <[email protected]>
On Fri, Feb 10, 2023 at 05:46:21PM -0800, Vipin Sharma wrote:
> Do atomic-AND to clear the dirty state of SPTEs. Optimize clear-dirty-log
> flow by avoiding to go through __handle_changed_spte() and directly call
> kvm_set_pfn_dirty() instead.
>
> Atomic-AND allows to fetch the latest value in SPTE, clear only its
> dirty state and set the new SPTE value. This optimization avoids
> executing unnecessary checks by not calling __handle_changed_spte().
>
> With the removal of tdp_mmu_set_spte_no_dirty_log(), "record_dirty_log"
> parameter in __tdp_mmu_set_spte() is now obsolete. It will always be set
> to true by its caller. This dead code will be cleaned up in future
> commits.
>
> Tested on a VM (160 vCPUs, 160 GB memory) and found that performance of clear
> dirty log stage improved by ~40% in dirty_log_perf_test
>
> Before optimization:
> --------------------
> Iteration 1 clear dirty log time: 3.638543593s
> Iteration 2 clear dirty log time: 3.145032742s
> Iteration 3 clear dirty log time: 3.142340358s
> Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)
>
> After optimization:
> -------------------
> Iteration 1 clear dirty log time: 2.318988110s
> Iteration 2 clear dirty log time: 1.794470164s
> Iteration 3 clear dirty log time: 1.791668628s
> Clear dirty log over 3 iterations took 5.905126902s. (Avg 1.968375634s/iteration)
>
> Signed-off-by: Vipin Sharma <[email protected]>
Reviewed-by: David Matlack <[email protected]>
On Fri, Feb 10, 2023 at 05:46:23PM -0800, Vipin Sharma wrote:
> No need to check all of the conditions in __handle_changed_spte(). Aging
> a gfn range implies resetting access bit or marking spte for access
> tracking.
nit: State what the patch does first.
>
> Use atomic operation to only reset those bits. This avoids checking many
> conditions in __handle_changed_spte() API. Also, clean up code by
> removing dead code and API parameters.
>
> Signed-off-by: Vipin Sharma <[email protected]>
nits aside,
Reviewed-by: David Matlack <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 36 +++++++++++++++++++-----------------
> 1 file changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index c895560244de..5d6e77554797 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -758,13 +758,6 @@ static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
> _tdp_mmu_set_spte(kvm, iter, new_spte, true);
> }
>
> -static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
> - struct tdp_iter *iter,
> - u64 new_spte)
> -{
> - _tdp_mmu_set_spte(kvm, iter, new_spte, false);
> -}
> -
> #define tdp_root_for_each_pte(_iter, _root, _start, _end) \
> for_each_tdp_pte(_iter, _root, _start, _end)
>
> @@ -1251,32 +1244,41 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
> /*
> * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
> * if any of the GFNs in the range have been accessed.
> + *
> + * No need to mark corresponding PFN as accessed as this call is coming from
> + * the clear_young() or clear_flush_young() notifier, which uses the return
> + * value to determine if the page has been accessed.
> */
> static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
> struct kvm_gfn_range *range)
> {
> - u64 new_spte = 0;
> + u64 new_spte;
>
> /* If we have a non-accessed entry we don't need to change the pte. */
> if (!is_accessed_spte(iter->old_spte))
> return false;
>
> - new_spte = iter->old_spte;
> -
> - if (spte_ad_enabled(new_spte)) {
> - new_spte &= ~shadow_accessed_mask;
> + if (spte_ad_enabled(iter->old_spte)) {
> + iter->old_spte = tdp_mmu_clear_spte_bits(iter->sptep,
nit: Extra space before =
On Fri, Feb 10, 2023 at 05:46:24PM -0800, Vipin Sharma wrote:
> Remove bool parameter "record_acc_track" from __tdp_mmu_set_spte() and
> refactor the code. This variable is always set to true by its caller.
>
> Remove single and double underscore prefix from tdp_mmu_set_spte()
uber-nit: I find it helpful to use phrasing like "Opportunistically do
X" for opportunistic cleanups that are separate from the primary change.
Otherwise the commit message reads as if 2 totally independent changes
are being made.
> related APIs:
> 1. Change __tdp_mmu_set_spte() to tdp_mmu_set_spte()
> 2. Change _tdp_mmu_set_spte() to tdp_mmu_iter_set_spte()
>
> Signed-off-by: Vipin Sharma <[email protected]>
> ---
Reviewed-by: David Matlack <[email protected]>
On Fri, Feb 10, 2023 at 05:46:25PM -0800, Vipin Sharma wrote:
> Remove handle_changed_spte_dirty_log() as there is no code flow which
> sets 4KiB SPTE writable and hit this path. This function marks the page
> dirty in a memslot only if new SPTE is 4KiB in size and writable.
>
> Current users of handle_changed_spte_dirty_log() are:
> 1. set_spte_gfn() - Create only non writable SPTEs.
> 2. write_protect_gfn() - Change an SPTE to non writable.
> 3. zap leaf and roots APIs - Everything is 0.
> 4. handle_removed_pt() - Sets SPTEs to REMOVED_SPTE
> 5. tdp_mmu_link_sp() - Makes non leaf SPTEs.
>
> There is also no path which creates a writable 4KiB without going
> through make_spte() and this functions takes care of marking SPTE dirty
> in the memslot if it is PT_WRITABLE.
>
> Signed-off-by: Vipin Sharma <[email protected]>
Aside from the comment suggestion,
Reviewed-by: David Matlack <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 22 ----------------------
> 1 file changed, 22 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index e50e869bf879..0c031319e901 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -345,24 +345,6 @@ static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
> kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> }
>
> -static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level)
> -{
> - bool pfn_changed;
> - struct kvm_memory_slot *slot;
> -
> - if (level > PG_LEVEL_4K)
> - return;
> -
> - pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
> -
> - if ((!is_writable_pte(old_spte) || pfn_changed) &&
> - is_writable_pte(new_spte)) {
> - slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
> - mark_page_dirty_in_slot(kvm, slot, gfn);
> - }
> -}
> -
> static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> {
> kvm_account_pgtable_pages((void *)sp->spt, +1);
> @@ -614,8 +596,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
> shared);
> handle_changed_spte_acc_track(old_spte, new_spte, level);
> - handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
> - new_spte, level);
Please leave a comment somewhere in handle_changed_spte() to document
why mark_page_dirty_in_slot() is not needed to help future readers and
in case something changes in the future.
On Fri, Feb 10, 2023 at 05:46:26PM -0800, Vipin Sharma wrote:
> __handle_changed_pte() and handle_changed_spte_acc_track() are always
> used together. Merge these two functions and name the new function
nit: State what the patch does first.
> handle_changed_pte(). Remove the existing handle_changed_pte() function
> which just calls __handle_changed_pte and
> handle_changed_spte_acc_track().
s/_pte/_spte/
>
> This converges SPTEs change handling code to a single place.
>
> Signed-off-by: Vipin Sharma <[email protected]>
> Reviewed-by: Ben Gardon <[email protected]>
Nice cleanup! Commit message nits aside,
Reviewed-by: David Matlack <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++++++---------------------------
> 1 file changed, 12 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 0c031319e901..67538ec48ce5 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -334,17 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> u64 old_spte, u64 new_spte, int level,
> bool shared);
>
> -static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
> -{
> - if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
> - return;
> -
> - if (is_accessed_spte(old_spte) &&
> - (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
> - spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
> - kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> -}
> -
> static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> {
> kvm_account_pgtable_pages((void *)sp->spt, +1);
> @@ -487,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> }
>
> /**
> - * __handle_changed_spte - handle bookkeeping associated with an SPTE change
> + * handle_changed_spte - handle bookkeeping associated with an SPTE change
> * @kvm: kvm instance
> * @as_id: the address space of the paging structure the SPTE was a part of
> * @gfn: the base GFN that was mapped by the SPTE
> @@ -501,9 +490,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> * Handle bookkeeping that might result from the modification of a SPTE.
> * This function must be called for all TDP SPTE modifications.
> */
> -static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared)
> +static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> + u64 old_spte, u64 new_spte, int level,
> + bool shared)
> {
> bool was_present = is_shadow_present_pte(old_spte);
> bool is_present = is_shadow_present_pte(new_spte);
> @@ -587,15 +576,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> if (was_present && !was_leaf &&
> (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> -}
>
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared)
> -{
> - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
> - shared);
> - handle_changed_spte_acc_track(old_spte, new_spte, level);
> + if (was_leaf && is_accessed_spte(old_spte) &&
> + (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
> + kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> }
>
> /*
> @@ -638,9 +622,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
> if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
> return -EBUSY;
>
> - __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> - new_spte, iter->level, true);
> - handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
> + handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> + new_spte, iter->level, true);
>
> return 0;
> }
> @@ -705,8 +688,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>
> old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
>
> - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> - handle_changed_spte_acc_track(old_spte, new_spte, level);
> + handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> return old_spte;
> }
>
> @@ -1276,7 +1258,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
> * Note, when changing a read-only SPTE, it's not strictly necessary to
> * zero the SPTE before setting the new PFN, but doing so preserves the
> * invariant that the PFN of a present * leaf SPTE can never change.
> - * See __handle_changed_spte().
> + * See handle_changed_spte().
> */
> tdp_mmu_iter_set_spte(kvm, iter, 0);
>
> @@ -1301,7 +1283,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> /*
> * No need to handle the remote TLB flush under RCU protection, the
> * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
> - * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
> + * shadow page. See the WARN on pfn_changed in handle_changed_spte().
> */
> return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
> }
> --
> 2.39.1.581.gbfd45094c4-goog
>
On Fri, Feb 10, 2023, Vipin Sharma wrote:
> @@ -1301,7 +1283,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> /*
> * No need to handle the remote TLB flush under RCU protection, the
> * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
> - * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
> + * shadow page. See the WARN on pfn_changed in handle_changed_spte().
I was just starting to think you're an ok person, and then I find out you're a
heretic that only puts a single space after periods. ;-)
On Fri, Feb 10, 2023, Vipin Sharma wrote:
> This patch series has optimized control flow of clearing dirty log and
> improved its performance by ~40% (2% more than v2).
>
> It also got rid of many variants of the handle_changed_spte family of
> functions and converged logic to one handle_changed_spte() function. It
> also remove tdp_mmu_set_spte_no_[acc_track|dirty_log] and various
> booleans for controlling them.
>
> v3:
> - Tried to do better job at writing commit messages.
LOL, that's the spirit!
Did a cursory glance, looks good. I'll do a more thorough pass next week and get
it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
various nits when applying. I'll also add a link in patch 2 to the discussion
about why we determined that bypassing __tdp_mmu_set_spte() is safe; that's critical
information that isn't captured in the changelog.
On Fri, Feb 10, 2023, Vipin Sharma wrote:
> @@ -1677,8 +1670,13 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
> static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> gfn_t gfn, unsigned long mask, bool wrprot)
> {
> + /*
> + * Either all SPTEs in TDP MMU will need write protection or none. This
> + * contract will not be modified for TDP MMU pages.
> + */
> + u64 clear_bit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
> + shadow_dirty_mask;
Switching from spte_ad_need_write_protect() to kvm_ad_enabled() belongs in a
separate. In the unlikely event that the above assertion/contracts is invalid,
then any issues should bisect to the switch, not to a much more complex patch.
I'll make that happen when applying.
On Fri, Mar 17, 2023 at 3:51 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Feb 10, 2023, Vipin Sharma wrote:
> > @@ -1301,7 +1283,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> > /*
> > * No need to handle the remote TLB flush under RCU protection, the
> > * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
> > - * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
> > + * shadow page. See the WARN on pfn_changed in handle_changed_spte().
>
> I was just starting to think you're an ok person, and then I find out you're a
> heretic that only puts a single space after periods. ;-)
I know, I know, I can't help it. I just love the way single space
looks after periods.
On Fri, Mar 17, 2023 at 3:59 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Feb 10, 2023, Vipin Sharma wrote:
> > @@ -1677,8 +1670,13 @@ bool kvm_tdp_mmu_clear_dirty_slot(struct kvm *kvm,
> > static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> > gfn_t gfn, unsigned long mask, bool wrprot)
> > {
> > + /*
> > + * Either all SPTEs in TDP MMU will need write protection or none. This
> > + * contract will not be modified for TDP MMU pages.
> > + */
> > + u64 clear_bit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
> > + shadow_dirty_mask;
>
> Switching from spte_ad_need_write_protect() to kvm_ad_enabled() belongs in a
> separate. In the unlikely event that the above assertion/contracts is invalid,
> then any issues should bisect to the switch, not to a much more complex patch.
>
> I'll make that happen when applying.
Make sense, thanks!
On Fri, Mar 17, 2023 at 3:57 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Feb 10, 2023, Vipin Sharma wrote:
> > This patch series has optimized control flow of clearing dirty log and
> > improved its performance by ~40% (2% more than v2).
> >
> > It also got rid of many variants of the handle_changed_spte family of
> > functions and converged logic to one handle_changed_spte() function. It
> > also remove tdp_mmu_set_spte_no_[acc_track|dirty_log] and various
> > booleans for controlling them.
> >
> > v3:
> > - Tried to do better job at writing commit messages.
>
> LOL, that's the spirit!
>
> Did a cursory glance, looks good. I'll do a more thorough pass next week and get
> it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
> various nits when applying. I'll also add a link in patch 2 to the discussion
Yeah, he is too demanding! :p
> about why we determined that bypassing __tdp_mmu_set_spte() is safe; that's critical
> information that isn't captured in the changelog.
Thanks!
On Fri, Mar 17, 2023, Sean Christopherson wrote:
> On Fri, Feb 10, 2023, Vipin Sharma wrote:
> > This patch series has optimized control flow of clearing dirty log and
> > improved its performance by ~40% (2% more than v2).
> >
> > It also got rid of many variants of the handle_changed_spte family of
> > functions and converged logic to one handle_changed_spte() function. It
> > also remove tdp_mmu_set_spte_no_[acc_track|dirty_log] and various
> > booleans for controlling them.
> >
> > v3:
> > - Tried to do better job at writing commit messages.
>
> LOL, that's the spirit!
>
> Did a cursory glance, looks good. I'll do a more thorough pass next week and get
> it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
> various nits when applying.
Ooof, that ended up being painful. In hindsight, I should have asked for a v4,
but damage done, and it's my fault for throwing you a big blob of code in the
first place.
I ended up splitting the "interesting" patches into three each:
1. Switch to the atomic-AND
2. Drop the access-tracking / dirty-logging (as appropriate)
3. Drop the call to __handle_changed_spte()
because logically they are three different things (although obviously related).
I have pushed the result to kvm-x86/mmu, but haven't merged to kvm-x86/next or
sent thanks because it's not yet tested. I'll do testing tomorrow, but if you
can take a look in the meantime to make sure I didn't do something completely
boneheaded, it'd be much appreciated.
On Fri, Feb 10, 2023, Vipin Sharma wrote:
> } else {
> + new_spte = mark_spte_for_access_track(iter->old_spte);
> + iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep,
> + iter->old_spte, new_spte,
> + iter->level);
> /*
> * Capture the dirty status of the page, so that it doesn't get
> * lost when the SPTE is marked for access tracking.
> */
> - if (is_writable_pte(new_spte))
> - kvm_set_pfn_dirty(spte_to_pfn(new_spte));
> -
> - new_spte = mark_spte_for_access_track(new_spte);
> + if (is_writable_pte(iter->old_spte))
> + kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
Moving this block below kvm_tdp_mmu_write_spte() is an unrelated change. Much to
my chagrin, I discovered that past me gave you this code. I still think the change
is correct, but I dropped it for now, mostly because the legacy/shadow MMU has the
same pattern (marks the PFN dirty before setting the SPTE).
I think this might actually be a bug fix, e.g. if the XCHG races with a fast page
fault fix and drops the Writable bit, the CPU could insert writable entry into the
TLB without KVM invoking kvm_set_pfn_dirty(). But I'm not 100% confident that I'm
not missing something, and _if_ there's a bug then mmu_spte_age() needs the same
fix, so for now, I dropped it.
On Mon, Mar 20, 2023 at 5:41 PM Sean Christopherson <[email protected]> wrote:
>
> On Fri, Mar 17, 2023, Sean Christopherson wrote:
> > Did a cursory glance, looks good. I'll do a more thorough pass next week and get
> > it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
> > various nits when applying.
>
> Ooof, that ended up being painful. In hindsight, I should have asked for a v4,
> but damage done, and it's my fault for throwing you a big blob of code in the
> first place.
>
> I ended up splitting the "interesting" patches into three each:
>
> 1. Switch to the atomic-AND
> 2. Drop the access-tracking / dirty-logging (as appropriate)
> 3. Drop the call to __handle_changed_spte()
>
> because logically they are three different things (although obviously related).
>
> I have pushed the result to kvm-x86/mmu, but haven't merged to kvm-x86/next or
> sent thanks because it's not yet tested. I'll do testing tomorrow, but if you
> can take a look in the meantime to make sure I didn't do something completely
> boneheaded, it'd be much appreciated.
Thanks for refactoring the patches. I reviewed the commits, no obvious
red flags from my side. Few small nits I found:
commit e534a94eac07 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine
if TDP MMU SPTEs need wrprot")
- kvm_ad_enabled() should be outside the loop.
commit 69032b5d71ef (" KVM: x86/mmu: Atomically clear SPTE dirty state
in the clear-dirty-log flow")
- MMU_WARN_ON(kvm_ad_enabled() &&
spte_ad_need_write_protect(iter.old_spte) should be after
if(iter.level > PG_LEVEL_4k...)
commit 93c375bb6aea ("KVM: x86/mmu: Bypass __handle_changed_spte()
when clearing TDP MMU dirty bits")
- Needs new performance numbers. Adding MMU_WARN_ON() might change
numbers. I will run a perf test on your mmu branch and see if
something changes a lot.
On Tue, Mar 21, 2023, Vipin Sharma wrote:
> On Mon, Mar 20, 2023 at 5:41 PM Sean Christopherson <[email protected]> wrote:
> >
> > On Fri, Mar 17, 2023, Sean Christopherson wrote:
> > > Did a cursory glance, looks good. I'll do a more thorough pass next week and get
> > > it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
> > > various nits when applying.
> >
> > Ooof, that ended up being painful. In hindsight, I should have asked for a v4,
> > but damage done, and it's my fault for throwing you a big blob of code in the
> > first place.
> >
> > I ended up splitting the "interesting" patches into three each:
> >
> > 1. Switch to the atomic-AND
> > 2. Drop the access-tracking / dirty-logging (as appropriate)
> > 3. Drop the call to __handle_changed_spte()
> >
> > because logically they are three different things (although obviously related).
> >
> > I have pushed the result to kvm-x86/mmu, but haven't merged to kvm-x86/next or
> > sent thanks because it's not yet tested. I'll do testing tomorrow, but if you
> > can take a look in the meantime to make sure I didn't do something completely
> > boneheaded, it'd be much appreciated.
>
>
> Thanks for refactoring the patches. I reviewed the commits, no obvious
> red flags from my side. Few small nits I found:
>
> commit e534a94eac07 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine
> if TDP MMU SPTEs need wrprot")
> - kvm_ad_enabled() should be outside the loop.
Hmm, I deliberately left it inside the loop, but I agree that it would be better
to hoist it out in that commit.
> commit 69032b5d71ef (" KVM: x86/mmu: Atomically clear SPTE dirty state
> in the clear-dirty-log flow")
> - MMU_WARN_ON(kvm_ad_enabled() &&
> spte_ad_need_write_protect(iter.old_spte) should be after
> if(iter.level > PG_LEVEL_4k...)
Ah, hrm. This was also deliberate, but looking at the diff I agree that relative
to the diff, it's an unnecessary/unrelated change. I think what I'll do is
land the assertion above the "if (iter.level > PG_LEVEL_4K ||" in the above
commit that switches to kvm_ad_enabled(). That way there shouldn't be any change
for the assertion in this commit.
> commit 93c375bb6aea ("KVM: x86/mmu: Bypass __handle_changed_spte()
> when clearing TDP MMU dirty bits")
> - Needs new performance numbers. Adding MMU_WARN_ON() might change
> numbers. I will run a perf test on your mmu branch and see if
> something changes a lot.
It won't. MMU_WARN_ON() is dead code without manual modification to define MMU_DEBUG.
Part of the reason I used MMU_WARN_ON() was to remind myself to send a patch/series
to overhaul MMU_WARN_ON[*]. My thought/hope is that a Kconfig will allow developers
and testers to run with a pile of assertions and sanity checks without impacting
the runtime overhead for production builds.
[*] https://lore.kernel.org/all/[email protected]/
On Tue, Mar 21, 2023, Sean Christopherson wrote:
> On Tue, Mar 21, 2023, Vipin Sharma wrote:
> > On Mon, Mar 20, 2023 at 5:41 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > On Fri, Mar 17, 2023, Sean Christopherson wrote:
> > > > Did a cursory glance, looks good. I'll do a more thorough pass next week and get
> > > > it queued up if all goes well. No need for a v4 at this point, I'll fixup David's
> > > > various nits when applying.
> > >
> > > Ooof, that ended up being painful. In hindsight, I should have asked for a v4,
> > > but damage done, and it's my fault for throwing you a big blob of code in the
> > > first place.
> > >
> > > I ended up splitting the "interesting" patches into three each:
> > >
> > > 1. Switch to the atomic-AND
> > > 2. Drop the access-tracking / dirty-logging (as appropriate)
> > > 3. Drop the call to __handle_changed_spte()
> > >
> > > because logically they are three different things (although obviously related).
> > >
> > > I have pushed the result to kvm-x86/mmu, but haven't merged to kvm-x86/next or
> > > sent thanks because it's not yet tested. I'll do testing tomorrow, but if you
> > > can take a look in the meantime to make sure I didn't do something completely
> > > boneheaded, it'd be much appreciated.
> >
> >
> > Thanks for refactoring the patches. I reviewed the commits, no obvious
> > red flags from my side. Few small nits I found:
> >
> > commit e534a94eac07 ("KVM: x86/mmu: Use kvm_ad_enabled() to determine
> > if TDP MMU SPTEs need wrprot")
> > - kvm_ad_enabled() should be outside the loop.
>
> Hmm, I deliberately left it inside the loop, but I agree that it would be better
> to hoist it out in that commit.
>
> > commit 69032b5d71ef (" KVM: x86/mmu: Atomically clear SPTE dirty state
> > in the clear-dirty-log flow")
> > - MMU_WARN_ON(kvm_ad_enabled() &&
> > spte_ad_need_write_protect(iter.old_spte) should be after
> > if(iter.level > PG_LEVEL_4k...)
>
> Ah, hrm. This was also deliberate, but looking at the diff I agree that relative
> to the diff, it's an unnecessary/unrelated change. I think what I'll do is
> land the assertion above the "if (iter.level > PG_LEVEL_4K ||" in the above
> commit that switches to kvm_ad_enabled(). That way there shouldn't be any change
> for the assertion in this commit.
Aha! Even better, split this into yet one more patch to dedup the guts before
switching to the atomic-AND, and give clear_dirty_gfn_range() the same treatment.
That further isolates the changes, provides solid justification for hoisting the
kvm_ad_enabled() check out of the loop (it's basically guaranteed to be a single
memory read that hits the L1), and keeps clear_dirty_gfn_range() and
clear_dirty_pt_masked() as similar as is reasonably possible.
Speaking of which, I'll send a patch to remove the redundant is_shadow_present_pte()
check in clear_dirty_gfn_range(), that's already handled by tdp_root_for_each_leaf_pte().
On Tue, Mar 21, 2023, Sean Christopherson wrote:
> It won't. MMU_WARN_ON() is dead code without manual modification to define MMU_DEBUG.
> Part of the reason I used MMU_WARN_ON() was to remind myself to send a patch/series
> to overhaul MMU_WARN_ON[*]. My thought/hope is that a Kconfig will allow developers
> and testers to run with a pile of assertions and sanity checks without impacting
> the runtime overhead for production builds.
>
> [*] https://lore.kernel.org/all/[email protected]/
Ugh, I'm definitely sending that patch, MMU_DEBUG has bitrotted and broken the
build yet again.
arch/x86/kvm/mmu/mmu.c: In function ‘kvm_mmu_free_shadow_page’:
arch/x86/kvm/mmu/mmu.c:1738:15: error: implicit declaration of function ‘is_empty_shadow_page’; did you mean ‘to_shadow_page’? [-Werror=implicit-function-declaration]
1738 | MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
| ^~~~~~~~~~~~~~~~~~~~
include/asm-generic/bug.h:110:25: note: in definition of macro ‘WARN_ON_ONCE’
110 | int __ret_warn_on = !!(condition); \
| ^~~~~~~~~
arch/x86/kvm/mmu/mmu.c:1738:2: note: in expansion of macro ‘MMU_WARN_ON’
1738 | MMU_WARN_ON(!is_empty_shadow_page(sp->spt));
| ^~~~~~~~~~~