LinuxLists.cc - [Patch v2 0/5] Optimize clear dirty log

2023-02-03 19:28:30

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 0/5] Optimize clear dirty log

Hi,

This patch series has optimized control flow of clearing dirty log and
improved its performance by ~38%. Patch 2 has more details about
optimization.

It also got rid of many variants of the handle_changed_spte family of
functions and converged logic to one handle_changed_spte() function. It
also remove tdp_mmu_set_spte_no_[acc_track|dirty_log] and various
booleans for controlling them.

Thanks
Vipin

v2:
- Clear dirty log and age gfn range does not go through
handle_changed_spte, they handle their SPTE changes locally to improve
their speed.
- Clear only specific bits atomically when updating SPTEs in clearing
dirty log and aging gfn range functions.
- Removed tdp_mmu_set_spte_no_[acc_track|dirty_log] APIs.
- Converged all handle_changed_spte related functions to one place.

v1: https://lore.kernel.org/lkml/[email protected]/

Vipin Sharma (5):
KVM: x86/mmu: Make separate function to check for SPTEs atomic write
conditions
KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log
KVM: x86/mmu: Optimize SPTE change for aging gfn range
KVM: x86/mmu: Remove handle_changed_spte_dirty_log()
KVM: x86/mmu: Merge all handle_changed_pte* functions.

arch/x86/kvm/mmu/tdp_iter.h | 29 +++++--
arch/x86/kvm/mmu/tdp_mmu.c | 163 +++++++++++-------------------------
2 files changed, 72 insertions(+), 120 deletions(-)

--
2.39.1.519.gcb327c4b5f-goog

2023-02-03 19:28:32

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 1/5] KVM: x86/mmu: Make separate function to check for SPTEs atomic write conditions

Move condition checks in kvm_tdp_mmu_write_spte() for writing spte
atomically in a separate function.

New function will be used in future commits to clear bits in SPTE.

Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index f0af385c56e0..30a52e5e68de 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -29,11 +29,10 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
WRITE_ONCE(*rcu_dereference(sptep), new_spte);
}

-static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
- u64 new_spte, int level)
+static inline bool kvm_tdp_mmu_spte_has_volatile_bits(u64 old_spte, int level)
{
/*
- * Atomically write the SPTE if it is a shadow-present, leaf SPTE with
+ * Atomically write SPTEs if it is a shadow-present, leaf SPTE with
* volatile bits, i.e. has bits that can be set outside of mmu_lock.
* The Writable bit can be set by KVM's fast page fault handler, and
* Accessed and Dirty bits can be set by the CPU.
@@ -44,8 +43,15 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
* logic needs to be reassessed if KVM were to use non-leaf Accessed
* bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
*/
- if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
- spte_has_volatile_bits(old_spte))
+ return is_shadow_present_pte(old_spte) &&
+ is_last_spte(old_spte, level) &&
+ spte_has_volatile_bits(old_spte);
+}
+
+static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
+ u64 new_spte, int level)
+{
+ if (kvm_tdp_mmu_spte_has_volatile_bits(old_spte, level))
return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);

__kvm_tdp_mmu_write_spte(sptep, new_spte);
--
2.39.1.519.gcb327c4b5f-goog

2023-02-03 19:28:40

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

No need to check all of the conditions in __handle_changed_spte() as
clearing dirty log only involves resetting dirty or writable bit.

Make atomic change to dirty or writable bit and mark pfn dirty.

Tested on 160 VCPU-160 GB VM and found that performance of clear dirty
log stage improved by ~38% in dirty_log_perf_test

Before optimization:
--------------------

Test iterations: 3
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
guest physical test memory: [0x3fd7c0000000, 0x3fffc0000000)
Populate memory time: 6.298459671s
Setting dirty log mode took : 0.000000052s

Enabling dirty logging time: 0.003815691s

Iteration 1 dirty memory time: 0.185538848s
Iteration 1 get dirty log time: 0.002562641s
Iteration 1 clear dirty log time: 3.638543593s
Iteration 2 dirty memory time: 0.192226071s
Iteration 2 get dirty log time: 0.001558446s
Iteration 2 clear dirty log time: 3.145032742s
Iteration 3 dirty memory time: 0.193606295s
Iteration 3 get dirty log time: 0.001559425s
Iteration 3 clear dirty log time: 3.142340358s
Disabling dirty logging time: 3.002873664s
Get dirty log over 3 iterations took 0.005680512s.
(Avg 0.001893504s/iteration)
Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)

After optimization:
-------------------

Test iterations: 3
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
guest physical test memory: [0x3fd7c0000000, 0x3fffc0000000)
Populate memory time: 6.581448437s
Setting dirty log mode took : 0.000000058s

Enabling dirty logging time: 0.003981283s

Iteration 1 dirty memory time: 0.285693420s
Iteration 1 get dirty log time: 0.002743004s
Iteration 1 clear dirty log time: 2.384343157s
Iteration 2 dirty memory time: 0.290414476s
Iteration 2 get dirty log time: 0.001720445s
Iteration 2 clear dirty log time: 1.882770288s
Iteration 3 dirty memory time: 0.289965965s
Iteration 3 get dirty log time: 0.001728232s
Iteration 3 clear dirty log time: 1.881043086s
Disabling dirty logging time: 2.930387523s
Get dirty log over 3 iterations took 0.006191681s.
(Avg 0.002063893s/iteration)
Clear dirty log over 3 iterations took 6.148156531s. (Avg 2.049385510s/iteration)

Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_iter.h | 13 ++++++++++
arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++++----------------------
2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
index 30a52e5e68de..21046b34f94e 100644
--- a/arch/x86/kvm/mmu/tdp_iter.h
+++ b/arch/x86/kvm/mmu/tdp_iter.h
@@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
void tdp_iter_next(struct tdp_iter *iter);
void tdp_iter_restart(struct tdp_iter *iter);

+static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
+{
+ atomic64_t *sptep;
+
+ if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
+ sptep = (atomic64_t *)rcu_dereference(iter->sptep);
+ return (u64)atomic64_fetch_and(~mask, sptep);
+ }
+
+ __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
+ return iter->old_spte;
+}
+
#endif /* __KVM_X86_MMU_TDP_ITER_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index bba33aea0fb0..83f15052aa6c 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -710,18 +710,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* notifier for access tracking. Leaving record_acc_track
* unset in that case prevents page accesses from being
* double counted.
- * @record_dirty_log: Record the page as dirty in the dirty bitmap if
- * appropriate for the change being made. Should be set
- * unless performing certain dirty logging operations.
- * Leaving record_dirty_log unset in that case prevents page
- * writes from being double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track, bool record_dirty_log)
+ bool record_acc_track)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -740,42 +735,34 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

if (record_acc_track)
handle_changed_spte_acc_track(old_spte, new_spte, level);
- if (record_dirty_log)
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
+
+ handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
+ level);
return old_spte;
}

static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track,
- bool record_dirty_log)
+ u64 new_spte, bool record_acc_track)
{
WARN_ON_ONCE(iter->yielded);

iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
iter->old_spte, new_spte,
iter->gfn, iter->level,
- record_acc_track, record_dirty_log);
+ record_acc_track);
}

static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, true);
}

static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
-}
-
-static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
+ _tdp_mmu_set_spte(kvm, iter, new_spte, false);
}

#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -925,7 +912,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
return false;

__tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true, true);
+ sp->gfn, sp->role.level + 1, true);

return true;
}
@@ -1678,7 +1665,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
gfn_t gfn, unsigned long mask, bool wrprot)
{
struct tdp_iter iter;
- u64 new_spte;
+ u64 clear_bits;

rcu_read_lock();

@@ -1694,18 +1681,22 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
mask &= ~(1UL << (iter.gfn - gfn));

if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
- if (is_writable_pte(iter.old_spte))
- new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
- else
+ if (!is_writable_pte(iter.old_spte))
continue;
+
+ clear_bits = PT_WRITABLE_MASK;
} else {
- if (iter.old_spte & shadow_dirty_mask)
- new_spte = iter.old_spte & ~shadow_dirty_mask;
- else
+ if (!(iter.old_spte & shadow_dirty_mask))
continue;
+
+ clear_bits = shadow_dirty_mask;
}

- tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
+ iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
+ trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
+ iter.old_spte,
+ iter.old_spte & ~clear_bits);
+ kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
}

rcu_read_unlock();
--
2.39.1.519.gcb327c4b5f-goog

2023-02-03 19:28:43

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 3/5] KVM: x86/mmu: Optimize SPTE change for aging gfn range

No need to check all of the conditions in __handle_changed_spte(). Aging
a gfn range implies resetting access bit or marking spte for access
tracking.

Use atomic operation to only reset those bits. This avoids checking many
conditions in __handle_changed_spte() API. Also, clean up code by
removing dead code and API parameters.

Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 68 ++++++++++++++------------------------
1 file changed, 25 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 83f15052aa6c..18630a06fa1f 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -697,7 +697,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,

/*
- * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
+ * _tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
* @kvm: KVM instance
* @as_id: Address space ID, i.e. regular vs. SMM
* @sptep: Pointer to the SPTE
@@ -705,18 +705,12 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
* @new_spte: The new value that will be set for the SPTE
* @gfn: The base GFN that was (or will be) mapped by the SPTE
* @level: The level _containing_ the SPTE (its parent PT's level)
- * @record_acc_track: Notify the MM subsystem of changes to the accessed state
- * of the page. Should be set unless handling an MMU
- * notifier for access tracking. Leaving record_acc_track
- * unset in that case prevents page accesses from being
- * double counted.
*
* Returns the old SPTE value, which _may_ be different than @old_spte if the
* SPTE had voldatile bits.
*/
-static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
- u64 old_spte, u64 new_spte, gfn_t gfn, int level,
- bool record_acc_track)
+static u64 _tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
+ u64 old_spte, u64 new_spte, gfn_t gfn, int level)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -732,37 +726,20 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);

__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
-
- if (record_acc_track)
- handle_changed_spte_acc_track(old_spte, new_spte, level);
-
+ handle_changed_spte_acc_track(old_spte, new_spte, level);
handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
level);
return old_spte;
}

-static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
- u64 new_spte, bool record_acc_track)
-{
- WARN_ON_ONCE(iter->yielded);
-
- iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
- iter->old_spte, new_spte,
- iter->gfn, iter->level,
- record_acc_track);
-}
-
static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
u64 new_spte)
{
- _tdp_mmu_set_spte(kvm, iter, new_spte, true);
-}
+ WARN_ON_ONCE(iter->yielded);

-static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
- struct tdp_iter *iter,
- u64 new_spte)
-{
- _tdp_mmu_set_spte(kvm, iter, new_spte, false);
+ iter->old_spte = _tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
+ iter->old_spte, new_spte,
+ iter->gfn, iter->level);
}

#define tdp_root_for_each_pte(_iter, _root, _start, _end) \
@@ -911,8 +888,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
if (WARN_ON_ONCE(!is_shadow_present_pte(old_spte)))
return false;

- __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
- sp->gfn, sp->role.level + 1, true);
+ _tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
+ sp->gfn, sp->role.level + 1);

return true;
}
@@ -1251,32 +1228,37 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
/*
* Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
* if any of the GFNs in the range have been accessed.
+ *
+ * No need to mark corresponding PFN as accessed as this call is coming from
+ * MMU notifier for that page via HVA.
*/
static bool age_gfn_range(struct kvm *kvm, struct tdp_iter *iter,
struct kvm_gfn_range *range)
{
- u64 new_spte = 0;
+ u64 new_spte;

/* If we have a non-accessed entry we don't need to change the pte. */
if (!is_accessed_spte(iter->old_spte))
return false;

- new_spte = iter->old_spte;
-
- if (spte_ad_enabled(new_spte)) {
- new_spte &= ~shadow_accessed_mask;
+ if (spte_ad_enabled(iter->old_spte)) {
+ iter->old_spte = kvm_tdp_mmu_clear_spte_bit(iter,
+ shadow_accessed_mask);
+ new_spte = iter->old_spte & ~shadow_accessed_mask;
} else {
+ new_spte = mark_spte_for_access_track(iter->old_spte);
+ iter->old_spte = kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte,
+ new_spte, iter->level);
/*
* Capture the dirty status of the page, so that it doesn't get
* lost when the SPTE is marked for access tracking.
*/
- if (is_writable_pte(new_spte))
- kvm_set_pfn_dirty(spte_to_pfn(new_spte));
-
- new_spte = mark_spte_for_access_track(new_spte);
+ if (is_writable_pte(iter->old_spte))
+ kvm_set_pfn_dirty(spte_to_pfn(iter->old_spte));
}

- tdp_mmu_set_spte_no_acc_track(kvm, iter, new_spte);
+ trace_kvm_tdp_mmu_spte_changed(iter->as_id, iter->gfn, iter->level,
+ iter->old_spte, new_spte);

return true;
}
--
2.39.1.519.gcb327c4b5f-goog

2023-02-03 19:28:45

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 4/5] KVM: x86/mmu: Remove handle_changed_spte_dirty_log()

Remove handle_changed_spte_dirty_log() as there is no code flow which
sets leaf SPTE writable and hit this path.

Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 22 ----------------------
1 file changed, 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 18630a06fa1f..afe0dcb1859e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -345,24 +345,6 @@ static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

-static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level)
-{
- bool pfn_changed;
- struct kvm_memory_slot *slot;
-
- if (level > PG_LEVEL_4K)
- return;
-
- pfn_changed = spte_to_pfn(old_spte) != spte_to_pfn(new_spte);
-
- if ((!is_writable_pte(old_spte) || pfn_changed) &&
- is_writable_pte(new_spte)) {
- slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn);
- mark_page_dirty_in_slot(kvm, slot, gfn);
- }
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -614,8 +596,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
shared);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
- new_spte, level);
}

/*
@@ -727,8 +707,6 @@ static u64 _tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

__handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
handle_changed_spte_acc_track(old_spte, new_spte, level);
- handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
- level);
return old_spte;
}

--
2.39.1.519.gcb327c4b5f-goog

2023-02-03 19:28:53

by Vipin Sharma

[permalink] [raw]

Subject: [Patch v2 5/5] KVM: x86/mmu: Merge all handle_changed_pte* functions.

__handle_changed_pte() and handle_changed_spte_acc_track() are always
used together. Merge these two functions and name the new function
handle_changed_pte(). Remove the existing handle_changed_pte() function
which just calls __handle_changed_pte and
handle_changed_spte_acc_track().

This converges SPTEs change handling code to a single place.

Signed-off-by: Vipin Sharma <[email protected]>
---
arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++++++---------------------------
1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index afe0dcb1859e..9b0c81a28f97 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -334,17 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
u64 old_spte, u64 new_spte, int level,
bool shared);

-static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
-{
- if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
- return;
-
- if (is_accessed_spte(old_spte) &&
- (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
- spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
- kvm_set_pfn_accessed(spte_to_pfn(old_spte));
-}
-
static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
{
kvm_account_pgtable_pages((void *)sp->spt, +1);
@@ -487,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
}

/**
- * __handle_changed_spte - handle bookkeeping associated with an SPTE change
+ * handle_changed_spte - handle bookkeeping associated with an SPTE change
* @kvm: kvm instance
* @as_id: the address space of the paging structure the SPTE was a part of
* @gfn: the base GFN that was mapped by the SPTE
@@ -501,9 +490,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
* Handle bookkeeping that might result from the modification of a SPTE.
* This function must be called for all TDP SPTE modifications.
*/
-static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
+static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
+ u64 old_spte, u64 new_spte, int level,
+ bool shared)
{
bool was_present = is_shadow_present_pte(old_spte);
bool is_present = is_shadow_present_pte(new_spte);
@@ -587,15 +576,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (was_present && !was_leaf &&
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
-}

-static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
- u64 old_spte, u64 new_spte, int level,
- bool shared)
-{
- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
- shared);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ if (was_leaf && is_accessed_spte(old_spte) &&
+ (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
+ kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

/*
@@ -638,9 +622,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
return -EBUSY;

- __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
- new_spte, iter->level, true);
- handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
+ handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
+ new_spte, iter->level, true);

return 0;
}
@@ -705,8 +688,7 @@ static u64 _tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,

old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);

- __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
- handle_changed_spte_acc_track(old_spte, new_spte, level);
+ handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
return old_spte;
}

@@ -1273,7 +1255,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
* Note, when changing a read-only SPTE, it's not strictly necessary to
* zero the SPTE before setting the new PFN, but doing so preserves the
* invariant that the PFN of a present * leaf SPTE can never change.
- * See __handle_changed_spte().
+ * See handle_changed_spte().
*/
tdp_mmu_set_spte(kvm, iter, 0);

@@ -1298,7 +1280,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
/*
* No need to handle the remote TLB flush under RCU protection, the
* target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
- * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
+ * shadow page. See the WARN on pfn_changed in handle_changed_spte().
*/
return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
}
--
2.39.1.519.gcb327c4b5f-goog

2023-02-06 22:07:01

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
>
> No need to check all of the conditions in __handle_changed_spte() as
> clearing dirty log only involves resetting dirty or writable bit.
>
> Make atomic change to dirty or writable bit and mark pfn dirty.
>
> Tested on 160 VCPU-160 GB VM and found that performance of clear dirty
> log stage improved by ~38% in dirty_log_perf_test

Dang! That's a big improvement.

...
> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> index 30a52e5e68de..21046b34f94e 100644
> --- a/arch/x86/kvm/mmu/tdp_iter.h
> +++ b/arch/x86/kvm/mmu/tdp_iter.h
> @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> void tdp_iter_next(struct tdp_iter *iter);
> void tdp_iter_restart(struct tdp_iter *iter);
>
> +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> +{
> + atomic64_t *sptep;
> +
> + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> + return (u64)atomic64_fetch_and(~mask, sptep);
> + }
> +
> + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> + return iter->old_spte;
> +}
> +

If you do end up sending another version of the series and feel like
breaking up this patch, you could probably split this part out since
the change to how we set the SPTE and how we handle that change are
somewhat independent. I like the switch to atomic64_fetch_and though.
No idea if it's faster, but I would believe it could be.

Totally optional, but there's currently no validation on the mask.
Maybe we're only calling this in one place, but it might be worth
clarifying the limits (if any) on what bits can be set in the mask. I
don't think there necessarily need to be limits as of this commit, but
the handling around this function where it's called here would
obviously not be sufficient if the mask were -1UL or something.

> #endif /* __KVM_X86_MMU_TDP_ITER_H */
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index bba33aea0fb0..83f15052aa6c 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -710,18 +710,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> * notifier for access tracking. Leaving record_acc_track
> * unset in that case prevents page accesses from being
> * double counted.
> - * @record_dirty_log: Record the page as dirty in the dirty bitmap if
> - * appropriate for the change being made. Should be set
> - * unless performing certain dirty logging operations.
> - * Leaving record_dirty_log unset in that case prevents page
> - * writes from being double counted.

I was kind of hesitant about getting rid of this but now that I see it
going, I love it.

...

> @@ -1694,18 +1681,22 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> mask &= ~(1UL << (iter.gfn - gfn));
>
> if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
> - if (is_writable_pte(iter.old_spte))
> - new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
> - else
> + if (!is_writable_pte(iter.old_spte))
> continue;
> +
> + clear_bits = PT_WRITABLE_MASK;
> } else {
> - if (iter.old_spte & shadow_dirty_mask)
> - new_spte = iter.old_spte & ~shadow_dirty_mask;
> - else
> + if (!(iter.old_spte & shadow_dirty_mask))
> continue;
> +
> + clear_bits = shadow_dirty_mask;
> }
>
> - tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
> + iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
> + trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
> + iter.old_spte,
> + iter.old_spte & ~clear_bits);
> + kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
> }

Nice!

2023-02-06 22:09:59

[permalink] [raw]

Subject: Re: [Patch v2 1/5] KVM: x86/mmu: Make separate function to check for SPTEs atomic write conditions

On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
>
> Move condition checks in kvm_tdp_mmu_write_spte() for writing spte
> atomically in a separate function.
>
> New function will be used in future commits to clear bits in SPTE.
>
> Signed-off-by: Vipin Sharma <[email protected]>

Reviewed-by: Ben Gardon <[email protected]>

> ---
> arch/x86/kvm/mmu/tdp_iter.h | 16 +++++++++++-----
> 1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> index f0af385c56e0..30a52e5e68de 100644
> --- a/arch/x86/kvm/mmu/tdp_iter.h
> +++ b/arch/x86/kvm/mmu/tdp_iter.h
> @@ -29,11 +29,10 @@ static inline void __kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 new_spte)
> WRITE_ONCE(*rcu_dereference(sptep), new_spte);
> }
>
> -static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
> - u64 new_spte, int level)
> +static inline bool kvm_tdp_mmu_spte_has_volatile_bits(u64 old_spte, int level)
> {
> /*
> - * Atomically write the SPTE if it is a shadow-present, leaf SPTE with
> + * Atomically write SPTEs if it is a shadow-present, leaf SPTE with

Nit: SPTEs must be modified atomically if they are shadow-present,
leaf SPTEs with

> * volatile bits, i.e. has bits that can be set outside of mmu_lock.
> * The Writable bit can be set by KVM's fast page fault handler, and
> * Accessed and Dirty bits can be set by the CPU.
> @@ -44,8 +43,15 @@ static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
> * logic needs to be reassessed if KVM were to use non-leaf Accessed
> * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs.
> */
> - if (is_shadow_present_pte(old_spte) && is_last_spte(old_spte, level) &&
> - spte_has_volatile_bits(old_spte))
> + return is_shadow_present_pte(old_spte) &&
> + is_last_spte(old_spte, level) &&
> + spte_has_volatile_bits(old_spte);
> +}
> +
> +static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte,
> + u64 new_spte, int level)
> +{
> + if (kvm_tdp_mmu_spte_has_volatile_bits(old_spte, level))
> return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte);
>
> __kvm_tdp_mmu_write_spte(sptep, new_spte);
> --
> 2.39.1.519.gcb327c4b5f-goog
>

2023-02-06 22:17:23

[permalink] [raw]

Subject: Re: [Patch v2 3/5] KVM: x86/mmu: Optimize SPTE change for aging gfn range

On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
>
> No need to check all of the conditions in __handle_changed_spte(). Aging
> a gfn range implies resetting access bit or marking spte for access
> tracking.
>
> Use atomic operation to only reset those bits. This avoids checking many
> conditions in __handle_changed_spte() API. Also, clean up code by
> removing dead code and API parameters.
>
> Signed-off-by: Vipin Sharma <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 68 ++++++++++++++------------------------
> 1 file changed, 25 insertions(+), 43 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 83f15052aa6c..18630a06fa1f 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -697,7 +697,7 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
>
>
> /*
> - * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
> + * _tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping

Nit: Not sure how we got to the point of having a single and double
underscore of the function, but what do you think about just calling
this one tdp_mmu_set_spte and the other tdp_mmu_iter_set_spte?

2023-02-06 22:34:13

[permalink] [raw]

Subject: Re: [Patch v2 5/5] KVM: x86/mmu: Merge all handle_changed_pte* functions.

On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
>
> __handle_changed_pte() and handle_changed_spte_acc_track() are always
> used together. Merge these two functions and name the new function
> handle_changed_pte(). Remove the existing handle_changed_pte() function
> which just calls __handle_changed_pte and
> handle_changed_spte_acc_track().
>
> This converges SPTEs change handling code to a single place.
>
> Signed-off-by: Vipin Sharma <[email protected]>

Reviewed-by: Ben Gardon <[email protected]>

> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 42 +++++++++++---------------------------
> 1 file changed, 12 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index afe0dcb1859e..9b0c81a28f97 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -334,17 +334,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> u64 old_spte, u64 new_spte, int level,
> bool shared);
>
> -static void handle_changed_spte_acc_track(u64 old_spte, u64 new_spte, int level)
> -{
> - if (!is_shadow_present_pte(old_spte) || !is_last_spte(old_spte, level))
> - return;
> -
> - if (is_accessed_spte(old_spte) &&
> - (!is_shadow_present_pte(new_spte) || !is_accessed_spte(new_spte) ||
> - spte_to_pfn(old_spte) != spte_to_pfn(new_spte)))
> - kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> -}
> -
> static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
> {
> kvm_account_pgtable_pages((void *)sp->spt, +1);
> @@ -487,7 +476,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> }
>
> /**
> - * __handle_changed_spte - handle bookkeeping associated with an SPTE change
> + * handle_changed_spte - handle bookkeeping associated with an SPTE change
> * @kvm: kvm instance
> * @as_id: the address space of the paging structure the SPTE was a part of
> * @gfn: the base GFN that was mapped by the SPTE
> @@ -501,9 +490,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep_t pt, bool shared)
> * Handle bookkeeping that might result from the modification of a SPTE.
> * This function must be called for all TDP SPTE modifications.
> */
> -static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared)
> +static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> + u64 old_spte, u64 new_spte, int level,
> + bool shared)
> {
> bool was_present = is_shadow_present_pte(old_spte);
> bool is_present = is_shadow_present_pte(new_spte);
> @@ -587,15 +576,10 @@ static void __handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> if (was_present && !was_leaf &&
> (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
> handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);
> -}
>
> -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
> - u64 old_spte, u64 new_spte, int level,
> - bool shared)
> -{
> - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level,
> - shared);
> - handle_changed_spte_acc_track(old_spte, new_spte, level);
> + if (was_leaf && is_accessed_spte(old_spte) &&
> + (!is_present || !is_accessed_spte(new_spte) || pfn_changed))
> + kvm_set_pfn_accessed(spte_to_pfn(old_spte));
> }
>
> /*
> @@ -638,9 +622,8 @@ static inline int tdp_mmu_set_spte_atomic(struct kvm *kvm,
> if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte))
> return -EBUSY;
>
> - __handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> - new_spte, iter->level, true);
> - handle_changed_spte_acc_track(iter->old_spte, new_spte, iter->level);
> + handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte,
> + new_spte, iter->level, true);
>
> return 0;
> }
> @@ -705,8 +688,7 @@ static u64 _tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>
> old_spte = kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level);
>
> - __handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> - handle_changed_spte_acc_track(old_spte, new_spte, level);
> + handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false);
> return old_spte;
> }
>
> @@ -1273,7 +1255,7 @@ static bool set_spte_gfn(struct kvm *kvm, struct tdp_iter *iter,
> * Note, when changing a read-only SPTE, it's not strictly necessary to
> * zero the SPTE before setting the new PFN, but doing so preserves the
> * invariant that the PFN of a present * leaf SPTE can never change.
> - * See __handle_changed_spte().
> + * See handle_changed_spte().
> */
> tdp_mmu_set_spte(kvm, iter, 0);
>
> @@ -1298,7 +1280,7 @@ bool kvm_tdp_mmu_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> /*
> * No need to handle the remote TLB flush under RCU protection, the
> * target SPTE _must_ be a leaf SPTE, i.e. cannot result in freeing a
> - * shadow page. See the WARN on pfn_changed in __handle_changed_spte().
> + * shadow page. See the WARN on pfn_changed in handle_changed_spte().
> */
> return kvm_tdp_mmu_handle_gfn(kvm, range, set_spte_gfn);
> }
> --
> 2.39.1.519.gcb327c4b5f-goog
>

2023-02-06 23:18:04

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 1/5] KVM: x86/mmu: Make separate function to check for SPTEs atomic write conditions

The shortlog is difficult to understand.

- I think it's more common to use "Add" or "Introduce" when talking
about adding a new function, rather than "Make".

- "atomic write conditions" does not mirror the code naming, which
checks for "volatile bits". e.g. The function is not called
kvm_tdp_mmu_spte_need_atomic_write().

"volatile bits" is, at this point, pretty standard terminology in KVM
MMU to refer to "bits that can change outside the MMU lock". So I would
suggest leaning on that here.

So something like this:

KVM: x86/mmu: Add helper function to check if an SPTE has volatile bits

On Fri, Feb 03, 2023 at 11:28:18AM -0800, Vipin Sharma wrote:
> Move condition checks in kvm_tdp_mmu_write_spte() for writing spte
> atomically in a separate function.

s/in a separate function/to a separate function/

>
> New function will be used inc

nit: Use complete sentences. e.g. "This new function ..." or just state
the name directly, e.g. "kvm_tdp_mmu_spte_has_volatile_bits() will be
used in ...".

> future commits to clear bits in SPTE.

s/to clear bits in SPTE/to optimize clearing bits in SPTEs/

>
> Signed-off-by: Vipin Sharma <[email protected]>

Code looks fine, just grammar/writing nits above.

Reviewed-by: David Matlack <[email protected]>

2023-02-06 23:34:50

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Mon, Feb 6, 2023 at 2:06 PM Ben Gardon <[email protected]> wrote:
>
> On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
> >
> > +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> > +{
> > + atomic64_t *sptep;
> > +
> > + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> > + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> > + return (u64)atomic64_fetch_and(~mask, sptep);
> > + }
> > +
> > + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> > + return iter->old_spte;
> > +}
> > +
>
> If you do end up sending another version of the series and feel like
> breaking up this patch, you could probably split this part out since
> the change to how we set the SPTE and how we handle that change are
> somewhat independent. I like the switch to atomic64_fetch_and though.
> No idea if it's faster, but I would believe it could be.

The changes are actually dependent. Using the atomic-AND makes it
impossible for KVM to clear a volatile bit that it wasn't intending to
clear, and that is what enables simplifying the SPTE change
propagation.

Instead I would split out the opportunistic cleanup of
@record_dirty_log to a separate patch, since that's dead-code cleanup
and refactoring.

2023-02-06 23:41:53

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Fri, Feb 03, 2023 at 11:28:19AM -0800, Vipin Sharma wrote:
> No need to check all of the conditions in __handle_changed_spte() as
> clearing dirty log only involves resetting dirty or writable bit.

Channelling Sean: State what the patch does first.

>
> Make atomic change to dirty or writable bit and mark pfn dirty.

This is way too vague. And the old code already did both of these
things. What's changed is that the bits are being cleared with an atomic
AND and taking advantage of the fact that that avoids needing to deal
with changes to volatile bits.

Please also explain what effect this has on @record_dirty_log and why it
can be opportunistically cleaned up in this commit.

>
> Tested on 160 VCPU-160 GB VM and found that performance of clear dirty
> log stage improved by ~38% in dirty_log_perf_test
>
> Before optimization:
> --------------------
>
> Test iterations: 3
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory: [0x3fd7c0000000, 0x3fffc0000000)
> Populate memory time: 6.298459671s
> Setting dirty log mode took : 0.000000052s
>
> Enabling dirty logging time: 0.003815691s
>
> Iteration 1 dirty memory time: 0.185538848s
> Iteration 1 get dirty log time: 0.002562641s
> Iteration 1 clear dirty log time: 3.638543593s
> Iteration 2 dirty memory time: 0.192226071s
> Iteration 2 get dirty log time: 0.001558446s
> Iteration 2 clear dirty log time: 3.145032742s
> Iteration 3 dirty memory time: 0.193606295s
> Iteration 3 get dirty log time: 0.001559425s
> Iteration 3 clear dirty log time: 3.142340358s
> Disabling dirty logging time: 3.002873664s
> Get dirty log over 3 iterations took 0.005680512s.
> (Avg 0.001893504s/iteration)
> Clear dirty log over 3 iterations took 9.925916693s. (Avg 3.308638897s/iteration)
>
> After optimization:
> -------------------
>
> Test iterations: 3
> Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
> guest physical test memory: [0x3fd7c0000000, 0x3fffc0000000)
> Populate memory time: 6.581448437s
> Setting dirty log mode took : 0.000000058s
>
> Enabling dirty logging time: 0.003981283s
>
> Iteration 1 dirty memory time: 0.285693420s
> Iteration 1 get dirty log time: 0.002743004s
> Iteration 1 clear dirty log time: 2.384343157s
> Iteration 2 dirty memory time: 0.290414476s
> Iteration 2 get dirty log time: 0.001720445s
> Iteration 2 clear dirty log time: 1.882770288s
> Iteration 3 dirty memory time: 0.289965965s
> Iteration 3 get dirty log time: 0.001728232s
> Iteration 3 clear dirty log time: 1.881043086s
> Disabling dirty logging time: 2.930387523s
> Get dirty log over 3 iterations took 0.006191681s.
> (Avg 0.002063893s/iteration)
> Clear dirty log over 3 iterations took 6.148156531s. (Avg 2.049385510s/iteration)

Can you trim these results to just show the clear times? (assuming none
of the rest are relevant)

>
> Signed-off-by: Vipin Sharma <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_iter.h | 13 ++++++++++
> arch/x86/kvm/mmu/tdp_mmu.c | 51 +++++++++++++++----------------------
> 2 files changed, 34 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> index 30a52e5e68de..21046b34f94e 100644
> --- a/arch/x86/kvm/mmu/tdp_iter.h
> +++ b/arch/x86/kvm/mmu/tdp_iter.h
> @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> void tdp_iter_next(struct tdp_iter *iter);
> void tdp_iter_restart(struct tdp_iter *iter);
>
> +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> +{

Make "bit" plural as long as the parameter is a raw mask.

Also drop "kvm_" since this is not intended to be called outside the TDP
MMU. (It'd be nice to make the same cleanup to the read/write
functions if you feel like it.)

> + atomic64_t *sptep;
> +
> + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> + return (u64)atomic64_fetch_and(~mask, sptep);
> + }
> +
> + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> + return iter->old_spte;
> +}
> +
> #endif /* __KVM_X86_MMU_TDP_ITER_H */
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index bba33aea0fb0..83f15052aa6c 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -710,18 +710,13 @@ static inline int tdp_mmu_zap_spte_atomic(struct kvm *kvm,
> * notifier for access tracking. Leaving record_acc_track
> * unset in that case prevents page accesses from being
> * double counted.
> - * @record_dirty_log: Record the page as dirty in the dirty bitmap if
> - * appropriate for the change being made. Should be set
> - * unless performing certain dirty logging operations.
> - * Leaving record_dirty_log unset in that case prevents page
> - * writes from being double counted.
> *
> * Returns the old SPTE value, which _may_ be different than @old_spte if the
> * SPTE had voldatile bits.
> */
> static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
> u64 old_spte, u64 new_spte, gfn_t gfn, int level,
> - bool record_acc_track, bool record_dirty_log)
> + bool record_acc_track)
> {
> lockdep_assert_held_write(&kvm->mmu_lock);
>
> @@ -740,42 +735,34 @@ static u64 __tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep,
>
> if (record_acc_track)
> handle_changed_spte_acc_track(old_spte, new_spte, level);
> - if (record_dirty_log)
> - handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte,
> - new_spte, level);
> +
> + handle_changed_spte_dirty_log(kvm, as_id, gfn, old_spte, new_spte,
> + level);
> return old_spte;
> }
>
> static inline void _tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
> - u64 new_spte, bool record_acc_track,
> - bool record_dirty_log)
> + u64 new_spte, bool record_acc_track)
> {
> WARN_ON_ONCE(iter->yielded);
>
> iter->old_spte = __tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep,
> iter->old_spte, new_spte,
> iter->gfn, iter->level,
> - record_acc_track, record_dirty_log);
> + record_acc_track);
> }
>
> static inline void tdp_mmu_set_spte(struct kvm *kvm, struct tdp_iter *iter,
> u64 new_spte)
> {
> - _tdp_mmu_set_spte(kvm, iter, new_spte, true, true);
> + _tdp_mmu_set_spte(kvm, iter, new_spte, true);
> }
>
> static inline void tdp_mmu_set_spte_no_acc_track(struct kvm *kvm,
> struct tdp_iter *iter,
> u64 new_spte)
> {
> - _tdp_mmu_set_spte(kvm, iter, new_spte, false, true);
> -}
> -
> -static inline void tdp_mmu_set_spte_no_dirty_log(struct kvm *kvm,
> - struct tdp_iter *iter,
> - u64 new_spte)
> -{
> - _tdp_mmu_set_spte(kvm, iter, new_spte, true, false);
> + _tdp_mmu_set_spte(kvm, iter, new_spte, false);
> }
>
> #define tdp_root_for_each_pte(_iter, _root, _start, _end) \
> @@ -925,7 +912,7 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> return false;
>
> __tdp_mmu_set_spte(kvm, kvm_mmu_page_as_id(sp), sp->ptep, old_spte, 0,
> - sp->gfn, sp->role.level + 1, true, true);
> + sp->gfn, sp->role.level + 1, true);
>
> return true;
> }
> @@ -1678,7 +1665,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> gfn_t gfn, unsigned long mask, bool wrprot)
> {
> struct tdp_iter iter;
> - u64 new_spte;
> + u64 clear_bits;

nit: clear_bit since it's always a single bit?

>
> rcu_read_lock();
>
> @@ -1694,18 +1681,22 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> mask &= ~(1UL << (iter.gfn - gfn));
>
> if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
> - if (is_writable_pte(iter.old_spte))
> - new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
> - else
> + if (!is_writable_pte(iter.old_spte))
> continue;
> +
> + clear_bits = PT_WRITABLE_MASK;
> } else {
> - if (iter.old_spte & shadow_dirty_mask)
> - new_spte = iter.old_spte & ~shadow_dirty_mask;
> - else
> + if (!(iter.old_spte & shadow_dirty_mask))
> continue;

You can factor out the continue check now that you have clear_bits. e.g.

if (wrprot || spte_ad_need_write_protect(iter.old_spte))
clear_bits = PT_WRITABLE_MASK;
else
clear_bits = shadow_dirty_mask;

if (!(iter->old_spte & clear_bits))
continue;

iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);

> +
> + clear_bits = shadow_dirty_mask;
> }
>
> - tdp_mmu_set_spte_no_dirty_log(kvm, &iter, new_spte);
> + iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
> + trace_kvm_tdp_mmu_spte_changed(iter.as_id, iter.gfn, iter.level,
> + iter.old_spte,
> + iter.old_spte & ~clear_bits);
> + kvm_set_pfn_dirty(spte_to_pfn(iter.old_spte));
> }
>
> rcu_read_unlock();
> --
> 2.39.1.519.gcb327c4b5f-goog
>

2023-02-06 23:53:53

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
>
> diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> index 30a52e5e68de..21046b34f94e 100644
> --- a/arch/x86/kvm/mmu/tdp_iter.h
> +++ b/arch/x86/kvm/mmu/tdp_iter.h
> @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> void tdp_iter_next(struct tdp_iter *iter);
> void tdp_iter_restart(struct tdp_iter *iter);
>
> +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> +{
> + atomic64_t *sptep;
> +
> + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> + return (u64)atomic64_fetch_and(~mask, sptep);

I think you can just set iter->old_spte here and drop the return value?

> + }
> +
> + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> + return iter->old_spte;
> +}

2023-02-06 23:57:10

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 3/5] KVM: x86/mmu: Optimize SPTE change for aging gfn range

On Fri, Feb 03, 2023 at 11:28:20AM -0800, Vipin Sharma wrote:
> No need to check all of the conditions in __handle_changed_spte(). Aging
> a gfn range implies resetting access bit or marking spte for access
> tracking.
>
> Use atomic operation to only reset those bits. This avoids checking many
> conditions in __handle_changed_spte() API. Also, clean up code by
> removing dead code and API parameters.

Suggest splitting out the dead code cleanup to make it easier to review.

>
> Signed-off-by: Vipin Sharma <[email protected]>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 68 ++++++++++++++------------------------
> 1 file changed, 25 insertions(+), 43 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 83f15052aa6c..18630a06fa1f 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1251,32 +1228,37 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
> /*
> * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
> * if any of the GFNs in the range have been accessed.
> + *
> + * No need to mark corresponding PFN as accessed as this call is coming from
> + * MMU notifier for that page via HVA.

Thanks for adding this comment.

Can you just extend it to mention that the information is passed via the
return value? e.g.

* No need to mark corresponding PFN as accessed as this call is coming
* from the clear_young() or clear_flush_young() notifier, which uses
* the return value to determine if the page has been accessed.

2023-02-06 23:59:24

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 4/5] KVM: x86/mmu: Remove handle_changed_spte_dirty_log()

On Fri, Feb 03, 2023 at 11:28:21AM -0800, Vipin Sharma wrote:
> Remove handle_changed_spte_dirty_log() as there is no code flow which
> sets leaf SPTE writable and hit this path.

Heh, way too vague again. e.g. This needs to explain how make_spte()
marks pages are dirty when constructing new SPTEs for the TDP MMU.

2023-02-07 17:30:15

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Mon, Feb 6, 2023 at 2:06 PM Ben Gardon <[email protected]> wrote:
>
> On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> > index 30a52e5e68de..21046b34f94e 100644
> > --- a/arch/x86/kvm/mmu/tdp_iter.h
> > +++ b/arch/x86/kvm/mmu/tdp_iter.h
> > @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> > void tdp_iter_next(struct tdp_iter *iter);
> > void tdp_iter_restart(struct tdp_iter *iter);
> >
> > +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> > +{
> > + atomic64_t *sptep;
> > +
> > + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> > + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> > + return (u64)atomic64_fetch_and(~mask, sptep);
> > + }
> > +
> > + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> > + return iter->old_spte;
> > +}
> > +
>
> If you do end up sending another version of the series and feel like
> breaking up this patch, you could probably split this part out since
> the change to how we set the SPTE and how we handle that change are
> somewhat independent. I like the switch to atomic64_fetch_and though.
> No idea if it's faster, but I would believe it could be.

David explained in his email why it is not independent.

>
> Totally optional, but there's currently no validation on the mask.
> Maybe we're only calling this in one place, but it might be worth
> clarifying the limits (if any) on what bits can be set in the mask. I
> don't think there necessarily need to be limits as of this commit, but
> the handling around this function where it's called here would
> obviously not be sufficient if the mask were -1UL or something.
>

I cannot think of any specific mask to be useful here. Let us keep it
as it is, we can revisit this API if there is a need to add a mask in
future. If someone sends -1UL then it will be on them on how they are
using the API.

2023-02-07 17:37:39

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Mon, Feb 6, 2023 at 3:41 PM David Matlack <[email protected]> wrote:
>
> On Fri, Feb 03, 2023 at 11:28:19AM -0800, Vipin Sharma wrote:
> > No need to check all of the conditions in __handle_changed_spte() as
> > clearing dirty log only involves resetting dirty or writable bit.
>
> Channelling Sean: State what the patch does first.
>
> >
> > Make atomic change to dirty or writable bit and mark pfn dirty.
>
> This is way too vague. And the old code already did both of these
> things. What's changed is that the bits are being cleared with an atomic
> AND and taking advantage of the fact that that avoids needing to deal
> with changes to volatile bits.
>
> Please also explain what effect this has on @record_dirty_log and why it
> can be opportunistically cleaned up in this commit.
>

Okay, I will try to be better in the next one.

> > Iteration 3 clear dirty log time: 1.881043086s
> > Disabling dirty logging time: 2.930387523s
> > Get dirty log over 3 iterations took 0.006191681s.
> > (Avg 0.002063893s/iteration)
> > Clear dirty log over 3 iterations took 6.148156531s. (Avg 2.049385510s/iteration)
>
> Can you trim these results to just show the clear times? (assuming none
> of the rest are relevant)

I was not sure if just showing clear dirty times will be acceptable or
not. I will update the message to only show clear dirty log time and
average.

>
> >
> > +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> > +{
>
> Make "bit" plural as long as the parameter is a raw mask.
>
> Also drop "kvm_" since this is not intended to be called outside the TDP
> MMU. (It'd be nice to make the same cleanup to the read/write
> functions if you feel like it.)
>

Sounds good.

> > @@ -1678,7 +1665,7 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> > gfn_t gfn, unsigned long mask, bool wrprot)
> > {
> > struct tdp_iter iter;
> > - u64 new_spte;
> > + u64 clear_bits;
>
> nit: clear_bit since it's always a single bit?

Yes.

>
> >
> > rcu_read_lock();
> >
> > @@ -1694,18 +1681,22 @@ static void clear_dirty_pt_masked(struct kvm *kvm, struct kvm_mmu_page *root,
> > mask &= ~(1UL << (iter.gfn - gfn));
> >
> > if (wrprot || spte_ad_need_write_protect(iter.old_spte)) {
> > - if (is_writable_pte(iter.old_spte))
> > - new_spte = iter.old_spte & ~PT_WRITABLE_MASK;
> > - else
> > + if (!is_writable_pte(iter.old_spte))
> > continue;
> > +
> > + clear_bits = PT_WRITABLE_MASK;
> > } else {
> > - if (iter.old_spte & shadow_dirty_mask)
> > - new_spte = iter.old_spte & ~shadow_dirty_mask;
> > - else
> > + if (!(iter.old_spte & shadow_dirty_mask))
> > continue;
>
> You can factor out the continue check now that you have clear_bits. e.g.
>
> if (wrprot || spte_ad_need_write_protect(iter.old_spte))
> clear_bits = PT_WRITABLE_MASK;
> else
> clear_bits = shadow_dirty_mask;
>
> if (!(iter->old_spte & clear_bits))
> continue;
>
> iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
>

Yeah, this is better. Even better if I just initialize like:

u64 clear_bits = shadow_dirty_mask;

This will also get rid of the else part.

2023-02-07 17:42:33

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Mon, Feb 6, 2023 at 3:53 PM David Matlack <[email protected]> wrote:
>
> On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_iter.h b/arch/x86/kvm/mmu/tdp_iter.h
> > index 30a52e5e68de..21046b34f94e 100644
> > --- a/arch/x86/kvm/mmu/tdp_iter.h
> > +++ b/arch/x86/kvm/mmu/tdp_iter.h
> > @@ -121,4 +121,17 @@ void tdp_iter_start(struct tdp_iter *iter, struct kvm_mmu_page *root,
> > void tdp_iter_next(struct tdp_iter *iter);
> > void tdp_iter_restart(struct tdp_iter *iter);
> >
> > +static inline u64 kvm_tdp_mmu_clear_spte_bit(struct tdp_iter *iter, u64 mask)
> > +{
> > + atomic64_t *sptep;
> > +
> > + if (kvm_tdp_mmu_spte_has_volatile_bits(iter->old_spte, iter->level)) {
> > + sptep = (atomic64_t *)rcu_dereference(iter->sptep);
> > + return (u64)atomic64_fetch_and(~mask, sptep);
>
> I think you can just set iter->old_spte here and drop the return value?
>

I can do this. However, my intention was to keep the return contract
similar to kvm_tdp_mmu_write_spte(). On second thought, should I make
this function signature similar to kvm_tdp_mmu_write_spte() just to be
consistent?

> > + }
> > +
> > + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte & ~mask);
> > + return iter->old_spte;
> > +}

2023-02-07 17:47:43

by David Matlack

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Tue, Feb 7, 2023 at 9:37 AM Vipin Sharma <[email protected]> wrote:
>
> On Mon, Feb 6, 2023 at 3:41 PM David Matlack <[email protected]> wrote:
> >
> > On Fri, Feb 03, 2023 at 11:28:19AM -0800, Vipin Sharma wrote:
> >
> > if (wrprot || spte_ad_need_write_protect(iter.old_spte))
> > clear_bits = PT_WRITABLE_MASK;
> > else
> > clear_bits = shadow_dirty_mask;
> >
> > if (!(iter->old_spte & clear_bits))
> > continue;
> >
> > iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
> >
>
> Yeah, this is better. Even better if I just initialize like:
>
> u64 clear_bits = shadow_dirty_mask;
>
> This will also get rid of the else part.

On that topic... Do we need to recalculate clear_bits for every spte?
wrprot is passed as a parameter so that will not change. And
spte_ad_need_write_protect() should either return true or false for
all SPTEs in the TDP MMU. Specifically, make_spte() has this code:

if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_mmu_page_ad_need_write_protect(sp))
spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;

sp->role.ad_disabled is never modified in TDP MMU pages. So it should
be constant for all pages. And kvm_mmu_page_ad_need_write_protect()
will always return false for TDP MMU pages since sp->role.guest_mode
is always false.

So this can just be:

u64 clear_bit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
shadow_dirty_mask;

2023-02-07 17:49:09

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 3/5] KVM: x86/mmu: Optimize SPTE change for aging gfn range

On Mon, Feb 6, 2023 at 2:17 PM Ben Gardon <[email protected]> wrote:
>
> On Fri, Feb 3, 2023 at 11:28 AM Vipin Sharma <[email protected]> wrote:

> > - * __tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
> > + * _tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkeeping
>
> Nit: Not sure how we got to the point of having a single and double
> underscore of the function, but what do you think about just calling
> this one tdp_mmu_set_spte and the other tdp_mmu_iter_set_spte?

I like your idea. I also don't like the single and double underscore
prefix on functions.

2023-02-07 17:51:05

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 3/5] KVM: x86/mmu: Optimize SPTE change for aging gfn range

On Mon, Feb 6, 2023 at 3:57 PM David Matlack <[email protected]> wrote:
>
> On Fri, Feb 03, 2023 at 11:28:20AM -0800, Vipin Sharma wrote:
> > No need to check all of the conditions in __handle_changed_spte(). Aging
> > a gfn range implies resetting access bit or marking spte for access
> > tracking.
> >
> > Use atomic operation to only reset those bits. This avoids checking many
> > conditions in __handle_changed_spte() API. Also, clean up code by
> > removing dead code and API parameters.
>
> Suggest splitting out the dead code cleanup to make it easier to review.
>

Sounds good.

> >
> > Signed-off-by: Vipin Sharma <[email protected]>
> > ---
> > arch/x86/kvm/mmu/tdp_mmu.c | 68 ++++++++++++++------------------------
> > 1 file changed, 25 insertions(+), 43 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 83f15052aa6c..18630a06fa1f 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1251,32 +1228,37 @@ static __always_inline bool kvm_tdp_mmu_handle_gfn(struct kvm *kvm,
> > /*
> > * Mark the SPTEs range of GFNs [start, end) unaccessed and return non-zero
> > * if any of the GFNs in the range have been accessed.
> > + *
> > + * No need to mark corresponding PFN as accessed as this call is coming from
> > + * MMU notifier for that page via HVA.
>
> Thanks for adding this comment.
>
> Can you just extend it to mention that the information is passed via the
> return value? e.g.
>
> * No need to mark corresponding PFN as accessed as this call is coming
> * from the clear_young() or clear_flush_young() notifier, which uses
> * the return value to determine if the page has been accessed.
>

Sure.

2023-02-07 17:52:40

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 4/5] KVM: x86/mmu: Remove handle_changed_spte_dirty_log()

On Mon, Feb 6, 2023 at 3:59 PM David Matlack <[email protected]> wrote:
>
> On Fri, Feb 03, 2023 at 11:28:21AM -0800, Vipin Sharma wrote:
> > Remove handle_changed_spte_dirty_log() as there is no code flow which
> > sets leaf SPTE writable and hit this path.
>
> Heh, way too vague again. e.g. This needs to explain how make_spte()
> marks pages are dirty when constructing new SPTEs for the TDP MMU.

Okay, I will try to be more detailed in the next version.

2023-02-08 23:45:51

by Vipin Sharma

[permalink] [raw]

Subject: Re: [Patch v2 2/5] KVM: x86/mmu: Optimize SPTE change flow for clear-dirty-log

On Tue, Feb 7, 2023 at 9:47 AM David Matlack <[email protected]> wrote:
>
> On Tue, Feb 7, 2023 at 9:37 AM Vipin Sharma <[email protected]> wrote:
> >
> > On Mon, Feb 6, 2023 at 3:41 PM David Matlack <[email protected]> wrote:
> > >
> > > On Fri, Feb 03, 2023 at 11:28:19AM -0800, Vipin Sharma wrote:
> > >
> > > if (wrprot || spte_ad_need_write_protect(iter.old_spte))
> > > clear_bits = PT_WRITABLE_MASK;
> > > else
> > > clear_bits = shadow_dirty_mask;
> > >
> > > if (!(iter->old_spte & clear_bits))
> > > continue;
> > >
> > > iter.old_spte = kvm_tdp_mmu_clear_spte_bit(&iter, clear_bits);
> > >
> >
> > Yeah, this is better. Even better if I just initialize like:
> >
> > u64 clear_bits = shadow_dirty_mask;
> >
> > This will also get rid of the else part.
>
> On that topic... Do we need to recalculate clear_bits for every spte?
> wrprot is passed as a parameter so that will not change. And
> spte_ad_need_write_protect() should either return true or false for
> all SPTEs in the TDP MMU. Specifically, make_spte() has this code:
>
> if (sp->role.ad_disabled)
> spte |= SPTE_TDP_AD_DISABLED_MASK;
> else if (kvm_mmu_page_ad_need_write_protect(sp))
> spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK;
>
> sp->role.ad_disabled is never modified in TDP MMU pages. So it should
> be constant for all pages. And kvm_mmu_page_ad_need_write_protect()
> will always return false for TDP MMU pages since sp->role.guest_mode
> is always false.
>
> So this can just be:
>
> u64 clear_bit = (wrprot || !kvm_ad_enabled()) ? PT_WRITABLE_MASK :
> shadow_dirty_mask;

I discussed it offline with David to understand more about it. It
makes sense as TDP MMU pages will not have nested SPTEs (they are in
rmaps). So, this looks good, I will add it in the next version.

Thanks