2021-10-22 01:01:28

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/3] KVM: x86/mmu: Clean up kvm_zap_gfn_range()

Fix overzealous flushing in kvm_zap_gfn_range(), and clean up the mess
that it's become by extracting the legacy MMU logic to a separate
helper.

Sean Christopherson (3):
KVM: x86/mmu: Drop a redundant, broken remote TLB flush
KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()
KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate
helper

arch/x86/kvm/mmu/mmu.c | 61 ++++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 29 deletions(-)

--
2.33.0.1079.g6e70778dc9-goog


2021-10-22 01:03:39

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/3] KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate helper

Extract the zapping of rmaps, a.k.a. legacy MMU, for a gfn range to a
separate helper to clean up the unholy mess that kvm_zap_gfn_range() has
become. In addition to deep nesting, the rmaps zapping spreads out the
declaration of several variables and is generally a mess. Clean up the
mess now so that future work to improve the memslots implementation
doesn't need to deal with it.

Cc: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 52 ++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e8b8a665e2e9..182d35a216d4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5667,40 +5667,48 @@ void kvm_mmu_uninit_vm(struct kvm *kvm)
kvm_mmu_uninit_tdp_mmu(kvm);
}

+static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+ const struct kvm_memory_slot *memslot;
+ struct kvm_memslots *slots;
+ bool flush = false;
+ gfn_t start, end;
+ int i;
+
+ if (!kvm_memslots_have_rmaps(kvm))
+ return flush;
+
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot(memslot, slots) {
+ start = max(gfn_start, memslot->base_gfn);
+ end = min(gfn_end, memslot->base_gfn + memslot->npages);
+ if (start >= end)
+ continue;
+
+ flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+ PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
+ start, end - 1, true, flush);
+ }
+ }
+
+ return flush;
+}
+
/*
* Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end
* (not including it)
*/
void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
+ bool flush;
int i;
- bool flush = false;

write_lock(&kvm->mmu_lock);

kvm_inc_notifier_count(kvm, gfn_start, gfn_end);

- if (kvm_memslots_have_rmaps(kvm)) {
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
- slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(memslot, slots) {
- gfn_t start, end;
-
- start = max(gfn_start, memslot->base_gfn);
- end = min(gfn_end, memslot->base_gfn + memslot->npages);
- if (start >= end)
- continue;
-
- flush = slot_handle_level_range(kvm,
- (const struct kvm_memory_slot *) memslot,
- kvm_zap_rmapp, PG_LEVEL_4K,
- KVM_MAX_HUGEPAGE_LEVEL, start,
- end - 1, true, flush);
- }
- }
- }
+ flush = __kvm_zap_rmaps(kvm, gfn_start, gfn_end);

if (is_tdp_mmu_enabled(kvm)) {
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
--
2.33.0.1079.g6e70778dc9-goog

2021-10-22 01:04:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()

Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
said function holds mmu_lock for write for its entire duration. The
flush was added by the now-reverted commit to allow TDP MMU to flush while
holding mmu_lock for read, as the transition from write=>read required
dropping the lock and thus a pending flush needed to be serviced.

Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
Cc: Maxim Levitsky <[email protected]>
Cc: Maciej S. Szmigiero <[email protected]>
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f82b192bba0b..e8b8a665e2e9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5700,9 +5700,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
end - 1, true, flush);
}
}
- if (flush)
- kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
- gfn_end - gfn_start);
}

if (is_tdp_mmu_enabled(kvm)) {
--
2.33.0.1079.g6e70778dc9-goog

2021-10-22 11:47:11

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()

On 22.10.2021 03:00, Sean Christopherson wrote:
> Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
> said function holds mmu_lock for write for its entire duration. The
> flush was added by the now-reverted commit to allow TDP MMU to flush while
> holding mmu_lock for read, as the transition from write=>read required
> dropping the lock and thus a pending flush needed to be serviced.
>
> Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
> Cc: Maxim Levitsky <[email protected]>
> Cc: Maciej S. Szmigiero <[email protected]>
> Cc: Ben Gardon <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index f82b192bba0b..e8b8a665e2e9 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5700,9 +5700,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> end - 1, true, flush);
> }
> }
> - if (flush)
> - kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> - gfn_end - gfn_start);
> }
>
> if (is_tdp_mmu_enabled(kvm)) {
>

Unfortunately, it seems that a pending flush from __kvm_zap_rmaps()
can be reset back to false by the following line:
> flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end, flush);

kvm_tdp_mmu_zap_gfn_range() calls __kvm_tdp_mmu_zap_gfn_range with
"can_yield" set to true, which passes it to zap_gfn_range, which has
this code:
> if (can_yield &&
> tdp_mmu_iter_cond_resched(kvm, &iter, flush, shared)) {
> flush = false;
> continue;
> }

Thanks,
Maciej

2021-10-22 15:14:29

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: x86/mmu: Clean up kvm_zap_gfn_range()

On 22/10/21 03:00, Sean Christopherson wrote:
> Fix overzealous flushing in kvm_zap_gfn_range(), and clean up the mess
> that it's become by extracting the legacy MMU logic to a separate
> helper.
>
> Sean Christopherson (3):
> KVM: x86/mmu: Drop a redundant, broken remote TLB flush
> KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()
> KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate
> helper

Queued, with Cc: stable for patch 1. (The other two patches depend on
it, so I don't feel like including it in 5.15-rc).

Paolo

2021-10-25 15:42:31

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()

On Fri, Oct 22, 2021, Maciej S. Szmigiero wrote:
> On 22.10.2021 03:00, Sean Christopherson wrote:
> > Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
> > said function holds mmu_lock for write for its entire duration. The
> > flush was added by the now-reverted commit to allow TDP MMU to flush while
> > holding mmu_lock for read, as the transition from write=>read required
> > dropping the lock and thus a pending flush needed to be serviced.
> >
> > Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
> > Cc: Maxim Levitsky <[email protected]>
> > Cc: Maciej S. Szmigiero <[email protected]>
> > Cc: Ben Gardon <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/mmu/mmu.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index f82b192bba0b..e8b8a665e2e9 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -5700,9 +5700,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
> > end - 1, true, flush);
> > }
> > }
> > - if (flush)
> > - kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
> > - gfn_end - gfn_start);
> > }
> > if (is_tdp_mmu_enabled(kvm)) {
> >
>
> Unfortunately, it seems that a pending flush from __kvm_zap_rmaps()
> can be reset back to false by the following line:
> > flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end, flush);
>
> kvm_tdp_mmu_zap_gfn_range() calls __kvm_tdp_mmu_zap_gfn_range with
> "can_yield" set to true, which passes it to zap_gfn_range, which has
> this code:
> > if (can_yield &&
> > tdp_mmu_iter_cond_resched(kvm, &iter, flush, shared)) {
> > flush = false;
> > continue;
> > }

That's working by design. If the MMU (legacy or TDP) yields during zap, it _must_
flush before dropping mmu_lock so that any SPTE modifications are guaranteed to be
observed by all vCPUs. Clearing "flush" is deliberate/correct as another is flush
is needed if and only if additional SPTE modifications are made.


static inline bool tdp_mmu_iter_cond_resched(struct kvm *kvm,
struct tdp_iter *iter, bool flush,
bool shared)
{
/* Ensure forward progress has been made before yielding. */
if (iter->next_last_level_gfn == iter->yielded_gfn)
return false;

if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
rcu_read_unlock();

if (flush)
kvm_flush_remote_tlbs(kvm); <------- ****** HERE ******

if (shared)
cond_resched_rwlock_read(&kvm->mmu_lock);
else
cond_resched_rwlock_write(&kvm->mmu_lock);

rcu_read_lock();

WARN_ON(iter->gfn > iter->next_last_level_gfn);

tdp_iter_restart(iter);

return true;
}

return false;
}

2021-10-26 17:52:08

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range()

On 25.10.2021 17:39, Sean Christopherson wrote:
> On Fri, Oct 22, 2021, Maciej S. Szmigiero wrote:
>> On 22.10.2021 03:00, Sean Christopherson wrote:
>>> Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
>>> said function holds mmu_lock for write for its entire duration. The
>>> flush was added by the now-reverted commit to allow TDP MMU to flush while
>>> holding mmu_lock for read, as the transition from write=>read required
>>> dropping the lock and thus a pending flush needed to be serviced.
>>>
>>> Fixes: 5a324c24b638 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
>>> Cc: Maxim Levitsky <[email protected]>
>>> Cc: Maciej S. Szmigiero <[email protected]>
>>> Cc: Ben Gardon <[email protected]>
>>> Signed-off-by: Sean Christopherson <[email protected]>
>>> ---
>>> arch/x86/kvm/mmu/mmu.c | 3 ---
>>> 1 file changed, 3 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>> index f82b192bba0b..e8b8a665e2e9 100644
>>> --- a/arch/x86/kvm/mmu/mmu.c
>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>> @@ -5700,9 +5700,6 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
>>> end - 1, true, flush);
>>> }
>>> }
>>> - if (flush)
>>> - kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
>>> - gfn_end - gfn_start);
>>> }
>>> if (is_tdp_mmu_enabled(kvm)) {
>>>
>>
>> Unfortunately, it seems that a pending flush from __kvm_zap_rmaps()
>> can be reset back to false by the following line:
>>> flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, gfn_end, flush);
>>
>> kvm_tdp_mmu_zap_gfn_range() calls __kvm_tdp_mmu_zap_gfn_range with
>> "can_yield" set to true, which passes it to zap_gfn_range, which has
>> this code:
>>> if (can_yield &&
>>> tdp_mmu_iter_cond_resched(kvm, &iter, flush, shared)) {
>>> flush = false;
>>> continue;
>>> }
>
> That's working by design. If the MMU (legacy or TDP) yields during zap, it _must_
> flush before dropping mmu_lock so that any SPTE modifications are guaranteed to be
> observed by all vCPUs. Clearing "flush" is deliberate/correct as another is flush
> is needed if and only if additional SPTE modifications are made.
>

Got it, thanks.

Maciej