2020-06-23 19:38:31

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 0/4] KVM: x86/mmu: Zapping and recycling cleanups

Semi-random, but related, changes that deal with the handling of active
root shadow pages during zapping and the zapping of arbitary/old pages.

Patch 1 changes the low level handling to keep zapped active roots off the
active page list. KVM already relies on the vCPU to explicitly free the
root, putting invalid root pages back on the list is just a quirk of the
implementation.

Patches 2 reworks the MMU page recycling to batch zap pages instead of
zapping them one at a time. This provides better handling for active root
pages and also avoids multiple remote TLB flushes.

Patch 3 applies the batch zapping to the .shrink_scan() path. This is a
significant change in behavior, i.e. is the scariest of the changes, but
unless I'm missing something it provides the intended functionality that
has been lacking since shrinker support was first added.

Patch 4 changes the page fault handlers to return an error to userspace
instead of restarting the guest if there are no MMU pages available. This
is dependent on patch 2 as theoretically the old recycling flow could
prematurely bail if it encountered an active root.

v2:
- Add a comment for the list shenanigans in patch 1. [Paolo]
- Add patches 2-4.
- Rebased to kvm/queue, commit a037ff353ba6 ("Merge branch ...")

Sean Christopherson (4):
KVM: x86/mmu: Don't put invalid SPs back on the list of active pages
KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages
KVM: x86/mmu: Batch zap MMU pages when shrinking the slab
KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() error

arch/x86/kvm/mmu/mmu.c | 94 +++++++++++++++++++++-------------
arch/x86/kvm/mmu/paging_tmpl.h | 3 +-
2 files changed, 61 insertions(+), 36 deletions(-)

--
2.26.0


2020-06-23 19:39:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 1/4] KVM: x86/mmu: Don't put invalid SPs back on the list of active pages

Delete a shadow page from the invalidation list instead of throwing it
back on the list of active pages when it's a root shadow page with
active users. Invalid active root pages will be explicitly freed by
mmu_free_root_page() when the root_count hits zero, i.e. they don't need
to be put on the active list to avoid leakage.

Use sp->role.invalid to detect that a shadow page has already been
zapped, i.e. is not on a list.

WARN if an invalid page is encountered when zapping pages, as it should
now be impossible.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3dd0af7e7515..8e7df4ed4b55 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2757,10 +2757,23 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
if (!sp->root_count) {
/* Count self */
(*nr_zapped)++;
- list_move(&sp->link, invalid_list);
+
+ /*
+ * Already invalid pages (previously active roots) are not on
+ * the active page list. See list_del() in the "else" case of
+ * !sp->root_count.
+ */
+ if (sp->role.invalid)
+ list_add(&sp->link, invalid_list);
+ else
+ list_move(&sp->link, invalid_list);
kvm_mod_used_mmu_pages(kvm, -1);
} else {
- list_move(&sp->link, &kvm->arch.active_mmu_pages);
+ /*
+ * Remove the active root from the active page list, the root
+ * will be explicitly freed when the root_count hits zero.
+ */
+ list_del(&sp->link);

/*
* Obsolete pages cannot be used on any vCPUs, see the comment
@@ -5727,12 +5740,11 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
break;

/*
- * Skip invalid pages with a non-zero root count, zapping pages
- * with a non-zero root count will never succeed, i.e. the page
- * will get thrown back on active_mmu_pages and we'll get stuck
- * in an infinite loop.
+ * Invalid pages should never land back on the list of active
+ * pages. Skip the bogus page, otherwise we'll get stuck in an
+ * infinite loop if the page gets put back on the list (again).
*/
- if (sp->role.invalid && sp->root_count)
+ if (WARN_ON(sp->role.invalid))
continue;

/*
@@ -6010,7 +6022,7 @@ void kvm_mmu_zap_all(struct kvm *kvm)
spin_lock(&kvm->mmu_lock);
restart:
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
- if (sp->role.invalid && sp->root_count)
+ if (WARN_ON(sp->role.invalid))
continue;
if (__kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list, &ign))
goto restart;
--
2.26.0

2020-06-23 19:40:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 2/4] KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages

Collect MMU pages for zapping in a loop when making MMU pages available,
and skip over active roots when doing so as zapping an active root can
never immediately free up a page. Batching the zapping avoids multiple
remote TLB flushes and remedies the issue where the loop would bail
early if an active root was encountered.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 58 ++++++++++++++++++++++++++++++------------
1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8e7df4ed4b55..8c85a3a178f4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2838,20 +2838,51 @@ static bool prepare_zap_oldest_mmu_page(struct kvm *kvm,
return kvm_mmu_prepare_zap_page(kvm, sp, invalid_list);
}

+static unsigned long kvm_mmu_zap_oldest_mmu_pages(struct kvm *kvm,
+ unsigned long nr_to_zap)
+{
+ unsigned long total_zapped = 0;
+ struct kvm_mmu_page *sp, *tmp;
+ LIST_HEAD(invalid_list);
+ bool unstable;
+ int nr_zapped;
+
+ if (list_empty(&kvm->arch.active_mmu_pages))
+ return 0;
+
+restart:
+ list_for_each_entry_safe(sp, tmp, &kvm->arch.active_mmu_pages, link) {
+ /*
+ * Don't zap active root pages, the page itself can't be freed
+ * and zapping it will just force vCPUs to realloc and reload.
+ */
+ if (sp->root_count)
+ continue;
+
+ unstable = __kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list,
+ &nr_zapped);
+ total_zapped += nr_zapped;
+ if (total_zapped >= nr_to_zap)
+ break;
+
+ if (unstable)
+ goto restart;
+ }
+
+ kvm_mmu_commit_zap_page(kvm, &invalid_list);
+
+ kvm->stat.mmu_recycled += total_zapped;
+ return total_zapped;
+}
+
static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
{
- LIST_HEAD(invalid_list);
+ unsigned long avail = kvm_mmu_available_pages(vcpu->kvm);

- if (likely(kvm_mmu_available_pages(vcpu->kvm) >= KVM_MIN_FREE_MMU_PAGES))
+ if (likely(avail >= KVM_MIN_FREE_MMU_PAGES))
return 0;

- while (kvm_mmu_available_pages(vcpu->kvm) < KVM_REFILL_PAGES) {
- if (!prepare_zap_oldest_mmu_page(vcpu->kvm, &invalid_list))
- break;
-
- ++vcpu->kvm->stat.mmu_recycled;
- }
- kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list);
+ kvm_mmu_zap_oldest_mmu_pages(vcpu->kvm, KVM_REFILL_PAGES - avail);

if (!kvm_mmu_available_pages(vcpu->kvm))
return -ENOSPC;
@@ -2864,17 +2895,12 @@ static int make_mmu_pages_available(struct kvm_vcpu *vcpu)
*/
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long goal_nr_mmu_pages)
{
- LIST_HEAD(invalid_list);
-
spin_lock(&kvm->mmu_lock);

if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) {
- /* Need to free some mmu pages to achieve the goal. */
- while (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages)
- if (!prepare_zap_oldest_mmu_page(kvm, &invalid_list))
- break;
+ kvm_mmu_zap_oldest_mmu_pages(kvm, kvm->arch.n_used_mmu_pages -
+ goal_nr_mmu_pages);

- kvm_mmu_commit_zap_page(kvm, &invalid_list);
goal_nr_mmu_pages = kvm->arch.n_used_mmu_pages;
}

--
2.26.0

2020-06-23 19:40:21

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 4/4] KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() error

Propagate any error returned by make_mmu_pages_available() out to
userspace instead of resuming the guest if the error occurs while
handling a page fault. Now that zapping the oldest MMU pages skips
active roots, i.e. fails if and only if there are no zappable pages,
there is no chance for a false positive, i.e. no chance of returning a
spurious error to userspace.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
arch/x86/kvm/mmu/paging_tmpl.h | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4d40b21a67bd..82086d9eecb0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4157,7 +4157,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
spin_lock(&vcpu->kvm->mmu_lock);
if (mmu_notifier_retry(vcpu->kvm, mmu_seq))
goto out_unlock;
- if (make_mmu_pages_available(vcpu) < 0)
+ r = make_mmu_pages_available(vcpu);
+ if (r)
goto out_unlock;
r = __direct_map(vcpu, gpa, write, map_writable, max_level, pfn,
prefault, is_tdp && lpage_disallowed);
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 58234bfaca07..a2db6971231d 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -865,7 +865,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
goto out_unlock;

kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT);
- if (make_mmu_pages_available(vcpu) < 0)
+ r = make_mmu_pages_available(vcpu);
+ if (r)
goto out_unlock;
r = FNAME(fetch)(vcpu, addr, &walker, write_fault, max_level, pfn,
map_writable, prefault, lpage_disallowed);
--
2.26.0

2020-07-03 17:20:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] KVM: x86/mmu: Zapping and recycling cleanups

On 23/06/20 21:35, Sean Christopherson wrote:
> Semi-random, but related, changes that deal with the handling of active
> root shadow pages during zapping and the zapping of arbitary/old pages.
>
> Patch 1 changes the low level handling to keep zapped active roots off the
> active page list. KVM already relies on the vCPU to explicitly free the
> root, putting invalid root pages back on the list is just a quirk of the
> implementation.
>
> Patches 2 reworks the MMU page recycling to batch zap pages instead of
> zapping them one at a time. This provides better handling for active root
> pages and also avoids multiple remote TLB flushes.
>
> Patch 3 applies the batch zapping to the .shrink_scan() path. This is a
> significant change in behavior, i.e. is the scariest of the changes, but
> unless I'm missing something it provides the intended functionality that
> has been lacking since shrinker support was first added.
>
> Patch 4 changes the page fault handlers to return an error to userspace
> instead of restarting the guest if there are no MMU pages available. This
> is dependent on patch 2 as theoretically the old recycling flow could
> prematurely bail if it encountered an active root.
>
> v2:
> - Add a comment for the list shenanigans in patch 1. [Paolo]
> - Add patches 2-4.
> - Rebased to kvm/queue, commit a037ff353ba6 ("Merge branch ...")
>
> Sean Christopherson (4):
> KVM: x86/mmu: Don't put invalid SPs back on the list of active pages
> KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages
> KVM: x86/mmu: Batch zap MMU pages when shrinking the slab
> KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() error
>
> arch/x86/kvm/mmu/mmu.c | 94 +++++++++++++++++++++-------------
> arch/x86/kvm/mmu/paging_tmpl.h | 3 +-
> 2 files changed, 61 insertions(+), 36 deletions(-)
>

Queued, thanks.

Paolo