2023-05-13 00:54:55

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups

Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
page-track APIs to provide a leaner and cleaner interface. The motivation
for this series is to (significantly) reduce the number of KVM APIs that
KVMGT uses, with a long-term goal of making all kvm_host.h headers
KVM-internal.

As always for this series, the KVMGT changes are compile tested only.

Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".

v3:
- Collect reviewed/tested tags (I apologize if I missed any, I manually
gathered them this time due to a goof in my workflow). [Yan]
- Drop check on max KVM paging size from KVMGT. [Yan]
- Drop the explicit change on THP pages, and instead validate that the
pfns (not struct page pointers) are contiguous. [Yan]
- Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
for shadowing 2MiB GTT entries. [Yan]
- Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of exposing
kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
- Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
- Wrap full definition of external page track structures with
CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]

v2:
- https://lore.kernel.org/all/[email protected]
- Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
buggy) mutext. [Yan]
- Remove a spurious return from kvm_page_track_init(). [Yan]
- Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
- Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
of providing a dedicated interface. [Yan]

v1: https://lore.kernel.org/lkml/[email protected]

Sean Christopherson (24):
drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
drm/i915/gvt: Verify hugepages are contiguous in physical address
space
drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
entry type
drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
GTT
drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
drm/i915/gvt: Protect gfn hash table with vgpu_lock
KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
change
KVM: x86/mmu: Don't bounce through page-track mechanism for guest PTEs
KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
slot
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Assert that correct locks are held for page
write-tracking
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
drm/i915/gvt: Drop final dependencies on KVM internal details

Yan Zhao (4):
drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: switch from ->track_flush_slot() to
->track_remove_region()
KVM: x86: Remove the unused page-track hook track_flush_slot()

arch/x86/include/asm/kvm_host.h | 16 +-
arch/x86/include/asm/kvm_page_track.h | 73 +++-----
arch/x86/kvm/mmu.h | 2 +
arch/x86/kvm/mmu/mmu.c | 51 +++--
arch/x86/kvm/mmu/page_track.c | 256 +++++++++++++-------------
arch/x86/kvm/mmu/page_track.h | 58 ++++++
arch/x86/kvm/x86.c | 22 +--
drivers/gpu/drm/i915/gvt/gtt.c | 102 ++--------
drivers/gpu/drm/i915/gvt/gtt.h | 1 -
drivers/gpu/drm/i915/gvt/gvt.h | 3 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 117 +++++-------
drivers/gpu/drm/i915/gvt/page_track.c | 10 +-
12 files changed, 320 insertions(+), 391 deletions(-)
create mode 100644 arch/x86/kvm/mmu/page_track.h


base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
--
2.40.1.606.ga4b1b128d6-goog



2023-05-13 00:56:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 19/28] KVM: x86: Remove the unused page-track hook track_flush_slot()

From: Yan Zhao <[email protected]>

Remove ->track_remove_slot(), there are no longer any users and it's
unlikely a "flush" hook will ever be the correct API to provide to an
external page-track user.

Cc: Zhenyu Wang <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 11 -----------
arch/x86/kvm/mmu/mmu.c | 2 --
arch/x86/kvm/mmu/page_track.c | 26 --------------------------
3 files changed, 39 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index cfd36c22b467..5c348ffdc194 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -33,16 +33,6 @@ struct kvm_page_track_notifier_node {
*/
void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
struct kvm_page_track_notifier_node *node);
- /*
- * It is called when memory slot is being moved or removed
- * users can drop write-protection for the pages in that memory slot
- *
- * @kvm: the kvm where memory slot being moved or removed
- * @slot: the memory slot being moved or removed
- * @node: this node
- */
- void (*track_flush_slot)(struct kvm *kvm, struct kvm_memory_slot *slot,
- struct kvm_page_track_notifier_node *node);

/*
* Invoked when a memory region is removed from the guest. Or in KVM
@@ -85,7 +75,6 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
struct kvm_page_track_notifier_node *n);
void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
int bytes);
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot);
void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot);

bool kvm_page_track_has_external_user(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index af3e562d3106..3f9030650c3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6734,8 +6734,6 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
kvm_mmu_zap_all_fast(kvm);
-
- kvm_page_track_flush_slot(kvm, slot);
}

void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index d971c28be99d..2a6ab7c455c0 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -278,32 +278,6 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
kvm_mmu_track_write(vcpu, gpa, new, bytes);
}

-/*
- * Notify the node that memory slot is being removed or moved so that it can
- * drop write-protection for the pages in the memory slot.
- *
- * The node should figure out it has any write-protected pages in this slot
- * by itself.
- */
-void kvm_page_track_flush_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
-{
- struct kvm_page_track_notifier_head *head;
- struct kvm_page_track_notifier_node *n;
- int idx;
-
- head = &kvm->arch.track_notifier_head;
-
- if (hlist_empty(&head->track_notifier_list))
- return;
-
- idx = srcu_read_lock(&head->track_srcu);
- hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
- srcu_read_lock_held(&head->track_srcu))
- if (n->track_flush_slot)
- n->track_flush_slot(kvm, slot, n);
- srcu_read_unlock(&head->track_srcu, idx);
-}
-
/*
* Notify external page track nodes that a memory region is being removed from
* the VM, e.g. so that users can free any associated metadata.
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 00:57:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 27/28] KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers

Get/put references to KVM when a page-track notifier is (un)registered
instead of relying on the caller to do so. Forcing the caller to do the
bookkeeping is unnecessary and adds one more thing for users to get
wrong, e.g. see commit 9ed1fdee9ee3 ("drm/i915/gvt: Get reference to KVM
iff attachment to VM is successful").

Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 11 +++++------
arch/x86/kvm/mmu/page_track.c | 18 ++++++++++++------
drivers/gpu/drm/i915/gvt/kvmgt.c | 17 +++++++----------
3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 4afab697e21c..3d040741044b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -44,12 +44,11 @@ struct kvm_page_track_notifier_node {
struct kvm_page_track_notifier_node *node);
};

-void
-kvm_page_track_register_notifier(struct kvm *kvm,
- struct kvm_page_track_notifier_node *n);
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
- struct kvm_page_track_notifier_node *n);
+int kvm_page_track_register_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n);
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n);
+
int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
#else
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 2a64df38ccab..fd04e618ad2d 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -157,17 +157,22 @@ int kvm_page_track_init(struct kvm *kvm)
* register the notifier so that event interception for the tracked guest
* pages can be received.
*/
-void
-kvm_page_track_register_notifier(struct kvm *kvm,
- struct kvm_page_track_notifier_node *n)
+int kvm_page_track_register_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n)
{
struct kvm_page_track_notifier_head *head;

+ if (!kvm || kvm->mm != current->mm)
+ return -ESRCH;
+
+ kvm_get_kvm(kvm);
+
head = &kvm->arch.track_notifier_head;

write_lock(&kvm->mmu_lock);
hlist_add_head_rcu(&n->node, &head->track_notifier_list);
write_unlock(&kvm->mmu_lock);
+ return 0;
}
EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);

@@ -175,9 +180,8 @@ EXPORT_SYMBOL_GPL(kvm_page_track_register_notifier);
* stop receiving the event interception. It is the opposed operation of
* kvm_page_track_register_notifier().
*/
-void
-kvm_page_track_unregister_notifier(struct kvm *kvm,
- struct kvm_page_track_notifier_node *n)
+void kvm_page_track_unregister_notifier(struct kvm *kvm,
+ struct kvm_page_track_notifier_node *n)
{
struct kvm_page_track_notifier_head *head;

@@ -187,6 +191,8 @@ kvm_page_track_unregister_notifier(struct kvm *kvm,
hlist_del_rcu(&n->node);
write_unlock(&kvm->mmu_lock);
synchronize_srcu(&head->track_srcu);
+
+ kvm_put_kvm(kvm);
}
EXPORT_SYMBOL_GPL(kvm_page_track_unregister_notifier);

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index b995d75a19c3..597ffc9d12fd 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -653,21 +653,19 @@ static bool __kvmgt_vgpu_exist(struct intel_vgpu *vgpu)
static int intel_vgpu_open_device(struct vfio_device *vfio_dev)
{
struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev);
-
- if (!vgpu->vfio_device.kvm ||
- vgpu->vfio_device.kvm->mm != current->mm) {
- gvt_vgpu_err("KVM is required to use Intel vGPU\n");
- return -ESRCH;
- }
+ int ret;

if (__kvmgt_vgpu_exist(vgpu))
return -EEXIST;

vgpu->track_node.track_write = kvmgt_page_track_write;
vgpu->track_node.track_remove_region = kvmgt_page_track_remove_region;
- kvm_get_kvm(vgpu->vfio_device.kvm);
- kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
- &vgpu->track_node);
+ ret = kvm_page_track_register_notifier(vgpu->vfio_device.kvm,
+ &vgpu->track_node);
+ if (ret) {
+ gvt_vgpu_err("KVM is required to use Intel vGPU\n");
+ return ret;
+ }

set_bit(INTEL_VGPU_STATUS_ATTACHED, vgpu->status);

@@ -702,7 +700,6 @@ static void intel_vgpu_close_device(struct vfio_device *vfio_dev)

kvm_page_track_unregister_notifier(vgpu->vfio_device.kvm,
&vgpu->track_node);
- kvm_put_kvm(vgpu->vfio_device.kvm);

kvmgt_protect_table_destroy(vgpu);
gvt_cache_destroy(vgpu);
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 00:58:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 01/28] drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"

Check that the pfn found by gfn_to_pfn() is actually backed by "struct
page" memory prior to retrieving and dereferencing the page. KVM
supports backing guest memory with VM_PFNMAP, VM_IO, etc., and so
there is no guarantee the pfn returned by gfn_to_pfn() has an associated
"struct page".

Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
drivers/gpu/drm/i915/gvt/gtt.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 4ec85308379a..58b9b316ae46 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1183,6 +1183,10 @@ static int is_2MB_gtt_possible(struct intel_vgpu *vgpu,
pfn = gfn_to_pfn(vgpu->vfio_device.kvm, ops->get_pfn(entry));
if (is_error_noslot_pfn(pfn))
return -EINVAL;
+
+ if (!pfn_valid(pfn))
+ return -EINVAL;
+
return PageTransHuge(pfn_to_page(pfn));
}

--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 00:59:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 11/28] KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c

Move x86's implementation of kvm_arch_flush_shadow_{all,memslot}() into
mmu.c, and make kvm_mmu_zap_all() static as it was globally visible only
for kvm_arch_flush_shadow_all(). This will allow refactoring
kvm_arch_flush_shadow_memslot() to call kvm_mmu_zap_all() directly without
having to expose kvm_mmu_zap_all_fast() outside of mmu.c. Keeping
everything in mmu.c will also likely simplify supporting TDX, which
intends to do zap only relevant SPTEs on memslot updates.

No functional change intended.

Suggested-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu/mmu.c | 13 ++++++++++++-
arch/x86/kvm/x86.c | 11 -----------
3 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb9d1f2d6136..564a29153cee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1832,7 +1832,6 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
-void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index c8961f45e3b1..2e4476d38377 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6717,7 +6717,7 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
*/
}

-void kvm_mmu_zap_all(struct kvm *kvm)
+static void kvm_mmu_zap_all(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
LIST_HEAD(invalid_list);
@@ -6742,6 +6742,17 @@ void kvm_mmu_zap_all(struct kvm *kvm)
write_unlock(&kvm->mmu_lock);
}

+void kvm_arch_flush_shadow_all(struct kvm *kvm)
+{
+ kvm_mmu_zap_all(kvm);
+}
+
+void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *slot)
+{
+ kvm_page_track_flush_slot(kvm, slot);
+}
+
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen)
{
WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 523c39a03c00..b2d9c5979df7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12758,17 +12758,6 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
kvm_arch_free_memslot(kvm, old);
}

-void kvm_arch_flush_shadow_all(struct kvm *kvm)
-{
- kvm_mmu_zap_all(kvm);
-}
-
-void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
- struct kvm_memory_slot *slot)
-{
- kvm_page_track_flush_slot(kvm, slot);
-}
-
static inline bool kvm_guest_apic_has_interrupt(struct kvm_vcpu *vcpu)
{
return (is_guest_mode(vcpu) &&
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:00:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 28/28] drm/i915/gvt: Drop final dependencies on KVM internal details

Open code gpa_to_gfn() in kvmgt_page_track_write() and drop KVMGT's
dependency on kvm_host.h, i.e. include only kvm_page_track.h. KVMGT
assumes "gfn == gpa >> PAGE_SHIFT" all over the place, including a few
lines below in the same function with the same gpa, i.e. there's no
reason to use KVM's helper for this one case.

No functional change intended.

Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
drivers/gpu/drm/i915/gvt/gvt.h | 3 ++-
drivers/gpu/drm/i915/gvt/kvmgt.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 2d65800d8e93..53a0a42a50db 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -34,10 +34,11 @@
#define _GVT_H_

#include <uapi/linux/pci_regs.h>
-#include <linux/kvm_host.h>
#include <linux/vfio.h>
#include <linux/mdev.h>

+#include <asm/kvm_page_track.h>
+
#include "i915_drv.h"
#include "intel_gvt.h"

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 597ffc9d12fd..191b4484cc7e 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1584,7 +1584,7 @@ static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,

mutex_lock(&info->vgpu_lock);

- if (kvmgt_gfn_is_write_protected(info, gpa_to_gfn(gpa)))
+ if (kvmgt_gfn_is_write_protected(info, gpa >> PAGE_SHIFT))
intel_vgpu_page_track_handler(info, gpa,
(void *)val, len);

--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:01:46

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 26/28] KVM: x86/mmu: Drop @slot param from exported/external page-track APIs

Refactor KVM's exported/external page-track, a.k.a. write-track, APIs
to take only the gfn and do the required memslot lookup in KVM proper.
Forcing users of the APIs to get the memslot unnecessarily bleeds
KVM internals into KVMGT and complicates usage of the APIs.

No functional change intended.

Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 7 +--
arch/x86/kvm/mmu/mmu.c | 4 +-
arch/x86/kvm/mmu/page_track.c | 85 ++++++++++++++++++++-------
arch/x86/kvm/mmu/page_track.h | 5 ++
drivers/gpu/drm/i915/gvt/kvmgt.c | 37 +++---------
5 files changed, 80 insertions(+), 58 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index f5c1db36cdb7..4afab697e21c 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,11 +4,6 @@

#include <linux/kvm_types.h>

-void kvm_write_track_add_gfn(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn);
-
#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
/*
* The notifier represented by @kvm_page_track_notifier_node is linked into
@@ -55,6 +50,8 @@ kvm_page_track_register_notifier(struct kvm *kvm,
void
kvm_page_track_unregister_notifier(struct kvm *kvm,
struct kvm_page_track_notifier_node *n);
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn);
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn);
#else
/*
* Allow defining a node in a structure even if page tracking is disabled, e.g.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1818c047891f..22f13963c320 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)

/* the non-leaf shadow pages are keeping readonly. */
if (sp->role.level > PG_LEVEL_4K)
- return kvm_write_track_add_gfn(kvm, slot, gfn);
+ return __kvm_write_track_add_gfn(kvm, slot, gfn);

kvm_mmu_gfn_disallow_lpage(slot, gfn);

@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
if (sp->role.level > PG_LEVEL_4K)
- return kvm_write_track_remove_gfn(kvm, slot, gfn);
+ return __kvm_write_track_remove_gfn(kvm, slot, gfn);

kvm_mmu_gfn_allow_lpage(slot, gfn);
}
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index eedb5889d73e..2a64df38ccab 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -74,16 +74,8 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
slot->arch.gfn_write_track[index] += count;
}

-/*
- * add guest page to the tracking pool so that corresponding access on that
- * page will be intercepted.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn)
+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -104,18 +96,9 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
kvm_flush_remote_tlbs(kvm);
}
-EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);

-/*
- * remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page.
- *
- * @kvm: the guest instance we are interested in.
- * @slot: the @gfn belongs to.
- * @gfn: the guest page.
- */
-void kvm_write_track_remove_gfn(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn)
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn)
{
lockdep_assert_held_write(&kvm->mmu_lock);

@@ -133,7 +116,6 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
*/
kvm_mmu_gfn_allow_lpage(slot, gfn);
}
-EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);

/*
* check if the corresponding access on the specified guest page is tracked.
@@ -257,4 +239,63 @@ void kvm_page_track_delete_slot(struct kvm *kvm, struct kvm_memory_slot *slot)
srcu_read_unlock(&head->track_srcu, idx);
}

+/*
+ * add guest page to the tracking pool so that corresponding access on that
+ * page will be intercepted.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_add_gfn(struct kvm *kvm, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot;
+ int idx;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot) {
+ srcu_read_unlock(&kvm->srcu, idx);
+ return -EINVAL;
+ }
+
+ write_lock(&kvm->mmu_lock);
+ __kvm_write_track_add_gfn(kvm, slot, gfn);
+ write_unlock(&kvm->mmu_lock);
+
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);
+
+/*
+ * remove the guest page from the tracking pool which stops the interception
+ * of corresponding access on that page.
+ *
+ * @kvm: the guest instance we are interested in.
+ * @gfn: the guest page.
+ */
+int kvm_write_track_remove_gfn(struct kvm *kvm, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot;
+ int idx;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (!slot) {
+ srcu_read_unlock(&kvm->srcu, idx);
+ return -EINVAL;
+ }
+
+ write_lock(&kvm->mmu_lock);
+ __kvm_write_track_remove_gfn(kvm, slot, gfn);
+ write_unlock(&kvm->mmu_lock);
+
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);
#endif
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 50d3278e8c69..62f98c6c5af3 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,6 +15,11 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned long npages);

+void __kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn);
+void __kvm_write_track_remove_gfn(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn);
+
bool kvm_gfn_is_write_tracked(struct kvm *kvm,
const struct kvm_memory_slot *slot, gfn_t gfn);

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 18f04493e103..b995d75a19c3 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1545,9 +1545,7 @@ static struct mdev_driver intel_vgpu_mdev_driver = {

int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
{
- struct kvm *kvm = info->vfio_device.kvm;
- struct kvm_memory_slot *slot;
- int idx;
+ int r;

if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
return -ESRCH;
@@ -1555,18 +1553,9 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
if (kvmgt_gfn_is_write_protected(info, gfn))
return 0;

- idx = srcu_read_lock(&kvm->srcu);
- slot = gfn_to_memslot(kvm, gfn);
- if (!slot) {
- srcu_read_unlock(&kvm->srcu, idx);
- return -EINVAL;
- }
-
- write_lock(&kvm->mmu_lock);
- kvm_write_track_add_gfn(kvm, slot, gfn);
- write_unlock(&kvm->mmu_lock);
-
- srcu_read_unlock(&kvm->srcu, idx);
+ r = kvm_write_track_add_gfn(info->vfio_device.kvm, gfn);
+ if (r)
+ return r;

kvmgt_protect_table_add(info, gfn);
return 0;
@@ -1574,9 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)

int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
{
- struct kvm *kvm = info->vfio_device.kvm;
- struct kvm_memory_slot *slot;
- int idx;
+ int r;

if (!test_bit(INTEL_VGPU_STATUS_ATTACHED, info->status))
return -ESRCH;
@@ -1584,17 +1571,9 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
if (!kvmgt_gfn_is_write_protected(info, gfn))
return 0;

- idx = srcu_read_lock(&kvm->srcu);
- slot = gfn_to_memslot(kvm, gfn);
- if (!slot) {
- srcu_read_unlock(&kvm->srcu, idx);
- return -EINVAL;
- }
-
- write_lock(&kvm->mmu_lock);
- kvm_write_track_remove_gfn(kvm, slot, gfn);
- write_unlock(&kvm->mmu_lock);
- srcu_read_unlock(&kvm->srcu, idx);
+ r = kvm_write_track_remove_gfn(info->vfio_device.kvm, gfn);
+ if (r)
+ return r;

kvmgt_protect_table_del(info, gfn);
return 0;
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:02:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 25/28] KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled

Bug the VM if something attempts to write-track a gfn, but write-tracking
isn't enabled. The VM is doomed (and KVM has an egregious bug) if KVM or
KVMGT wants to shadow guest page tables but can't because write-tracking
isn't enabled.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/page_track.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 29ae61f1e303..eedb5889d73e 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -90,7 +90,7 @@ void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
srcu_read_lock_held(&kvm->srcu));

- if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+ if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
return;

update_gfn_write_track(slot, gfn, 1);
@@ -122,7 +122,7 @@ void kvm_write_track_remove_gfn(struct kvm *kvm,
lockdep_assert_once(lockdep_is_held(&kvm->slots_lock) ||
srcu_read_lock_held(&kvm->srcu));

- if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
+ if (KVM_BUG_ON(!kvm_page_track_write_tracking_enabled(kvm), kvm))
return;

update_gfn_write_track(slot, gfn, -1);
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:03:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 14/28] KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook

Drop @vcpu from KVM's ->track_write() hook provided for external users of
the page-track APIs now that KVM itself doesn't use the page-track
mechanism.

Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 5 ++---
arch/x86/kvm/mmu/page_track.c | 2 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 10 ++++------
3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index eb186bc57f6a..8c4d216e3b2b 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -26,14 +26,13 @@ struct kvm_page_track_notifier_node {
* It is called when guest is writing the write-tracked page
* and write emulation is finished at that time.
*
- * @vcpu: the vcpu where the write access happened.
* @gpa: the physical address written by guest.
* @new: the data was written to the address.
* @bytes: the written length.
* @node: this node
*/
- void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
- int bytes, struct kvm_page_track_notifier_node *node);
+ void (*track_write)(gpa_t gpa, const u8 *new, int bytes,
+ struct kvm_page_track_notifier_node *node);
/*
* It is called when memory slot is being moved or removed
* users can drop write-protection for the pages in that memory slot
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 23088c90d2fd..891e5cc52b45 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -272,7 +272,7 @@ void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new,
hlist_for_each_entry_srcu(n, &head->track_notifier_list, node,
srcu_read_lock_held(&head->track_srcu))
if (n->track_write)
- n->track_write(vcpu, gpa, new, bytes, n);
+ n->track_write(gpa, new, bytes, n);
srcu_read_unlock(&head->track_srcu, idx);

kvm_mmu_track_write(vcpu, gpa, new, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 0785f9cb2c20..aaebb44c139f 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -106,9 +106,8 @@ struct gvt_dma {
#define vfio_dev_to_vgpu(vfio_dev) \
container_of((vfio_dev), struct intel_vgpu, vfio_device)

-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *val, int len,
- struct kvm_page_track_notifier_node *node);
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+ struct kvm_page_track_notifier_node *node);
static void kvmgt_page_track_flush_slot(struct kvm *kvm,
struct kvm_memory_slot *slot,
struct kvm_page_track_notifier_node *node);
@@ -1602,9 +1601,8 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
return 0;
}

-static void kvmgt_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa,
- const u8 *val, int len,
- struct kvm_page_track_notifier_node *node)
+static void kvmgt_page_track_write(gpa_t gpa, const u8 *val, int len,
+ struct kvm_page_track_notifier_node *node)
{
struct intel_vgpu *info =
container_of(node, struct intel_vgpu, track_node);
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:05:01

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 09/28] drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()

Drop intel_vgpu_reset_gtt() as it no longer has any callers. In addition
to eliminating dead code, this eliminates the last possible scenario where
__kvmgt_protect_table_find() can be reached without holding vgpu_lock.
Requiring vgpu_lock to be held when calling __kvmgt_protect_table_find()
will allow a protecting the gfn hash with vgpu_lock without too much fuss.

No functional change intended.

Fixes: ba25d977571e ("drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0.")
Reviewed-by: Yan Zhao <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
drivers/gpu/drm/i915/gvt/gtt.c | 18 ------------------
drivers/gpu/drm/i915/gvt/gtt.h | 1 -
2 files changed, 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index f505be9e647a..c3c623b929ce 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -2817,24 +2817,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old)
ggtt_invalidate(gvt->gt);
}

-/**
- * intel_vgpu_reset_gtt - reset the all GTT related status
- * @vgpu: a vGPU
- *
- * This function is called from vfio core to reset reset all
- * GTT related status, including GGTT, PPGTT, scratch page.
- *
- */
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu)
-{
- /* Shadow pages are only created when there is no page
- * table tracking data, so remove page tracking data after
- * removing the shadow pages.
- */
- intel_vgpu_destroy_all_ppgtt_mm(vgpu);
- intel_vgpu_reset_ggtt(vgpu, true);
-}
-
/**
* intel_gvt_restore_ggtt - restore all vGPU's ggtt entries
* @gvt: intel gvt device
diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
index a3b0f59ec8bd..4cb183e06e95 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.h
+++ b/drivers/gpu/drm/i915/gvt/gtt.h
@@ -224,7 +224,6 @@ void intel_vgpu_reset_ggtt(struct intel_vgpu *vgpu, bool invalidate_old);
void intel_vgpu_invalidate_ppgtt(struct intel_vgpu *vgpu);

int intel_gvt_init_gtt(struct intel_gvt *gvt);
-void intel_vgpu_reset_gtt(struct intel_vgpu *vgpu);
void intel_gvt_clean_gtt(struct intel_gvt *gvt);

struct intel_vgpu_mm *intel_gvt_find_ppgtt_mm(struct intel_vgpu *vgpu,
--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:06:00

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 23/28] KVM: x86/mmu: Rename page-track APIs to reflect the new reality

Rename the page-track APIs to capture that they're all about tracking
writes, now that the facade of supporting multiple modes is gone.

Opportunstically replace "slot" with "gfn" in anticipation of removing
the @slot param from the external APIs.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_page_track.h | 8 ++++----
arch/x86/kvm/mmu/mmu.c | 8 ++++----
arch/x86/kvm/mmu/page_track.c | 21 +++++++++------------
arch/x86/kvm/mmu/page_track.h | 4 ++--
drivers/gpu/drm/i915/gvt/kvmgt.c | 4 ++--
5 files changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 9e4ee26d1779..f5c1db36cdb7 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,10 +4,10 @@

#include <linux/kvm_types.h>

-void kvm_slot_page_track_add_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn);
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_add_gfn(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn);
+void kvm_write_track_remove_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn);

#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8041f5747704..1818c047891f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,7 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)

/* the non-leaf shadow pages are keeping readonly. */
if (sp->role.level > PG_LEVEL_4K)
- return kvm_slot_page_track_add_page(kvm, slot, gfn);
+ return kvm_write_track_add_gfn(kvm, slot, gfn);

kvm_mmu_gfn_disallow_lpage(slot, gfn);

@@ -883,7 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
if (sp->role.level > PG_LEVEL_4K)
- return kvm_slot_page_track_remove_page(kvm, slot, gfn);
+ return kvm_write_track_remove_gfn(kvm, slot, gfn);

kvm_mmu_gfn_allow_lpage(slot, gfn);
}
@@ -2823,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
* track machinery is used to write-protect upper-level shadow pages,
* i.e. this guards the role.level == 4K assertion below!
*/
- if (kvm_slot_page_track_is_active(kvm, slot, gfn))
+ if (kvm_gfn_is_write_tracked(kvm, slot, gfn))
return -EPERM;

/*
@@ -4224,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
* guest is writing the page which is write tracked which can
* not be fixed by page fault handler.
*/
- if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
+ if (kvm_gfn_is_write_tracked(vcpu->kvm, fault->slot, fault->gfn))
return true;

return false;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index cdc6069b8caf..b835ba7f325c 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -84,10 +84,9 @@ static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
* @slot: the @gfn belongs to.
* @gfn: the guest page.
*/
-void kvm_slot_page_track_add_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_add_gfn(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn)
{
-
if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
return;

@@ -102,12 +101,11 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
kvm_flush_remote_tlbs(kvm);
}
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_add_gfn);

/*
* remove the guest page from the tracking pool which stops the interception
- * of corresponding access on that page. It is the opposed operation of
- * kvm_slot_page_track_add_page().
+ * of corresponding access on that page.
*
* It should be called under the protection both of mmu-lock and kvm->srcu
* or kvm->slots_lock.
@@ -116,8 +114,8 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
* @slot: the @gfn belongs to.
* @gfn: the guest page.
*/
-void kvm_slot_page_track_remove_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn)
+void kvm_write_track_remove_gfn(struct kvm *kvm,
+ struct kvm_memory_slot *slot, gfn_t gfn)
{
if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
return;
@@ -130,14 +128,13 @@ void kvm_slot_page_track_remove_page(struct kvm *kvm,
*/
kvm_mmu_gfn_allow_lpage(slot, gfn);
}
-EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
+EXPORT_SYMBOL_GPL(kvm_write_track_remove_gfn);

/*
* check if the corresponding access on the specified guest page is tracked.
*/
-bool kvm_slot_page_track_is_active(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- gfn_t gfn)
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+ const struct kvm_memory_slot *slot, gfn_t gfn)
{
int index;

diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 789d0c479519..50d3278e8c69 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -15,8 +15,8 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned long npages);

-bool kvm_slot_page_track_is_active(struct kvm *kvm,
- const struct kvm_memory_slot *slot, gfn_t gfn);
+bool kvm_gfn_is_write_tracked(struct kvm *kvm,
+ const struct kvm_memory_slot *slot, gfn_t gfn);

#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 25226e4e3417..18f04493e103 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
}

write_lock(&kvm->mmu_lock);
- kvm_slot_page_track_add_page(kvm, slot, gfn);
+ kvm_write_track_add_gfn(kvm, slot, gfn);
write_unlock(&kvm->mmu_lock);

srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
}

write_lock(&kvm->mmu_lock);
- kvm_slot_page_track_remove_page(kvm, slot, gfn);
+ kvm_write_track_remove_gfn(kvm, slot, gfn);
write_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);

--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:06:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change

Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
bouncing through the page-track mechanism. KVM (unfortunately) needs to
zap and flush all page tables on memslot DELETE/MOVE irrespective of
whether KVM is shadowing guest page tables.

This will allow changing KVM to register a page-track notifier on the
first shadow root allocation, and will also allow deleting the misguided
kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
different method for reacting to memslot changes.

No functional change intended.

Cc: Yan Zhao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 2e4476d38377..23a79723031b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
}

-static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
- struct kvm_memory_slot *slot,
- struct kvm_page_track_notifier_node *node)
-{
- kvm_mmu_zap_all_fast(kvm);
-}
-
int kvm_mmu_init_vm(struct kvm *kvm)
{
struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
@@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
}

node->track_write = kvm_mmu_pte_write;
- node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
kvm_page_track_register_notifier(kvm, node);

kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
@@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot)
{
+ kvm_mmu_zap_all_fast(kvm);
+
kvm_page_track_flush_slot(kvm, slot);
}

--
2.40.1.606.ga4b1b128d6-goog


2023-05-13 01:06:23

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 22/28] KVM: x86/mmu: Drop infrastructure for multiple page-track modes

Drop "support" for multiple page-track modes, as there is no evidence
that array-based and refcounted metadata is the optimal solution for
other modes, nor is there any evidence that other use cases, e.g. for
access-tracking, will be a good fit for the page-track machinery in
general.

E.g. one potential use case of access-tracking would be to prevent guest
access to poisoned memory (from the guest's perspective). In that case,
the number of poisoned pages is likely to be a very small percentage of
the guest memory, and there is no need to reference count the number of
access-tracking users, i.e. expanding gfn_track[] for a new mode would be
grossly inefficient. And for poisoned memory, host userspace would also
likely want to trap accesses, e.g. to inject #MC into the guest, and that
isn't currently supported by the page-track framework.

A better alternative for that poisoned page use case is likely a
variation of the proposed per-gfn attributes overlay (linked), which
would allow efficiently tracking the sparse set of poisoned pages, and by
default would exit to userspace on access.

Link: https://lore.kernel.org/all/[email protected]
Cc: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 12 +--
arch/x86/include/asm/kvm_page_track.h | 11 +--
arch/x86/kvm/mmu/mmu.c | 14 ++--
arch/x86/kvm/mmu/page_track.c | 111 ++++++++------------------
arch/x86/kvm/mmu/page_track.h | 3 +-
drivers/gpu/drm/i915/gvt/kvmgt.c | 4 +-
6 files changed, 51 insertions(+), 104 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5ce06a75d3de..3dde3a11113a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -288,13 +288,13 @@ struct kvm_kernel_irq_routing_entry;
* kvm_mmu_page_role tracks the properties of a shadow page (where shadow page
* also includes TDP pages) to determine whether or not a page can be used in
* the given MMU context. This is a subset of the overall kvm_cpu_role to
- * minimize the size of kvm_memory_slot.arch.gfn_track, i.e. allows allocating
- * 2 bytes per gfn instead of 4 bytes per gfn.
+ * minimize the size of kvm_memory_slot.arch.gfn_write_track, i.e. allows
+ * allocating 2 bytes per gfn instead of 4 bytes per gfn.
*
* Upper-level shadow pages having gptes are tracked for write-protection via
- * gfn_track. As above, gfn_track is a 16 bit counter, so KVM must not create
- * more than 2^16-1 upper-level shadow pages at a single gfn, otherwise
- * gfn_track will overflow and explosions will ensure.
+ * gfn_write_track. As above, gfn_write_track is a 16 bit counter, so KVM must
+ * not create more than 2^16-1 upper-level shadow pages at a single gfn,
+ * otherwise gfn_write_track will overflow and explosions will ensue.
*
* A unique shadow page (SP) for a gfn is created if and only if an existing SP
* cannot be reused. The ability to reuse a SP is tracked by its role, which
@@ -1005,7 +1005,7 @@ struct kvm_lpage_info {
struct kvm_arch_memory_slot {
struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES];
struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
- unsigned short *gfn_track[KVM_PAGE_TRACK_MAX];
+ unsigned short *gfn_write_track;
};

/*
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 61adb07b5927..9e4ee26d1779 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -4,17 +4,10 @@

#include <linux/kvm_types.h>

-enum kvm_page_track_mode {
- KVM_PAGE_TRACK_WRITE,
- KVM_PAGE_TRACK_MAX,
-};
-
void kvm_slot_page_track_add_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn,
- enum kvm_page_track_mode mode);
+ struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_slot_page_track_remove_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn,
- enum kvm_page_track_mode mode);
+ struct kvm_memory_slot *slot, gfn_t gfn);

#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0d9fe54ecb01..8041f5747704 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -837,8 +837,7 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)

/* the non-leaf shadow pages are keeping readonly. */
if (sp->role.level > PG_LEVEL_4K)
- return kvm_slot_page_track_add_page(kvm, slot, gfn,
- KVM_PAGE_TRACK_WRITE);
+ return kvm_slot_page_track_add_page(kvm, slot, gfn);

kvm_mmu_gfn_disallow_lpage(slot, gfn);

@@ -884,8 +883,7 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
if (sp->role.level > PG_LEVEL_4K)
- return kvm_slot_page_track_remove_page(kvm, slot, gfn,
- KVM_PAGE_TRACK_WRITE);
+ return kvm_slot_page_track_remove_page(kvm, slot, gfn);

kvm_mmu_gfn_allow_lpage(slot, gfn);
}
@@ -2825,7 +2823,7 @@ int mmu_try_to_unsync_pages(struct kvm *kvm, const struct kvm_memory_slot *slot,
* track machinery is used to write-protect upper-level shadow pages,
* i.e. this guards the role.level == 4K assertion below!
*/
- if (kvm_slot_page_track_is_active(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_slot_page_track_is_active(kvm, slot, gfn))
return -EPERM;

/*
@@ -4226,7 +4224,7 @@ static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu,
* guest is writing the page which is write tracked which can
* not be fixed by page fault handler.
*/
- if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE))
+ if (kvm_slot_page_track_is_active(vcpu->kvm, fault->slot, fault->gfn))
return true;

return false;
@@ -5461,8 +5459,8 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
* physical address properties) in a single VM would require tracking
* all relevant CPUID information in kvm_mmu_page_role. That is very
* undesirable as it would increase the memory requirements for
- * gfn_track (see struct kvm_mmu_page_role comments). For now that
- * problem is swept under the rug; KVM's CPUID API is horrific and
+ * gfn_write_track (see struct kvm_mmu_page_role comments). For now
+ * that problem is swept under the rug; KVM's CPUID API is horrific and
* it's all but impossible to solve it without introducing a new API.
*/
vcpu->arch.root_mmu.root_role.word = 0;
diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index b20aad7ac3fe..cdc6069b8caf 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -27,76 +27,50 @@ bool kvm_page_track_write_tracking_enabled(struct kvm *kvm)

void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
{
- int i;
+ kvfree(slot->arch.gfn_write_track);
+ slot->arch.gfn_write_track = NULL;
+}

- for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
- kvfree(slot->arch.gfn_track[i]);
- slot->arch.gfn_track[i] = NULL;
- }
+static int __kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ const size_t size = sizeof(*slot->arch.gfn_write_track);
+
+ if (!slot->arch.gfn_write_track)
+ slot->arch.gfn_write_track = __vcalloc(npages, size,
+ GFP_KERNEL_ACCOUNT);
+
+ return slot->arch.gfn_write_track ? 0 : -ENOMEM;
}

int kvm_page_track_create_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned long npages)
{
- int i;
-
- for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
- if (i == KVM_PAGE_TRACK_WRITE &&
- !kvm_page_track_write_tracking_enabled(kvm))
- continue;
-
- slot->arch.gfn_track[i] =
- __vcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
- GFP_KERNEL_ACCOUNT);
- if (!slot->arch.gfn_track[i])
- goto track_free;
- }
-
- return 0;
-
-track_free:
- kvm_page_track_free_memslot(slot);
- return -ENOMEM;
-}
-
-static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode)
-{
- if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX)
- return false;
-
- return true;
-}
-
-int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
-{
- unsigned short *gfn_track;
-
- if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE])
+ if (!kvm_page_track_write_tracking_enabled(kvm))
return 0;

- gfn_track = __vcalloc(slot->npages, sizeof(*gfn_track),
- GFP_KERNEL_ACCOUNT);
- if (gfn_track == NULL)
- return -ENOMEM;
+ return __kvm_page_track_write_tracking_alloc(slot, npages);
+}

- slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track;
- return 0;
+int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot)
+{
+ return __kvm_page_track_write_tracking_alloc(slot, slot->npages);
}

-static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
- enum kvm_page_track_mode mode, short count)
+static void update_gfn_write_track(struct kvm_memory_slot *slot, gfn_t gfn,
+ short count)
{
int index, val;

index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);

- val = slot->arch.gfn_track[mode][index];
+ val = slot->arch.gfn_write_track[index];

if (WARN_ON(val + count < 0 || val + count > USHRT_MAX))
return;

- slot->arch.gfn_track[mode][index] += count;
+ slot->arch.gfn_write_track[index] += count;
}

/*
@@ -109,21 +83,15 @@ static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn,
* @kvm: the guest instance we are interested in.
* @slot: the @gfn belongs to.
* @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
*/
void kvm_slot_page_track_add_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn,
- enum kvm_page_track_mode mode)
+ struct kvm_memory_slot *slot, gfn_t gfn)
{

- if (WARN_ON(!page_track_mode_is_valid(mode)))
+ if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
return;

- if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
- !kvm_page_track_write_tracking_enabled(kvm)))
- return;
-
- update_gfn_track(slot, gfn, mode, 1);
+ update_gfn_write_track(slot, gfn, 1);

/*
* new track stops large page mapping for the
@@ -131,9 +99,8 @@ void kvm_slot_page_track_add_page(struct kvm *kvm,
*/
kvm_mmu_gfn_disallow_lpage(slot, gfn);

- if (mode == KVM_PAGE_TRACK_WRITE)
- if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
- kvm_flush_remote_tlbs(kvm);
+ if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn, PG_LEVEL_4K))
+ kvm_flush_remote_tlbs(kvm);
}
EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);

@@ -148,20 +115,14 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_add_page);
* @kvm: the guest instance we are interested in.
* @slot: the @gfn belongs to.
* @gfn: the guest page.
- * @mode: tracking mode, currently only write track is supported.
*/
void kvm_slot_page_track_remove_page(struct kvm *kvm,
- struct kvm_memory_slot *slot, gfn_t gfn,
- enum kvm_page_track_mode mode)
+ struct kvm_memory_slot *slot, gfn_t gfn)
{
- if (WARN_ON(!page_track_mode_is_valid(mode)))
+ if (WARN_ON(!kvm_page_track_write_tracking_enabled(kvm)))
return;

- if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE &&
- !kvm_page_track_write_tracking_enabled(kvm)))
- return;
-
- update_gfn_track(slot, gfn, mode, -1);
+ update_gfn_write_track(slot, gfn, -1);

/*
* allow large page mapping for the tracked page
@@ -176,22 +137,18 @@ EXPORT_SYMBOL_GPL(kvm_slot_page_track_remove_page);
*/
bool kvm_slot_page_track_is_active(struct kvm *kvm,
const struct kvm_memory_slot *slot,
- gfn_t gfn, enum kvm_page_track_mode mode)
+ gfn_t gfn)
{
int index;

- if (WARN_ON(!page_track_mode_is_valid(mode)))
- return false;
-
if (!slot)
return false;

- if (mode == KVM_PAGE_TRACK_WRITE &&
- !kvm_page_track_write_tracking_enabled(kvm))
+ if (!kvm_page_track_write_tracking_enabled(kvm))
return false;

index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);
- return !!READ_ONCE(slot->arch.gfn_track[mode][index]);
+ return !!READ_ONCE(slot->arch.gfn_write_track[index]);
}

#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
diff --git a/arch/x86/kvm/mmu/page_track.h b/arch/x86/kvm/mmu/page_track.h
index 931b26b8fc8f..789d0c479519 100644
--- a/arch/x86/kvm/mmu/page_track.h
+++ b/arch/x86/kvm/mmu/page_track.h
@@ -16,8 +16,7 @@ int kvm_page_track_create_memslot(struct kvm *kvm,
unsigned long npages);

bool kvm_slot_page_track_is_active(struct kvm *kvm,
- const struct kvm_memory_slot *slot,
- gfn_t gfn, enum kvm_page_track_mode mode);
+ const struct kvm_memory_slot *slot, gfn_t gfn);

#ifdef CONFIG_KVM_EXTERNAL_WRITE_TRACKING
int kvm_page_track_init(struct kvm *kvm);
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 2e65901270ca..25226e4e3417 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1563,7 +1563,7 @@ int intel_gvt_page_track_add(struct intel_vgpu *info, u64 gfn)
}

write_lock(&kvm->mmu_lock);
- kvm_slot_page_track_add_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+ kvm_slot_page_track_add_page(kvm, slot, gfn);
write_unlock(&kvm->mmu_lock);

srcu_read_unlock(&kvm->srcu, idx);
@@ -1592,7 +1592,7 @@ int intel_gvt_page_track_remove(struct intel_vgpu *info, u64 gfn)
}

write_lock(&kvm->mmu_lock);
- kvm_slot_page_track_remove_page(kvm, slot, gfn, KVM_PAGE_TRACK_WRITE);
+ kvm_slot_page_track_remove_page(kvm, slot, gfn);
write_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);

--
2.40.1.606.ga4b1b128d6-goog


2023-05-17 03:11:39

by Yan Zhao

[permalink] [raw]
Subject: Re: [PATCH v3 12/28] KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot change

Reviewed-by: Yan Zhao <[email protected]>

On Fri, May 12, 2023 at 05:35:44PM -0700, Sean Christopherson wrote:
> Call kvm_mmu_zap_all_fast() directly when flushing a memslot instead of
> bouncing through the page-track mechanism. KVM (unfortunately) needs to
> zap and flush all page tables on memslot DELETE/MOVE irrespective of
> whether KVM is shadowing guest page tables.
>
> This will allow changing KVM to register a page-track notifier on the
> first shadow root allocation, and will also allow deleting the misguided
> kvm_page_track_flush_slot() hook itself once KVM-GT also moves to a
> different method for reacting to memslot changes.
>
> No functional change intended.
>
> Cc: Yan Zhao <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 2e4476d38377..23a79723031b 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6184,13 +6184,6 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> }
>
> -static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
> - struct kvm_memory_slot *slot,
> - struct kvm_page_track_notifier_node *node)
> -{
> - kvm_mmu_zap_all_fast(kvm);
> -}
> -
> int kvm_mmu_init_vm(struct kvm *kvm)
> {
> struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
> @@ -6208,7 +6201,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
> }
>
> node->track_write = kvm_mmu_pte_write;
> - node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
> kvm_page_track_register_notifier(kvm, node);
>
> kvm->arch.split_page_header_cache.kmem_cache = mmu_page_header_cache;
> @@ -6750,6 +6742,8 @@ void kvm_arch_flush_shadow_all(struct kvm *kvm)
> void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
> struct kvm_memory_slot *slot)
> {
> + kvm_mmu_zap_all_fast(kvm);
> +
> kvm_page_track_flush_slot(kvm, slot);
> }
>
> --
> 2.40.1.606.ga4b1b128d6-goog
>

2023-06-14 03:19:05

by Ma, Yongwei

[permalink] [raw]
Subject: RE: [PATCH v3 00/28] drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups

> Fix a variety of found-by-inspection bugs in KVMGT, and overhaul KVM's
> page-track APIs to provide a leaner and cleaner interface. The motivation for
> this series is to (significantly) reduce the number of KVM APIs that KVMGT
> uses, with a long-term goal of making all kvm_host.h headers KVM-internal.
>
> As always for this series, the KVMGT changes are compile tested only.
>
> Based on "git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-6.4-1".
>
> v3:
> - Collect reviewed/tested tags (I apologize if I missed any, I manually
> gathered them this time due to a goof in my workflow). [Yan]
> - Drop check on max KVM paging size from KVMGT. [Yan]
> - Drop the explicit change on THP pages, and instead validate that the
> pfns (not struct page pointers) are contiguous. [Yan]
> - Fix buggy intel_gvt_dma_map_guest_page() usage by eliminating a helper
> for shadowing 2MiB GTT entries. [Yan]
> - Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c instead of
> exposing
> kvm_mmu_zap_all_fast() outside of mmu.c. [Yan]
> - Fix an alignment goof in hlist_for_each_entry_srcu() usage. [Yan]
> - Wrap full definition of external page track structures with
> CONFIG_KVM_EXTERNAL_WRITE_TRACKING. [Yan]
>
> v2:
> - https://lore.kernel.org/all/[email protected]
> - Reuse vgpu_lock to protect gfn hash instead of introducing a new (and
> buggy) mutext. [Yan]
> - Remove a spurious return from kvm_page_track_init(). [Yan]
> - Take @kvm directly in the inner __kvm_page_track_write(). [Yan]
> - Delete the gfn sanity check that relies on kvm_is_visible_gfn() instead
> of providing a dedicated interface. [Yan]
>
> v1: https://lore.kernel.org/lkml/20221223005739.1295925-1-
> [email protected]
>
> Sean Christopherson (24):
> drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
> drm/i915/gvt: Verify hugepages are contiguous in physical address
> space
> drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
> drm/i915/gvt: Explicitly check that vGPU is attached before shadowing
> drm/i915/gvt: Error out on an attempt to shadowing an unknown GTT
> entry type
> drm/i915/gvt: Don't rely on KVM's gfn_to_pfn() to query possible 2M
> GTT
> drm/i915/gvt: Use an "unsigned long" to iterate over memslot gfns
> drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
> drm/i915/gvt: Protect gfn hash table with vgpu_lock
> KVM: x86/mmu: Move kvm_arch_flush_shadow_{all,memslot}() to mmu.c
> KVM: x86/mmu: Don't rely on page-track mechanism to flush on memslot
> change
> KVM: x86/mmu: Don't bounce through page-track mechanism for guest
> PTEs
> KVM: drm/i915/gvt: Drop @vcpu from KVM's ->track_write() hook
> KVM: x86: Reject memslot MOVE operations if KVMGT is attached
> drm/i915/gvt: Don't bother removing write-protection on to-be-deleted
> slot
> KVM: x86/mmu: Move KVM-only page-track declarations to internal header
> KVM: x86/mmu: Use page-track notifiers iff there are external users
> KVM: x86/mmu: Drop infrastructure for multiple page-track modes
> KVM: x86/mmu: Rename page-track APIs to reflect the new reality
> KVM: x86/mmu: Assert that correct locks are held for page
> write-tracking
> KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
> KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
> KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
> drm/i915/gvt: Drop final dependencies on KVM internal details
>
> Yan Zhao (4):
> drm/i915/gvt: remove interface intel_gvt_is_valid_gfn
> KVM: x86: Add a new page-track hook to handle memslot deletion
> drm/i915/gvt: switch from ->track_flush_slot() to
> ->track_remove_region()
> KVM: x86: Remove the unused page-track hook track_flush_slot()
>
> arch/x86/include/asm/kvm_host.h | 16 +-
> arch/x86/include/asm/kvm_page_track.h | 73 +++-----
> arch/x86/kvm/mmu.h | 2 +
> arch/x86/kvm/mmu/mmu.c | 51 +++--
> arch/x86/kvm/mmu/page_track.c | 256 +++++++++++++-------------
> arch/x86/kvm/mmu/page_track.h | 58 ++++++
> arch/x86/kvm/x86.c | 22 +--
> drivers/gpu/drm/i915/gvt/gtt.c | 102 ++--------
> drivers/gpu/drm/i915/gvt/gtt.h | 1 -
> drivers/gpu/drm/i915/gvt/gvt.h | 3 +-
> drivers/gpu/drm/i915/gvt/kvmgt.c | 117 +++++-------
> drivers/gpu/drm/i915/gvt/page_track.c | 10 +-
> 12 files changed, 320 insertions(+), 391 deletions(-) create mode 100644
> arch/x86/kvm/mmu/page_track.h
>
>
> base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
> --
> 2.40.1.606.ga4b1b128d6-goog
Verified GVT-g on Intel platform host Core(TM) i7-8559U CPU @ 2.70GHz
+ Ubuntu22.04 LTS.

Both Linux Ubuntu 22.04 VM and Windows10 VM could boot up successfully.
3D benchmark GLmark2 can run in the guest VM.

Tested-by: Yongwei Ma <[email protected]>

Best Regards,
Yongwei Ma