2023-11-21 18:04:02

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 00/15] KVM: xen: update shared_info and vcpu_info handling

From: Paul Durrant <[email protected]>

Currently the shared_info page is treated as guest memory and the VMM
informs KVM of its location using a GFN. However it is not guest memory as
such; it's an overlay page. So the pfncache is pointlessly invalidated and
re-activated to create a new mapping of the *same page* of memory every
time the guest requests that shared_info be mapped into its address space.
Avoid doing that by modifying the pfncache code to allow activation using
a fixed userspace HVA as well as a GPA.

Version 8 of the series has been substantially re-worked and includes
several clean-up patches, particlarly the non-trivial clean-up patches:
"pfncache: remove KVM_GUEST_USES_PFN usage" and "xen: split up
kvm_xen_set_evtchn_fast()". Each patch carries its own version history.

Paul Durrant (15):
KVM: pfncache: Add a map helper function
KVM: pfncache: remove unnecessary exports
KVM: xen: mark guest pages dirty with the pfncache lock held
KVM: pfncache: add a mark-dirty helper
KVM: pfncache: remove KVM_GUEST_USES_PFN usage
KVM: pfncache: stop open-coding offset_in_page()
KVM: pfncache: include page offset in uhva and use it consistently
KVM: pfncache: allow a cache to be activated with a fixed (userspace)
HVA
KVM: xen: allow shared_info to be mapped by fixed HVA
KVM: xen: allow vcpu_info to be mapped by fixed HVA
KVM: selftests / xen: map shared_info using HVA rather than GFN
KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA
KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
KVM: xen: split up kvm_xen_set_evtchn_fast()
KVM: xen: allow vcpu_info content to be 'safely' copied

Documentation/virt/kvm/api.rst | 53 +++-
arch/x86/kvm/x86.c | 7 +-
arch/x86/kvm/xen.c | 260 +++++++++++-------
include/linux/kvm_host.h | 38 ++-
include/linux/kvm_types.h | 8 -
include/uapi/linux/kvm.h | 9 +-
.../selftests/kvm/x86_64/xen_shinfo_test.c | 59 +++-
virt/kvm/pfncache.c | 175 ++++++------
8 files changed, 374 insertions(+), 235 deletions(-)


base-commit: 45b890f7689eb0aba454fc5831d2d79763781677
---
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
--
2.39.2


2023-11-21 18:04:16

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 01/15] KVM: pfncache: Add a map helper function

From: Paul Durrant <[email protected]>

There is a pfncache unmap helper but mapping is open-coded. Arguably this
is fine because mapping is done in only one place, hva_to_pfn_retry(), but
adding the helper does make that function more readable.

No functional change intended.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Paolo Bonzini <[email protected]>

v8:
- Re-work commit comment.
- Fix CONFIG_HAS_IOMEM=n build.
---
virt/kvm/pfncache.c | 47 ++++++++++++++++++++++++++++-----------------
1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 2d6aba677830..10842f1eeeae 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -96,17 +96,32 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
}
EXPORT_SYMBOL_GPL(kvm_gpc_check);

-static void gpc_unmap_khva(kvm_pfn_t pfn, void *khva)
+static void *gpc_map(kvm_pfn_t pfn)
{
- /* Unmap the old pfn/page if it was mapped before. */
- if (!is_error_noslot_pfn(pfn) && khva) {
- if (pfn_valid(pfn))
- kunmap(pfn_to_page(pfn));
+ if (pfn_valid(pfn))
+ return kmap(pfn_to_page(pfn));
+
#ifdef CONFIG_HAS_IOMEM
- else
- memunmap(khva);
+ return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB);
+#else
+ return NULL;
#endif
+}
+
+static void gpc_unmap(kvm_pfn_t pfn, void *khva)
+{
+ /* Unmap the old pfn/page if it was mapped before. */
+ if (is_error_noslot_pfn(pfn) || !khva)
+ return;
+
+ if (pfn_valid(pfn)) {
+ kunmap(pfn_to_page(pfn));
+ return;
}
+
+#ifdef CONFIG_HAS_IOMEM
+ memunmap(khva);
+#endif
}

static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_seq)
@@ -175,7 +190,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
* the existing mapping and didn't create a new one.
*/
if (new_khva != old_khva)
- gpc_unmap_khva(new_pfn, new_khva);
+ gpc_unmap(new_pfn, new_khva);

kvm_release_pfn_clean(new_pfn);

@@ -193,15 +208,11 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
* too must be done outside of gpc->lock!
*/
if (gpc->usage & KVM_HOST_USES_PFN) {
- if (new_pfn == gpc->pfn) {
+ if (new_pfn == gpc->pfn)
new_khva = old_khva;
- } else if (pfn_valid(new_pfn)) {
- new_khva = kmap(pfn_to_page(new_pfn));
-#ifdef CONFIG_HAS_IOMEM
- } else {
- new_khva = memremap(pfn_to_hpa(new_pfn), PAGE_SIZE, MEMREMAP_WB);
-#endif
- }
+ else
+ new_khva = gpc_map(new_pfn);
+
if (!new_khva) {
kvm_release_pfn_clean(new_pfn);
goto out_error;
@@ -326,7 +337,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
mutex_unlock(&gpc->refresh_lock);

if (unmap_old)
- gpc_unmap_khva(old_pfn, old_khva);
+ gpc_unmap(old_pfn, old_khva);

return ret;
}
@@ -412,7 +423,7 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
list_del(&gpc->list);
spin_unlock(&kvm->gpc_lock);

- gpc_unmap_khva(old_pfn, old_khva);
+ gpc_unmap(old_pfn, old_khva);
}
}
EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
--
2.39.2

2023-11-21 18:04:16

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 02/15] KVM: pfncache: remove unnecessary exports

From: Paul Durrant <[email protected]>

There is need for the existing kvm_gpc_XXX() functions to be exported. Clean
up now before additional functions are added in subsequent patches.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Paolo Bonzini <[email protected]>

v8:
- New in this version.
---
virt/kvm/pfncache.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 10842f1eeeae..f3571f44d9af 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -94,7 +94,6 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)

return true;
}
-EXPORT_SYMBOL_GPL(kvm_gpc_check);

static void *gpc_map(kvm_pfn_t pfn)
{
@@ -346,7 +345,6 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
{
return __kvm_gpc_refresh(gpc, gpc->gpa, len);
}
-EXPORT_SYMBOL_GPL(kvm_gpc_refresh);

void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
struct kvm_vcpu *vcpu, enum pfn_cache_usage usage)
@@ -363,7 +361,6 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
gpc->pfn = KVM_PFN_ERR_FAULT;
gpc->uhva = KVM_HVA_ERR_BAD;
}
-EXPORT_SYMBOL_GPL(kvm_gpc_init);

int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
{
@@ -388,7 +385,6 @@ int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
}
return __kvm_gpc_refresh(gpc, gpa, len);
}
-EXPORT_SYMBOL_GPL(kvm_gpc_activate);

void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
{
@@ -426,4 +422,3 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
gpc_unmap(old_pfn, old_khva);
}
}
-EXPORT_SYMBOL_GPL(kvm_gpc_deactivate);
--
2.39.2

2023-11-21 18:15:14

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 09/15] KVM: xen: allow shared_info to be mapped by fixed HVA

From: Paul Durrant <[email protected]>

The shared_info page is not guest memory as such. It is a dedicated page
allocated by the VMM and overlaid onto guest memory in a GFN chosen by the
guest and specified in the XENMEM_add_to_physmap hypercall. The guest may
even request that shared_info be moved from one GFN to another by
re-issuing that hypercall, but the HVA is never going to change.

Because the shared_info page is an overlay the memory slots need to be
updated in response to the hypercall. However, memory slot adjustment is
not atomic and, whilst all vCPUs are paused, there is still the possibility
that events may be delivered (which requires the shared_info page to be
updated) whilst the shared_info GPA is absent. The HVA is never absent
though, so it makes much more sense to use that as the basis for the
kernel's mapping.

Hence add a new KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA attribute type for this
purpose and a KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag to advertize its
availability. Don't actually advertize it yet though. That will be done in
a subsequent patch, which will also add tests for the new attribute type.

Also update the KVM API documentation with the new attribute and also fix
it up to consistently refer to 'shared_info' (with the underscore).

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v8:
- Re-base.

v2:
- Define the new attribute and capability but don't advertize the
capability yet.
- Add API documentation.
---
Documentation/virt/kvm/api.rst | 25 +++++++++++++++++++------
arch/x86/kvm/xen.c | 32 +++++++++++++++++++++++++-------
include/uapi/linux/kvm.h | 6 +++++-
3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7025b3751027..25ac75b6d8c5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -353,7 +353,7 @@ The bits in the dirty bitmap are cleared before the ioctl returns, unless
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information,
see the description of the capability.

-Note that the Xen shared info page, if configured, shall always be assumed
+Note that the Xen shared_info page, if configured, shall always be assumed
to be dirty. KVM will not explicitly mark it such.


@@ -5480,8 +5480,9 @@ KVM_PV_ASYNC_CLEANUP_PERFORM
__u8 long_mode;
__u8 vector;
__u8 runstate_update_flag;
- struct {
+ union {
__u64 gfn;
+ __u64 hva;
} shared_info;
struct {
__u32 send_port;
@@ -5509,10 +5510,10 @@ type values:

KVM_XEN_ATTR_TYPE_LONG_MODE
Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
- determines the layout of the shared info pages exposed to the VM.
+ determines the layout of the shared_info page exposed to the VM.

KVM_XEN_ATTR_TYPE_SHARED_INFO
- Sets the guest physical frame number at which the Xen "shared info"
+ Sets the guest physical frame number at which the Xen shared_info
page resides. Note that although Xen places vcpu_info for the first
32 vCPUs in the shared_info page, KVM does not automatically do so
and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
@@ -5521,7 +5522,7 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
not be aware of the Xen CPU id which is used as the index into the
vcpu_info[] array, so may know the correct default location.

- Note that the shared info page may be constantly written to by KVM;
+ Note that the shared_info page may be constantly written to by KVM;
it contains the event channel bitmap used to deliver interrupts to
a Xen guest, amongst other things. It is exempt from dirty tracking
mechanisms — KVM will not explicitly mark the page as dirty each
@@ -5530,9 +5531,21 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
any vCPU has been running or any event channel interrupts can be
routed to the guest.

- Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info
+ Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared_info
page.

+KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA
+ If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+ Xen capabilities, then this attribute may be used to set the
+ userspace address at which the shared_info page resides, which
+ will always be fixed in the VMM regardless of where it is mapped
+ in guest physical address space. This attribute should be used in
+ preference to KVM_XEN_ATTR_TYPE_SHARED_INFO as it avoids
+ unnecessary invalidation of an internal cache when the page is
+ re-mapped in guest physcial address space.
+
+ Setting the hva to zero will disable the shared_info page.
+
KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
Sets the exception vector used to deliver Xen event channel upcalls.
This is the HVM-wide vector injected directly by the hypervisor
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e1967f970f54..618ae4c0e7f8 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -34,24 +34,27 @@ static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r);

DEFINE_STATIC_KEY_DEFERRED_FALSE(kvm_xen_enabled, HZ);

-static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+static int kvm_xen_shared_info_init(struct kvm *kvm, u64 addr, bool addr_is_gfn)
{
struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
struct pvclock_wall_clock *wc;
- gpa_t gpa = gfn_to_gpa(gfn);
u32 *wc_sec_hi;
u32 wc_version;
u64 wall_nsec;
int ret = 0;
int idx = srcu_read_lock(&kvm->srcu);

- if (gfn == KVM_XEN_INVALID_GFN) {
+ if ((addr_is_gfn && addr == KVM_XEN_INVALID_GFN) ||
+ (!addr_is_gfn && addr == 0)) {
kvm_gpc_deactivate(gpc);
goto out;
}

do {
- ret = kvm_gpc_activate(gpc, gpa, PAGE_SIZE);
+ if (addr_is_gfn)
+ ret = kvm_gpc_activate(gpc, gfn_to_gpa(addr), PAGE_SIZE);
+ else
+ ret = kvm_gpc_activate_hva(gpc, addr, PAGE_SIZE);
if (ret)
goto out;

@@ -626,7 +629,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
{
int r = -ENOENT;

-
switch (data->type) {
case KVM_XEN_ATTR_TYPE_LONG_MODE:
if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) {
@@ -641,7 +643,13 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)

case KVM_XEN_ATTR_TYPE_SHARED_INFO:
mutex_lock(&kvm->arch.xen.xen_lock);
- r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
+ r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn, true);
+ mutex_unlock(&kvm->arch.xen.xen_lock);
+ break;
+
+ case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA:
+ mutex_lock(&kvm->arch.xen.xen_lock);
+ r = kvm_xen_shared_info_init(kvm, data->u.shared_info.hva, false);
mutex_unlock(&kvm->arch.xen.xen_lock);
break;

@@ -698,13 +706,23 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
break;

case KVM_XEN_ATTR_TYPE_SHARED_INFO:
- if (kvm->arch.xen.shinfo_cache.active)
+ if (kvm->arch.xen.shinfo_cache.active &&
+ kvm->arch.xen.shinfo_cache.gpa != KVM_XEN_INVALID_GPA)
data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
else
data->u.shared_info.gfn = KVM_XEN_INVALID_GFN;
r = 0;
break;

+ case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA:
+ if (kvm->arch.xen.shinfo_cache.active &&
+ kvm->arch.xen.shinfo_cache.gpa == KVM_XEN_INVALID_GPA)
+ data->u.shared_info.hva = kvm->arch.xen.shinfo_cache.uhva;
+ else
+ data->u.shared_info.hva = 0;
+ r = 0;
+ break;
+
case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
data->u.vector = kvm->arch.xen.upcall_vector;
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 211b86de35ac..769361b91f95 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1291,6 +1291,7 @@ struct kvm_x86_mce {
#define KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL (1 << 4)
#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA (1 << 7)

struct kvm_xen_hvm_config {
__u32 flags;
@@ -1804,9 +1805,10 @@ struct kvm_xen_hvm_attr {
__u8 long_mode;
__u8 vector;
__u8 runstate_update_flag;
- struct {
+ union {
__u64 gfn;
#define KVM_XEN_INVALID_GFN ((__u64)-1)
+ __u64 hva;
} shared_info;
struct {
__u32 send_port;
@@ -1848,6 +1850,8 @@ struct kvm_xen_hvm_attr {
#define KVM_XEN_ATTR_TYPE_XEN_VERSION 0x4
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG 0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA 0x6

/* Per-vCPU Xen attributes */
#define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
--
2.39.2

2023-11-21 18:16:43

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 03/15] KVM: xen: mark guest pages dirty with the pfncache lock held

From: Paul Durrant <[email protected]>

Sampling gpa and memslot from an unlocked pfncache may yield inconsistent
values so, since there is no problem with calling mark_page_dirty_in_slot()
with the pfncache lock held, relocate the calls in
kvm_xen_update_runstate_guest() and kvm_xen_inject_pending_events()
accordingly.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v8:
- New in this version.
---
arch/x86/kvm/xen.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e53fad915a62..426306022c2f 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -452,14 +452,13 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
smp_wmb();
}

- if (user_len2)
+ if (user_len2) {
+ mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
read_unlock(&gpc2->lock);
-
- read_unlock_irqrestore(&gpc1->lock, flags);
+ }

mark_page_dirty_in_slot(v->kvm, gpc1->memslot, gpc1->gpa >> PAGE_SHIFT);
- if (user_len2)
- mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
+ read_unlock_irqrestore(&gpc1->lock, flags);
}

void kvm_xen_update_runstate(struct kvm_vcpu *v, int state)
@@ -565,13 +564,13 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v)
: "0" (evtchn_pending_sel32));
WRITE_ONCE(vi->evtchn_upcall_pending, 1);
}
+
+ mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
read_unlock_irqrestore(&gpc->lock, flags);

/* For the per-vCPU lapic vector, deliver it as MSI. */
if (v->arch.xen.upcall_vector)
kvm_xen_inject_vcpu_vector(v);
-
- mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
}

int __kvm_xen_has_interrupt(struct kvm_vcpu *v)
--
2.39.2

2023-11-21 18:17:20

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 04/15] KVM: pfncache: add a mark-dirty helper

From: Paul Durrant <[email protected]>

At the moment pages are marked dirty by open-coded calls to
mark_page_dirty_in_slot(), directly deferefencing the gpa and memslot
from the cache. After a subsequent patch these may not always be set
so add a helper now so that caller will protected from the need to know
about this detail.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v8:
- Make the helper a static inline.
---
arch/x86/kvm/x86.c | 2 +-
arch/x86/kvm/xen.c | 6 +++---
include/linux/kvm_host.h | 10 ++++++++++
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c924075f6f1..f4ebac198ff5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3145,7 +3145,7 @@ static void kvm_setup_guest_pvclock(struct kvm_vcpu *v,

guest_hv_clock->version = ++vcpu->hv_clock.version;

- mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+ kvm_gpc_mark_dirty(gpc);
read_unlock_irqrestore(&gpc->lock, flags);

trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 426306022c2f..41a7c03f7204 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -453,11 +453,11 @@ static void kvm_xen_update_runstate_guest(struct kvm_vcpu *v, bool atomic)
}

if (user_len2) {
- mark_page_dirty_in_slot(v->kvm, gpc2->memslot, gpc2->gpa >> PAGE_SHIFT);
+ kvm_gpc_mark_dirty(gpc2);
read_unlock(&gpc2->lock);
}

- mark_page_dirty_in_slot(v->kvm, gpc1->memslot, gpc1->gpa >> PAGE_SHIFT);
+ kvm_gpc_mark_dirty(gpc1);
read_unlock_irqrestore(&gpc1->lock, flags);
}

@@ -565,7 +565,7 @@ void kvm_xen_inject_pending_events(struct kvm_vcpu *v)
WRITE_ONCE(vi->evtchn_upcall_pending, 1);
}

- mark_page_dirty_in_slot(v->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+ kvm_gpc_mark_dirty(gpc);
read_unlock_irqrestore(&gpc->lock, flags);

/* For the per-vCPU lapic vector, deliver it as MSI. */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fb6c6109fdca..ae83caa99974 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1367,6 +1367,16 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len);
*/
void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);

+/**
+ * kvm_gpc_mark_dirty - mark a cached page as dirty.
+ *
+ * @gpc: struct gfn_to_pfn_cache object.
+ */
+static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
+{
+ mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+}
+
void kvm_sigset_activate(struct kvm_vcpu *vcpu);
void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);

--
2.39.2

2023-11-21 18:19:25

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 05/15] KVM: pfncache: remove KVM_GUEST_USES_PFN usage

From: Paul Durrant <[email protected]>

As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
callers of kvm_gpc_init(), which also makes the 'vcpu' argument redundant.
Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
check in hva_to_pfn_retry() and hence the 'usage' argument to
kvm_gpc_init() are also redundant.
Remove the pfn_cache_usage enumeration and remove the redundant arguments,
fields of struct gfn_to_hva_cache, and all the related code.

[1] https://lore.kernel.org/all/[email protected]/

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v8:
- New in this version.
---
arch/x86/kvm/x86.c | 2 +-
arch/x86/kvm/xen.c | 14 ++++-----
include/linux/kvm_host.h | 11 +------
include/linux/kvm_types.h | 8 -----
virt/kvm/pfncache.c | 61 ++++++---------------------------------
5 files changed, 16 insertions(+), 80 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f4ebac198ff5..4afe9e447ba4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11976,7 +11976,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
vcpu->arch.regs_avail = ~0;
vcpu->arch.regs_dirty = ~0;

- kvm_gpc_init(&vcpu->arch.pv_time, vcpu->kvm, vcpu, KVM_HOST_USES_PFN);
+ kvm_gpc_init(&vcpu->arch.pv_time, vcpu->kvm);

if (!irqchip_in_kernel(vcpu->kvm) || kvm_vcpu_is_reset_bsp(vcpu))
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 41a7c03f7204..e1967f970f54 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -2101,14 +2101,10 @@ void kvm_xen_init_vcpu(struct kvm_vcpu *vcpu)

timer_setup(&vcpu->arch.xen.poll_timer, cancel_evtchn_poll, 0);

- kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm, NULL,
- KVM_HOST_USES_PFN);
- kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm, NULL,
- KVM_HOST_USES_PFN);
- kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache, vcpu->kvm, NULL,
- KVM_HOST_USES_PFN);
- kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache, vcpu->kvm, NULL,
- KVM_HOST_USES_PFN);
+ kvm_gpc_init(&vcpu->arch.xen.runstate_cache, vcpu->kvm);
+ kvm_gpc_init(&vcpu->arch.xen.runstate2_cache, vcpu->kvm);
+ kvm_gpc_init(&vcpu->arch.xen.vcpu_info_cache, vcpu->kvm);
+ kvm_gpc_init(&vcpu->arch.xen.vcpu_time_info_cache, vcpu->kvm);
}

void kvm_xen_destroy_vcpu(struct kvm_vcpu *vcpu)
@@ -2151,7 +2147,7 @@ void kvm_xen_init_vm(struct kvm *kvm)
{
mutex_init(&kvm->arch.xen.xen_lock);
idr_init(&kvm->arch.xen.evtchn_ports);
- kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm, NULL, KVM_HOST_USES_PFN);
+ kvm_gpc_init(&kvm->arch.xen.shinfo_cache, kvm);
}

void kvm_xen_destroy_vm(struct kvm *kvm)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae83caa99974..b1dc2e5a64f3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1287,21 +1287,12 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);
*
* @gpc: struct gfn_to_pfn_cache object.
* @kvm: pointer to kvm instance.
- * @vcpu: vCPU to be used for marking pages dirty and to be woken on
- * invalidation.
- * @usage: indicates if the resulting host physical PFN is used while
- * the @vcpu is IN_GUEST_MODE (in which case invalidation of
- * the cache from MMU notifiers---but not for KVM memslot
- * changes!---will also force @vcpu to exit the guest and
- * refresh the cache); and/or if the PFN used directly
- * by KVM (and thus needs a kernel virtual mapping).
*
* This sets up a gfn_to_pfn_cache by initializing locks and assigning the
* immutable attributes. Note, the cache must be zero-allocated (or zeroed by
* the caller before init).
*/
-void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
- struct kvm_vcpu *vcpu, enum pfn_cache_usage usage);
+void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm);

/**
* kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 6f4737d5046a..dad9121b3359 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -48,12 +48,6 @@ typedef u64 hfn_t;

typedef hfn_t kvm_pfn_t;

-enum pfn_cache_usage {
- KVM_GUEST_USES_PFN = BIT(0),
- KVM_HOST_USES_PFN = BIT(1),
- KVM_GUEST_AND_HOST_USE_PFN = KVM_GUEST_USES_PFN | KVM_HOST_USES_PFN,
-};
-
struct gfn_to_hva_cache {
u64 generation;
gpa_t gpa;
@@ -68,13 +62,11 @@ struct gfn_to_pfn_cache {
unsigned long uhva;
struct kvm_memory_slot *memslot;
struct kvm *kvm;
- struct kvm_vcpu *vcpu;
struct list_head list;
rwlock_t lock;
struct mutex refresh_lock;
void *khva;
kvm_pfn_t pfn;
- enum pfn_cache_usage usage;
bool active;
bool valid;
};
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index f3571f44d9af..6f4b537eb25b 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -25,9 +25,7 @@
void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
unsigned long end, bool may_block)
{
- DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS);
struct gfn_to_pfn_cache *gpc;
- bool evict_vcpus = false;

spin_lock(&kvm->gpc_lock);
list_for_each_entry(gpc, &kvm->gpc_list, list) {
@@ -37,43 +35,10 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,
if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
gpc->uhva >= start && gpc->uhva < end) {
gpc->valid = false;
-
- /*
- * If a guest vCPU could be using the physical address,
- * it needs to be forced out of guest mode.
- */
- if (gpc->usage & KVM_GUEST_USES_PFN) {
- if (!evict_vcpus) {
- evict_vcpus = true;
- bitmap_zero(vcpu_bitmap, KVM_MAX_VCPUS);
- }
- __set_bit(gpc->vcpu->vcpu_idx, vcpu_bitmap);
- }
}
write_unlock_irq(&gpc->lock);
}
spin_unlock(&kvm->gpc_lock);
-
- if (evict_vcpus) {
- /*
- * KVM needs to ensure the vCPU is fully out of guest context
- * before allowing the invalidation to continue.
- */
- unsigned int req = KVM_REQ_OUTSIDE_GUEST_MODE;
- bool called;
-
- /*
- * If the OOM reaper is active, then all vCPUs should have
- * been stopped already, so perform the request without
- * KVM_REQUEST_WAIT and be sad if any needed to be IPI'd.
- */
- if (!may_block)
- req &= ~KVM_REQUEST_WAIT;
-
- called = kvm_make_vcpus_request_mask(kvm, req, vcpu_bitmap);
-
- WARN_ON_ONCE(called && !may_block);
- }
}

bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
@@ -206,16 +171,14 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
* pfn. Note, kmap() and memremap() can both sleep, so this
* too must be done outside of gpc->lock!
*/
- if (gpc->usage & KVM_HOST_USES_PFN) {
- if (new_pfn == gpc->pfn)
- new_khva = old_khva;
- else
- new_khva = gpc_map(new_pfn);
-
- if (!new_khva) {
- kvm_release_pfn_clean(new_pfn);
- goto out_error;
- }
+ if (new_pfn == gpc->pfn)
+ new_khva = old_khva;
+ else
+ new_khva = gpc_map(new_pfn);
+
+ if (!new_khva) {
+ kvm_release_pfn_clean(new_pfn);
+ goto out_error;
}

write_lock_irq(&gpc->lock);
@@ -346,18 +309,12 @@ int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
return __kvm_gpc_refresh(gpc, gpc->gpa, len);
}

-void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm,
- struct kvm_vcpu *vcpu, enum pfn_cache_usage usage)
+void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
{
- WARN_ON_ONCE(!usage || (usage & KVM_GUEST_AND_HOST_USE_PFN) != usage);
- WARN_ON_ONCE((usage & KVM_GUEST_USES_PFN) && !vcpu);
-
rwlock_init(&gpc->lock);
mutex_init(&gpc->refresh_lock);

gpc->kvm = kvm;
- gpc->vcpu = vcpu;
- gpc->usage = usage;
gpc->pfn = KVM_PFN_ERR_FAULT;
gpc->uhva = KVM_HVA_ERR_BAD;
}
--
2.39.2

2023-11-21 18:23:16

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 06/15] KVM: pfncache: stop open-coding offset_in_page()

From: Paul Durrant <[email protected]>

Some code in pfncache uses offset_in_page() but in other places it is open-
coded. Use offset_in_page() consistently everywhere.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v8:
- New in this version.
---
virt/kvm/pfncache.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 6f4b537eb25b..0eeb034d0674 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,7 +48,7 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
if (!gpc->active)
return false;

- if ((gpc->gpa & ~PAGE_MASK) + len > PAGE_SIZE)
+ if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
return false;

if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

gpc->valid = true;
gpc->pfn = new_pfn;
- gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+ gpc->khva = new_khva + offset_in_page(gpc->gpa);

/*
* Put the reference to the _new_ pfn. The pfn is now tracked by the
@@ -213,7 +213,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
unsigned long len)
{
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
- unsigned long page_offset = gpa & ~PAGE_MASK;
+ unsigned long page_offset = offset_in_page(gpa);
bool unmap_old = false;
unsigned long old_uhva;
kvm_pfn_t old_pfn;
--
2.39.2

2023-11-21 18:26:27

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 08/15] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA

From: Paul Durrant <[email protected]>

Some pfncache pages may actually be overlays on guest memory that have a
fixed HVA within the VMM. It's pointless to invalidate such cached
mappings if the overlay is moved so allow a cache to be activated directly
with the HVA to cater for such cases. A subsequent patch will make use
of this facility.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v8:
- Re-worked to avoid messing with struct gfn_to_pfn_cache.
---
include/linux/kvm_host.h | 19 +++++++++++++++++-
virt/kvm/pfncache.c | 43 ++++++++++++++++++++++++++++++----------
2 files changed, 51 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b1dc2e5a64f3..484c587e8290 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1312,6 +1312,22 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm);
*/
int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len);

+/**
+ * kvm_gpc_activate_hva - prepare a cached kernel mapping and HPA for a given HVA.
+ *
+ * @gpc: struct gfn_to_pfn_cache object.
+ * @hva: userspace virtual address to map.
+ * @len: sanity check; the range being access must fit a single page.
+ *
+ * @return: 0 for success.
+ * -EINVAL for a mapping which would cross a page boundary.
+ * -EFAULT for an untranslatable guest physical address.
+ *
+ * The semantics of this function are the same as those of kvm_gpc_activate(). It
+ * merely bypasses a layer of address translation.
+ */
+int kvm_gpc_activate_hva(struct gfn_to_pfn_cache *gpc, unsigned long hva, unsigned long len);
+
/**
* kvm_gpc_check - check validity of a gfn_to_pfn_cache.
*
@@ -1365,7 +1381,8 @@ void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc);
*/
static inline void kvm_gpc_mark_dirty(struct gfn_to_pfn_cache *gpc)
{
- mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
+ if (gpc->gpa != KVM_XEN_INVALID_GPA)
+ mark_page_dirty_in_slot(gpc->kvm, gpc->memslot, gpc->gpa >> PAGE_SHIFT);
}

void kvm_sigset_activate(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index c545f6246501..ed700afeec49 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -209,11 +209,11 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
return -EFAULT;
}

-static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
+static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, u64 addr, bool addr_is_gpa,
unsigned long len)
{
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
- unsigned long page_offset = offset_in_page(gpa);
+ unsigned long page_offset = offset_in_page(addr);
bool unmap_old = false;
kvm_pfn_t old_pfn;
bool hva_change = false;
@@ -244,12 +244,21 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
old_pfn = gpc->pfn;
old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);

- /* If the userspace HVA is invalid, refresh that first */
- if (gpc->gpa != gpa || gpc->generation != slots->generation ||
- kvm_is_error_hva(gpc->uhva)) {
- gfn_t gfn = gpa_to_gfn(gpa);
+ if (!addr_is_gpa) {
+ gpc->gpa = KVM_XEN_INVALID_GPA;
+ gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
+ addr = PAGE_ALIGN_DOWN(addr);
+
+ if (gpc->uhva != addr) {
+ gpc->uhva = addr;
+ hva_change = true;
+ }
+ } else if (gpc->gpa != addr ||
+ gpc->generation != slots->generation ||
+ kvm_is_error_hva(gpc->uhva)) {
+ gfn_t gfn = gpa_to_gfn(addr);

- gpc->gpa = gpa;
+ gpc->gpa = addr;
gpc->generation = slots->generation;
gpc->memslot = __gfn_to_memslot(slots, gfn);
gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
@@ -317,7 +326,10 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,

int kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, unsigned long len)
{
- return __kvm_gpc_refresh(gpc, gpc->gpa, len);
+ if (gpc->gpa != KVM_XEN_INVALID_GPA)
+ return __kvm_gpc_refresh(gpc, gpc->gpa, true, len);
+
+ return __kvm_gpc_refresh(gpc, gpc->uhva, false, len);
}

void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
@@ -330,7 +342,8 @@ void kvm_gpc_init(struct gfn_to_pfn_cache *gpc, struct kvm *kvm)
gpc->uhva = KVM_HVA_ERR_BAD;
}

-int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
+static int __kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, u64 addr, bool addr_is_gpa,
+ unsigned long len)
{
struct kvm *kvm = gpc->kvm;

@@ -351,7 +364,17 @@ int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
gpc->active = true;
write_unlock_irq(&gpc->lock);
}
- return __kvm_gpc_refresh(gpc, gpa, len);
+ return __kvm_gpc_refresh(gpc, addr, addr_is_gpa, len);
+}
+
+int kvm_gpc_activate(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long len)
+{
+ return __kvm_gpc_activate(gpc, gpa, true, len);
+}
+
+int kvm_gpc_activate_hva(struct gfn_to_pfn_cache *gpc, unsigned long hva, unsigned long len)
+{
+ return __kvm_gpc_activate(gpc, hva, false, len);
}

void kvm_gpc_deactivate(struct gfn_to_pfn_cache *gpc)
--
2.39.2

2023-11-21 18:31:14

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 11/15] KVM: selftests / xen: map shared_info using HVA rather than GFN

From: Paul Durrant <[email protected]>

Using the HVA of the shared_info page is more efficient, so if the
capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method
to do the mapping.

NOTE: Have the juggle_shinfo_state() thread map and unmap using both
GFN and HVA, to make sure the older mechanism is not broken.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v3:
- Re-work the juggle_shinfo_state() thread

v2:
- New in this version.
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 44 +++++++++++++++----
1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 9ec9ab60b63e..a61500ff0822 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -389,6 +389,7 @@ static int cmp_timespec(struct timespec *a, struct timespec *b)
return 0;
}

+static struct shared_info *shinfo;
static struct vcpu_info *vinfo;
static struct kvm_vcpu *vcpu;

@@ -404,20 +405,38 @@ static void *juggle_shinfo_state(void *arg)
{
struct kvm_vm *vm = (struct kvm_vm *)arg;

- struct kvm_xen_hvm_attr cache_activate = {
+ struct kvm_xen_hvm_attr cache_activate_gfn = {
.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
.u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE
};

- struct kvm_xen_hvm_attr cache_deactivate = {
+ struct kvm_xen_hvm_attr cache_deactivate_gfn = {
.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
.u.shared_info.gfn = KVM_XEN_INVALID_GFN
};

+ struct kvm_xen_hvm_attr cache_activate_hva = {
+ .type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA,
+ .u.shared_info.hva = (unsigned long)shinfo
+ };
+
+ struct kvm_xen_hvm_attr cache_deactivate_hva = {
+ .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+ .u.shared_info.hva = 0
+ };
+
+ int xen_caps = kvm_check_cap(KVM_CAP_XEN_HVM);
+
for (;;) {
- __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate);
- __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate);
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_gfn);
pthread_testcancel();
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_gfn);
+
+ if (xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) {
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_hva);
+ pthread_testcancel();
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_hva);
+ }
}

return NULL;
@@ -442,6 +461,7 @@ int main(int argc, char *argv[])
bool do_runstate_flag = !!(xen_caps & KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG);
bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);
+ bool has_shinfo_hva = !!(xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA);

clock_gettime(CLOCK_REALTIME, &min_ts);

@@ -452,7 +472,7 @@ int main(int argc, char *argv[])
SHINFO_REGION_GPA, SHINFO_REGION_SLOT, 3, 0);
virt_map(vm, SHINFO_REGION_GVA, SHINFO_REGION_GPA, 3);

- struct shared_info *shinfo = addr_gpa2hva(vm, SHINFO_VADDR);
+ shinfo = addr_gpa2hva(vm, SHINFO_VADDR);

int zero_fd = open("/dev/zero", O_RDONLY);
TEST_ASSERT(zero_fd != -1, "Failed to open /dev/zero");
@@ -488,10 +508,16 @@ int main(int argc, char *argv[])
"Failed to read back RUNSTATE_UPDATE_FLAG attr");
}

- struct kvm_xen_hvm_attr ha = {
- .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
- .u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE,
- };
+ struct kvm_xen_hvm_attr ha = {};
+
+ if (has_shinfo_hva) {
+ ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA;
+ ha.u.shared_info.hva = (unsigned long)shinfo;
+ } else {
+ ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO;
+ ha.u.shared_info.gfn = SHINFO_ADDR / PAGE_SIZE;
+ }
+
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha);

/*
--
2.39.2

2023-11-21 18:31:22

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 14/15] KVM: xen: split up kvm_xen_set_evtchn_fast()

From: Paul Durrant <[email protected]>

The implementation of kvm_xen_set_evtchn_fast() is a rather lengthy piece
of code that performs two operations: updating of the shared_info
evtchn_pending mask, and updating of the vcpu_info evtchn_pending_sel
mask. Introdude a separate function to perform each of those operations and
re-work kvm_xen_set_evtchn_fast() to use them.

No functional change intended.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v8:
- New in this version.
---
arch/x86/kvm/xen.c | 170 +++++++++++++++++++++++++--------------------
1 file changed, 96 insertions(+), 74 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 42a9f1ea25b3..eff405eead1c 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1623,60 +1623,28 @@ static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port)
}
}

-/*
- * The return value from this function is propagated to kvm_set_irq() API,
- * so it returns:
- * < 0 Interrupt was ignored (masked or not delivered for other reasons)
- * = 0 Interrupt was coalesced (previous irq is still pending)
- * > 0 Number of CPUs interrupt was delivered to
- *
- * It is also called directly from kvm_arch_set_irq_inatomic(), where the
- * only check on its return value is a comparison with -EWOULDBLOCK'.
- */
-int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
+static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port, u32 *port_word_bit)
{
+ struct kvm *kvm = vcpu->kvm;
struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
- struct kvm_vcpu *vcpu;
unsigned long *pending_bits, *mask_bits;
unsigned long flags;
- int port_word_bit;
- bool kick_vcpu = false;
- int vcpu_idx, idx, rc;
-
- vcpu_idx = READ_ONCE(xe->vcpu_idx);
- if (vcpu_idx >= 0)
- vcpu = kvm_get_vcpu(kvm, vcpu_idx);
- else {
- vcpu = kvm_get_vcpu_by_id(kvm, xe->vcpu_id);
- if (!vcpu)
- return -EINVAL;
- WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx);
- }
-
- if (!vcpu->arch.xen.vcpu_info_cache.active)
- return -EINVAL;
-
- if (xe->port >= max_evtchn_port(kvm))
- return -EINVAL;
-
- rc = -EWOULDBLOCK;
-
- idx = srcu_read_lock(&kvm->srcu);
+ int rc = -EWOULDBLOCK;

read_lock_irqsave(&gpc->lock, flags);
if (!kvm_gpc_check(gpc, PAGE_SIZE))
- goto out_rcu;
+ goto out;

if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
struct shared_info *shinfo = gpc->khva;
pending_bits = (unsigned long *)&shinfo->evtchn_pending;
mask_bits = (unsigned long *)&shinfo->evtchn_mask;
- port_word_bit = xe->port / 64;
+ *port_word_bit = port / 64;
} else {
struct compat_shared_info *shinfo = gpc->khva;
pending_bits = (unsigned long *)&shinfo->evtchn_pending;
mask_bits = (unsigned long *)&shinfo->evtchn_mask;
- port_word_bit = xe->port / 32;
+ *port_word_bit = port / 32;
}

/*
@@ -1686,52 +1654,106 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
* already set, then we kick the vCPU in question to write to the
* *real* evtchn_pending_sel in its own guest vcpu_info struct.
*/
- if (test_and_set_bit(xe->port, pending_bits)) {
+ if (test_and_set_bit(port, pending_bits)) {
rc = 0; /* It was already raised */
- } else if (test_bit(xe->port, mask_bits)) {
- rc = -ENOTCONN; /* Masked */
- kvm_xen_check_poller(vcpu, xe->port);
+ } else if (test_bit(port, mask_bits)) {
+ rc = -ENOTCONN; /* It is masked */
+ kvm_xen_check_poller(vcpu, port);
} else {
- rc = 1; /* Delivered to the bitmap in shared_info. */
- /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */
- read_unlock_irqrestore(&gpc->lock, flags);
- gpc = &vcpu->arch.xen.vcpu_info_cache;
+ rc = 1; /* It is newly raised */
+ }

- read_lock_irqsave(&gpc->lock, flags);
- if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) {
- /*
- * Could not access the vcpu_info. Set the bit in-kernel
- * and prod the vCPU to deliver it for itself.
- */
- if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
- kick_vcpu = true;
- goto out_rcu;
- }
+ out:
+ read_unlock_irqrestore(&gpc->lock, flags);
+ return rc;
+}

- if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
- struct vcpu_info *vcpu_info = gpc->khva;
- if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
- WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
- kick_vcpu = true;
- }
- } else {
- struct compat_vcpu_info *vcpu_info = gpc->khva;
- if (!test_and_set_bit(port_word_bit,
- (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
- WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
- kick_vcpu = true;
- }
+static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port_word_bit)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct gfn_to_pfn_cache *gpc = &vcpu->arch.xen.vcpu_info_cache;
+ unsigned long flags;
+ bool kick_vcpu = false;
+
+ read_lock_irqsave(&gpc->lock, flags);
+ if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) {
+ /*
+ * Could not access the vcpu_info. Set the bit in-kernel
+ * and prod the vCPU to deliver it for itself.
+ */
+ if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
+ kick_vcpu = true;
+ goto out;
+ }
+
+ if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
+ struct vcpu_info *vcpu_info = gpc->khva;
+
+ if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
+ WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+ kick_vcpu = true;
}
+ } else {
+ struct compat_vcpu_info *vcpu_info = gpc->khva;

- /* For the per-vCPU lapic vector, deliver it as MSI. */
- if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
- kvm_xen_inject_vcpu_vector(vcpu);
- kick_vcpu = false;
+ if (!test_and_set_bit(port_word_bit,
+ (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
+ WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+ kick_vcpu = true;
}
}

- out_rcu:
+ /* For the per-vCPU lapic vector, deliver it as MSI. */
+ if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
+ kvm_xen_inject_vcpu_vector(vcpu);
+ kick_vcpu = false;
+ }
+
+ out:
read_unlock_irqrestore(&gpc->lock, flags);
+ return kick_vcpu;
+}
+
+/*
+ * The return value from this function is propagated to kvm_set_irq() API,
+ * so it returns:
+ * < 0 Interrupt was ignored (masked or not delivered for other reasons)
+ * = 0 Interrupt was coalesced (previous irq is still pending)
+ * > 0 Number of CPUs interrupt was delivered to
+ *
+ * It is also called directly from kvm_arch_set_irq_inatomic(), where the
+ * only check on its return value is a comparison with -EWOULDBLOCK
+ * (which may be returned by set_shinfo_evtchn_pending()).
+ */
+int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu;
+ u32 port_word_bit;
+ bool kick_vcpu = false;
+ int vcpu_idx, idx, rc;
+
+ vcpu_idx = READ_ONCE(xe->vcpu_idx);
+ if (vcpu_idx >= 0)
+ vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+ else {
+ vcpu = kvm_get_vcpu_by_id(kvm, xe->vcpu_id);
+ if (!vcpu)
+ return -EINVAL;
+ WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx);
+ }
+
+ if (!vcpu->arch.xen.vcpu_info_cache.active)
+ return -EINVAL;
+
+ if (xe->port >= max_evtchn_port(kvm))
+ return -EINVAL;
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ rc = set_shinfo_evtchn_pending(vcpu, xe->port, &port_word_bit);
+ if (rc == 1) /* Delivered to the bitmap in shared_info. */
+ kick_vcpu = set_vcpu_info_evtchn_pending(vcpu, port_word_bit);
+
srcu_read_unlock(&kvm->srcu, idx);

if (kick_vcpu) {
--
2.39.2

2023-11-21 18:32:08

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 15/15] KVM: xen: allow vcpu_info content to be 'safely' copied

From: Paul Durrant <[email protected]>

If the guest sets an explicit vcpu_info GPA then, for any of the first 32
vCPUs, the content of the default vcpu_info in the shared_info page must be
copied into the new location. Because this copy may race with event
delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
needs to be a way to defer that until the copy is complete.
Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
that is used in atomic context if the vcpu_info PFN cache has been
invalidated so that the update of vcpu_info can be deferred until the
cache can be refreshed (on vCPU thread's the way back into guest context).

Also use this shadow if the vcpu_info cache has been *deactivated*, so that
the VMM can safely copy the vcpu_info content and then re-activate the
cache with the new GPA. To do this, stop considering an inactive vcpu_info
cache as a hard error in kvm_xen_set_evtchn_fast().

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v8:
- Update commit comment.

v6:
- New in this version.
---
arch/x86/kvm/xen.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index eff405eead1c..cfd5051e0800 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1742,9 +1742,6 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx);
}

- if (!vcpu->arch.xen.vcpu_info_cache.active)
- return -EINVAL;
-
if (xe->port >= max_evtchn_port(kvm))
return -EINVAL;

--
2.39.2

2023-11-21 18:33:48

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 12/15] KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA

From: Paul Durrant <[email protected]>

If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present
then re-map vcpu_info using the HVA part way through the tests to make sure
then there is no functional change.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v5:
- New in this version.
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index a61500ff0822..d2ea0435f4f7 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -62,6 +62,7 @@ enum {
TEST_POLL_TIMEOUT,
TEST_POLL_MASKED,
TEST_POLL_WAKE,
+ SET_VCPU_INFO,
TEST_TIMER_PAST,
TEST_LOCKING_SEND_RACE,
TEST_LOCKING_POLL_RACE,
@@ -321,6 +322,10 @@ static void guest_code(void)

GUEST_SYNC(TEST_POLL_WAKE);

+ /* Set the vcpu_info to point at exactly the place it already is to
+ * make sure the attribute is functional. */
+ GUEST_SYNC(SET_VCPU_INFO);
+
/* A timer wake an *unmasked* port which should wake us with an
* actual interrupt, while we're polling on a different port. */
ports[0]++;
@@ -888,6 +893,16 @@ int main(int argc, char *argv[])
alarm(1);
break;

+ case SET_VCPU_INFO:
+ if (has_shinfo_hva) {
+ struct kvm_xen_vcpu_attr vih = {
+ .type = KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA,
+ .u.hva = (unsigned long)vinfo
+ };
+ vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &vih);
+ }
+ break;
+
case TEST_TIMER_PAST:
TEST_ASSERT(!evtchn_irq_expected,
"Expected event channel IRQ but it didn't happen");
--
2.39.2

2023-11-21 18:33:49

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 10/15] KVM: xen: allow vcpu_info to be mapped by fixed HVA

From: Paul Durrant <[email protected]>

If the guest does not explicitly set the GPA of vcpu_info structure in
memory then, for guests with 32 vCPUs or fewer, the vcpu_info embedded
in the shared_info page may be used. As described in a previous commit,
the shared_info page is an overlay at a fixed HVA within the VMM, so in
this case it also more optimal to activate the vcpu_info cache with a
fixed HVA to avoid unnecessary invalidation if the guest memory layout
is modified.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v8:
- Re-base.

v5:
- New in this version.
---
Documentation/virt/kvm/api.rst | 26 +++++++++++++++++++++-----
arch/x86/kvm/xen.c | 34 ++++++++++++++++++++++++++++------
include/uapi/linux/kvm.h | 3 +++
3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 25ac75b6d8c5..2c9d6bbba051 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5516,11 +5516,12 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
Sets the guest physical frame number at which the Xen shared_info
page resides. Note that although Xen places vcpu_info for the first
32 vCPUs in the shared_info page, KVM does not automatically do so
- and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
- explicitly even when the vcpu_info for a given vCPU resides at the
- "default" location in the shared_info page. This is because KVM may
- not be aware of the Xen CPU id which is used as the index into the
- vcpu_info[] array, so may know the correct default location.
+ and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or
+ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA be used explicitly even when
+ the vcpu_info for a given vCPU resides at the "default" location
+ in the shared_info page. This is because KVM may not be aware of
+ the Xen CPU id which is used as the index into the vcpu_info[]
+ array, so may know the correct default location.

Note that the shared_info page may be constantly written to by KVM;
it contains the event channel bitmap used to deliver interrupts to
@@ -5642,6 +5643,21 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
the vcpu_info.

+KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA
+ If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+ Xen capabilities, then this attribute may be used to set the
+ userspace address of the vcpu_info for a given vCPU. It should
+ only be used when the vcpu_info resides at the "default" location
+ in the shared_info page. In this case it is safe to assume the
+ userspace address will not change, because the shared_info page is
+ an overlay on guest memory and remains at a fixed host address
+ regardless of where it is mapped in guest physical address space
+ and hence unnecessary invalidation of an internal cache may be
+ avoided if the guest memory layout is modified.
+ If the vcpu_info does not reside at the "default" location then
+ it is not guaranteed to remain at the same host address and
+ hence the aforementioned cache invalidation is required.
+
KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Sets the guest physical address of an additional pvclock structure
for a given vCPU. This is typically used for guest vsyscall support.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 618ae4c0e7f8..42a9f1ea25b3 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -759,20 +759,33 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)

switch (data->type) {
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO:
+ case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
/* No compat necessary here. */
BUILD_BUG_ON(sizeof(struct vcpu_info) !=
sizeof(struct compat_vcpu_info));
BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
offsetof(struct compat_vcpu_info, time));

- if (data->u.gpa == KVM_XEN_INVALID_GPA) {
- kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
- r = 0;
- break;
+ if (data->type == KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO) {
+ if (data->u.gpa == KVM_XEN_INVALID_GPA) {
+ kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+ r = 0;
+ break;
+ }
+
+ r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
+ data->u.gpa, sizeof(struct vcpu_info));
+ } else {
+ if (data->u.hva == 0) {
+ kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+ r = 0;
+ break;
+ }
+
+ r = kvm_gpc_activate_hva(&vcpu->arch.xen.vcpu_info_cache,
+ data->u.hva, sizeof(struct vcpu_info));
}

- r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
- data->u.gpa, sizeof(struct vcpu_info));
if (!r)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

@@ -1001,6 +1014,15 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
r = 0;
break;

+ case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
+ if (vcpu->arch.xen.vcpu_info_cache.active &&
+ vcpu->arch.xen.vcpu_info_cache.gpa == KVM_XEN_INVALID_GPA)
+ data->u.hva = vcpu->arch.xen.vcpu_info_cache.uhva;
+ else
+ data->u.hva = 0;
+ r = 0;
+ break;
+
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO:
if (vcpu->arch.xen.vcpu_time_info_cache.active)
data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 769361b91f95..bb9d40b5e613 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1869,6 +1869,7 @@ struct kvm_xen_vcpu_attr {
union {
__u64 gpa;
#define KVM_XEN_INVALID_GPA ((__u64)-1)
+ __u64 hva;
__u64 pad[8];
struct {
__u64 state;
@@ -1899,6 +1900,8 @@ struct kvm_xen_vcpu_attr {
#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID 0x6
#define KVM_XEN_VCPU_ATTR_TYPE_TIMER 0x7
#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR 0x8
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA 0x9

/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
--
2.39.2

2023-11-21 18:40:22

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 13/15] KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability

From: Paul Durrant <[email protected]>

Now that all relevant kernel changes and selftests are in place, enable the
new capability.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v2:
- New in this version.
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4afe9e447ba4..270018cf9ce0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4638,7 +4638,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL |
KVM_XEN_HVM_CONFIG_SHARED_INFO |
KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL |
- KVM_XEN_HVM_CONFIG_EVTCHN_SEND;
+ KVM_XEN_HVM_CONFIG_EVTCHN_SEND |
+ KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA;
if (sched_info_on())
r |= KVM_XEN_HVM_CONFIG_RUNSTATE |
KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG;
--
2.39.2

2023-11-21 18:48:30

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

From: Paul Durrant <[email protected]>

Currently the pfncache page offset is sometimes determined using the gpa
and sometimes the khva, whilst the uhva is always page-aligned. After a
subsequent patch is applied the gpa will not always be valid so adjust
the code to include the page offset in the uhva and use it consistently
as the source of truth.

Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
for clarity.

Signed-off-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v8:
- New in this version.
---
virt/kvm/pfncache.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 0eeb034d0674..c545f6246501 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,10 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
if (!gpc->active)
return false;

- if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
+ if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
return false;

- if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
+ if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
return false;

if (!gpc->valid)
@@ -119,7 +119,7 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
{
/* Note, the new page offset may be different than the old! */
- void *old_khva = gpc->khva - offset_in_page(gpc->khva);
+ void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
void *new_khva = NULL;
unsigned long mmu_seq;
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

gpc->valid = true;
gpc->pfn = new_pfn;
- gpc->khva = new_khva + offset_in_page(gpc->gpa);
+ gpc->khva = new_khva + offset_in_page(gpc->uhva);

/*
* Put the reference to the _new_ pfn. The pfn is now tracked by the
@@ -215,8 +215,8 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
unsigned long page_offset = offset_in_page(gpa);
bool unmap_old = false;
- unsigned long old_uhva;
kvm_pfn_t old_pfn;
+ bool hva_change = false;
void *old_khva;
int ret;

@@ -242,8 +242,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
}

old_pfn = gpc->pfn;
- old_khva = gpc->khva - offset_in_page(gpc->khva);
- old_uhva = gpc->uhva;
+ old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);

/* If the userspace HVA is invalid, refresh that first */
if (gpc->gpa != gpa || gpc->generation != slots->generation ||
@@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
ret = -EFAULT;
goto out;
}
+
+ hva_change = true;
+ } else {
+ /*
+ * No need to do any re-mapping if the only thing that has
+ * changed is the page offset. Just page align it to allow the
+ * new offset to be added in.
+ */
+ gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
}

+ /* Note: the offset must be correct before calling hva_to_pfn_retry() */
+ gpc->uhva += page_offset;
+
/*
* If the userspace HVA changed or the PFN was already invalid,
* drop the lock and do the HVA to PFN lookup again.
*/
- if (!gpc->valid || old_uhva != gpc->uhva) {
+ if (!gpc->valid || hva_change) {
ret = hva_to_pfn_retry(gpc);
} else {
/*
--
2.39.2

2023-11-21 21:49:54

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 02/15] KVM: pfncache: remove unnecessary exports

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> There is need for the existing kvm_gpc_XXX() functions to be exported. Clean
> up now before additional functions are added in subsequent patches.
>

I think you mean "no need".

> Signed-off-by: Paul Durrant <[email protected]>

Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 21:50:26

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 03/15] KVM: xen: mark guest pages dirty with the pfncache lock held

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> Sampling gpa and memslot from an unlocked pfncache may yield inconsistent
> values so, since there is no problem with calling mark_page_dirty_in_slot()
> with the pfncache lock held, relocate the calls in
> kvm_xen_update_runstate_guest() and kvm_xen_inject_pending_events()
> accordingly.
>
> Signed-off-by: Paul Durrant <[email protected]>

Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:24:48

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 05/15] KVM: pfncache: remove KVM_GUEST_USES_PFN usage

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
> callers of kvm_gpc_init(), which also makes the 'vcpu' argument redundant.
> Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
> check in hva_to_pfn_retry() and hence the 'usage' argument to
> kvm_gpc_init() are also redundant.
> Remove the pfn_cache_usage enumeration and remove the redundant arguments,
> fields of struct gfn_to_hva_cache, and all the related code.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Signed-off-by: Paul Durrant <[email protected]>

I think it's https://lore.kernel.org/all/[email protected]/
which is the key reference. I'm not sure I'm 100% on board, but I never
got round to replying to Sean's email because it was one of those "put
up or shut up situations" and I didn't have the bandwidth to actually
write the code to prove my point.

I think it *is* important to support non-pinned pages. There's a reason
we even made the vapic page migratable. We want to support memory
hotplug, we want to cope with machine checks telling us to move certain
pages (which I suppose is memory hotplug). See commit 38b9917350cb
("kvm: vmx: Implement set_apic_access_page_addr") for example.

I agree that in the first round of the nVMX code there were bugs. And
sure, of *course* it isn't sufficient to wire up the invalidation
without either a KVM_REQ_SOMETHIMG to put it back, or just a *check* on
the corresponding gpc on the way back into the guest. We'd have worked
that out.

And yes, the gpc has had bugs as we implemented it, but the point was
that we got to something which *is* working, and forms a usable
building block.

So I'm not really sold on the idea of ditching KVM_GUEST_USES_PFN. I
think we could get it working, and I think it's worth it. But my
opinion is worth very little unless I express it in 'diff -up' form
instead of prose, and reverting this particular patch is the least of
my barriers to doing so, so reluctantly...


Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:28:06

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 06/15] KVM: pfncache: stop open-coding offset_in_page()

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> Some code in pfncache uses offset_in_page() but in other places it is open-
> coded. Use offset_in_page() consistently everywhere.
>
> Signed-off-by: Paul Durrant <[email protected]>

Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:36:22

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> @@ -242,8 +242,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>         }
>  
>         old_pfn = gpc->pfn;
> -       old_khva = gpc->khva - offset_in_page(gpc->khva);
> -       old_uhva = gpc->uhva;
> +       old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
>  
>         /* If the userspace HVA is invalid, refresh that first */
>         if (gpc->gpa != gpa || gpc->generation != slots->generation ||
> @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>                         ret = -EFAULT;
>                         goto out;
>                 }


There's a subtle behaviour change here, isn't there? I'd *really* like
you do say 'No functional change intended' where that is true, and then
the absence of that sentence in this one would be meaningful.

You are now calling hva_to_pfn_retry() even when the uhva page hasn't
changed. Which is harmless and probably not important, but IIUC fixable
by the addition of:

+ if (gpc->uhva != PAGE_ALIGN_DOWN(old_uhva))
> +               hva_change = true;
> +       } else {
> +               /*
> +                * No need to do any re-mapping if the only thing that has
> +                * changed is the page offset. Just page align it to allow the
> +                * new offset to be added in.
> +                */
> +               gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
>         }
>  
> +       /* Note: the offset must be correct before calling hva_to_pfn_retry() */
> +       gpc->uhva += page_offset;
> +
>         /*
>          * If the userspace HVA changed or the PFN was already invalid,
>          * drop the lock and do the HVA to PFN lookup again.
>          */
> -       if (!gpc->valid || old_uhva != gpc->uhva) {
> +       if (!gpc->valid || hva_change) {
>                 ret = hva_to_pfn_retry(gpc);
>         } else {
>                 /*
> --

But I don't really think it's that important if you can come up with a
coherent justification for the change and note it in the commit
message. So either way:

Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:48:26

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 08/15] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
>
> -static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> +static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, u64 addr, bool addr_is_gpa,
>                              unsigned long len)
>  {
>         struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
> -       unsigned long page_offset = offset_in_page(gpa);
> +       unsigned long page_offset = offset_in_page(addr);
>         bool unmap_old = false;
>         kvm_pfn_t old_pfn;
>         bool hva_change = false;
> @@ -244,12 +244,21 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>         old_pfn = gpc->pfn;
>         old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
>  
> -       /* If the userspace HVA is invalid, refresh that first */
> -       if (gpc->gpa != gpa || gpc->generation != slots->generation ||
> -           kvm_is_error_hva(gpc->uhva)) {
> -               gfn_t gfn = gpa_to_gfn(gpa);
> +       if (!addr_is_gpa) {
> +               gpc->gpa = KVM_XEN_INVALID_GPA;
> +               gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
> +               addr = PAGE_ALIGN_DOWN(addr);
> +
> +               if (gpc->uhva != addr) {
> +                       gpc->uhva = addr;
> +                       hva_change = true;
> +               }
> +       } else if (gpc->gpa != addr ||
> +                  gpc->generation != slots->generation ||
> +                  kvm_is_error_hva(gpc->uhva)) {
> +               gfn_t gfn = gpa_to_gfn(addr);
>  
> -               gpc->gpa = gpa;
> +               gpc->gpa = addr;
>                 gpc->generation = slots->generation;
>                 gpc->memslot = __gfn_to_memslot(slots, gfn);
>                 gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);

Hrm, now that a previous patch means we're preserving the low bits of
gpc->uhva surely you don't *need* to mess with the gpc struct?

If gpc->gpa == KVM_XEN_INVALID_GPA (but gpc->uhva != KVM_ERR_ERR_BAD &&
gpc->active) surely that's enough to signal that gpc->uhva is canonical
and doesn't need to be looked up from the GPA?

And I think that means the 'bool addr_is_gpa' argument can go away from
__kvm_gpc_refresh(); you can set it up in {__,}kvm_gpc_activate*()
instead?


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:49:55

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 14/15] KVM: xen: split up kvm_xen_set_evtchn_fast()

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> The implementation of kvm_xen_set_evtchn_fast() is a rather lengthy piece
> of code that performs two operations: updating of the shared_info
> evtchn_pending mask, and updating of the vcpu_info evtchn_pending_sel
> mask. Introdude a separate function to perform each of those operations and

I like the intro dude, but I think you meant introduce.

> re-work kvm_xen_set_evtchn_fast() to use them.
>
> No functional change intended.
>
> Signed-off-by: Paul Durrant <[email protected]>

Reviewed-by: David Woodhouse <[email protected]>


Attachments:
smime.p7s (5.83 kB)

2023-11-21 22:54:04

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 15/15] KVM: xen: allow vcpu_info content to be 'safely' copied

On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> If the guest sets an explicit vcpu_info GPA then, for any of the first 32
> vCPUs, the content of the default vcpu_info in the shared_info page must be
> copied into the new location. Because this copy may race with event
> delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
> needs to be a way to defer that until the copy is complete.
> Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
> that is used in atomic context if the vcpu_info PFN cache has been
> invalidated so that the update of vcpu_info can be deferred until the
> cache can be refreshed (on vCPU thread's the way back into guest context).
>
> Also use this shadow if the vcpu_info cache has been *deactivated*, so that
> the VMM can safely copy the vcpu_info content and then re-activate the
> cache with the new GPA. To do this, stop considering an inactive vcpu_info
> cache as a hard error in kvm_xen_set_evtchn_fast().
>
> Signed-off-by: Paul Durrant <[email protected]>
> Reviewed-by: David Woodhouse <[email protected]>

Wait, didn't we realise that this leaves the bits set in the shadow
evtchn_pending_sel that get lost on migration?

The point in your previous patch which split out a shiny new
set_shinfo_evtchn_pending() function was that you could then *call*
that function to ensure that the corresponding index bit was set on the
destination host after migration, if the bit in the shinfo is.

So we'd do that from kvm_xen_setup_evtchn(), kvm_xen_eventfd_assign(),
and when setting KVM_XEN_VCPU_ATTR_TYPE_TIMER.

if (bit_is_set_in_shinfo)
set_shinfo_evtchn_pending()



Attachments:
smime.p7s (5.83 kB)

2023-11-22 08:44:23

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v8 02/15] KVM: pfncache: remove unnecessary exports

On 21/11/2023 21:49, David Woodhouse wrote:
> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
>> From: Paul Durrant <[email protected]>
>>
>> There is need for the existing kvm_gpc_XXX() functions to be exported. Clean
>> up now before additional functions are added in subsequent patches.
>>
>
> I think you mean "no need".
>

Oh yes. Will fix.

>> Signed-off-by: Paul Durrant <[email protected]>
>
> Reviewed-by: David Woodhouse <[email protected]>
>

Thanks,

Paul

2023-11-22 08:58:23

by Xu Yilun

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On Tue, Nov 21, 2023 at 06:02:15PM +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> Currently the pfncache page offset is sometimes determined using the gpa
> and sometimes the khva, whilst the uhva is always page-aligned. After a
> subsequent patch is applied the gpa will not always be valid so adjust
> the code to include the page offset in the uhva and use it consistently
> as the source of truth.
>
> Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
> for clarity.
>
> Signed-off-by: Paul Durrant <[email protected]>
> ---
> Cc: Sean Christopherson <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: David Woodhouse <[email protected]>
>
> v8:
> - New in this version.
> ---
> virt/kvm/pfncache.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
> index 0eeb034d0674..c545f6246501 100644
> --- a/virt/kvm/pfncache.c
> +++ b/virt/kvm/pfncache.c
> @@ -48,10 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
> if (!gpc->active)
> return false;
>
> - if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
> + if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
> return false;
>
> - if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
> + if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
> return false;
>
> if (!gpc->valid)
> @@ -119,7 +119,7 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
> static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
> {
> /* Note, the new page offset may be different than the old! */
> - void *old_khva = gpc->khva - offset_in_page(gpc->khva);
> + void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
> kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
> void *new_khva = NULL;
> unsigned long mmu_seq;
> @@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
>
> gpc->valid = true;
> gpc->pfn = new_pfn;
> - gpc->khva = new_khva + offset_in_page(gpc->gpa);
> + gpc->khva = new_khva + offset_in_page(gpc->uhva);
>
> /*
> * Put the reference to the _new_ pfn. The pfn is now tracked by the
> @@ -215,8 +215,8 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
> unsigned long page_offset = offset_in_page(gpa);
> bool unmap_old = false;
> - unsigned long old_uhva;
> kvm_pfn_t old_pfn;
> + bool hva_change = false;
> void *old_khva;
> int ret;
>
> @@ -242,8 +242,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> }
>
> old_pfn = gpc->pfn;
> - old_khva = gpc->khva - offset_in_page(gpc->khva);
> - old_uhva = gpc->uhva;
> + old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
>
> /* If the userspace HVA is invalid, refresh that first */
> if (gpc->gpa != gpa || gpc->generation != slots->generation ||
> @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> ret = -EFAULT;
> goto out;
> }
> +
> + hva_change = true;
> + } else {
> + /*
> + * No need to do any re-mapping if the only thing that has
> + * changed is the page offset. Just page align it to allow the
> + * new offset to be added in.

I don't understand how the uhva('s offset) could be changed when both gpa and
slot are not changed. Maybe I have no knowledge of xen, but in later
patch you said your uhva would never change...

Thanks,
Yilun

> + */
> + gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
> }
>
> + /* Note: the offset must be correct before calling hva_to_pfn_retry() */
> + gpc->uhva += page_offset;
> +
> /*
> * If the userspace HVA changed or the PFN was already invalid,
> * drop the lock and do the HVA to PFN lookup again.
> */
> - if (!gpc->valid || old_uhva != gpc->uhva) {
> + if (!gpc->valid || hva_change) {
> ret = hva_to_pfn_retry(gpc);
> } else {
> /*
> --
> 2.39.2
>
>

2023-11-22 09:13:09

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On Wed, 2023-11-22 at 16:54 +0800, Xu Yilun wrote:
>
> > @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> >                         ret = -EFAULT;
> >                         goto out;
> >                 }
> > +
> > +               hva_change = true;
> > +       } else {
> > +               /*
> > +                * No need to do any re-mapping if the only thing that has
> > +                * changed is the page offset. Just page align it to allow the
> > +                * new offset to be added in.
>
> I don't understand how the uhva('s offset) could be changed when both gpa and
> slot are not changed. Maybe I have no knowledge of xen, but in later
> patch you said your uhva would never change...

It doesn't change on a normal refresh with kvm_gpc_refresh(), which is
just for revalidation after memslot changes or MMU invalidation.

But it can change if the gpc is being reinitialized with a new address
(perhaps because the guest has made another hypercall to set the
address, etc.)

That new address could happen to be in the *same* page as the previous
one. In fact the xen_shinfo_test explicitly tests that case, IIRC.

And kvm_gpc_activate() also happens to use __kvm_gpc_refresh()
internally.


Attachments:
smime.p7s (5.83 kB)

2023-11-22 09:31:53

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On 21/11/2023 22:35, David Woodhouse wrote:
> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
>> @@ -242,8 +242,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>>         }
>>
>>         old_pfn = gpc->pfn;
>> -       old_khva = gpc->khva - offset_in_page(gpc->khva);
>> -       old_uhva = gpc->uhva;
>> +       old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
>>
>>         /* If the userspace HVA is invalid, refresh that first */
>>         if (gpc->gpa != gpa || gpc->generation != slots->generation ||
>> @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>>                         ret = -EFAULT;
>>                         goto out;
>>                 }
>
>
> There's a subtle behaviour change here, isn't there? I'd *really* like
> you do say 'No functional change intended' where that is true, and then
> the absence of that sentence in this one would be meaningful.
>
> You are now calling hva_to_pfn_retry() even when the uhva page hasn't
> changed. Which is harmless and probably not important, but IIUC fixable
> by the addition of:
>
> + if (gpc->uhva != PAGE_ALIGN_DOWN(old_uhva))

True; I can keep that optimization and then I will indeed add 'no
functional change'... Didn't seem worth it at the time, but no harm.

>> +               hva_change = true;
>> +       } else {
>> +               /*
>> +                * No need to do any re-mapping if the only thing that has
>> +                * changed is the page offset. Just page align it to allow the
>> +                * new offset to be added in.
>> +                */
>> +               gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
>>         }
>>
>> +       /* Note: the offset must be correct before calling hva_to_pfn_retry() */
>> +       gpc->uhva += page_offset;
>> +
>>         /*
>>          * If the userspace HVA changed or the PFN was already invalid,
>>          * drop the lock and do the HVA to PFN lookup again.
>>          */
>> -       if (!gpc->valid || old_uhva != gpc->uhva) {
>> +       if (!gpc->valid || hva_change) {
>>                 ret = hva_to_pfn_retry(gpc);
>>         } else {
>>                 /*
>> --
>
> But I don't really think it's that important if you can come up with a
> coherent justification for the change and note it in the commit
> message. So either way:
>
> Reviewed-by: David Woodhouse <[email protected]>

Thanks,

Paul

2023-11-22 10:09:03

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v8 08/15] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA

On 21/11/2023 22:47, David Woodhouse wrote:
> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
>>
>> -static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>> +static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, u64 addr, bool addr_is_gpa,
>>                              unsigned long len)
>>  {
>>         struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
>> -       unsigned long page_offset = offset_in_page(gpa);
>> +       unsigned long page_offset = offset_in_page(addr);
>>         bool unmap_old = false;
>>         kvm_pfn_t old_pfn;
>>         bool hva_change = false;
>> @@ -244,12 +244,21 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>>         old_pfn = gpc->pfn;
>>         old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
>>
>> -       /* If the userspace HVA is invalid, refresh that first */
>> -       if (gpc->gpa != gpa || gpc->generation != slots->generation ||
>> -           kvm_is_error_hva(gpc->uhva)) {
>> -               gfn_t gfn = gpa_to_gfn(gpa);
>> +       if (!addr_is_gpa) {
>> +               gpc->gpa = KVM_XEN_INVALID_GPA;
>> +               gpc->uhva = PAGE_ALIGN_DOWN(gpc->uhva);
>> +               addr = PAGE_ALIGN_DOWN(addr);
>> +
>> +               if (gpc->uhva != addr) {
>> +                       gpc->uhva = addr;
>> +                       hva_change = true;
>> +               }
>> +       } else if (gpc->gpa != addr ||
>> +                  gpc->generation != slots->generation ||
>> +                  kvm_is_error_hva(gpc->uhva)) {
>> +               gfn_t gfn = gpa_to_gfn(addr);
>>
>> -               gpc->gpa = gpa;
>> +               gpc->gpa = addr;
>>                 gpc->generation = slots->generation;
>>                 gpc->memslot = __gfn_to_memslot(slots, gfn);
>>                 gpc->uhva = gfn_to_hva_memslot(gpc->memslot, gfn);
>
> Hrm, now that a previous patch means we're preserving the low bits of
> gpc->uhva surely you don't *need* to mess with the gpc struct?
>

I'm not messing with it, am I?

> If gpc->gpa == KVM_XEN_INVALID_GPA (but gpc->uhva != KVM_ERR_ERR_BAD &&
> gpc->active) surely that's enough to signal that gpc->uhva is canonical
> and doesn't need to be looked up from the GPA?
>
> And I think that means the 'bool addr_is_gpa' argument can go away from
> __kvm_gpc_refresh(); you can set it up in {__,}kvm_gpc_activate*()
> instead?

Alas not... __kvm_gpc_refresh() still needs to know *something* has
changed, otherwise the khva will be stale.

Paul


2023-11-22 10:39:55

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 15/15] KVM: xen: allow vcpu_info content to be 'safely' copied

On Tue, 2023-11-21 at 22:53 +0000, David Woodhouse wrote:
> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> > From: Paul Durrant <[email protected]>
> >
> > If the guest sets an explicit vcpu_info GPA then, for any of the first 32
> > vCPUs, the content of the default vcpu_info in the shared_info page must be
> > copied into the new location. Because this copy may race with event
> > delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
> > needs to be a way to defer that until the copy is complete.
> > Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
> > that is used in atomic context if the vcpu_info PFN cache has been
> > invalidated so that the update of vcpu_info can be deferred until the
> > cache can be refreshed (on vCPU thread's the way back into guest context).
> >
> > Also use this shadow if the vcpu_info cache has been *deactivated*, so that
> > the VMM can safely copy the vcpu_info content and then re-activate the
> > cache with the new GPA. To do this, stop considering an inactive vcpu_info
> > cache as a hard error in kvm_xen_set_evtchn_fast().
> >
> > Signed-off-by: Paul Durrant <[email protected]>
> > Reviewed-by: David Woodhouse <[email protected]>
>
> Wait, didn't we realise that this leaves the bits set in the shadow
> evtchn_pending_sel that get lost on migration?
>
> The point in your previous patch which split out a shiny new
> set_shinfo_evtchn_pending() function was that you could then *call*
> that function to ensure that the corresponding index bit was set on the
> destination host after migration, if the bit in the shinfo is.
>
> So we'd do that from kvm_xen_setup_evtchn(), kvm_xen_eventfd_assign(),
> and when setting KVM_XEN_VCPU_ATTR_TYPE_TIMER.
>
>  if (bit_is_set_in_shinfo)
>    set_shinfo_evtchn_pending()

I mean set_vcpu_info_evtchn_pending() of course. And we probably want
to extend the xen_shinfo_test to test it, by setting the bit in the
shinfo to mark the event as pending, and then doing each of

• Set up timer (a bit like in TEST_TIMER_RESTORE at line 817).
• Add incoming eventfd with KVM_SET_GSI_ROUTING (cf. line 563)
• Add IPI with KVM_XEN_ATTR_TYPE_EVTCHN (cf. line 597)

Each of those should set the index bit in the vcpu_info immediately if
the evtchn port is already set (and unmasked) in the shinfo.


(Ignore this part if you're cleverer than me or have had more coffee…)

It took me a moment to get my head around the different setups we have
for event channels, but that's because the standard KVM_SET_GSI_ROUTING
one is for *incoming* events, just as we would for MSIs, and we use the
standard way of attaching an eventfd to an incoming GSI/MSI/evtchn.

The KVM_XEN_ATTR_TYPE_EVTCHN one is for *outbound* events where the
guest does an EVTCHNOP_send. That can *raise* events on an eventfd, or
it can be an IPI or loopback interdomain port, which is the case we
need to test.


Attachments:
smime.p7s (5.83 kB)

2023-11-22 10:58:13

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v8 15/15] KVM: xen: allow vcpu_info content to be 'safely' copied

On 22/11/2023 10:39, David Woodhouse wrote:
> On Tue, 2023-11-21 at 22:53 +0000, David Woodhouse wrote:
>> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
>>> From: Paul Durrant <[email protected]>
>>>
>>> If the guest sets an explicit vcpu_info GPA then, for any of the first 32
>>> vCPUs, the content of the default vcpu_info in the shared_info page must be
>>> copied into the new location. Because this copy may race with event
>>> delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
>>> needs to be a way to defer that until the copy is complete.
>>> Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
>>> that is used in atomic context if the vcpu_info PFN cache has been
>>> invalidated so that the update of vcpu_info can be deferred until the
>>> cache can be refreshed (on vCPU thread's the way back into guest context).
>>>
>>> Also use this shadow if the vcpu_info cache has been *deactivated*, so that
>>> the VMM can safely copy the vcpu_info content and then re-activate the
>>> cache with the new GPA. To do this, stop considering an inactive vcpu_info
>>> cache as a hard error in kvm_xen_set_evtchn_fast().
>>>
>>> Signed-off-by: Paul Durrant <[email protected]>
>>> Reviewed-by: David Woodhouse <[email protected]>
>>
>> Wait, didn't we realise that this leaves the bits set in the shadow
>> evtchn_pending_sel that get lost on migration?
>>

Indeed we did not, but that's not something that *this* patch, or even
this series, is dealing with. We also know that setting the 'width' of
shared_info has some issues, but again, can we keep that for other
patches? The series is at v9 and has already suffered a fair amount of
scope-creep.

Paul

2023-11-22 11:27:07

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 15/15] KVM: xen: allow vcpu_info content to be 'safely' copied

On Wed, 2023-11-22 at 10:55 +0000, Paul Durrant wrote:
>
> > > Wait, didn't we realise that this leaves the bits set in the shadow
> > > evtchn_pending_sel that get lost on migration?
> > >
>
> Indeed we did not, but that's not something that *this* patch, or even
> this series, is dealing with.  We also know that setting the 'width' of
> shared_info has some issues, but again, can we keep that for other
> patches? The series is at v9 and has already suffered a fair amount of
> scope-creep.

Hrm... OK, that makes sense. This series is attempting to address the
fact that we can't do overlays on memslots without temporarily taking
away GPA ranges that we didn't mean to change. This patch is sufficient
to allow evtchn delivery to work while the memslots are being frobbed,
because userspace takes the vcpu_info away during the update.

In that case the shadow works just fine.

So yeah, if you want to handle the migration case separately then that
seems reasonable.


Attachments:
smime.p7s (5.83 kB)

2023-11-22 14:29:31

by Xu Yilun

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On Wed, Nov 22, 2023 at 09:12:18AM +0000, David Woodhouse wrote:
> On Wed, 2023-11-22 at 16:54 +0800, Xu Yilun wrote:
> >
> > > @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> > >                         ret = -EFAULT;
> > >                         goto out;
> > >                 }
> > > +
> > > +               hva_change = true;
> > > +       } else {
> > > +               /*
> > > +                * No need to do any re-mapping if the only thing that has
> > > +                * changed is the page offset. Just page align it to allow the
> > > +                * new offset to be added in.
> >
> > I don't understand how the uhva('s offset) could be changed when both gpa and
> > slot are not changed. Maybe I have no knowledge of xen, but in later
> > patch you said your uhva would never change...
>
> It doesn't change on a normal refresh with kvm_gpc_refresh(), which is
> just for revalidation after memslot changes or MMU invalidation.
>
> But it can change if the gpc is being reinitialized with a new address
> (perhaps because the guest has made another hypercall to set the
> address, etc.)
>
> That new address could happen to be in the *same* page as the previous

In this case, the lower bits of new gpa should be different to gpc->gpa,
so will hit "if (gpc->gpa != gpa ...)" branch.

And I see this comment is deleted in v9, which makes sense to me.

Thanks,
Yilun

> one. In fact the xen_shinfo_test explicitly tests that case, IIRC.
>
> And kvm_gpc_activate() also happens to use __kvm_gpc_refresh()
> internally.


2023-11-22 15:47:16

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On Wed, 2023-11-22 at 22:27 +0800, Xu Yilun wrote:
> On Wed, Nov 22, 2023 at 09:12:18AM +0000, David Woodhouse wrote:
> > On Wed, 2023-11-22 at 16:54 +0800, Xu Yilun wrote:
> > >
> > > > @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
> > > >                         ret = -EFAULT;
> > > >                         goto out;
> > > >                 }
> > > > +
> > > > +               hva_change = true;
> > > > +       } else {
> > > > +               /*
> > > > +                * No need to do any re-mapping if the only thing that has
> > > > +                * changed is the page offset. Just page align it to allow the
> > > > +                * new offset to be added in.
> > >
> > > I don't understand how the uhva('s offset) could be changed when both gpa and
> > > slot are not changed. Maybe I have no knowledge of xen, but in later
> > > patch you said your uhva would never change...
> >
> > It doesn't change on a normal refresh with kvm_gpc_refresh(), which is
> > just for revalidation after memslot changes or MMU invalidation.
> >
> > But it can change if the gpc is being reinitialized with a new address
> > (perhaps because the guest has made another hypercall to set the
> > address, etc.)
> >
> > That new address could happen to be in the *same* page as the previous
>
> In this case, the lower bits of new gpa should be different to gpc->gpa,
> so will hit "if (gpc->gpa != gpa ...)" branch.

I think that 'if (gpc->gpa != gpa); branch is also gratuitously
refreshing when it doesn't need to; it only needs to refresh if the
*aligned* gpas don't match.

But it was like that already, so I won't heckle Paul any further :)


Attachments:
smime.p7s (5.83 kB)

2023-11-22 15:53:26

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v8 07/15] KVM: pfncache: include page offset in uhva and use it consistently

On 22/11/2023 15:42, David Woodhouse wrote:
> On Wed, 2023-11-22 at 22:27 +0800, Xu Yilun wrote:
>> On Wed, Nov 22, 2023 at 09:12:18AM +0000, David Woodhouse wrote:
>>> On Wed, 2023-11-22 at 16:54 +0800, Xu Yilun wrote:
>>>>
>>>>> @@ -259,13 +258,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
>>>>>                         ret = -EFAULT;
>>>>>                         goto out;
>>>>>                 }
>>>>> +
>>>>> +               hva_change = true;
>>>>> +       } else {
>>>>> +               /*
>>>>> +                * No need to do any re-mapping if the only thing that has
>>>>> +                * changed is the page offset. Just page align it to allow the
>>>>> +                * new offset to be added in.
>>>>
>>>> I don't understand how the uhva('s offset) could be changed when both gpa and
>>>> slot are not changed. Maybe I have no knowledge of xen, but in later
>>>> patch you said your uhva would never change...
>>>
>>> It doesn't change on a normal refresh with kvm_gpc_refresh(), which is
>>> just for revalidation after memslot changes or MMU invalidation.
>>>
>>> But it can change if the gpc is being reinitialized with a new address
>>> (perhaps because the guest has made another hypercall to set the
>>> address, etc.)
>>>
>>> That new address could happen to be in the *same* page as the previous
>>
>> In this case, the lower bits of new gpa should be different to gpc->gpa,
>> so will hit "if (gpc->gpa != gpa ...)" branch.
>
> I think that 'if (gpc->gpa != gpa); branch is also gratuitously
> refreshing when it doesn't need to; it only needs to refresh if the
> *aligned* gpas don't match.
>

I did look at that but decided that gfn_to_hva_memslot() was
sufficiently lightweight that it was not really worth optimising.

> But it was like that already, so I won't heckle Paul any further :)

I appreciate it! :-)

Paul

2023-11-27 23:36:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v8 05/15] KVM: pfncache: remove KVM_GUEST_USES_PFN usage

On Tue, Nov 21, 2023, David Woodhouse wrote:
> On Tue, 2023-11-21 at 18:02 +0000, Paul Durrant wrote:
> > From: Paul Durrant <[email protected]>
> >
> > As noted in [1] the KVM_GUEST_USES_PFN usage flag is never set by any
> > callers of kvm_gpc_init(), which also makes the 'vcpu' argument redundant.
> > Moreover, all existing callers specify KVM_HOST_USES_PFN so the usage
> > check in hva_to_pfn_retry() and hence the 'usage' argument to
> > kvm_gpc_init() are also redundant.
> > Remove the pfn_cache_usage enumeration and remove the redundant arguments,
> > fields of struct gfn_to_hva_cache, and all the related code.
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> >
> > Signed-off-by: Paul Durrant <[email protected]>
>
> I think it's https://lore.kernel.org/all/[email protected]/

Yeah, that's the more important link.

> which is the key reference. I'm not sure I'm 100% on board, but I never
> got round to replying to Sean's email because it was one of those "put
> up or shut up situations" and I didn't have the bandwidth to actually
> write the code to prove my point.
>
> I think it *is* important to support non-pinned pages. There's a reason
> we even made the vapic page migratable. We want to support memory
> hotplug, we want to cope with machine checks telling us to move certain
> pages (which I suppose is memory hotplug). See commit 38b9917350cb
> ("kvm: vmx: Implement set_apic_access_page_addr") for example.

The vAPIC page is slightly different in that it effectively never opened a window
for page migration, i.e. once a vCPU was created that page was stuck. For nested
virtualization pages, the probability of being able to migrate a page at any given
time might be relatively low, but it's extremely unlikely for a page to be pinned
for the entire lifetime of a (L1) VM.

> I agree that in the first round of the nVMX code there were bugs. And
> sure, of *course* it isn't sufficient to wire up the invalidation
> without either a KVM_REQ_SOMETHIMG to put it back, or just a *check* on
> the corresponding gpc on the way back into the guest. We'd have worked
> that out.

Maybe. I spent most of a day, maybe longer, hacking at the nVMX code and was
unable to get line of sight to an end result that I felt would be worth pursuing.

I'm definitely not saying it's impossible, and I'm not dead set against
re-introducing KVM_GUEST_USES_PFN or similar, but a complete solution crosses the
threshold where it's unreasonable to ask/expect someone to pick up the work in
order to get their code/series merged.

Which is effectively what you said below, I just wanted to explain why I'm pushing
to remove KVM_GUEST_USES_PFN, and to say that if you or someone else were to write
the code it wouldn't be an automatic nak.

> And yes, the gpc has had bugs as we implemented it, but the point was
> that we got to something which *is* working, and forms a usable
> building block.
>
> So I'm not really sold on the idea of ditching KVM_GUEST_USES_PFN. I
> think we could get it working, and I think it's worth it. But my
> opinion is worth very little unless I express it in 'diff -up' form
> instead of prose, and reverting this particular patch is the least of
> my barriers to doing so, so reluctantly...