2024-02-15 15:42:24

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

From: Paul Durrant <[email protected]>

This series contains a new patch from Sean added since v12 [1]:

* KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()

This frees up the function name kvm_is_error_gpa() such that it can then be
re-defined in:

* KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA

to be used for a simple GPA validation helper function. The patch also now
contains an either/or address check for GPA versus HVA in
__kvm_gpc_refresh().

In:

* KVM: pfncache: add a mark-dirty helper

The function name has been changed from kvm_gpc_mark_dirty() to
kvm_gpc_mark_dirty_in_slot().

In:

* KVM: x86/xen: allow shared_info to be mapped by fixed HVA

missing HVA validation checks have been added and the 'hva == 0' test
has been changed to '!hva'. The KVM_XEN_ATTR_TYPE_SHARED_INFO and
KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA cases are still largely handled as one
though as separation leads to duplicate calls to
kvm_xen_shared_info_init() which looks messy.

Also, patches with a 'xen' prefix have now been modified to have a
'x86/xen' prefix and patches with a 'selftests / xen' prefix have been
modified to have simply a 'selftests' prefix.

[1] https://lore.kernel.org/kvm/[email protected]/

David Woodhouse (1):
KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

Paul Durrant (19):
KVM: pfncache: Add a map helper function
KVM: pfncache: remove unnecessary exports
KVM: x86/xen: mark guest pages dirty with the pfncache lock held
KVM: pfncache: add a mark-dirty helper
KVM: pfncache: remove KVM_GUEST_USES_PFN usage
KVM: pfncache: stop open-coding offset_in_page()
KVM: pfncache: include page offset in uhva and use it consistently
KVM: pfncache: allow a cache to be activated with a fixed (userspace)
HVA
KVM: x86/xen: separate initialization of shared_info cache and content
KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is
set
KVM: x86/xen: allow shared_info to be mapped by fixed HVA
KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
KVM: selftests: map Xen's shared_info page using HVA rather than GFN
KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA
capability
KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
KVM: x86/xen: don't block on pfncache locks in
kvm_xen_set_evtchn_fast()
KVM: pfncache: check the need for invalidation under read lock first
KVM: x86/xen: allow vcpu_info content to be 'safely' copied

Sean Christopherson (1):
KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()

Documentation/virt/kvm/api.rst | 53 ++-
arch/s390/kvm/diag.c | 2 +-
arch/s390/kvm/gaccess.c | 14 +-
arch/s390/kvm/kvm-s390.c | 4 +-
arch/s390/kvm/priv.c | 4 +-
arch/s390/kvm/sigp.c | 2 +-
arch/x86/kvm/x86.c | 7 +-
arch/x86/kvm/xen.c | 361 +++++++++++------
include/linux/kvm_host.h | 49 ++-
include/linux/kvm_types.h | 8 -
include/uapi/linux/kvm.h | 9 +-
.../selftests/kvm/x86_64/xen_shinfo_test.c | 59 ++-
virt/kvm/pfncache.c | 382 ++++++++++--------
13 files changed, 591 insertions(+), 363 deletions(-)


base-commit: 7455665a3521aa7b56245c0a2810f748adc5fdd4
--
2.39.2



2024-02-15 15:44:19

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 06/21] KVM: pfncache: stop open-coding offset_in_page()

From: Paul Durrant <[email protected]>

Some code in pfncache uses offset_in_page() but in other places it is open-
coded. Use offset_in_page() consistently everywhere.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v8:
- New in this version.
---
virt/kvm/pfncache.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 6f4b537eb25b..0eeb034d0674 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,7 +48,7 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
if (!gpc->active)
return false;

- if ((gpc->gpa & ~PAGE_MASK) + len > PAGE_SIZE)
+ if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
return false;

if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

gpc->valid = true;
gpc->pfn = new_pfn;
- gpc->khva = new_khva + (gpc->gpa & ~PAGE_MASK);
+ gpc->khva = new_khva + offset_in_page(gpc->gpa);

/*
* Put the reference to the _new_ pfn. The pfn is now tracked by the
@@ -213,7 +213,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
unsigned long len)
{
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);
- unsigned long page_offset = gpa & ~PAGE_MASK;
+ unsigned long page_offset = offset_in_page(gpa);
bool unmap_old = false;
unsigned long old_uhva;
kvm_pfn_t old_pfn;
--
2.39.2


2024-02-15 15:44:47

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 20/21] KVM: x86/xen: allow vcpu_info content to be 'safely' copied

From: Paul Durrant <[email protected]>

If the guest sets an explicit vcpu_info GPA then, for any of the first 32
vCPUs, the content of the default vcpu_info in the shared_info page must be
copied into the new location. Because this copy may race with event
delivery (which updates the 'evtchn_pending_sel' field in vcpu_info) there
needs to be a way to defer that until the copy is complete.
Happily there is already a shadow of 'evtchn_pending_sel' in kvm_vcpu_xen
that is used in atomic context if the vcpu_info PFN cache has been
invalidated so that the update of vcpu_info can be deferred until the
cache can be refreshed (on vCPU thread's the way back into guest context).

Also use this shadow if the vcpu_info cache has been *deactivated*, so that
the VMM can safely copy the vcpu_info content and then re-activate the
cache with the new GPA. To do this, stop considering an inactive vcpu_info
cache as a hard error in kvm_xen_set_evtchn_fast().

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v8:
- Update commit comment.

v6:
- New in this version.
---
arch/x86/kvm/xen.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 8650141b266e..11ab62ca011d 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1802,9 +1802,6 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
WRITE_ONCE(xe->vcpu_idx, vcpu->vcpu_idx);
}

- if (!vcpu->arch.xen.vcpu_info_cache.active)
- return -EINVAL;
-
if (xe->port >= max_evtchn_port(kvm))
return -EINVAL;

--
2.39.2


2024-02-15 15:46:19

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 13/21] KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA

From: Paul Durrant <[email protected]>

If the guest does not explicitly set the GPA of vcpu_info structure in
memory then, for guests with 32 vCPUs or fewer, the vcpu_info embedded
in the shared_info page may be used. As described in a previous commit,
the shared_info page is an overlay at a fixed HVA within the VMM, so in
this case it also more optimal to activate the vcpu_info cache with a
fixed HVA to avoid unnecessary invalidation if the guest memory layout
is modified.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v8:
- Re-base.

v5:
- New in this version.
---
Documentation/virt/kvm/api.rst | 26 +++++++++++++++++++++-----
arch/x86/kvm/xen.c | 34 ++++++++++++++++++++++++++++------
include/uapi/linux/kvm.h | 3 +++
3 files changed, 52 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3372be85b335..bd93cafd3e4e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5523,11 +5523,12 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
Sets the guest physical frame number at which the Xen shared_info
page resides. Note that although Xen places vcpu_info for the first
32 vCPUs in the shared_info page, KVM does not automatically do so
- and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
- explicitly even when the vcpu_info for a given vCPU resides at the
- "default" location in the shared_info page. This is because KVM may
- not be aware of the Xen CPU id which is used as the index into the
- vcpu_info[] array, so may know the correct default location.
+ and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO or
+ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA be used explicitly even when
+ the vcpu_info for a given vCPU resides at the "default" location
+ in the shared_info page. This is because KVM may not be aware of
+ the Xen CPU id which is used as the index into the vcpu_info[]
+ array, so may know the correct default location.

Note that the shared_info page may be constantly written to by KVM;
it contains the event channel bitmap used to deliver interrupts to
@@ -5649,6 +5650,21 @@ KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO
on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
the vcpu_info.

+KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA
+ If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+ Xen capabilities, then this attribute may be used to set the
+ userspace address of the vcpu_info for a given vCPU. It should
+ only be used when the vcpu_info resides at the "default" location
+ in the shared_info page. In this case it is safe to assume the
+ userspace address will not change, because the shared_info page is
+ an overlay on guest memory and remains at a fixed host address
+ regardless of where it is mapped in guest physical address space
+ and hence unnecessary invalidation of an internal cache may be
+ avoided if the guest memory layout is modified.
+ If the vcpu_info does not reside at the "default" location then
+ it is not guaranteed to remain at the same host address and
+ hence the aforementioned cache invalidation is required.
+
KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO
Sets the guest physical address of an additional pvclock structure
for a given vCPU. This is typically used for guest vsyscall support.
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 6fb268e424fa..4e79cc68e0a9 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -784,20 +784,33 @@ int kvm_xen_vcpu_set_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)

switch (data->type) {
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO:
+ case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
/* No compat necessary here. */
BUILD_BUG_ON(sizeof(struct vcpu_info) !=
sizeof(struct compat_vcpu_info));
BUILD_BUG_ON(offsetof(struct vcpu_info, time) !=
offsetof(struct compat_vcpu_info, time));

- if (data->u.gpa == KVM_XEN_INVALID_GPA) {
- kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
- r = 0;
- break;
+ if (data->type == KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO) {
+ if (data->u.gpa == KVM_XEN_INVALID_GPA) {
+ kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+ r = 0;
+ break;
+ }
+
+ r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
+ data->u.gpa, sizeof(struct vcpu_info));
+ } else {
+ if (data->u.hva == 0) {
+ kvm_gpc_deactivate(&vcpu->arch.xen.vcpu_info_cache);
+ r = 0;
+ break;
+ }
+
+ r = kvm_gpc_activate_hva(&vcpu->arch.xen.vcpu_info_cache,
+ data->u.hva, sizeof(struct vcpu_info));
}

- r = kvm_gpc_activate(&vcpu->arch.xen.vcpu_info_cache,
- data->u.gpa, sizeof(struct vcpu_info));
if (!r)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

@@ -1026,6 +1039,15 @@ int kvm_xen_vcpu_get_attr(struct kvm_vcpu *vcpu, struct kvm_xen_vcpu_attr *data)
r = 0;
break;

+ case KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA:
+ if (vcpu->arch.xen.vcpu_info_cache.active &&
+ vcpu->arch.xen.vcpu_info_cache.gpa == KVM_XEN_INVALID_GPA)
+ data->u.hva = vcpu->arch.xen.vcpu_info_cache.uhva;
+ else
+ data->u.hva = 0;
+ r = 0;
+ break;
+
case KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO:
if (vcpu->arch.xen.vcpu_time_info_cache.active)
data->u.gpa = vcpu->arch.xen.vcpu_time_info_cache.gpa;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ac5caba313d1..d2665319db6e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1809,6 +1809,7 @@ struct kvm_xen_vcpu_attr {
union {
__u64 gpa;
#define KVM_XEN_INVALID_GPA ((__u64)-1)
+ __u64 hva;
__u64 pad[8];
struct {
__u64 state;
@@ -1839,6 +1840,8 @@ struct kvm_xen_vcpu_attr {
#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID 0x6
#define KVM_XEN_VCPU_ATTR_TYPE_TIMER 0x7
#define KVM_XEN_VCPU_ATTR_TYPE_UPCALL_VECTOR 0x8
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA 0x9

/* Secure Encrypted Virtualization command */
enum sev_cmd_id {
--
2.39.2


2024-02-15 15:46:20

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 18/21] KVM: x86/xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()

From: Paul Durrant <[email protected]>

As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
There is only actually blocking with PREEMPT_RT because the locks will
turned into mutexes. There is no 'raw' version of rwlock_t that can be used
to avoid that, so use read_trylock() and treat failure to lock the same as
an invalid cache.

[1] https://lore.kernel.org/lkml/[email protected]/T/#mbd06e5a04534ce9c0ee94bd8f1e8d942b2d45bd6

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v11:
- Amended the commit comment.

v10:
- New in this version.
---
arch/x86/kvm/xen.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 59073642c078..8650141b266e 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1678,10 +1678,13 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
unsigned long flags;
int rc = -EWOULDBLOCK;

- read_lock_irqsave(&gpc->lock, flags);
- if (!kvm_gpc_check(gpc, PAGE_SIZE))
+ local_irq_save(flags);
+ if (!read_trylock(&gpc->lock))
goto out;

+ if (!kvm_gpc_check(gpc, PAGE_SIZE))
+ goto out_unlock;
+
if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
struct shared_info *shinfo = gpc->khva;

@@ -1703,8 +1706,10 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
rc = 1; /* It is newly raised */
}

+ out_unlock:
+ read_unlock(&gpc->lock);
out:
- read_unlock_irqrestore(&gpc->lock, flags);
+ local_irq_restore(flags);
return rc;
}

@@ -1714,21 +1719,23 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
struct gfn_to_pfn_cache *gpc = &vcpu->arch.xen.vcpu_info_cache;
unsigned long flags;
bool kick_vcpu = false;
+ bool locked;

- read_lock_irqsave(&gpc->lock, flags);
+ local_irq_save(flags);
+ locked = read_trylock(&gpc->lock);

/*
* Try to deliver the event directly to the vcpu_info. If successful and
* the guest is using upcall_vector delivery, send the MSI.
- * If the pfncache is invalid, set the shadow. In this case, or if the
- * guest is using another form of event delivery, the vCPU must be
- * kicked to complete the delivery.
+ * If the pfncache lock is contended or the cache is invalid, set the
+ * shadow. In this case, or if the guest is using another form of event
+ * delivery, the vCPU must be kicked to complete the delivery.
*/
if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
struct vcpu_info *vcpu_info = gpc->khva;
int port_word_bit = port / 64;

- if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+ if ((!locked || !kvm_gpc_check(gpc, sizeof(*vcpu_info)))) {
if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
kick_vcpu = true;
goto out;
@@ -1742,7 +1749,7 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
struct compat_vcpu_info *vcpu_info = gpc->khva;
int port_word_bit = port / 32;

- if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+ if ((!locked || !kvm_gpc_check(gpc, sizeof(*vcpu_info)))) {
if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
kick_vcpu = true;
goto out;
@@ -1761,7 +1768,10 @@ static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
}

out:
- read_unlock_irqrestore(&gpc->lock, flags);
+ if (locked)
+ read_unlock(&gpc->lock);
+
+ local_irq_restore(flags);
return kick_vcpu;
}

--
2.39.2


2024-02-15 15:46:34

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 19/21] KVM: pfncache: check the need for invalidation under read lock first

From: Paul Durrant <[email protected]>

When processing mmu_notifier invalidations for gpc caches, pre-check for
overlap with the invalidation event while holding gpc->lock for read, and
only take gpc->lock for write if the cache needs to be invalidated. Doing
a pre-check without taking gpc->lock for write avoids unnecessarily
contending the lock for unrelated invalidations, which is very beneficial
for caches that are heavily used (but rarely subjected to mmu_notifier
invalidations).

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v13:
- Use Sean's preferred wording for the commit comment.

v10:
- New in this version.
---
virt/kvm/pfncache.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 4e64d349b2f7..a60d8f906896 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -29,14 +29,30 @@ void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm, unsigned long start,

spin_lock(&kvm->gpc_lock);
list_for_each_entry(gpc, &kvm->gpc_list, list) {
- write_lock_irq(&gpc->lock);
+ read_lock_irq(&gpc->lock);

/* Only a single page so no need to care about length */
if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
gpc->uhva >= start && gpc->uhva < end) {
- gpc->valid = false;
+ read_unlock_irq(&gpc->lock);
+
+ /*
+ * There is a small window here where the cache could
+ * be modified, and invalidation would no longer be
+ * necessary. Hence check again whether invalidation
+ * is still necessary once the write lock has been
+ * acquired.
+ */
+
+ write_lock_irq(&gpc->lock);
+ if (gpc->valid && !is_error_noslot_pfn(gpc->pfn) &&
+ gpc->uhva >= start && gpc->uhva < end)
+ gpc->valid = false;
+ write_unlock_irq(&gpc->lock);
+ continue;
}
- write_unlock_irq(&gpc->lock);
+
+ read_unlock_irq(&gpc->lock);
}
spin_unlock(&kvm->gpc_lock);
}
--
2.39.2


2024-02-15 15:49:12

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 10/21] KVM: x86/xen: separate initialization of shared_info cache and content

From: Paul Durrant <[email protected]>

A subsequent patch will allow shared_info to be initialized using either a
GPA or a user-space (i.e. VMM) HVA. To make that patch cleaner, separate
the initialization of the shared_info content from the activation of the
pfncache.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v11:
- Fix accidental regression from commit 5d6d6a7d7e66a ("KVM: x86: Refine
calculation of guest wall clock to use a single TSC read").

v10:
- New in this version.
---
arch/x86/kvm/xen.c | 55 +++++++++++++++++++++++++++-------------------
1 file changed, 32 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index e90464225467..031e98d88ba2 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -34,41 +34,32 @@ static bool kvm_xen_hcall_evtchn_send(struct kvm_vcpu *vcpu, u64 param, u64 *r);

DEFINE_STATIC_KEY_DEFERRED_FALSE(kvm_xen_enabled, HZ);

-static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn)
+static int kvm_xen_shared_info_init(struct kvm *kvm)
{
struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
struct pvclock_wall_clock *wc;
- gpa_t gpa = gfn_to_gpa(gfn);
u32 *wc_sec_hi;
u32 wc_version;
u64 wall_nsec;
int ret = 0;
int idx = srcu_read_lock(&kvm->srcu);

- if (gfn == KVM_XEN_INVALID_GFN) {
- kvm_gpc_deactivate(gpc);
- goto out;
- }
+ read_lock_irq(&gpc->lock);
+ while (!kvm_gpc_check(gpc, PAGE_SIZE)) {
+ read_unlock_irq(&gpc->lock);

- do {
- ret = kvm_gpc_activate(gpc, gpa, PAGE_SIZE);
+ ret = kvm_gpc_refresh(gpc, PAGE_SIZE);
if (ret)
goto out;

- /*
- * This code mirrors kvm_write_wall_clock() except that it writes
- * directly through the pfn cache and doesn't mark the page dirty.
- */
- wall_nsec = kvm_get_wall_clock_epoch(kvm);
-
- /* It could be invalid again already, so we need to check */
read_lock_irq(&gpc->lock);
+ }

- if (gpc->valid)
- break;
-
- read_unlock_irq(&gpc->lock);
- } while (1);
+ /*
+ * This code mirrors kvm_write_wall_clock() except that it writes
+ * directly through the pfn cache and doesn't mark the page dirty.
+ */
+ wall_nsec = kvm_get_wall_clock_epoch(kvm);

/* Paranoia checks on the 32-bit struct layout */
BUILD_BUG_ON(offsetof(struct compat_shared_info, wc) != 0x900);
@@ -639,12 +630,30 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
}
break;

- case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+ case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
+ int idx;
+
mutex_lock(&kvm->arch.xen.xen_lock);
- r = kvm_xen_shared_info_init(kvm, data->u.shared_info.gfn);
+
+ idx = srcu_read_lock(&kvm->srcu);
+
+ if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
+ kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+ r = 0;
+ } else {
+ r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
+ gfn_to_gpa(data->u.shared_info.gfn),
+ PAGE_SIZE);
+ }
+
+ srcu_read_unlock(&kvm->srcu, idx);
+
+ if (!r && kvm->arch.xen.shinfo_cache.active)
+ r = kvm_xen_shared_info_init(kvm);
+
mutex_unlock(&kvm->arch.xen.xen_lock);
break;
-
+ }
case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
if (data->u.vector && data->u.vector < 0x10)
r = -EINVAL;
--
2.39.2


2024-02-15 15:55:23

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 16/21] KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability

From: Paul Durrant <[email protected]>

Now that all relevant kernel changes and selftests are in place, enable the
new capability.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v2:
- New in this version.
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 31cd5d803dae..9f98c5075629 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4673,7 +4673,8 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
KVM_XEN_HVM_CONFIG_SHARED_INFO |
KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL |
KVM_XEN_HVM_CONFIG_EVTCHN_SEND |
- KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE;
+ KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE |
+ KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA;
if (sched_info_on())
r |= KVM_XEN_HVM_CONFIG_RUNSTATE |
KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG;
--
2.39.2


2024-02-15 15:56:39

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

From: David Woodhouse <[email protected]>

This function can race with kvm_gpc_deactivate(), which does not take
the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
temporarily dropped its write lock on gpc->lock.

Then if hva_to_pfn_retry() determines that the PFN hasn't changed and
that the original pfn and khva can be reused, they get assigned back to
gpc->pfn and gpc->khva even though the khva was already unmapped by
kvm_gpc_deactivate(). This leaves the cache in an apparently valid state
but with ->khva pointing to an address which has been unmapped. Which in
turn leads to oopses in e.g. __kvm_xen_has_interrupt() and
set_shinfo_evtchn_pending() when they dereference said khva.

It may be possible to fix this just by making kvm_gpc_deactivate() take
the ->refresh_lock, but that still leaves ->refresh_lock being basically
redundant with the write lock on ->lock, which frankly makes my skin
itch, with the way that pfn_to_hva_retry() operates on fields in the gpc
without holding a write lock on ->lock.

Instead, fix it by cleaning up the semantics of hva_to_pfn_retry(). It
now no longer does locking gymnastics because it no longer operates on
the gpc object at all. I's called with a uhva and simply returns the
corresponding pfn (pinned), and a mapped khva for it.

Its caller __kvm_gpc_refresh() now sets gpc->uhva and clears gpc->valid
before dropping ->lock, calling hva_to_pfn_retry() and retaking ->lock
for write.

If hva_to_pfn_retry() fails, *or* if the ->uhva or ->active fields in
the gpc changed while the lock was dropped, the new mapping is discarded
and the gpc is not modified. On success with an unchanged gpc, the new
mapping is installed and the current ->pfn and ->uhva are taken into the
local old_pfn and old_khva variables to be unmapped once the locks are
all released.

This simplification means that ->refresh_lock is no longer needed for
correctness, but it does still provide a minor optimisation because it
will prevent two concurrent __kvm_gpc_refresh() calls from mapping a
given PFN, only for one of them to lose the race and discard its
mapping.

The optimisation in hva_to_pfn_retry() where it attempts to use the old
mapping if the pfn doesn't change is dropped, since it makes the pinning
more complex. It's a pointless optimisation anyway, since the odds of
the pfn ending up the same when the uhva has changed (i.e. the odds of
the two userspace addresses both pointing to the same underlying
physical page) are negligible,

The 'hva_changed' local variable in __kvm_gpc_refresh() is also removed,
since it's simpler just to clear gpc->valid if the uhva changed.
Likewise the unmap_old variable is dropped because it's just as easy to
check the old_pfn variable for KVM_PFN_ERR_FAULT.

I remain slightly confused because although this is clearly a race in
the gfn_to_pfn_cache code, I don't quite know how the Xen support code
actually managed to trigger it. We've seen oopses from dereferencing a
valid-looking ->khva in both __kvm_xen_has_interrupt() (the vcpu_info)
and in set_shinfo_evtchn_pending() (the shared_info). But surely the
race shouldn't happen for the vcpu_info gpc because all calls to both
refresh and deactivate hold the vcpu mutex, and it shouldn't happen
for the shared_info gpc because all calls to both will hold the
kvm->arch.xen.xen_lock mutex.

Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Paul Durrant <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>

v12:
- New in this version.
---
virt/kvm/pfncache.c | 184 +++++++++++++++++++++-----------------------
1 file changed, 88 insertions(+), 96 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index a60d8f906896..d1c73b452570 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -139,108 +139,65 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
return kvm->mmu_invalidate_seq != mmu_seq;
}

-static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
+/*
+ * Given a user virtual address, obtain a pinned host PFN and kernel mapping
+ * for it. The caller will release the PFN after installing it into the GPC
+ * so that the MMU notifier invalidation mechanism is active.
+ */
+static kvm_pfn_t hva_to_pfn_retry(struct kvm *kvm, unsigned long uhva,
+ kvm_pfn_t *pfn, void **khva)
{
/* Note, the new page offset may be different than the old! */
- void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
void *new_khva = NULL;
unsigned long mmu_seq;

- lockdep_assert_held(&gpc->refresh_lock);
-
- lockdep_assert_held_write(&gpc->lock);
-
- /*
- * Invalidate the cache prior to dropping gpc->lock, the gpa=>uhva
- * assets have already been updated and so a concurrent check() from a
- * different task may not fail the gpa/uhva/generation checks.
- */
- gpc->valid = false;
-
- do {
- mmu_seq = gpc->kvm->mmu_invalidate_seq;
+ for (;;) {
+ mmu_seq = kvm->mmu_invalidate_seq;
smp_rmb();

- write_unlock_irq(&gpc->lock);
-
- /*
- * If the previous iteration "failed" due to an mmu_notifier
- * event, release the pfn and unmap the kernel virtual address
- * from the previous attempt. Unmapping might sleep, so this
- * needs to be done after dropping the lock. Opportunistically
- * check for resched while the lock isn't held.
- */
- if (new_pfn != KVM_PFN_ERR_FAULT) {
- /*
- * Keep the mapping if the previous iteration reused
- * the existing mapping and didn't create a new one.
- */
- if (new_khva != old_khva)
- gpc_unmap(new_pfn, new_khva);
-
- kvm_release_pfn_clean(new_pfn);
-
- cond_resched();
- }
-
/* We always request a writeable mapping */
- new_pfn = hva_to_pfn(gpc->uhva, false, false, NULL, true, NULL);
+ new_pfn = hva_to_pfn(uhva, false, false, NULL, true, NULL);
if (is_error_noslot_pfn(new_pfn))
- goto out_error;
+ return -EFAULT;

/*
- * Obtain a new kernel mapping if KVM itself will access the
- * pfn. Note, kmap() and memremap() can both sleep, so this
- * too must be done outside of gpc->lock!
+ * Always obtain a new kernel mapping. Trying to reuse an
+ * existing one is more complex than it's worth.
*/
- if (new_pfn == gpc->pfn)
- new_khva = old_khva;
- else
- new_khva = gpc_map(new_pfn);
-
+ new_khva = gpc_map(new_pfn);
if (!new_khva) {
kvm_release_pfn_clean(new_pfn);
- goto out_error;
+ return -EFAULT;
}

- write_lock_irq(&gpc->lock);
+ if (!mmu_notifier_retry_cache(kvm, mmu_seq))
+ break;

/*
- * Other tasks must wait for _this_ refresh to complete before
- * attempting to refresh.
+ * If this iteration "failed" due to an mmu_notifier event,
+ * release the pfn and unmap the kernel virtual address, and
+ * loop around again.
*/
- WARN_ON_ONCE(gpc->valid);
- } while (mmu_notifier_retry_cache(gpc->kvm, mmu_seq));
-
- gpc->valid = true;
- gpc->pfn = new_pfn;
- gpc->khva = new_khva + offset_in_page(gpc->uhva);
+ if (new_pfn != KVM_PFN_ERR_FAULT) {
+ gpc_unmap(new_pfn, new_khva);
+ kvm_release_pfn_clean(new_pfn);
+ }
+ }

- /*
- * Put the reference to the _new_ pfn. The pfn is now tracked by the
- * cache and can be safely migrated, swapped, etc... as the cache will
- * invalidate any mappings in response to relevant mmu_notifier events.
- */
- kvm_release_pfn_clean(new_pfn);
+ *pfn = new_pfn;
+ *khva = new_khva;

return 0;
-
-out_error:
- write_lock_irq(&gpc->lock);
-
- return -EFAULT;
}

-static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned long uhva,
- unsigned long len)
+static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
+ unsigned long uhva, unsigned long len)
{
unsigned long page_offset = kvm_is_error_gpa(gpa) ?
offset_in_page(uhva) : offset_in_page(gpa);
- bool unmap_old = false;
unsigned long old_uhva;
- kvm_pfn_t old_pfn;
- bool hva_change = false;
+ kvm_pfn_t old_pfn = KVM_PFN_ERR_FAULT;
void *old_khva;
int ret;

@@ -257,8 +214,9 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l

/*
* If another task is refreshing the cache, wait for it to complete.
- * There is no guarantee that concurrent refreshes will see the same
- * gpa, memslots generation, etc..., so they must be fully serialized.
+ * This is purely an optimisation, to avoid concurrent mappings from
+ * hva_to_pfn_retry(), all but one of which will be discarded after
+ * losing a race to install them in the GPC.
*/
mutex_lock(&gpc->refresh_lock);

@@ -279,7 +237,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
gpc->uhva = PAGE_ALIGN_DOWN(uhva);

if (gpc->uhva != old_uhva)
- hva_change = true;
+ gpc->valid = false;
} else {
struct kvm_memslots *slots = kvm_memslots(gpc->kvm);

@@ -294,7 +252,11 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l

if (kvm_is_error_hva(gpc->uhva)) {
ret = -EFAULT;
- goto out;
+
+ gpc->valid = false;
+ gpc->pfn = KVM_PFN_ERR_FAULT;
+ gpc->khva = NULL;
+ goto out_unlock;
}

/*
@@ -302,7 +264,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
* HVA may still be the same.
*/
if (gpc->uhva != old_uhva)
- hva_change = true;
+ gpc->valid = false;
} else {
gpc->uhva = old_uhva;
}
@@ -315,9 +277,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
* If the userspace HVA changed or the PFN was already invalid,
* drop the lock and do the HVA to PFN lookup again.
*/
- if (!gpc->valid || hva_change) {
- ret = hva_to_pfn_retry(gpc);
- } else {
+ if (gpc->valid) {
/*
* If the HVA→PFN mapping was already valid, don't unmap it.
* But do update gpc->khva because the offset within the page
@@ -325,30 +285,62 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa, unsigned l
*/
gpc->khva = old_khva + page_offset;
ret = 0;
- goto out_unlock;
- }

- out:
- /*
- * Invalidate the cache and purge the pfn/khva if the refresh failed.
- * Some/all of the uhva, gpa, and memslot generation info may still be
- * valid, leave it as is.
- */
- if (ret) {
+ /* old_pfn must not be unmapped because it was reused. */
+ old_pfn = KVM_PFN_ERR_FAULT;
+ } else {
+ kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
+ unsigned long new_uhva = gpc->uhva;
+ void *new_khva = NULL;
+
+ /*
+ * Invalidate the cache prior to dropping gpc->lock; the
+ * gpa=>uhva assets have already been updated and so a
+ * concurrent check() from a different task may not fail
+ * the gpa/uhva/generation checks as it should.
+ */
gpc->valid = false;
- gpc->pfn = KVM_PFN_ERR_FAULT;
- gpc->khva = NULL;
- }

- /* Detect a pfn change before dropping the lock! */
- unmap_old = (old_pfn != gpc->pfn);
+ write_unlock_irq(&gpc->lock);
+
+ ret = hva_to_pfn_retry(gpc->kvm, new_uhva, &new_pfn, &new_khva);
+
+ write_lock_irq(&gpc->lock);
+
+ WARN_ON_ONCE(gpc->valid);
+
+ if (ret || !gpc->active || gpc->uhva != new_uhva) {
+ /*
+ * On failure or if another change occurred while the
+ * lock was dropped, just purge the new mapping.
+ */
+ old_pfn = new_pfn;
+ old_khva = new_khva;
+ } else {
+ old_pfn = gpc->pfn;
+ old_khva = gpc->khva;
+
+ gpc->pfn = new_pfn;
+ gpc->khva = new_khva + offset_in_page(gpc->uhva);
+ gpc->valid = true;
+ }
+
+ /*
+ * Put the reference to the _new_ pfn. On success, the
+ * pfn is now tracked by the cache and can safely be
+ * migrated, swapped, etc. as the cache will invalidate
+ * any mappings in response to relevant mmu_notifier
+ * events.
+ */
+ kvm_release_pfn_clean(new_pfn);
+ }

out_unlock:
write_unlock_irq(&gpc->lock);

mutex_unlock(&gpc->refresh_lock);

- if (unmap_old)
+ if (old_pfn != KVM_PFN_ERR_FAULT)
gpc_unmap(old_pfn, old_khva);

return ret;
--
2.39.2


2024-02-15 16:02:16

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 07/21] KVM: pfncache: include page offset in uhva and use it consistently

From: Paul Durrant <[email protected]>

Currently the pfncache page offset is sometimes determined using the gpa
and sometimes the khva, whilst the uhva is always page-aligned. After a
subsequent patch is applied the gpa will not always be valid so adjust
the code to include the page offset in the uhva and use it consistently
as the source of truth.

Also, where a page-aligned address is required, use PAGE_ALIGN_DOWN()
for clarity.

No functional change intended.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v8:
- New in this version.
---
virt/kvm/pfncache.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c
index 0eeb034d0674..97eec8ee3449 100644
--- a/virt/kvm/pfncache.c
+++ b/virt/kvm/pfncache.c
@@ -48,10 +48,10 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len)
if (!gpc->active)
return false;

- if (offset_in_page(gpc->gpa) + len > PAGE_SIZE)
+ if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
return false;

- if (gpc->generation != slots->generation || kvm_is_error_hva(gpc->uhva))
+ if (offset_in_page(gpc->uhva) + len > PAGE_SIZE)
return false;

if (!gpc->valid)
@@ -119,7 +119,7 @@ static inline bool mmu_notifier_retry_cache(struct kvm *kvm, unsigned long mmu_s
static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)
{
/* Note, the new page offset may be different than the old! */
- void *old_khva = gpc->khva - offset_in_page(gpc->khva);
+ void *old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
kvm_pfn_t new_pfn = KVM_PFN_ERR_FAULT;
void *new_khva = NULL;
unsigned long mmu_seq;
@@ -192,7 +192,7 @@ static kvm_pfn_t hva_to_pfn_retry(struct gfn_to_pfn_cache *gpc)

gpc->valid = true;
gpc->pfn = new_pfn;
- gpc->khva = new_khva + offset_in_page(gpc->gpa);
+ gpc->khva = new_khva + offset_in_page(gpc->uhva);

/*
* Put the reference to the _new_ pfn. The pfn is now tracked by the
@@ -217,6 +217,7 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
bool unmap_old = false;
unsigned long old_uhva;
kvm_pfn_t old_pfn;
+ bool hva_change = false;
void *old_khva;
int ret;

@@ -242,10 +243,10 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
}

old_pfn = gpc->pfn;
- old_khva = gpc->khva - offset_in_page(gpc->khva);
- old_uhva = gpc->uhva;
+ old_khva = (void *)PAGE_ALIGN_DOWN((uintptr_t)gpc->khva);
+ old_uhva = PAGE_ALIGN_DOWN(gpc->uhva);

- /* If the userspace HVA is invalid, refresh that first */
+ /* Refresh the userspace HVA if necessary */
if (gpc->gpa != gpa || gpc->generation != slots->generation ||
kvm_is_error_hva(gpc->uhva)) {
gfn_t gfn = gpa_to_gfn(gpa);
@@ -259,13 +260,25 @@ static int __kvm_gpc_refresh(struct gfn_to_pfn_cache *gpc, gpa_t gpa,
ret = -EFAULT;
goto out;
}
+
+ /*
+ * Even if the GPA and/or the memslot generation changed, the
+ * HVA may still be the same.
+ */
+ if (gpc->uhva != old_uhva)
+ hva_change = true;
+ } else {
+ gpc->uhva = old_uhva;
}

+ /* Note: the offset must be correct before calling hva_to_pfn_retry() */
+ gpc->uhva += page_offset;
+
/*
* If the userspace HVA changed or the PFN was already invalid,
* drop the lock and do the HVA to PFN lookup again.
*/
- if (!gpc->valid || old_uhva != gpc->uhva) {
+ if (!gpc->valid || hva_change) {
ret = hva_to_pfn_retry(gpc);
} else {
/*
--
2.39.2


2024-02-15 16:09:29

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 14/21] KVM: selftests: map Xen's shared_info page using HVA rather than GFN

From: Paul Durrant <[email protected]>

Using the HVA of the shared_info page is more efficient, so if the
capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method
to do the mapping.

NOTE: Have the juggle_shinfo_state() thread map and unmap using both
GFN and HVA, to make sure the older mechanism is not broken.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v13:
- Patch title change.

v3:
- Re-work the juggle_shinfo_state() thread.

v2:
- New in this version.
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 44 +++++++++++++++----
1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index 9ec9ab60b63e..a61500ff0822 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -389,6 +389,7 @@ static int cmp_timespec(struct timespec *a, struct timespec *b)
return 0;
}

+static struct shared_info *shinfo;
static struct vcpu_info *vinfo;
static struct kvm_vcpu *vcpu;

@@ -404,20 +405,38 @@ static void *juggle_shinfo_state(void *arg)
{
struct kvm_vm *vm = (struct kvm_vm *)arg;

- struct kvm_xen_hvm_attr cache_activate = {
+ struct kvm_xen_hvm_attr cache_activate_gfn = {
.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
.u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE
};

- struct kvm_xen_hvm_attr cache_deactivate = {
+ struct kvm_xen_hvm_attr cache_deactivate_gfn = {
.type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
.u.shared_info.gfn = KVM_XEN_INVALID_GFN
};

+ struct kvm_xen_hvm_attr cache_activate_hva = {
+ .type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA,
+ .u.shared_info.hva = (unsigned long)shinfo
+ };
+
+ struct kvm_xen_hvm_attr cache_deactivate_hva = {
+ .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
+ .u.shared_info.hva = 0
+ };
+
+ int xen_caps = kvm_check_cap(KVM_CAP_XEN_HVM);
+
for (;;) {
- __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate);
- __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate);
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_gfn);
pthread_testcancel();
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_gfn);
+
+ if (xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) {
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_activate_hva);
+ pthread_testcancel();
+ __vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &cache_deactivate_hva);
+ }
}

return NULL;
@@ -442,6 +461,7 @@ int main(int argc, char *argv[])
bool do_runstate_flag = !!(xen_caps & KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG);
bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL);
bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND);
+ bool has_shinfo_hva = !!(xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA);

clock_gettime(CLOCK_REALTIME, &min_ts);

@@ -452,7 +472,7 @@ int main(int argc, char *argv[])
SHINFO_REGION_GPA, SHINFO_REGION_SLOT, 3, 0);
virt_map(vm, SHINFO_REGION_GVA, SHINFO_REGION_GPA, 3);

- struct shared_info *shinfo = addr_gpa2hva(vm, SHINFO_VADDR);
+ shinfo = addr_gpa2hva(vm, SHINFO_VADDR);

int zero_fd = open("/dev/zero", O_RDONLY);
TEST_ASSERT(zero_fd != -1, "Failed to open /dev/zero");
@@ -488,10 +508,16 @@ int main(int argc, char *argv[])
"Failed to read back RUNSTATE_UPDATE_FLAG attr");
}

- struct kvm_xen_hvm_attr ha = {
- .type = KVM_XEN_ATTR_TYPE_SHARED_INFO,
- .u.shared_info.gfn = SHINFO_REGION_GPA / PAGE_SIZE,
- };
+ struct kvm_xen_hvm_attr ha = {};
+
+ if (has_shinfo_hva) {
+ ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA;
+ ha.u.shared_info.hva = (unsigned long)shinfo;
+ } else {
+ ha.type = KVM_XEN_ATTR_TYPE_SHARED_INFO;
+ ha.u.shared_info.gfn = SHINFO_ADDR / PAGE_SIZE;
+ }
+
vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha);

/*
--
2.39.2


2024-02-15 16:11:44

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 12/21] KVM: x86/xen: allow shared_info to be mapped by fixed HVA

From: Paul Durrant <[email protected]>

The shared_info page is not guest memory as such. It is a dedicated page
allocated by the VMM and overlaid onto guest memory in a GFN chosen by the
guest and specified in the XENMEM_add_to_physmap hypercall. The guest may
even request that shared_info be moved from one GFN to another by
re-issuing that hypercall, but the HVA is never going to change.

Because the shared_info page is an overlay the memory slots need to be
updated in response to the hypercall. However, memory slot adjustment is
not atomic and, whilst all vCPUs are paused, there is still the possibility
that events may be delivered (which requires the shared_info page to be
updated) whilst the shared_info GPA is absent. The HVA is never absent
though, so it makes much more sense to use that as the basis for the
kernel's mapping.

Hence add a new KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA attribute type for this
purpose and a KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag to advertize its
availability. Don't actually advertize it yet though. That will be done in
a subsequent patch, which will also add tests for the new attribute type.

Also update the KVM API documentation with the new attribute and also fix
it up to consistently refer to 'shared_info' (with the underscore).

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: David Woodhouse <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: [email protected]

v13:
- Patch title change.
- Properly validate HVAs.

v8:
- Re-base.

v2:
- Define the new attribute and capability but don't advertize the
capability yet.
- Add API documentation.
---
Documentation/virt/kvm/api.rst | 25 ++++++++++++++-----
arch/x86/kvm/xen.c | 44 +++++++++++++++++++++++++++-------
include/uapi/linux/kvm.h | 6 ++++-
3 files changed, 59 insertions(+), 16 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 3ec0b7a455a0..3372be85b335 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -372,7 +372,7 @@ The bits in the dirty bitmap are cleared before the ioctl returns, unless
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is enabled. For more information,
see the description of the capability.

-Note that the Xen shared info page, if configured, shall always be assumed
+Note that the Xen shared_info page, if configured, shall always be assumed
to be dirty. KVM will not explicitly mark it such.


@@ -5487,8 +5487,9 @@ KVM_PV_ASYNC_CLEANUP_PERFORM
__u8 long_mode;
__u8 vector;
__u8 runstate_update_flag;
- struct {
+ union {
__u64 gfn;
+ __u64 hva;
} shared_info;
struct {
__u32 send_port;
@@ -5516,10 +5517,10 @@ type values:

KVM_XEN_ATTR_TYPE_LONG_MODE
Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
- determines the layout of the shared info pages exposed to the VM.
+ determines the layout of the shared_info page exposed to the VM.

KVM_XEN_ATTR_TYPE_SHARED_INFO
- Sets the guest physical frame number at which the Xen "shared info"
+ Sets the guest physical frame number at which the Xen shared_info
page resides. Note that although Xen places vcpu_info for the first
32 vCPUs in the shared_info page, KVM does not automatically do so
and instead requires that KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO be used
@@ -5528,7 +5529,7 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
not be aware of the Xen CPU id which is used as the index into the
vcpu_info[] array, so may know the correct default location.

- Note that the shared info page may be constantly written to by KVM;
+ Note that the shared_info page may be constantly written to by KVM;
it contains the event channel bitmap used to deliver interrupts to
a Xen guest, amongst other things. It is exempt from dirty tracking
mechanisms — KVM will not explicitly mark the page as dirty each
@@ -5537,9 +5538,21 @@ KVM_XEN_ATTR_TYPE_SHARED_INFO
any vCPU has been running or any event channel interrupts can be
routed to the guest.

- Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared info
+ Setting the gfn to KVM_XEN_INVALID_GFN will disable the shared_info
page.

+KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA
+ If the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA flag is also set in the
+ Xen capabilities, then this attribute may be used to set the
+ userspace address at which the shared_info page resides, which
+ will always be fixed in the VMM regardless of where it is mapped
+ in guest physical address space. This attribute should be used in
+ preference to KVM_XEN_ATTR_TYPE_SHARED_INFO as it avoids
+ unnecessary invalidation of an internal cache when the page is
+ re-mapped in guest physcial address space.
+
+ Setting the hva to zero will disable the shared_info page.
+
KVM_XEN_ATTR_TYPE_UPCALL_VECTOR
Sets the exception vector used to deliver Xen event channel upcalls.
This is the HVM-wide vector injected directly by the hypervisor
diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 52edf676c471..6fb268e424fa 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -617,7 +617,6 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
{
int r = -ENOENT;

-
switch (data->type) {
case KVM_XEN_ATTR_TYPE_LONG_MODE:
if (!IS_ENABLED(CONFIG_64BIT) && data->u.long_mode) {
@@ -638,20 +637,37 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
}
break;

- case KVM_XEN_ATTR_TYPE_SHARED_INFO: {
+ case KVM_XEN_ATTR_TYPE_SHARED_INFO:
+ case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA: {
int idx;

mutex_lock(&kvm->arch.xen.xen_lock);

idx = srcu_read_lock(&kvm->srcu);

- if (data->u.shared_info.gfn == KVM_XEN_INVALID_GFN) {
- kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
- r = 0;
+ if (data->type == KVM_XEN_ATTR_TYPE_SHARED_INFO) {
+ gfn_t gfn = data->u.shared_info.gfn;
+
+ if (gfn == KVM_XEN_INVALID_GFN) {
+ kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+ r = 0;
+ } else {
+ r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
+ gfn_to_gpa(gfn), PAGE_SIZE);
+ }
} else {
- r = kvm_gpc_activate(&kvm->arch.xen.shinfo_cache,
- gfn_to_gpa(data->u.shared_info.gfn),
- PAGE_SIZE);
+ unsigned long hva = data->u.shared_info.hva;
+
+ if (!PAGE_ALIGNED(hva) ||
+ !access_ok((void __user *)hva, PAGE_SIZE)) {
+ r = -EINVAL;
+ } else if (!hva) {
+ kvm_gpc_deactivate(&kvm->arch.xen.shinfo_cache);
+ r = 0;
+ } else {
+ r = kvm_gpc_activate_hva(&kvm->arch.xen.shinfo_cache,
+ hva, PAGE_SIZE);
+ }
}

srcu_read_unlock(&kvm->srcu, idx);
@@ -715,13 +731,23 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
break;

case KVM_XEN_ATTR_TYPE_SHARED_INFO:
- if (kvm->arch.xen.shinfo_cache.active)
+ if (kvm->arch.xen.shinfo_cache.active &&
+ kvm->arch.xen.shinfo_cache.gpa != KVM_XEN_INVALID_GPA)
data->u.shared_info.gfn = gpa_to_gfn(kvm->arch.xen.shinfo_cache.gpa);
else
data->u.shared_info.gfn = KVM_XEN_INVALID_GFN;
r = 0;
break;

+ case KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA:
+ if (kvm->arch.xen.shinfo_cache.active &&
+ kvm->arch.xen.shinfo_cache.gpa == KVM_XEN_INVALID_GPA)
+ data->u.shared_info.hva = kvm->arch.xen.shinfo_cache.uhva;
+ else
+ data->u.shared_info.hva = 0;
+ r = 0;
+ break;
+
case KVM_XEN_ATTR_TYPE_UPCALL_VECTOR:
data->u.vector = kvm->arch.xen.upcall_vector;
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index c3308536482b..ac5caba313d1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1246,6 +1246,7 @@ struct kvm_x86_mce {
#define KVM_XEN_HVM_CONFIG_EVTCHN_SEND (1 << 5)
#define KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG (1 << 6)
#define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE (1 << 7)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA (1 << 8)

struct kvm_xen_hvm_config {
__u32 flags;
@@ -1744,9 +1745,10 @@ struct kvm_xen_hvm_attr {
__u8 long_mode;
__u8 vector;
__u8 runstate_update_flag;
- struct {
+ union {
__u64 gfn;
#define KVM_XEN_INVALID_GFN ((__u64)-1)
+ __u64 hva;
} shared_info;
struct {
__u32 send_port;
@@ -1788,6 +1790,8 @@ struct kvm_xen_hvm_attr {
#define KVM_XEN_ATTR_TYPE_XEN_VERSION 0x4
/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_RUNSTATE_UPDATE_FLAG */
#define KVM_XEN_ATTR_TYPE_RUNSTATE_UPDATE_FLAG 0x5
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA */
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO_HVA 0x6

/* Per-vCPU Xen attributes */
#define KVM_XEN_VCPU_GET_ATTR _IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
--
2.39.2


2024-02-15 16:14:27

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 11/21] KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set

From: Paul Durrant <[email protected]>

If the shared_info PFN cache has already been initialized then the content
of the shared_info page needs to be re-initialized whenever the guest
mode is (re)set.
Setting the guest mode is either done explicitly by the VMM via the
KVM_XEN_ATTR_TYPE_LONG_MODE attribute, or implicitly when the guest writes
the MSR to set up the hypercall page.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v12:
- Fix missing update of return value if mode is not actually changed.

v11:
- Drop the hunk removing the call to kvm_xen_shared_info_init() when
KVM_XEN_ATTR_TYPE_SHARED_INFO is set; it was a mistake and causes self-
test failures.

v10:
- New in this version.
---
arch/x86/kvm/xen.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 031e98d88ba2..52edf676c471 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -625,8 +625,16 @@ int kvm_xen_hvm_set_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
} else {
mutex_lock(&kvm->arch.xen.xen_lock);
kvm->arch.xen.long_mode = !!data->u.long_mode;
+
+ /*
+ * Re-initialize shared_info to put the wallclock in the
+ * correct place. Whilst it's not necessary to do this
+ * unless the mode is actually changed, it does no harm
+ * to make the call anyway.
+ */
+ r = kvm->arch.xen.shinfo_cache.active ?
+ kvm_xen_shared_info_init(kvm) : 0;
mutex_unlock(&kvm->arch.xen.xen_lock);
- r = 0;
}
break;

@@ -1101,9 +1109,24 @@ int kvm_xen_write_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
u32 page_num = data & ~PAGE_MASK;
u64 page_addr = data & PAGE_MASK;
bool lm = is_long_mode(vcpu);
+ int r = 0;
+
+ mutex_lock(&kvm->arch.xen.xen_lock);
+ if (kvm->arch.xen.long_mode != lm) {
+ kvm->arch.xen.long_mode = lm;
+
+ /*
+ * Re-initialize shared_info to put the wallclock in the
+ * correct place.
+ */
+ if (kvm->arch.xen.shinfo_cache.active &&
+ kvm_xen_shared_info_init(kvm))
+ r = 1;
+ }
+ mutex_unlock(&kvm->arch.xen.xen_lock);

- /* Latch long_mode for shared_info pages etc. */
- vcpu->kvm->arch.xen.long_mode = lm;
+ if (r)
+ return r;

/*
* If Xen hypercall intercept is enabled, fill the hypercall
--
2.39.2


2024-02-15 16:14:48

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 17/21] KVM: x86/xen: split up kvm_xen_set_evtchn_fast()

From: Paul Durrant <[email protected]>

The implementation of kvm_xen_set_evtchn_fast() is a rather lengthy piece
of code that performs two operations: updating of the shared_info
evtchn_pending mask, and updating of the vcpu_info evtchn_pending_sel
mask. Introduce a separate function to perform each of those operations and
re-work kvm_xen_set_evtchn_fast() to use them.

No functional change intended.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: [email protected]

v13:
- Patch title change.

v11:
- Fixed /64 vs /32 switcheroo and changed type of port_word_bit back to
int.

v10:
- Updated in this version. Dropped David'd R-b since the updates are
non-trivial.

v8:
- New in this version.
---
arch/x86/kvm/xen.c | 175 ++++++++++++++++++++++++++-------------------
1 file changed, 100 insertions(+), 75 deletions(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index 4e79cc68e0a9..59073642c078 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -1670,6 +1670,101 @@ static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port)
}
}

+static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
+ unsigned long *pending_bits, *mask_bits;
+ unsigned long flags;
+ int rc = -EWOULDBLOCK;
+
+ read_lock_irqsave(&gpc->lock, flags);
+ if (!kvm_gpc_check(gpc, PAGE_SIZE))
+ goto out;
+
+ if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
+ struct shared_info *shinfo = gpc->khva;
+
+ pending_bits = (unsigned long *)&shinfo->evtchn_pending;
+ mask_bits = (unsigned long *)&shinfo->evtchn_mask;
+ } else {
+ struct compat_shared_info *shinfo = gpc->khva;
+
+ pending_bits = (unsigned long *)&shinfo->evtchn_pending;
+ mask_bits = (unsigned long *)&shinfo->evtchn_mask;
+ }
+
+ if (test_and_set_bit(port, pending_bits)) {
+ rc = 0; /* It was already raised */
+ } else if (test_bit(port, mask_bits)) {
+ rc = -ENOTCONN; /* It is masked */
+ kvm_xen_check_poller(vcpu, port);
+ } else {
+ rc = 1; /* It is newly raised */
+ }
+
+ out:
+ read_unlock_irqrestore(&gpc->lock, flags);
+ return rc;
+}
+
+static bool set_vcpu_info_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct gfn_to_pfn_cache *gpc = &vcpu->arch.xen.vcpu_info_cache;
+ unsigned long flags;
+ bool kick_vcpu = false;
+
+ read_lock_irqsave(&gpc->lock, flags);
+
+ /*
+ * Try to deliver the event directly to the vcpu_info. If successful and
+ * the guest is using upcall_vector delivery, send the MSI.
+ * If the pfncache is invalid, set the shadow. In this case, or if the
+ * guest is using another form of event delivery, the vCPU must be
+ * kicked to complete the delivery.
+ */
+ if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
+ struct vcpu_info *vcpu_info = gpc->khva;
+ int port_word_bit = port / 64;
+
+ if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+ if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
+ kick_vcpu = true;
+ goto out;
+ }
+
+ if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
+ WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+ kick_vcpu = true;
+ }
+ } else {
+ struct compat_vcpu_info *vcpu_info = gpc->khva;
+ int port_word_bit = port / 32;
+
+ if (!kvm_gpc_check(gpc, sizeof(*vcpu_info))) {
+ if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
+ kick_vcpu = true;
+ goto out;
+ }
+
+ if (!test_and_set_bit(port_word_bit,
+ (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
+ WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
+ kick_vcpu = true;
+ }
+ }
+
+ if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
+ kvm_xen_inject_vcpu_vector(vcpu);
+ kick_vcpu = false;
+ }
+
+ out:
+ read_unlock_irqrestore(&gpc->lock, flags);
+ return kick_vcpu;
+}
+
/*
* The return value from this function is propagated to kvm_set_irq() API,
* so it returns:
@@ -1678,15 +1773,12 @@ static void kvm_xen_check_poller(struct kvm_vcpu *vcpu, int port)
* > 0 Number of CPUs interrupt was delivered to
*
* It is also called directly from kvm_arch_set_irq_inatomic(), where the
- * only check on its return value is a comparison with -EWOULDBLOCK'.
+ * only check on its return value is a comparison with -EWOULDBLOCK
+ * (which may be returned by set_shinfo_evtchn_pending()).
*/
int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
{
- struct gfn_to_pfn_cache *gpc = &kvm->arch.xen.shinfo_cache;
struct kvm_vcpu *vcpu;
- unsigned long *pending_bits, *mask_bits;
- unsigned long flags;
- int port_word_bit;
bool kick_vcpu = false;
int vcpu_idx, idx, rc;

@@ -1706,79 +1798,12 @@ int kvm_xen_set_evtchn_fast(struct kvm_xen_evtchn *xe, struct kvm *kvm)
if (xe->port >= max_evtchn_port(kvm))
return -EINVAL;

- rc = -EWOULDBLOCK;
-
idx = srcu_read_lock(&kvm->srcu);

- read_lock_irqsave(&gpc->lock, flags);
- if (!kvm_gpc_check(gpc, PAGE_SIZE))
- goto out_rcu;
-
- if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
- struct shared_info *shinfo = gpc->khva;
- pending_bits = (unsigned long *)&shinfo->evtchn_pending;
- mask_bits = (unsigned long *)&shinfo->evtchn_mask;
- port_word_bit = xe->port / 64;
- } else {
- struct compat_shared_info *shinfo = gpc->khva;
- pending_bits = (unsigned long *)&shinfo->evtchn_pending;
- mask_bits = (unsigned long *)&shinfo->evtchn_mask;
- port_word_bit = xe->port / 32;
- }
+ rc = set_shinfo_evtchn_pending(vcpu, xe->port);
+ if (rc == 1) /* Delivered to the bitmap in shared_info */
+ kick_vcpu = set_vcpu_info_evtchn_pending(vcpu, xe->port);

- /*
- * If this port wasn't already set, and if it isn't masked, then
- * we try to set the corresponding bit in the in-kernel shadow of
- * evtchn_pending_sel for the target vCPU. And if *that* wasn't
- * already set, then we kick the vCPU in question to write to the
- * *real* evtchn_pending_sel in its own guest vcpu_info struct.
- */
- if (test_and_set_bit(xe->port, pending_bits)) {
- rc = 0; /* It was already raised */
- } else if (test_bit(xe->port, mask_bits)) {
- rc = -ENOTCONN; /* Masked */
- kvm_xen_check_poller(vcpu, xe->port);
- } else {
- rc = 1; /* Delivered to the bitmap in shared_info. */
- /* Now switch to the vCPU's vcpu_info to set the index and pending_sel */
- read_unlock_irqrestore(&gpc->lock, flags);
- gpc = &vcpu->arch.xen.vcpu_info_cache;
-
- read_lock_irqsave(&gpc->lock, flags);
- if (!kvm_gpc_check(gpc, sizeof(struct vcpu_info))) {
- /*
- * Could not access the vcpu_info. Set the bit in-kernel
- * and prod the vCPU to deliver it for itself.
- */
- if (!test_and_set_bit(port_word_bit, &vcpu->arch.xen.evtchn_pending_sel))
- kick_vcpu = true;
- goto out_rcu;
- }
-
- if (IS_ENABLED(CONFIG_64BIT) && kvm->arch.xen.long_mode) {
- struct vcpu_info *vcpu_info = gpc->khva;
- if (!test_and_set_bit(port_word_bit, &vcpu_info->evtchn_pending_sel)) {
- WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
- kick_vcpu = true;
- }
- } else {
- struct compat_vcpu_info *vcpu_info = gpc->khva;
- if (!test_and_set_bit(port_word_bit,
- (unsigned long *)&vcpu_info->evtchn_pending_sel)) {
- WRITE_ONCE(vcpu_info->evtchn_upcall_pending, 1);
- kick_vcpu = true;
- }
- }
-
- /* For the per-vCPU lapic vector, deliver it as MSI. */
- if (kick_vcpu && vcpu->arch.xen.upcall_vector) {
- kvm_xen_inject_vcpu_vector(vcpu);
- kick_vcpu = false;
- }
- }
-
- out_rcu:
- read_unlock_irqrestore(&gpc->lock, flags);
srcu_read_unlock(&kvm->srcu, idx);

if (kick_vcpu) {
--
2.39.2


2024-02-15 16:15:53

by Paul Durrant

[permalink] [raw]
Subject: [PATCH v13 15/21] KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA

From: Paul Durrant <[email protected]>

If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present
then re-map vcpu_info using the HVA part way through the tests to make sure
then there is no functional change.

Signed-off-by: Paul Durrant <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
---
Cc: Sean Christopherson <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: David Woodhouse <[email protected]>

v13:
- Patch title change.

v5:
- New in this version.
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
index a61500ff0822..d2ea0435f4f7 100644
--- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
+++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c
@@ -62,6 +62,7 @@ enum {
TEST_POLL_TIMEOUT,
TEST_POLL_MASKED,
TEST_POLL_WAKE,
+ SET_VCPU_INFO,
TEST_TIMER_PAST,
TEST_LOCKING_SEND_RACE,
TEST_LOCKING_POLL_RACE,
@@ -321,6 +322,10 @@ static void guest_code(void)

GUEST_SYNC(TEST_POLL_WAKE);

+ /* Set the vcpu_info to point at exactly the place it already is to
+ * make sure the attribute is functional. */
+ GUEST_SYNC(SET_VCPU_INFO);
+
/* A timer wake an *unmasked* port which should wake us with an
* actual interrupt, while we're polling on a different port. */
ports[0]++;
@@ -888,6 +893,16 @@ int main(int argc, char *argv[])
alarm(1);
break;

+ case SET_VCPU_INFO:
+ if (has_shinfo_hva) {
+ struct kvm_xen_vcpu_attr vih = {
+ .type = KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO_HVA,
+ .u.hva = (unsigned long)vinfo
+ };
+ vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &vih);
+ }
+ break;
+
case TEST_TIMER_PAST:
TEST_ASSERT(!evtchn_irq_expected,
"Expected event channel IRQ but it didn't happen");
--
2.39.2


2024-02-16 13:06:23

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v13 21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

On Thu, 2024-02-15 at 15:29 +0000, Paul Durrant wrote:
> From: David Woodhouse <[email protected]>
>
> This function can race with kvm_gpc_deactivate(), which does not take
> the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
> and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
> temporarily dropped its write lock on gpc->lock.

Let's drop this from your series for now, as it's contentious.

Sean didn't like calling it a 'fix', which I had conceded and reworked
the commit message. It was on the list somewhere, and also in
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/f19755000a7

I *also* think we should do this simpler one:
https://git.infradead.org/users/dwmw2/linux.git/commitdiff/cc69506d19a
.. which almost makes the first one unnecessary, but I think we should
do it *anyway* because the rwlock abuse it fixes is kind of awful.

And while we still can't actually *identify* the race condition that
led to a dereference of a NULL gpc->khva while holding the read lock
and gpc->valid and gpc->active both being true... I'll eat my hat if
cleaning up and simplifying the locking (and making it self-contained)
*doesn't* fix it.

But either way, it isn't really part of your series. The only reason it
was tacked on the end was because it would have merge conflicts with
your series, which had been outstanding for months already.

So drop this one, and I'll work this bit out with Sean afterwards.


Attachments:
smime.p7s (5.83 kB)

2024-02-16 14:05:27

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

On 16/02/2024 13:04, David Woodhouse wrote:
> On Thu, 2024-02-15 at 15:29 +0000, Paul Durrant wrote:
>> From: David Woodhouse <[email protected]>
>>
>> This function can race with kvm_gpc_deactivate(), which does not take
>> the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
>> and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
>> temporarily dropped its write lock on gpc->lock.
>
> Let's drop this from your series for now, as it's contentious.
>
> Sean didn't like calling it a 'fix', which I had conceded and reworked
> the commit message. It was on the list somewhere, and also in
> https://git.infradead.org/users/dwmw2/linux.git/commitdiff/f19755000a7
>
> I *also* think we should do this simpler one:
> https://git.infradead.org/users/dwmw2/linux.git/commitdiff/cc69506d19a
> ... which almost makes the first one unnecessary, but I think we should
> do it *anyway* because the rwlock abuse it fixes is kind of awful.
>
> And while we still can't actually *identify* the race condition that
> led to a dereference of a NULL gpc->khva while holding the read lock
> and gpc->valid and gpc->active both being true... I'll eat my hat if
> cleaning up and simplifying the locking (and making it self-contained)
> *doesn't* fix it.
>
> But either way, it isn't really part of your series. The only reason it
> was tacked on the end was because it would have merge conflicts with
> your series, which had been outstanding for months already.
>
> So drop this one, and I'll work this bit out with Sean afterwards.

Ok. Sean, I assume that since this is the last patch in the series it's
superfluous for me to post a v14 just for this?

2024-02-16 15:53:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

On Fri, Feb 16, 2024, Paul Durrant wrote:
> On 16/02/2024 13:04, David Woodhouse wrote:
> > On Thu, 2024-02-15 at 15:29 +0000, Paul Durrant wrote:
> > > From: David Woodhouse <[email protected]>
> > >
> > > This function can race with kvm_gpc_deactivate(), which does not take
> > > the ->refresh_lock. This means kvm_gpc_deactivate() can wipe the ->pfn
> > > and ->khva fields, and unmap the latter, while hva_to_pfn_retry() has
> > > temporarily dropped its write lock on gpc->lock.
> >
> > Let's drop this from your series for now, as it's contentious.
> >
> > Sean didn't like calling it a 'fix', which I had conceded and reworked
> > the commit message. It was on the list somewhere, and also in
> > https://git.infradead.org/users/dwmw2/linux.git/commitdiff/f19755000a7
> >
> > I *also* think we should do this simpler one:
> > https://git.infradead.org/users/dwmw2/linux.git/commitdiff/cc69506d19a
> > ... which almost makes the first one unnecessary, but I think we should
> > do it *anyway* because the rwlock abuse it fixes is kind of awful.
> >
> > And while we still can't actually *identify* the race condition that
> > led to a dereference of a NULL gpc->khva while holding the read lock
> > and gpc->valid and gpc->active both being true... I'll eat my hat if
> > cleaning up and simplifying the locking (and making it self-contained)
> > *doesn't* fix it.

Heh, I'm not taking that bet.

> > But either way, it isn't really part of your series. The only reason it
> > was tacked on the end was because it would have merge conflicts with
> > your series, which had been outstanding for months already.
> >
> > So drop this one, and I'll work this bit out with Sean afterwards.

FWIW, I'm not opposed to overhauling the gpc locking, I agree it's a mess. I just
want to proceed slower than I would for a fix, it's a lot to digest.

> Ok. Sean, I assume that since this is the last patch in the series it's
> superfluous for me to post a v14 just for this?

Correct, definitely no need for a new version.

2024-02-17 10:53:26

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v13 21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues

On Fri, 2024-02-16 at 07:52 -0800, Sean Christopherson wrote:
>
> FWIW, I'm not opposed to overhauling the gpc locking, I agree it's a mess  I just
> want to proceed slower than I would for a fix, it's a lot to digest.

Agreed. I'll do a *really* simple expansion of ->refresh_lock for the fix, and
then cleaning up the rwlock abuse can come later (at the very end of
the series I'm about to post, so it can be ignored for now).


Attachments:
smime.p7s (5.83 kB)

2024-02-19 21:53:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 12/21] KVM: x86/xen: allow shared_info to be mapped by fixed HVA

On Thu, Feb 15, 2024, Paul Durrant wrote:
> @@ -715,13 +731,23 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
> break;
>
> case KVM_XEN_ATTR_TYPE_SHARED_INFO:
> - if (kvm->arch.xen.shinfo_cache.active)
> + if (kvm->arch.xen.shinfo_cache.active &&
> + kvm->arch.xen.shinfo_cache.gpa != KVM_XEN_INVALID_GPA)

This should really use INVALID_GPA when checking internal gpc state. Mostly to
help clarify what is/isn't KVM Xen ABI, but also because I don't like the subtle
assumption that KVM_XEN_INVALID_GPA == INVALID_GPA.

Even better, if we slot in two helpers when the HVA-based GPC support is added,
then the Xen code doesn't need to to make assumptions about how the GPC code
manages HVA vs. GPA internally. E.g. if we ever refactor the code again and use
a dedicated flag instead of gpc->gpa as the implicit flag.

static inline bool kvm_gpc_is_gpa_active(struct gfn_to_pfn_cache *gpc)
{
return gpc->active && !kvm_is_error_gpa(gpc->gpa);
}

static inline bool kvm_gpc_is_hva_active(struct gfn_to_pfn_cache *gpc)
{
return gpc->active && kvm_is_error_gpa(gpc->gpa);
}

2024-02-19 22:04:14

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 18/21] KVM: x86/xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()

On Thu, Feb 15, 2024, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
> kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
> There is only actually blocking with PREEMPT_RT because the locks will
> turned into mutexes. There is no 'raw' version of rwlock_t that can be used
> to avoid that, so use read_trylock() and treat failure to lock the same as
> an invalid cache.
>
> [1] https://lore.kernel.org/lkml/[email protected]/T/#mbd06e5a04534ce9c0ee94bd8f1e8d942b2d45bd6
>
> Signed-off-by: Paul Durrant <[email protected]>
> Reviewed-by: David Woodhouse <[email protected]>
> ---
> Cc: Sean Christopherson <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: [email protected]
>
> v13:
> - Patch title change.
>
> v11:
> - Amended the commit comment.
>
> v10:
> - New in this version.
> ---
> arch/x86/kvm/xen.c | 30 ++++++++++++++++++++----------
> 1 file changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
> index 59073642c078..8650141b266e 100644
> --- a/arch/x86/kvm/xen.c
> +++ b/arch/x86/kvm/xen.c
> @@ -1678,10 +1678,13 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
> unsigned long flags;
> int rc = -EWOULDBLOCK;
>
> - read_lock_irqsave(&gpc->lock, flags);
> - if (!kvm_gpc_check(gpc, PAGE_SIZE))
> + local_irq_save(flags);
> + if (!read_trylock(&gpc->lock))
> goto out;

I am not comfortable applying this patch. As shown by the need for the next patch
to optimize unrelated invalidations, switching to read_trylock() is more subtle
than it seems at first glance. Specifically, there are no fairness guarantees.

I am not dead set against this change, but I don't want to put my SoB on what I
consider to be a hack.

I've zero objections if you can convince Paolo to take this directly, i.e. this
isn't a NAK. I just don't want to take it through my tree.

2024-02-19 22:07:09

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On Thu, Feb 15, 2024, Paul Durrant wrote:
> David Woodhouse (1):
> KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
>
> Paul Durrant (19):
> KVM: pfncache: Add a map helper function
> KVM: pfncache: remove unnecessary exports
> KVM: x86/xen: mark guest pages dirty with the pfncache lock held
> KVM: pfncache: add a mark-dirty helper
> KVM: pfncache: remove KVM_GUEST_USES_PFN usage
> KVM: pfncache: stop open-coding offset_in_page()
> KVM: pfncache: include page offset in uhva and use it consistently
> KVM: pfncache: allow a cache to be activated with a fixed (userspace)
> HVA
> KVM: x86/xen: separate initialization of shared_info cache and content
> KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is
> set
> KVM: x86/xen: allow shared_info to be mapped by fixed HVA
> KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
> KVM: selftests: map Xen's shared_info page using HVA rather than GFN
> KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
> KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA
> capability
> KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
> KVM: x86/xen: don't block on pfncache locks in
> kvm_xen_set_evtchn_fast()
> KVM: pfncache: check the need for invalidation under read lock first
> KVM: x86/xen: allow vcpu_info content to be 'safely' copied
>
> Sean Christopherson (1):
> KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>
> Documentation/virt/kvm/api.rst | 53 ++-
> arch/s390/kvm/diag.c | 2 +-
> arch/s390/kvm/gaccess.c | 14 +-
> arch/s390/kvm/kvm-s390.c | 4 +-
> arch/s390/kvm/priv.c | 4 +-
> arch/s390/kvm/sigp.c | 2 +-
> arch/x86/kvm/x86.c | 7 +-
> arch/x86/kvm/xen.c | 361 +++++++++++------
> include/linux/kvm_host.h | 49 ++-
> include/linux/kvm_types.h | 8 -
> include/uapi/linux/kvm.h | 9 +-
> .../selftests/kvm/x86_64/xen_shinfo_test.c | 59 ++-
> virt/kvm/pfncache.c | 382 ++++++++++--------
> 13 files changed, 591 insertions(+), 363 deletions(-)

Except for the read_trylock() patch, just a few nits that I can fixup when
applying, though I'll defeinitely want your eyeballs on the end result as they
tweaks aren't _that_ trivial.

Running tests now, if all goes well I'll push to kvm-x86 within the hour.

2024-02-20 09:05:50

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 12/21] KVM: x86/xen: allow shared_info to be mapped by fixed HVA

On 19/02/2024 21:53, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, Paul Durrant wrote:
>> @@ -715,13 +731,23 @@ int kvm_xen_hvm_get_attr(struct kvm *kvm, struct kvm_xen_hvm_attr *data)
>> break;
>>
>> case KVM_XEN_ATTR_TYPE_SHARED_INFO:
>> - if (kvm->arch.xen.shinfo_cache.active)
>> + if (kvm->arch.xen.shinfo_cache.active &&
>> + kvm->arch.xen.shinfo_cache.gpa != KVM_XEN_INVALID_GPA)
>
> This should really use INVALID_GPA when checking internal gpc state. Mostly to
> help clarify what is/isn't KVM Xen ABI, but also because I don't like the subtle
> assumption that KVM_XEN_INVALID_GPA == INVALID_GPA.
>

Sorry, yes, that was miss on my part.

> Even better, if we slot in two helpers when the HVA-based GPC support is added,
> then the Xen code doesn't need to to make assumptions about how the GPC code
> manages HVA vs. GPA internally. E.g. if we ever refactor the code again and use
> a dedicated flag instead of gpc->gpa as the implicit flag.
>
> static inline bool kvm_gpc_is_gpa_active(struct gfn_to_pfn_cache *gpc)
> {
> return gpc->active && !kvm_is_error_gpa(gpc->gpa);
> }
>
> static inline bool kvm_gpc_is_hva_active(struct gfn_to_pfn_cache *gpc)
> {
> return gpc->active && kvm_is_error_gpa(gpc->gpa);
> }

Sure, sounds good to me. I'll add those.

2024-02-20 09:16:24

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 18/21] KVM: x86/xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()

On 19/02/2024 22:04, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, Paul Durrant wrote:
>> From: Paul Durrant <[email protected]>
>>
>> As described in [1] compiling with CONFIG_PROVE_RAW_LOCK_NESTING shows that
>> kvm_xen_set_evtchn_fast() is blocking on pfncache locks in IRQ context.
>> There is only actually blocking with PREEMPT_RT because the locks will
>> turned into mutexes. There is no 'raw' version of rwlock_t that can be used
>> to avoid that, so use read_trylock() and treat failure to lock the same as
>> an invalid cache.
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/T/#mbd06e5a04534ce9c0ee94bd8f1e8d942b2d45bd6
>>
>> Signed-off-by: Paul Durrant <[email protected]>
>> Reviewed-by: David Woodhouse <[email protected]>
>> ---
>> Cc: Sean Christopherson <[email protected]>
>> Cc: Paolo Bonzini <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: David Woodhouse <[email protected]>
>> Cc: [email protected]
>>
>> v13:
>> - Patch title change.
>>
>> v11:
>> - Amended the commit comment.
>>
>> v10:
>> - New in this version.
>> ---
>> arch/x86/kvm/xen.c | 30 ++++++++++++++++++++----------
>> 1 file changed, 20 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
>> index 59073642c078..8650141b266e 100644
>> --- a/arch/x86/kvm/xen.c
>> +++ b/arch/x86/kvm/xen.c
>> @@ -1678,10 +1678,13 @@ static int set_shinfo_evtchn_pending(struct kvm_vcpu *vcpu, u32 port)
>> unsigned long flags;
>> int rc = -EWOULDBLOCK;
>>
>> - read_lock_irqsave(&gpc->lock, flags);
>> - if (!kvm_gpc_check(gpc, PAGE_SIZE))
>> + local_irq_save(flags);
>> + if (!read_trylock(&gpc->lock))
>> goto out;
>
> I am not comfortable applying this patch. As shown by the need for the next patch
> to optimize unrelated invalidations, switching to read_trylock() is more subtle
> than it seems at first glance. Specifically, there are no fairness guarantees.
>
> I am not dead set against this change, but I don't want to put my SoB on what I
> consider to be a hack.
>
> I've zero objections if you can convince Paolo to take this directly, i.e. this
> isn't a NAK. I just don't want to take it through my tree.

Ok. I'll drop this from v14 then. It can go separately, assuming there
is no move to add the raw lock which would negate it.

2024-02-20 09:17:25

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On 19/02/2024 22:06, Sean Christopherson wrote:
> On Thu, Feb 15, 2024, Paul Durrant wrote:
>> David Woodhouse (1):
>> KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
>>
>> Paul Durrant (19):
>> KVM: pfncache: Add a map helper function
>> KVM: pfncache: remove unnecessary exports
>> KVM: x86/xen: mark guest pages dirty with the pfncache lock held
>> KVM: pfncache: add a mark-dirty helper
>> KVM: pfncache: remove KVM_GUEST_USES_PFN usage
>> KVM: pfncache: stop open-coding offset_in_page()
>> KVM: pfncache: include page offset in uhva and use it consistently
>> KVM: pfncache: allow a cache to be activated with a fixed (userspace)
>> HVA
>> KVM: x86/xen: separate initialization of shared_info cache and content
>> KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is
>> set
>> KVM: x86/xen: allow shared_info to be mapped by fixed HVA
>> KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
>> KVM: selftests: map Xen's shared_info page using HVA rather than GFN
>> KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
>> KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA
>> capability
>> KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
>> KVM: x86/xen: don't block on pfncache locks in
>> kvm_xen_set_evtchn_fast()
>> KVM: pfncache: check the need for invalidation under read lock first
>> KVM: x86/xen: allow vcpu_info content to be 'safely' copied
>>
>> Sean Christopherson (1):
>> KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>>
>> Documentation/virt/kvm/api.rst | 53 ++-
>> arch/s390/kvm/diag.c | 2 +-
>> arch/s390/kvm/gaccess.c | 14 +-
>> arch/s390/kvm/kvm-s390.c | 4 +-
>> arch/s390/kvm/priv.c | 4 +-
>> arch/s390/kvm/sigp.c | 2 +-
>> arch/x86/kvm/x86.c | 7 +-
>> arch/x86/kvm/xen.c | 361 +++++++++++------
>> include/linux/kvm_host.h | 49 ++-
>> include/linux/kvm_types.h | 8 -
>> include/uapi/linux/kvm.h | 9 +-
>> .../selftests/kvm/x86_64/xen_shinfo_test.c | 59 ++-
>> virt/kvm/pfncache.c | 382 ++++++++++--------
>> 13 files changed, 591 insertions(+), 363 deletions(-)
>
> Except for the read_trylock() patch, just a few nits that I can fixup when
> applying, though I'll defeinitely want your eyeballs on the end result as they
> tweaks aren't _that_ trivial.
>
> Running tests now, if all goes well I'll push to kvm-x86 within the hour.

Oh, I read this last and you already made the changes :-) I'll check
kvm-x86. Thanks.


2024-02-20 10:53:44

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On 20/02/2024 09:14, Paul Durrant wrote:
> On 19/02/2024 22:06, Sean Christopherson wrote:
>> On Thu, Feb 15, 2024, Paul Durrant wrote:
>>> David Woodhouse (1):
>>>    KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
>>>
>>> Paul Durrant (19):
>>>    KVM: pfncache: Add a map helper function
>>>    KVM: pfncache: remove unnecessary exports
>>>    KVM: x86/xen: mark guest pages dirty with the pfncache lock held
>>>    KVM: pfncache: add a mark-dirty helper
>>>    KVM: pfncache: remove KVM_GUEST_USES_PFN usage
>>>    KVM: pfncache: stop open-coding offset_in_page()
>>>    KVM: pfncache: include page offset in uhva and use it consistently
>>>    KVM: pfncache: allow a cache to be activated with a fixed (userspace)
>>>      HVA
>>>    KVM: x86/xen: separate initialization of shared_info cache and
>>> content
>>>    KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is
>>>      set
>>>    KVM: x86/xen: allow shared_info to be mapped by fixed HVA
>>>    KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
>>>    KVM: selftests: map Xen's shared_info page using HVA rather than GFN
>>>    KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
>>>    KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA
>>>      capability
>>>    KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
>>>    KVM: x86/xen: don't block on pfncache locks in
>>>      kvm_xen_set_evtchn_fast()
>>>    KVM: pfncache: check the need for invalidation under read lock first
>>>    KVM: x86/xen: allow vcpu_info content to be 'safely' copied
>>>
>>> Sean Christopherson (1):
>>>    KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>>>
>>>   Documentation/virt/kvm/api.rst                |  53 ++-
>>>   arch/s390/kvm/diag.c                          |   2 +-
>>>   arch/s390/kvm/gaccess.c                       |  14 +-
>>>   arch/s390/kvm/kvm-s390.c                      |   4 +-
>>>   arch/s390/kvm/priv.c                          |   4 +-
>>>   arch/s390/kvm/sigp.c                          |   2 +-
>>>   arch/x86/kvm/x86.c                            |   7 +-
>>>   arch/x86/kvm/xen.c                            | 361 +++++++++++------
>>>   include/linux/kvm_host.h                      |  49 ++-
>>>   include/linux/kvm_types.h                     |   8 -
>>>   include/uapi/linux/kvm.h                      |   9 +-
>>>   .../selftests/kvm/x86_64/xen_shinfo_test.c    |  59 ++-
>>>   virt/kvm/pfncache.c                           | 382 ++++++++++--------
>>>   13 files changed, 591 insertions(+), 363 deletions(-)
>>
>> Except for the read_trylock() patch, just a few nits that I can fixup
>> when
>> applying, though I'll defeinitely want your eyeballs on the end result
>> as they
>> tweaks aren't _that_ trivial.
>>
>> Running tests now, if all goes well I'll push to kvm-x86 within the hour.
>
> Oh, I read this last and you already made the changes :-) I'll check
> kvm-x86. Thanks.
>

I checked the patches you amended. All LGTM and my tests are fine too.

2024-02-20 15:58:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On Thu, 15 Feb 2024 15:28:55 +0000, Paul Durrant wrote:
> From: Paul Durrant <[email protected]>
>
> This series contains a new patch from Sean added since v12 [1]:
>
> * KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>
> This frees up the function name kvm_is_error_gpa() such that it can then be
> re-defined in:
>
> [...]

*sigh*

I forgot to hit "send" on this yesterday. But lucky for me, that worked out in
my favor as I needed to rebase on top of kvm/kvm-uapi to avoid pointless conflicts
in the uapi headeres.

So....

Applied to kvm-x86 xen, minus 18 and 19 (trylock stuff) and 21 (locking cleanup
that we're doing elsewhere).

Paul and David, please take (another) look at the end result to make sure you don't
object to any of my tweaks and that I didn't botch anything.

s390 folks, I'm applying/pushing now to get it into -next asap, but I'll make
sure to get acks/reviews on patch 08/21 before I do anything else with this
branch/series.

Thanks!

[01/21] KVM: pfncache: Add a map helper function
https://github.com/kvm-x86/linux/commit/f39b80e3ff12
[02/21] KVM: pfncache: remove unnecessary exports
https://github.com/kvm-x86/linux/commit/41496fffc0e1
[03/21] KVM: x86/xen: mark guest pages dirty with the pfncache lock held
https://github.com/kvm-x86/linux/commit/4438355ec6e1
[04/21] KVM: pfncache: add a mark-dirty helper
https://github.com/kvm-x86/linux/commit/78b74638eb6d
[05/21] KVM: pfncache: remove KVM_GUEST_USES_PFN usage
https://github.com/kvm-x86/linux/commit/a4bff3df5147
[06/21] KVM: pfncache: stop open-coding offset_in_page()
https://github.com/kvm-x86/linux/commit/53e63e953e14
[07/21] KVM: pfncache: include page offset in uhva and use it consistently
https://github.com/kvm-x86/linux/commit/406c10962a4c
[08/21] KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
https://github.com/kvm-x86/linux/commit/9e7325acb3dc
[09/21] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
https://github.com/kvm-x86/linux/commit/721f5b0dda78
[10/21] KVM: x86/xen: separate initialization of shared_info cache and content
https://github.com/kvm-x86/linux/commit/c01c55a34f28
[11/21] KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set
https://github.com/kvm-x86/linux/commit/21b99e4d6db6
[12/21] KVM: x86/xen: allow shared_info to be mapped by fixed HVA
https://github.com/kvm-x86/linux/commit/10dcbfc46724
[13/21] KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
https://github.com/kvm-x86/linux/commit/16877dd45f98
[14/21] KVM: selftests: map Xen's shared_info page using HVA rather than GFN
https://github.com/kvm-x86/linux/commit/95c27ed8619b
[15/21] KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
https://github.com/kvm-x86/linux/commit/5359bf19a3f0
[16/21] KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
https://github.com/kvm-x86/linux/commit/49668ce7e1ae
[17/21] KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
(not applied)
[18/21] KVM: x86/xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
(not applied)
[19/21] KVM: pfncache: check the need for invalidation under read lock first
https://github.com/kvm-x86/linux/commit/21dadfcd665e
[20/21] KVM: x86/xen: allow vcpu_info content to be 'safely' copied
https://github.com/kvm-x86/linux/commit/dadeabc3b6fa
[21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
(not applied)

--
https://github.com/kvm-x86/linux/tree/next

2024-02-20 16:06:44

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On 20/02/2024 15:55, Sean Christopherson wrote:
> On Thu, 15 Feb 2024 15:28:55 +0000, Paul Durrant wrote:
>> From: Paul Durrant <[email protected]>
>>
>> This series contains a new patch from Sean added since v12 [1]:
>>
>> * KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>>
>> This frees up the function name kvm_is_error_gpa() such that it can then be
>> re-defined in:
>>
>> [...]
>
> *sigh*
>
> I forgot to hit "send" on this yesterday. But lucky for me, that worked out in
> my favor as I needed to rebase on top of kvm/kvm-uapi to avoid pointless conflicts
> in the uapi headeres.
>
> So....
>
> Applied to kvm-x86 xen, minus 18 and 19 (trylock stuff) and 21 (locking cleanup
> that we're doing elsewhere).
>

Looks like you meant 17 & 18?

> Paul and David, please take (another) look at the end result to make sure you don't
> object to any of my tweaks and that I didn't botch anything.
>

What was the issue with 17? It was reasonable clean-up and I'd like to
keep it even without 18 being applied (and I totally understand your
reasons for that).

> s390 folks, I'm applying/pushing now to get it into -next asap, but I'll make
> sure to get acks/reviews on patch 08/21 before I do anything else with this
> branch/series.
>
> Thanks!
>
> [01/21] KVM: pfncache: Add a map helper function
> https://github.com/kvm-x86/linux/commit/f39b80e3ff12
> [02/21] KVM: pfncache: remove unnecessary exports
> https://github.com/kvm-x86/linux/commit/41496fffc0e1
> [03/21] KVM: x86/xen: mark guest pages dirty with the pfncache lock held
> https://github.com/kvm-x86/linux/commit/4438355ec6e1
> [04/21] KVM: pfncache: add a mark-dirty helper
> https://github.com/kvm-x86/linux/commit/78b74638eb6d
> [05/21] KVM: pfncache: remove KVM_GUEST_USES_PFN usage
> https://github.com/kvm-x86/linux/commit/a4bff3df5147
> [06/21] KVM: pfncache: stop open-coding offset_in_page()
> https://github.com/kvm-x86/linux/commit/53e63e953e14
> [07/21] KVM: pfncache: include page offset in uhva and use it consistently
> https://github.com/kvm-x86/linux/commit/406c10962a4c
> [08/21] KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
> https://github.com/kvm-x86/linux/commit/9e7325acb3dc
> [09/21] KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
> https://github.com/kvm-x86/linux/commit/721f5b0dda78
> [10/21] KVM: x86/xen: separate initialization of shared_info cache and content
> https://github.com/kvm-x86/linux/commit/c01c55a34f28
> [11/21] KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set
> https://github.com/kvm-x86/linux/commit/21b99e4d6db6
> [12/21] KVM: x86/xen: allow shared_info to be mapped by fixed HVA
> https://github.com/kvm-x86/linux/commit/10dcbfc46724
> [13/21] KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
> https://github.com/kvm-x86/linux/commit/16877dd45f98
> [14/21] KVM: selftests: map Xen's shared_info page using HVA rather than GFN
> https://github.com/kvm-x86/linux/commit/95c27ed8619b
> [15/21] KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
> https://github.com/kvm-x86/linux/commit/5359bf19a3f0
> [16/21] KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
> https://github.com/kvm-x86/linux/commit/49668ce7e1ae
> [17/21] KVM: x86/xen: split up kvm_xen_set_evtchn_fast()
> (not applied)
> [18/21] KVM: x86/xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast()
> (not applied)
> [19/21] KVM: pfncache: check the need for invalidation under read lock first
> https://github.com/kvm-x86/linux/commit/21dadfcd665e
> [20/21] KVM: x86/xen: allow vcpu_info content to be 'safely' copied
> https://github.com/kvm-x86/linux/commit/dadeabc3b6fa
> [21/21] KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues
> (not applied)
>
> --
> https://github.com/kvm-x86/linux/tree/next


2024-02-20 16:16:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On Tue, Feb 20, 2024, Paul Durrant wrote:
> On 20/02/2024 15:55, Sean Christopherson wrote:
> > On Thu, 15 Feb 2024 15:28:55 +0000, Paul Durrant wrote:
> > > From: Paul Durrant <[email protected]>
> > >
> > > This series contains a new patch from Sean added since v12 [1]:
> > >
> > > * KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
> > >
> > > This frees up the function name kvm_is_error_gpa() such that it can then be
> > > re-defined in:
> > >
> > > [...]
> >
> > *sigh*
> >
> > I forgot to hit "send" on this yesterday. But lucky for me, that worked out in
> > my favor as I needed to rebase on top of kvm/kvm-uapi to avoid pointless conflicts
> > in the uapi headeres.
> >
> > So....
> >
> > Applied to kvm-x86 xen, minus 18 and 19 (trylock stuff) and 21 (locking cleanup
> > that we're doing elsewhere).
> >
>
> Looks like you meant 17 & 18?

Doh, yes.

> > Paul and David, please take (another) look at the end result to make sure you don't
> > object to any of my tweaks and that I didn't botch anything.
> >
>
> What was the issue with 17? It was reasonable clean-up and I'd like to keep
> it even without 18 being applied (and I totally understand your reasons for
> that).

I omitted it purely to avoid creating an unnecessary dependency for the trylock
patch. That way the trylock patch (or whatever it morphs into) can be applied on
any branch (along with the cleanup), i.e. doesn't need to be taken through kvm-x86/xen.

2024-02-20 17:07:46

by Paul Durrant

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On 20/02/2024 16:15, Sean Christopherson wrote:
> On Tue, Feb 20, 2024, Paul Durrant wrote:
>> On 20/02/2024 15:55, Sean Christopherson wrote:
>>> On Thu, 15 Feb 2024 15:28:55 +0000, Paul Durrant wrote:
>>>> From: Paul Durrant <[email protected]>
>>>>
>>>> This series contains a new patch from Sean added since v12 [1]:
>>>>
>>>> * KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>>>>
>>>> This frees up the function name kvm_is_error_gpa() such that it can then be
>>>> re-defined in:
>>>>
>>>> [...]
>>>
>>> *sigh*
>>>
>>> I forgot to hit "send" on this yesterday. But lucky for me, that worked out in
>>> my favor as I needed to rebase on top of kvm/kvm-uapi to avoid pointless conflicts
>>> in the uapi headeres.
>>>
>>> So....
>>>
>>> Applied to kvm-x86 xen, minus 18 and 19 (trylock stuff) and 21 (locking cleanup
>>> that we're doing elsewhere).
>>>
>>
>> Looks like you meant 17 & 18?
>
> Doh, yes.
>
>>> Paul and David, please take (another) look at the end result to make sure you don't
>>> object to any of my tweaks and that I didn't botch anything.
>>>
>>
>> What was the issue with 17? It was reasonable clean-up and I'd like to keep
>> it even without 18 being applied (and I totally understand your reasons for
>> that).
>
> I omitted it purely to avoid creating an unnecessary dependency for the trylock
> patch. That way the trylock patch (or whatever it morphs into) can be applied on
> any branch (along with the cleanup), i.e. doesn't need to be taken through kvm-x86/xen.

Ok, personally I don't see the dependency being an issue. I suspect it
will be a while before we decide what to do about the locking issue...
particularly since David is out this week, as he says.

2024-02-20 17:28:11

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v13 00/21] KVM: xen: update shared_info and vcpu_info handling

On 20 February 2024 17:15:06 CET, Sean Christopherson <[email protected]> wrote:
>On Tue, Feb 20, 2024, Paul Durrant wrote:
>> On 20/02/2024 15:55, Sean Christopherson wrote:
>> > On Thu, 15 Feb 2024 15:28:55 +0000, Paul Durrant wrote:
>> > > From: Paul Durrant <[email protected]>
>> > >
>> > > This series contains a new patch from Sean added since v12 [1]:
>> > >
>> > > * KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>> > >
>> > > This frees up the function name kvm_is_error_gpa() such that it can then be
>> > > re-defined in:
>> > >
>> > > [...]
>> >
>> > *sigh*
>> >
>> > I forgot to hit "send" on this yesterday. But lucky for me, that worked out in
>> > my favor as I needed to rebase on top of kvm/kvm-uapi to avoid pointless conflicts
>> > in the uapi headeres.
>> >
>> > So....
>> >
>> > Applied to kvm-x86 xen, minus 18 and 19 (trylock stuff) and 21 (locking cleanup
>> > that we're doing elsewhere).
>> >
>>
>> Looks like you meant 17 & 18?
>
>Doh, yes.
>
>> > Paul and David, please take (another) look at the end result to make sure you don't
>> > object to any of my tweaks and that I didn't botch anything.
>> >
>>
>> What was the issue with 17? It was reasonable clean-up and I'd like to keep
>> it even without 18 being applied (and I totally understand your reasons for
>> that).
>
>I omitted it purely to avoid creating an unnecessary dependency for the trylock
>patch. That way the trylock patch (or whatever it morphs into) can be applied on
>any branch (along with the cleanup), i.e. doesn't need to be taken through kvm-x86/xen.

What about if (in_atomic() && read_trylock()) return -EAGAIN; else read_lock();

That way we don't have any even theoretical fairness issues because the trylock can fail just *once* which kicks us to the slow path and that'll take the lock normally now.

The condition might not actually be in_atomic() but I'm not working this week and you get the idea.