2021-06-23 11:31:05

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 00/10] My AVIC patch queue

Hi!



This is a series of bugfixes to the AVIC dynamic inhibition, which was

made while trying to fix bugs as much as possible, in this area and trying

to make the AVIC+SYNIC conditional enablement work.



First 4 patches in this series fix a race condition which can cause

a lost write from a guest to APIC when the APIC write races

the AVIC un-inhibition.



Next four patches hopefully correctly address an issue of possible

mismatch between the AVIC inhibit state and AVIC enable state on all vCPUs.



Since AVICs state is changed via a request there is a window during which

the states differ which can lead to various warnings and errors.



There was an earlier attempt to fix this by changing the AVIC enable state

on the current vCPU immediately when the AVIC inhibit request is created,

however while this fixes the common case, it actually hides the issue deeper,

because on all other vCPUs but current one, the two states can still

mismatch till the KVM_REQ_APICV_UPDATE is processed on each of them.



My take on this is to fix the places where the mismatch causes the

issues instead and then drop the special case of toggling the AVIC right

away in kvm_request_apicv_update.



Patch 9 is an attempt to fix yet another issue I found in AVIC inhibit

code:

Currently avic_vcpu_load/avic_vcpu_put are called on userspace entry/exit

from KVM (aka kvm_vcpu_get/kvm_vcpu_put), and these functions update the

"is running" bit in the AVIC physical ID remap table and update the

target vCPU in iommu code.



However both of these functions don't do anything when AVIC is inhibited

thus the "is running" bit will be kept enabled during exit to userspace.

This shouldn't be a big issue as the caller

doesn't use the AVIC when inhibited but still inconsistent and can trigger

a warning about this in avic_vcpu_load.



To be on the safe side I think it makes sense to call

avic_vcpu_put/avic_vcpu_load when inhibiting/uninhibiting the AVIC.

This will ensure that the work these functions do is matched.



Patch 10 is the patch from Vitaly about allowing AVIC with SYNC

as long as the guest doesn’t use the AutoEOI feature. I only slightly

changed it to drop the SRCU lock around call to kvm_request_apicv_update

and also expose the AutoEOI cpuid bit regardless of AVIC enablement.



Despite the fact that this is the last patch in this series, this patch

doesn't depend on the other fixes.



Lastly I should note that I spent quite some time last week trying

to avoid dropping the SRCU lock around call to kvm_request_apicv_update,

which is needed due to the fact that this function changes memslots

and needs to do SRCU synchronization.



I tried to make this function such that it would only raise

the KVM_REQ_APICV_UPDATE, and let all vCPUs try to toggle the memory slot,

while processing this request,

but that approach was doomed to fail due to various races.



Using a delayed work for this as was suggested doesn't work either as it can't

update the VM's memory slots (this has to be done from the VM's process).



It is possible to brute force this by raising a new request,

say KVM_REQUEST_AVIC_INHIBITION on say VCPU0, let the new request

processing code drop the srcu lock and then do the things that

kvm_request_apicv_update does. I don't know if this would be better

than the current state of the things.



Best regards,

Maxim Levitsky



Maxim Levitsky (9):

KVM: x86: extract block/allow guest enteries

KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled

KVM: SVM: add warning for mistmatch between AVIC state and AVIC access

page state

KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to

be deactivated

KVM: SVM: tweak warning about enabled AVIC on nested entry

KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl

KVM: x86: APICv: drop immediate APICv disablement on current vCPU

KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling

AVIC



Vitaly Kuznetsov (1):

KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in

use



arch/x86/include/asm/kvm_host.h | 10 +++++--

arch/x86/kvm/hyperv.c | 34 +++++++++++++++++++----

arch/x86/kvm/svm/avic.c | 49 +++++++++++++++++----------------

arch/x86/kvm/svm/nested.c | 2 +-

arch/x86/kvm/svm/svm.c | 18 +++++++++---

arch/x86/kvm/vmx/vmx.c | 4 +--

arch/x86/kvm/x86.c | 49 ++++++++++++++++++---------------

7 files changed, 105 insertions(+), 61 deletions(-)



--

2.26.3





2021-06-23 11:31:12

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 01/10] KVM: x86: extract block/allow guest enteries

Master clock update sets and then clears a request which is not
handled by KVM as a means to block and then allow guest entries.
Extract this to kvm_allow_guest_entries/kvm_block_guest_entries.

No functional change intended.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +++--
arch/x86/kvm/x86.c | 25 +++++++++++++++----------
2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e11d64aa0bcd..cadb09c6fb0e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -68,7 +68,7 @@
#define KVM_REQ_PMI KVM_ARCH_REQ(11)
#define KVM_REQ_SMI KVM_ARCH_REQ(12)
#define KVM_REQ_MASTERCLOCK_UPDATE KVM_ARCH_REQ(13)
-#define KVM_REQ_MCLOCK_INPROGRESS \
+#define KVM_REQ_BLOCK_GUEST_ENTRIES \
KVM_ARCH_REQ_FLAGS(14, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_SCAN_IOAPIC \
KVM_ARCH_REQ_FLAGS(15, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
@@ -1831,7 +1831,8 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier);
unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu);
bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);

-void kvm_make_mclock_inprogress_request(struct kvm *kvm);
+void kvm_block_guest_entries(struct kvm *kvm);
+void kvm_allow_guest_entries(struct kvm *kvm);
void kvm_make_scan_ioapic_request(struct kvm *kvm);
void kvm_make_scan_ioapic_request_mask(struct kvm *kvm,
unsigned long *vcpu_bitmap);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 76dae88cf524..9af2fbbe0521 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2735,9 +2735,18 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
#endif
}

-void kvm_make_mclock_inprogress_request(struct kvm *kvm)
+void kvm_block_guest_entries(struct kvm *kvm)
{
- kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS);
+ kvm_make_all_cpus_request(kvm, KVM_REQ_BLOCK_GUEST_ENTRIES);
+}
+
+void kvm_allow_guest_entries(struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu;
+ int i;
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ kvm_clear_request(KVM_REQ_BLOCK_GUEST_ENTRIES, vcpu);
}

static void kvm_gen_update_masterclock(struct kvm *kvm)
@@ -2750,9 +2759,8 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)

kvm_hv_invalidate_tsc_page(kvm);

- kvm_make_mclock_inprogress_request(kvm);
+ kvm_block_guest_entries(kvm);

- /* no guest entries from this point */
spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags);
pvclock_update_vm_gtod_copy(kvm);
spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags);
@@ -2760,9 +2768,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

- /* guest entries allowed */
- kvm_for_each_vcpu(i, vcpu, kvm)
- kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
+ kvm_allow_guest_entries(kvm);
#endif
}

@@ -8051,7 +8057,7 @@ static void kvm_hyperv_tsc_notifier(void)

mutex_lock(&kvm_lock);
list_for_each_entry(kvm, &vm_list, vm_list)
- kvm_make_mclock_inprogress_request(kvm);
+ kvm_block_guest_entries(kvm);

hyperv_stop_tsc_emulation();

@@ -8070,8 +8076,7 @@ static void kvm_hyperv_tsc_notifier(void)
kvm_for_each_vcpu(cpu, vcpu, kvm)
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

- kvm_for_each_vcpu(cpu, vcpu, kvm)
- kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
+ kvm_allow_guest_entries(kvm);
}
mutex_unlock(&kvm_lock);
}
--
2.26.3

2021-06-23 11:31:20

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 02/10] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

Currently on SVM, the kvm_request_apicv_update calls the
'pre_update_apicv_exec_ctrl' without doing any synchronization
and that function toggles the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.

If there is a mismatch between that memslot state and the AVIC state,
while a vCPU is in guest mode, an APIC mmio write can be lost:

For example:

VCPU0: enable the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
VCPU1: write to an APIC mmio register.

Since AVIC is still disabled on VCPU1, the access will not be intercepted
by it, and neither will it cause MMIO fault, but rather it will just update
the dummy page mapped into the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.

Fix that by blocking guest entries while we update the memslot.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/x86.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9af2fbbe0521..6f0d9c231249 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9231,6 +9231,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
if (!!old == !!new)
return;

+ kvm_block_guest_entries(kvm);
+
trace_kvm_apicv_update_request(activate, bit);
if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
@@ -9243,6 +9245,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
except = kvm_get_running_vcpu();
kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
except);
+
+ kvm_allow_guest_entries(kvm);
if (except)
kvm_vcpu_update_apicv(except);
}
--
2.26.3

2021-06-23 11:31:31

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 03/10] KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled

This better reflects the purpose of this variable on AMD, since
on AMD the AVIC's memory slot can be enabled and disabled dynamically.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm/avic.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 4 ++--
3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cadb09c6fb0e..9ed5c55be352 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1022,7 +1022,7 @@ struct kvm_arch {
struct kvm_apic_map __rcu *apic_map;
atomic_t apic_map_dirty;

- bool apic_access_page_done;
+ bool apic_access_memslot_enabled;
unsigned long apicv_inhibit_reasons;

gpa_t wall_clock;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a9abed054cd5..1d01da64c333 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -236,7 +236,7 @@ static int avic_update_access_page(struct kvm *kvm, bool activate)
* APICv mode change, which update APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
* memory region. So, we need to ensure that kvm->mm == current->mm.
*/
- if ((kvm->arch.apic_access_page_done == activate) ||
+ if ((kvm->arch.apic_access_memslot_enabled == activate) ||
(kvm->mm != current->mm))
goto out;

@@ -249,7 +249,7 @@ static int avic_update_access_page(struct kvm *kvm, bool activate)
goto out;
}

- kvm->arch.apic_access_page_done = activate;
+ kvm->arch.apic_access_memslot_enabled = activate;
out:
mutex_unlock(&kvm->slots_lock);
return r;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ab6f682645d7..e4491e6a7f89 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3621,7 +3621,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
int ret = 0;

mutex_lock(&kvm->slots_lock);
- if (kvm->arch.apic_access_page_done)
+ if (kvm->arch.apic_access_memslot_enabled)
goto out;
hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
@@ -3641,7 +3641,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
* is able to migrate it.
*/
put_page(page);
- kvm->arch.apic_access_page_done = true;
+ kvm->arch.apic_access_memslot_enabled = true;
out:
mutex_unlock(&kvm->slots_lock);
return ret;
--
2.26.3

2021-06-23 11:31:35

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 04/10] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state

It is never a good idea to enter a guest when the AVIC state doesn't match
the state of the AVIC MMIO memory slot.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 12c06ea28f5c..50405c561394 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3780,6 +3780,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)

pre_svm_run(vcpu);

+ WARN_ON_ONCE(vcpu->kvm->arch.apic_access_memslot_enabled !=
+ kvm_vcpu_apicv_active(vcpu));
+
sync_lapic_to_cr8(vcpu);

if (unlikely(svm->asid != svm->vmcb->control.asid)) {
--
2.26.3

2021-06-23 11:31:54

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 06/10] KVM: SVM: tweak warning about enabled AVIC on nested entry

It is possible that AVIC was requested to be disabled but
not yet disabled, e.g if the nested entry is done right
after svm_vcpu_after_set_cpuid.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/nested.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dca20f949b63..253847f7d9aa 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -505,7 +505,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
* Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
* avic_physical_id.
*/
- WARN_ON(svm->vmcb01.ptr->control.int_ctl & AVIC_ENABLE_MASK);
+ WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));

/* Copied from vmcb01. msrpm_base can be overwritten later. */
svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
--
2.26.3

2021-06-23 11:32:07

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 07/10] KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl

AVIC is not supported for nesting but in some corner
cases it is possible to have it still be enabled,
after we entered nesting, and use vmcb02.

Fix this by always using vmcb01 in svm_refresh_apicv_exec_ctrl

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 1d01da64c333..a8ad78a2faa1 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -646,7 +646,7 @@ static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- struct vmcb *vmcb = svm->vmcb;
+ struct vmcb *vmcb = svm->vmcb01.ptr;
bool activated = kvm_vcpu_apicv_active(vcpu);

if (!enable_apicv)
--
2.26.3

2021-06-23 11:32:19

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 05/10] KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated

It is possible for AVIC inhibit and AVIC active state to be mismatched.
Currently we disable AVIC right away on vCPU which started the AVIC inhibit
request thus this warning doesn't trigger but at least in theory,
if svm_set_vintr is called at the same time on multiple vCPUs,
the warning can happen.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 50405c561394..d14be8aba2d7 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1560,8 +1560,11 @@ static void svm_set_vintr(struct vcpu_svm *svm)
{
struct vmcb_control_area *control;

- /* The following fields are ignored when AVIC is enabled */
- WARN_ON(kvm_vcpu_apicv_active(&svm->vcpu));
+ /*
+ * The following fields are ignored when AVIC is enabled
+ */
+ WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
+
svm_set_intercept(svm, INTERCEPT_VINTR);

/*
--
2.26.3

2021-06-23 11:32:26

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 09/10] KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC

Currently it is possible to have the following scenario:

1. AVIC is disabled by svm_refresh_apicv_exec_ctrl
2. svm_vcpu_blocking calls avic_vcpu_put which does nothing
3. svm_vcpu_unblocking enables the AVIC (due to KVM_REQ_APICV_UPDATE)
and then calls avic_vcpu_load
4. warning is triggered in avic_vcpu_load since
AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK was never cleared

While it is possible to just remove the warning, it seems to be more robust
to fully disable/enable AVIC in svm_refresh_apicv_exec_ctrl by calling the
avic_vcpu_load/avic_vcpu_put

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 43 ++++++++++++++++++++++-------------------
arch/x86/kvm/svm/svm.c | 8 ++++++--
arch/x86/kvm/x86.c | 8 +++++++-
3 files changed, 36 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index a8ad78a2faa1..29d9d767b56c 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -80,6 +80,28 @@ enum avic_ipi_failure_cause {
AVIC_IPI_FAILURE_INVALID_BACKING_PAGE,
};

+
+static void __avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+ if (is_run)
+ avic_vcpu_load(vcpu, vcpu->cpu);
+ else
+ avic_vcpu_put(vcpu);
+}
+
+/*
+ * This function is called during VCPU halt/unhalt.
+ */
+static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ svm->avic_is_running = is_run;
+
+ if (kvm_vcpu_apicv_active(vcpu))
+ __avic_set_running(vcpu, is_run);
+}
+
/* Note:
* This function is called from IOMMU driver to notify
* SVM to schedule in a particular vCPU of a particular VM.
@@ -667,6 +689,7 @@ void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
}
vmcb_mark_dirty(vmcb, VMCB_AVIC);

+ __avic_set_running(vcpu, activated);
svm_set_pi_irte_mode(vcpu, activated);
}

@@ -960,9 +983,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
int h_physical_id = kvm_cpu_get_apicid(cpu);
struct vcpu_svm *svm = to_svm(vcpu);

- if (!kvm_vcpu_apicv_active(vcpu))
- return;
-
/*
* Since the host physical APIC id is 8 bits,
* we can support host APIC ID upto 255.
@@ -990,9 +1010,6 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
u64 entry;
struct vcpu_svm *svm = to_svm(vcpu);

- if (!kvm_vcpu_apicv_active(vcpu))
- return;
-
entry = READ_ONCE(*(svm->avic_physical_id_cache));
if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)
avic_update_iommu_vcpu_affinity(vcpu, -1, 0);
@@ -1001,20 +1018,6 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}

-/*
- * This function is called during VCPU halt/unhalt.
- */
-static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
-{
- struct vcpu_svm *svm = to_svm(vcpu);
-
- svm->avic_is_running = is_run;
- if (is_run)
- avic_vcpu_load(vcpu, vcpu->cpu);
- else
- avic_vcpu_put(vcpu);
-}
-
void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
{
avic_set_running(vcpu, false);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d14be8aba2d7..2c291dfa0e04 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1505,12 +1505,16 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
sd->current_vmcb = svm->vmcb;
indirect_branch_prediction_barrier();
}
- avic_vcpu_load(vcpu, cpu);
+
+ if (kvm_vcpu_apicv_active(vcpu))
+ avic_vcpu_load(vcpu, cpu);
}

static void svm_vcpu_put(struct kvm_vcpu *vcpu)
{
- avic_vcpu_put(vcpu);
+ if (kvm_vcpu_apicv_active(vcpu))
+ avic_vcpu_put(vcpu);
+
svm_prepare_host_switch(vcpu);

++vcpu->stat.host_state_reload;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dcf06acdbf52..f5d4b7f66bcd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9182,10 +9182,16 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)

void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
{
+ bool activate = kvm_apicv_activated(vcpu->kvm);
+
if (!lapic_in_kernel(vcpu))
return;

- vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
+ if (vcpu->arch.apicv_active == activate)
+ return;
+
+ vcpu->arch.apicv_active = activate;
+
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

--
2.26.3

2021-06-23 11:32:30

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 10/10] KVM: x86: hyper-v: Deactivate APICv only when AutoEOI feature is in use

From: Vitaly Kuznetsov <[email protected]>

APICV_INHIBIT_REASON_HYPERV is currently unconditionally forced upon
SynIC activation as SynIC's AutoEOI is incompatible with APICv/AVIC. It is,
however, possible to track whether the feature was actually used by the
guest and only inhibit APICv/AVIC when needed.

TLFS suggests a dedicated 'HV_DEPRECATING_AEOI_RECOMMENDED' flag to let
Windows know that AutoEOI feature should be avoided. While it's up to
KVM userspace to set the flag, KVM can help a bit by exposing global
APICv/AVIC enablement: in case APICv/AVIC usage is impossible, AutoEOI
is still preferred.

Maxim:
- Drop SRCU lock around the call to kvm_request_apicv_update
to avoid deadlock.
- Always set HV_DEPRECATING_AEOI_RECOMMENDED in kvm_get_hv_cpuid,
since this feature can be used regardless of APICv.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/hyperv.c | 34 +++++++++++++++++++++++++++------
2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9ed5c55be352..9eb78ec6a290 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -956,6 +956,9 @@ struct kvm_hv {
/* How many vCPUs have VP index != vCPU index */
atomic_t num_mismatched_vp_indexes;

+ /* How many SynICs use 'AutoEOI' feature */
+ atomic_t synic_auto_eoi_used;
+
struct hv_partition_assist_pg *hv_pa_pg;
struct kvm_hv_syndbg hv_syndbg;
};
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index b07592ca92f0..6bf47a583d0e 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -85,9 +85,22 @@ static bool synic_has_vector_auto_eoi(struct kvm_vcpu_hv_synic *synic,
return false;
}

+
+static void synic_toggle_avic(struct kvm_vcpu *vcpu, bool activate)
+{
+ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+ kvm_request_apicv_update(vcpu->kvm, activate,
+ APICV_INHIBIT_REASON_HYPERV);
+ vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+}
+
static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
int vector)
{
+ struct kvm_vcpu *vcpu = hv_synic_to_vcpu(synic);
+ struct kvm_hv *hv = to_kvm_hv(vcpu->kvm);
+ int auto_eoi_old, auto_eoi_new;
+
if (vector < HV_SYNIC_FIRST_VALID_VECTOR)
return;

@@ -96,10 +109,23 @@ static void synic_update_vector(struct kvm_vcpu_hv_synic *synic,
else
__clear_bit(vector, synic->vec_bitmap);

+ auto_eoi_old = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
if (synic_has_vector_auto_eoi(synic, vector))
__set_bit(vector, synic->auto_eoi_bitmap);
else
__clear_bit(vector, synic->auto_eoi_bitmap);
+
+ auto_eoi_new = bitmap_weight(synic->auto_eoi_bitmap, 256);
+
+ /* Hyper-V SynIC auto EOI SINTs are not compatible with APICV */
+ if (!auto_eoi_old && auto_eoi_new) {
+ if (atomic_inc_return(&hv->synic_auto_eoi_used) == 1)
+ synic_toggle_avic(vcpu, false);
+ } else if (!auto_eoi_new && auto_eoi_old) {
+ if (atomic_dec_return(&hv->synic_auto_eoi_used) == 0)
+ synic_toggle_avic(vcpu, true);
+ }
}

static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint,
@@ -933,12 +959,6 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)

synic = to_hv_synic(vcpu);

- /*
- * Hyper-V SynIC auto EOI SINT's are
- * not compatible with APICV, so request
- * to deactivate APICV permanently.
- */
- kvm_request_apicv_update(vcpu->kvm, false, APICV_INHIBIT_REASON_HYPERV);
synic->active = true;
synic->dont_zero_synic_pages = dont_zero_synic_pages;
synic->control = HV_SYNIC_CONTROL_ENABLE;
@@ -2466,6 +2486,8 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
ent->eax |= HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
if (!cpu_smt_possible())
ent->eax |= HV_X64_NO_NONARCH_CORESHARING;
+
+ ent->eax |= HV_DEPRECATING_AEOI_RECOMMENDED;
/*
* Default number of spinlock retry attempts, matches
* HyperV 2016.
--
2.26.3

2021-06-23 11:33:28

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH 08/10] KVM: x86: APICv: drop immediate APICv disablement on current vCPU

Special case of disabling the APICv on the current vCPU right away in
kvm_request_apicv_update doesn't bring much benefit vs raising
KVM_REQ_APICV_UPDATE on it instead, since this request will be processed
on the next entry to the guest.
(the comment about having another #VMEXIT is wrong).

It also hides various assumptions, that APICv enable state matches
the APICv inhibit state, as this special case only makes those states
match on the current vCPU.

Previous patches fixed few such assumptions so now it should be safe
to drop this special case.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/x86.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f0d9c231249..dcf06acdbf52 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9209,7 +9209,6 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
*/
void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
{
- struct kvm_vcpu *except;
unsigned long old, new, expected;

if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
@@ -9237,18 +9236,9 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);

- /*
- * Sending request to update APICV for all other vcpus,
- * while update the calling vcpu immediately instead of
- * waiting for another #VMEXIT to handle the request.
- */
- except = kvm_get_running_vcpu();
- kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
- except);
+ kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);

kvm_allow_guest_entries(kvm);
- if (except)
- kvm_vcpu_update_apicv(except);
}
EXPORT_SYMBOL_GPL(kvm_request_apicv_update);

--
2.26.3

2021-06-23 21:51:30

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 02/10] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

On 23/06/21 13:29, Maxim Levitsky wrote:
> + kvm_block_guest_entries(kvm);
> +
> trace_kvm_apicv_update_request(activate, bit);
> if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
> static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
> @@ -9243,6 +9245,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> except = kvm_get_running_vcpu();
> kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
> except);
> +
> + kvm_allow_guest_entries(kvm);

Doesn't this cause a busy loop during synchronize_rcu? It should be
possible to request the vmexit of other CPUs from
avic_update_access_page, and do a lock/unlock of kvm->slots_lock to wait
for the memslot to be updated.

(As an aside, I'd like to get rid of KVM_REQ_MCLOCK_IN_PROGRESS in 5.15...).

Paolo

2021-06-23 21:52:48

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 03/10] KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled

On 23/06/21 13:29, Maxim Levitsky wrote:
> This better reflects the purpose of this variable on AMD, since
> on AMD the AVIC's memory slot can be enabled and disabled dynamically.

Queued, thanks.

Paolo

2021-06-23 21:54:45

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 06/10] KVM: SVM: tweak warning about enabled AVIC on nested entry

On 23/06/21 13:29, Maxim Levitsky wrote:
> It is possible that AVIC was requested to be disabled but
> not yet disabled, e.g if the nested entry is done right
> after svm_vcpu_after_set_cpuid.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/nested.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
> index dca20f949b63..253847f7d9aa 100644
> --- a/arch/x86/kvm/svm/nested.c
> +++ b/arch/x86/kvm/svm/nested.c
> @@ -505,7 +505,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm)
> * Also covers avic_vapic_bar, avic_backing_page, avic_logical_id,
> * avic_physical_id.
> */
> - WARN_ON(svm->vmcb01.ptr->control.int_ctl & AVIC_ENABLE_MASK);
> + WARN_ON(kvm_apicv_activated(svm->vcpu.kvm));
>
> /* Copied from vmcb01. msrpm_base can be overwritten later. */
> svm->vmcb->control.nested_ctl = svm->vmcb01.ptr->control.nested_ctl;
>

Queued, thanks.

Paolo

2021-06-23 21:57:00

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 07/10] KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl

On 23/06/21 13:29, Maxim Levitsky wrote:
> AVIC is not supported for nesting but in some corner
> cases it is possible to have it still be enabled,
> after we entered nesting, and use vmcb02.
>
> Fix this by always using vmcb01 in svm_refresh_apicv_exec_ctrl

Please be more verbose about the corner case (and then the second
paragraph should not be necessary anymore).

Paolo

> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 1d01da64c333..a8ad78a2faa1 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -646,7 +646,7 @@ static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
> void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> - struct vmcb *vmcb = svm->vmcb;
> + struct vmcb *vmcb = svm->vmcb01.ptr;
> bool activated = kvm_vcpu_apicv_active(vcpu);
>
> if (!enable_apicv)
>

2021-06-23 21:57:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 04/10] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state

On 23/06/21 13:29, Maxim Levitsky wrote:
> It is never a good idea to enter a guest when the AVIC state doesn't match
> the state of the AVIC MMIO memory slot.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 12c06ea28f5c..50405c561394 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -3780,6 +3780,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>
> pre_svm_run(vcpu);
>
> + WARN_ON_ONCE(vcpu->kvm->arch.apic_access_memslot_enabled !=
> + kvm_vcpu_apicv_active(vcpu));
> +
> sync_lapic_to_cr8(vcpu);
>
> if (unlikely(svm->asid != svm->vmcb->control.asid)) {
>

For patches 4-6, can the warnings actually fire without the fix in patch 2?

Paolo

2021-06-24 08:08:37

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 02/10] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

On Wed, 2021-06-23 at 23:50 +0200, Paolo Bonzini wrote:
> On 23/06/21 13:29, Maxim Levitsky wrote:
> > + kvm_block_guest_entries(kvm);
> > +
> > trace_kvm_apicv_update_request(activate, bit);
> > if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
> > static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
> > @@ -9243,6 +9245,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> > except = kvm_get_running_vcpu();
> > kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
> > except);
> > +
> > + kvm_allow_guest_entries(kvm);
>
> Doesn't this cause a busy loop during synchronize_rcu?

Hi,

If you mean busy loop on other vcpus, then the answer is sadly yes.
Other option is to use a mutex, which is what I did in a former
version of this patch, but at last minute I decided that this
way it was done in this patch would be simplier.
AVIC updates don't happen often.
Also with a request, the KVM_REQ_APICV_UPDATE can be handled in parallel,
while mutex enforces unneeded mutual execution of it.


> It should be
> possible to request the vmexit of other CPUs from
> avic_update_access_page, and do a lock/unlock of kvm->slots_lock to wait
> for the memslot to be updated.

This would still keep the race. The other vCPUs must not enter the guest mode
from the moment the memslot update was started and until the KVM_REQ_APICV_UPDATE
is raised.

If I were to do any kind of synchronization in avic_update_access_page, then I will
have to drop the lock/request there, and from this point and till the common code
raises the KVM_REQ_APICV_UPDATE there is a possibility of a vCPU reentering the
guest mode without updating its AVIC.


Here is an older version of this patch that does use mutex instead.
Please let me know if you prefer it.

I copy pasted it here, thus its likely not to apply as my email client
probably destroys whitespace.

Thanks for the review,
Best regards,
Maxim Levitsky


--

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fdc6b8a4348f..b7dc7fd7b63d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9183,11 +9183,8 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
}

-void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
+void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
{
- if (!lapic_in_kernel(vcpu))
- return;
-
vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
@@ -9201,6 +9198,16 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
if (!vcpu->arch.apicv_active)
kvm_make_request(KVM_REQ_EVENT, vcpu);
}
+
+void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
+{
+ if (!lapic_in_kernel(vcpu))
+ return;
+
+ mutex_lock(&vcpu->kvm->apicv_update_lock);
+ __kvm_vcpu_update_apicv(vcpu);
+ mutex_unlock(&vcpu->kvm->apicv_update_lock);
+}
EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);

/*
@@ -9213,30 +9220,26 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
{
struct kvm_vcpu *except;
- unsigned long old, new, expected;
+ unsigned long old, new;

if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
!static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
return;

- old = READ_ONCE(kvm->arch.apicv_inhibit_reasons);
- do {
- expected = new = old;
- if (activate)
- __clear_bit(bit, &new);
- else
- __set_bit(bit, &new);
- if (new == old)
- break;
- old = cmpxchg(&kvm->arch.apicv_inhibit_reasons, expected, new);
- } while (old != expected);
+ mutex_lock(&kvm->apicv_update_lock);
+
+ old = new = kvm->arch.apicv_inhibit_reasons;
+ if (activate)
+ __clear_bit(bit, &new);
+ else
+ __set_bit(bit, &new);
+
+ WRITE_ONCE(kvm->arch.apicv_inhibit_reasons, new);

if (!!old == !!new)
- return;
+ goto out;

trace_kvm_apicv_update_request(activate, bit);
- if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
- static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);

/*
* Sending request to update APICV for all other vcpus,
@@ -9244,10 +9247,24 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
* waiting for another #VMEXIT to handle the request.
*/
except = kvm_get_running_vcpu();
+
+ /*
+ * on SVM, raising the KVM_REQ_APICV_UPDATE request while holding the
+ * apicv_update_lock ensures that we kick all vCPUs out of the
+ * guest mode and let them wait until the AVIC memslot update
+ * has completed.
+ */
+
kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
except);
+
+ if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
+ static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
+
if (except)
- kvm_vcpu_update_apicv(except);
+ __kvm_vcpu_update_apicv(except);
+out:
+ mutex_unlock(&kvm->apicv_update_lock);
}
EXPORT_SYMBOL_GPL(kvm_request_apicv_update);

@@ -9454,8 +9471,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
*/
if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu))
kvm_hv_process_stimers(vcpu);
- if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+ if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) {
+ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
kvm_vcpu_update_apicv(vcpu);
+ vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+ }
if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
kvm_check_async_pf_completion(vcpu);
if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 37cbb56ccd09..0364d35d43dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -524,6 +524,7 @@ struct kvm {
#endif /* KVM_HAVE_MMU_RWLOCK */

struct mutex slots_lock;
+ struct mutex apicv_update_lock;

/*
* Protects the arch-specific fields of struct kvm_memory_slots in
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ed4d1581d502..ba5d5d9ebc64 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -943,6 +943,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->irq_lock);
mutex_init(&kvm->slots_lock);
mutex_init(&kvm->slots_arch_lock);
+ mutex_init(&kvm->apicv_update_lock);
INIT_LIST_HEAD(&kvm->devices);

BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);



>
> (As an aside, I'd like to get rid of KVM_REQ_MCLOCK_IN_PROGRESS in 5.15...).
>
> Paolo
>


2021-06-24 08:15:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 04/10] KVM: SVM: add warning for mistmatch between AVIC state and AVIC access page state

On Wed, 2021-06-23 at 23:53 +0200, Paolo Bonzini wrote:
> On 23/06/21 13:29, Maxim Levitsky wrote:
> > It is never a good idea to enter a guest when the AVIC state doesn't match
> > the state of the AVIC MMIO memory slot.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/svm.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 12c06ea28f5c..50405c561394 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -3780,6 +3780,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
> >
> > pre_svm_run(vcpu);
> >
> > + WARN_ON_ONCE(vcpu->kvm->arch.apic_access_memslot_enabled !=
> > + kvm_vcpu_apicv_active(vcpu));
> > +
> > sync_lapic_to_cr8(vcpu);
> >
> > if (unlikely(svm->asid != svm->vmcb->control.asid)) {
> >
>
> For patches 4-6, can the warnings actually fire without the fix in patch 2?
>
> Paolo
>

Hi!

The warning in patch 4 does fire, not often but it does. Patch 2 fixes it.
The guest usually boots though few lost APIC writes don't always cause it to hang.

Plus the warning is also triggered when the AVIC state is mismatched the other way
around, that is when AVIC is enabled but memslot is disabled, which probably
doesn't cause issues.

Warning in patch 5 is mostly theoretical, until patch 8 is applied.
They can happen if AVIC is toggled on one vCPU for some reason, while another vCPU
asks for an interrupt window.

Patch 6 doesn't fix a warning, but rather a case which most likely can't happen
till patch 8 is applied, but still is correct.

Best regards,
Maxim Levitsky

2021-06-24 08:18:43

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 07/10] KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl

On Wed, 2021-06-23 at 23:54 +0200, Paolo Bonzini wrote:
> On 23/06/21 13:29, Maxim Levitsky wrote:
> > AVIC is not supported for nesting but in some corner
> > cases it is possible to have it still be enabled,
> > after we entered nesting, and use vmcb02.
> >
> > Fix this by always using vmcb01 in svm_refresh_apicv_exec_ctrl
>
> Please be more verbose about the corner case (and then the second
> paragraph should not be necessary anymore).

I will do it.
The issue can happen only after patch 8 is applied, because then AVIC disable on
the current vCPU is always deferred.

I think that currently the problem in this patch can't happen because
kvm_request_apicv_update(..., APICV_INHIBIT_REASON_NESTED) is called on each vCPU
from svm_vcpu_after_set_cpuid, and since it disables it on current vCPU, the
AVIC is fully disabled on all vCPUs when we get to the first guest entry, even if nested
(after a migration the first guest entry can be already nested)

After patch 8, the AVIC disable is done at guest entry where we already are in
L2, thus we should toggle it in vmcb01, while vmcb02 shouldn't have AVIC enabled
in the first place.

Best regards,
Maxim Levitsky


>
> Paolo
>
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/svm/avic.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 1d01da64c333..a8ad78a2faa1 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -646,7 +646,7 @@ static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
> > void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> > {
> > struct vcpu_svm *svm = to_svm(vcpu);
> > - struct vmcb *vmcb = svm->vmcb;
> > + struct vmcb *vmcb = svm->vmcb01.ptr;
> > bool activated = kvm_vcpu_apicv_active(vcpu);
> >
> > if (!enable_apicv)
> >


2021-07-07 12:58:55

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 02/10] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

On Thu, 2021-06-24 at 11:07 +0300, Maxim Levitsky wrote:
> On Wed, 2021-06-23 at 23:50 +0200, Paolo Bonzini wrote:
> > On 23/06/21 13:29, Maxim Levitsky wrote:
> > > + kvm_block_guest_entries(kvm);
> > > +
> > > trace_kvm_apicv_update_request(activate, bit);
> > > if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
> > > static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
> > > @@ -9243,6 +9245,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> > > except = kvm_get_running_vcpu();
> > > kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
> > > except);
> > > +
> > > + kvm_allow_guest_entries(kvm);
> >
> > Doesn't this cause a busy loop during synchronize_rcu?
>
> Hi,
>
> If you mean busy loop on other vcpus, then the answer is sadly yes.
> Other option is to use a mutex, which is what I did in a former
> version of this patch, but at last minute I decided that this
> way it was done in this patch would be simplier.
> AVIC updates don't happen often.
> Also with a request, the KVM_REQ_APICV_UPDATE can be handled in parallel,
> while mutex enforces unneeded mutual execution of it.
>
>
> > It should be
> > possible to request the vmexit of other CPUs from
> > avic_update_access_page, and do a lock/unlock of kvm->slots_lock to wait
> > for the memslot to be updated.
>
> This would still keep the race. The other vCPUs must not enter the guest mode
> from the moment the memslot update was started and until the KVM_REQ_APICV_UPDATE
> is raised.
>
> If I were to do any kind of synchronization in avic_update_access_page, then I will
> have to drop the lock/request there, and from this point and till the common code
> raises the KVM_REQ_APICV_UPDATE there is a possibility of a vCPU reentering the
> guest mode without updating its AVIC.
>
>
> Here is an older version of this patch that does use mutex instead.
> Please let me know if you prefer it.
>
> I copy pasted it here, thus its likely not to apply as my email client
> probably destroys whitespace.
>
> Thanks for the review,
> Best regards,
> Maxim Levitsky
>
>
> --
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fdc6b8a4348f..b7dc7fd7b63d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9183,11 +9183,8 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
> kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
> }
>
> -void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> +void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> {
> - if (!lapic_in_kernel(vcpu))
> - return;
> -
> vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
> kvm_apic_update_apicv(vcpu);
> static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
> @@ -9201,6 +9198,16 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> if (!vcpu->arch.apicv_active)
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> }
> +
> +void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
> +{
> + if (!lapic_in_kernel(vcpu))
> + return;
> +
> + mutex_lock(&vcpu->kvm->apicv_update_lock);
> + __kvm_vcpu_update_apicv(vcpu);
> + mutex_unlock(&vcpu->kvm->apicv_update_lock);
> +}
> EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
>
> /*
> @@ -9213,30 +9220,26 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
> void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> {
> struct kvm_vcpu *except;
> - unsigned long old, new, expected;
> + unsigned long old, new;
>
> if (!kvm_x86_ops.check_apicv_inhibit_reasons ||
> !static_call(kvm_x86_check_apicv_inhibit_reasons)(bit))
> return;
>
> - old = READ_ONCE(kvm->arch.apicv_inhibit_reasons);
> - do {
> - expected = new = old;
> - if (activate)
> - __clear_bit(bit, &new);
> - else
> - __set_bit(bit, &new);
> - if (new == old)
> - break;
> - old = cmpxchg(&kvm->arch.apicv_inhibit_reasons, expected, new);
> - } while (old != expected);
> + mutex_lock(&kvm->apicv_update_lock);
> +
> + old = new = kvm->arch.apicv_inhibit_reasons;
> + if (activate)
> + __clear_bit(bit, &new);
> + else
> + __set_bit(bit, &new);
> +
> + WRITE_ONCE(kvm->arch.apicv_inhibit_reasons, new);
>
> if (!!old == !!new)
> - return;
> + goto out;
>
> trace_kvm_apicv_update_request(activate, bit);
> - if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
> - static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
>
> /*
> * Sending request to update APICV for all other vcpus,
> @@ -9244,10 +9247,24 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> * waiting for another #VMEXIT to handle the request.
> */
> except = kvm_get_running_vcpu();
> +
> + /*
> + * on SVM, raising the KVM_REQ_APICV_UPDATE request while holding the
> + * apicv_update_lock ensures that we kick all vCPUs out of the
> + * guest mode and let them wait until the AVIC memslot update
> + * has completed.
> + */
> +
> kvm_make_all_cpus_request_except(kvm, KVM_REQ_APICV_UPDATE,
> except);
> +
> + if (kvm_x86_ops.pre_update_apicv_exec_ctrl)
> + static_call(kvm_x86_pre_update_apicv_exec_ctrl)(kvm, activate);
> +
> if (except)
> - kvm_vcpu_update_apicv(except);
> + __kvm_vcpu_update_apicv(except);
> +out:
> + mutex_unlock(&kvm->apicv_update_lock);
> }
> EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
>
> @@ -9454,8 +9471,11 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> */
> if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu))
> kvm_hv_process_stimers(vcpu);
> - if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
> + if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu)) {
> + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
> kvm_vcpu_update_apicv(vcpu);
> + vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
> + }
> if (kvm_check_request(KVM_REQ_APF_READY, vcpu))
> kvm_check_async_pf_completion(vcpu);
> if (kvm_check_request(KVM_REQ_MSR_FILTER_CHANGED, vcpu))
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 37cbb56ccd09..0364d35d43dc 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -524,6 +524,7 @@ struct kvm {
> #endif /* KVM_HAVE_MMU_RWLOCK */
>
> struct mutex slots_lock;
> + struct mutex apicv_update_lock;
>
> /*
> * Protects the arch-specific fields of struct kvm_memory_slots in
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ed4d1581d502..ba5d5d9ebc64 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -943,6 +943,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
> mutex_init(&kvm->irq_lock);
> mutex_init(&kvm->slots_lock);
> mutex_init(&kvm->slots_arch_lock);
> + mutex_init(&kvm->apicv_update_lock);
> INIT_LIST_HEAD(&kvm->devices);
>
> BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
>
>
>
> > (As an aside, I'd like to get rid of KVM_REQ_MCLOCK_IN_PROGRESS in 5.15...).
> >
> > Paolo
> >


Hi!
Any update? should I use a lock for this?


Best regards,
Maxim Levitsky


2021-07-07 14:05:35

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 02/10] KVM: x86: APICv: fix race in kvm_request_apicv_update on SVM

On 07/07/21 14:57, Maxim Levitsky wrote:
>
> Hi!
> Any update? should I use a lock for this?

Yes please, even irq_lock can do.

Paolo