2019-09-14 05:44:25

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 00/16] kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode

The 'commit 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before
enabling AVIC")' was introduced to fix miscellaneous boot-hang issues
when enable AVIC. This is mainly due to AVIC hardware doest not #vmexit
on write to LAPIC EOI register resulting in-kernel PIC and IOAPIC to
wait and do not inject new interrupts (e.g. PIT, RTC).

This limits AVIC to only work with kernel_irqchip=split mode, which is
not currently enabled by default, and also required user-space to
support split irqchip model, which might not be the case.

The goal of this series is to enable AVIC to work in both irqchip modes,
by allowing AVIC to be deactivated temporarily during runtime, and fallback
to legacy interrupt injection mode (w/ vINTR and interrupt windows)
when needed, and then re-enabled subsequently.

Similar approach is also used to handle Hyper-V SynIC in the
'commit 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller")',
where APICv is permanently disabled at runtime (currently broken for
AVIC, and fixed by this series).

This series contains three parts:
* Part 1: patch 1-2
Introduce APICv state enum and logic for keeping track of the state
for each vm.

* Part 2: patch 3-11
Add support for activate/deactivate APICv at runtime

* Part 3: patch 12-16:
Provide workaround for AVIC EOI and allow enable AVIC w/
kernel_irqchip=on

Pre-requisite Patch:
* commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
(de-)activation code")
(https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/
?h=next&id=b9c6ff94e43a0ee053e0c1d983fba1ac4953b762)

This series has been tested against v5.3-rc5 as following:
* Booting Linux and Windows Server 2019 VMs upto 240 vcpus
and FreeBSD upto 128 vcpus w/ qemu option "kernel-irqchip=on"
and "-no-hpet".
* Pass-through Intel 10GbE NIC and run netperf in the VM.

Changes from V2: (https://lkml.org/lkml/2019/8/15/672)
* Rebase to v5.3-rc5
* Misc changes recommended by Alex and Vitaly.
* Rename APICV_DEACTIVATED to APICV_SUSPENDED
* Disable AVIC when guest booting w/ SVM support since AVIC
does not currently support guest w/ nested virt.
* Add tracepoint for APICV activate/deactivate request. (per Alex)
* Consolidate changes for handling EOI for kvm PIT emulation and
IOAPIC RTC handling in V2 into ioapic_lazy_update_eoi() in
patch 17/18 of v3 serie.
* Remove patches for providing per-vm apicv_state debug information.

Changes from V1: (https://lkml.org/lkml/2019/3/22/1042)
* Introduce APICv state enumeration
* Introduce KVM debugfs for APICv state
* Add synchronization logic for APICv state to prevent potential
race condition (per Jan's suggestion)
* Add support for activate/deactivate posted interrupt
(per Jan's suggestion)
* Remove callback functions for handling APIC ID, DFR and LDR update
(per Paolo's suggestion)
* Add workaround for handling EOI for in-kernel PIT and IOAPIC.

Suravee Suthikulpanit (16):
kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm
parameter
kvm: x86: Introduce KVM APICv state
kvm: lapic: Introduce APICv update helper function
kvm: x86: Add support for activate/de-activate APICv at runtime
kvm: x86: Add APICv activate/deactivate request trace points
kvm: x86: svm: Add support to activate/deactivate posted interrupts
svm: Add support for setup/destroy virutal APIC backing page for AVIC
svm: Add support for activate/deactivate AVIC at runtime
kvm: x86: hyperv: Use APICv deactivate request interface
svm: Disable AVIC when launching guest with SVM support
svm: Temporary deactivate AVIC during ExtINT handling
kvm: x86: Introduce struct kvm_x86_ops.apicv_eoi_accelerate
kvm: lapic: Clean up APIC predefined macros
kvm: ioapic: Refactor kvm_ioapic_update_eoi()
kvm: x86: ioapic: Lazy update IOAPIC EOI
svm: Allow AVIC with in-kernel irqchip mode

arch/x86/include/asm/kvm_host.h | 28 +++++-
arch/x86/kvm/hyperv.c | 12 ++-
arch/x86/kvm/ioapic.c | 149 +++++++++++++++++++-----------
arch/x86/kvm/lapic.c | 35 ++++---
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/svm.c | 198 +++++++++++++++++++++++++++++++++++++---
arch/x86/kvm/trace.h | 30 ++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 136 ++++++++++++++++++++++++++-
9 files changed, 506 insertions(+), 86 deletions(-)

--
1.8.3.1


2019-09-14 05:46:29

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 04/16] kvm: x86: Add support for activate/de-activate APICv at runtime

Certain runtime conditions require APICv to be temporary deactivated.
However, current implementation only support permanently deactivate
APICv at runtime (mainly used when running Hyper-V guest).

In addition, for AMD, when activate / deactivate APICv during runtime,
all vcpus in the VM has to be operating in the same APICv mode, which
requires the requesting (main) vcpu to notify others.

So, introduce interfaces to request all vcpus to activate/deactivate
APICv.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 9 ++++
arch/x86/kvm/x86.c | 95 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 104 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 562bfbd..a50fca1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -78,6 +78,10 @@
#define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22)
#define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23)
#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24)
+#define KVM_REQ_APICV_ACTIVATE \
+ KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
+#define KVM_REQ_APICV_DEACTIVATE \
+ KVM_ARCH_REQ_FLAGS(26, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1098,6 +1102,8 @@ struct kvm_x86_ops {
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
bool (*get_enable_apicv)(struct kvm *kvm);
+ void (*pre_update_apicv_exec_ctrl)(struct kvm_vcpu *vcpu,
+ bool activate);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
@@ -1425,6 +1431,9 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);

void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
+void kvm_vcpu_activate_apicv(struct kvm_vcpu *vcpu);
+void kvm_make_apicv_activate_request(struct kvm_vcpu *vcpu);
+void kvm_make_apicv_deactivate_request(struct kvm_vcpu *vcpu, bool disable);

int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64d275e..446df2b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
#include "cpuid.h"
#include "pmu.h"
#include "hyperv.h"
+#include "lapic.h"

#include <linux/clocksource.h>
#include <linux/interrupt.h>
@@ -7184,6 +7185,22 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}

+void kvm_vcpu_activate_apicv(struct kvm_vcpu *vcpu)
+{
+ if (!lapic_in_kernel(vcpu)) {
+ WARN_ON_ONCE(!vcpu->arch.apicv_active);
+ return;
+ }
+ if (vcpu->arch.apicv_active)
+ return;
+
+ vcpu->arch.apicv_active = true;
+ kvm_apic_update_apicv(vcpu);
+
+ kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_activate_apicv);
+
void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
{
if (!lapic_in_kernel(vcpu)) {
@@ -7194,8 +7211,10 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
return;

vcpu->arch.apicv_active = false;
+ kvm_apic_update_apicv(vcpu);
kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
}
+EXPORT_SYMBOL_GPL(kvm_vcpu_deactivate_apicv);

static void kvm_sched_yield(struct kvm *kvm, unsigned long dest_id)
{
@@ -7711,6 +7730,78 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
}

+/*
+ * NOTE:
+ * With kvm_make_apicv_[activate|deactivate]_request(), we could run into
+ * a situation where a vcpu tries to send a new request to a target vcpu before
+ * it can handle the prior request.
+ *
+ * For example:
+ *
+ * vcpu0 vcpu1
+ *
+ * kvm_make_apicv_activate_request() ...
+ * Handle KVM_REQ_APICV_ACTIVATE KVM_REQ_APICV_ACTIVATE==1
+ * vcpu_run() ...
+ * ...
+ * ...
+ * kvm_make_apicv_deactivate_request() ...
+ *
+ * Here, vcpu0 would have to clear stale apicv activate/deactivate
+ * request before handling new one.
+ */
+void kvm_make_apicv_activate_request(struct kvm_vcpu *vcpu)
+{
+ int i;
+ struct kvm_vcpu *v;
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->arch.apicv_lock);
+ if (kvm->arch.apicv_state != APICV_SUSPENDED) {
+ mutex_unlock(&kvm->arch.apicv_lock);
+ return;
+ }
+
+ kvm_for_each_vcpu(i, v, kvm)
+ kvm_clear_request(KVM_REQ_APICV_DEACTIVATE, v);
+
+ if (kvm_x86_ops->pre_update_apicv_exec_ctrl)
+ kvm_x86_ops->pre_update_apicv_exec_ctrl(vcpu, true);
+
+ kvm->arch.apicv_state = APICV_ACTIVATED;
+
+ kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_ACTIVATE);
+
+ mutex_unlock(&kvm->arch.apicv_lock);
+}
+EXPORT_SYMBOL_GPL(kvm_make_apicv_activate_request);
+
+void kvm_make_apicv_deactivate_request(struct kvm_vcpu *vcpu, bool disable)
+{
+ int i;
+ struct kvm_vcpu *v;
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->arch.apicv_lock);
+ if (kvm->arch.apicv_state != APICV_ACTIVATED) {
+ mutex_unlock(&kvm->arch.apicv_lock);
+ return;
+ }
+
+ kvm_for_each_vcpu(i, v, kvm)
+ kvm_clear_request(KVM_REQ_APICV_ACTIVATE, v);
+
+ if (kvm_x86_ops->pre_update_apicv_exec_ctrl)
+ kvm_x86_ops->pre_update_apicv_exec_ctrl(vcpu, false);
+
+ kvm->arch.apicv_state = disable ? APICV_DISABLED : APICV_SUSPENDED;
+
+ kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_DEACTIVATE);
+
+ mutex_unlock(&kvm->arch.apicv_lock);
+}
+EXPORT_SYMBOL_GPL(kvm_make_apicv_deactivate_request);
+
static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
{
if (!kvm_apic_present(vcpu))
@@ -7897,6 +7988,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
*/
if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu))
kvm_hv_process_stimers(vcpu);
+ if (kvm_check_request(KVM_REQ_APICV_ACTIVATE, vcpu))
+ kvm_vcpu_activate_apicv(vcpu);
+ if (kvm_check_request(KVM_REQ_APICV_DEACTIVATE, vcpu))
+ kvm_vcpu_deactivate_apicv(vcpu);
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
--
1.8.3.1

2019-09-14 06:45:00

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 13/16] kvm: lapic: Clean up APIC predefined macros

Move these duplicated predefined macros to the header file so that
it can be re-used in other places.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/lapic.c | 13 +++++--------
arch/x86/kvm/lapic.h | 1 +
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6453273..32967dc7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -56,9 +56,6 @@
#define APIC_VERSION (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
#define LAPIC_MMIO_LENGTH (1 << 12)
/* followed define is not in apicdef.h */
-#define APIC_SHORT_MASK 0xc0000
-#define APIC_DEST_NOSHORT 0x0
-#define APIC_DEST_MASK 0x800
#define MAX_APIC_VECTOR 256
#define APIC_VECTORS_PER_REG 32

@@ -570,9 +567,9 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
irq.level = (icr & APIC_INT_ASSERT) != 0;
irq.trig_mode = icr & APIC_INT_LEVELTRIG;

- if (icr & APIC_DEST_MASK)
+ if (icr & KVM_APIC_DEST_MASK)
return -KVM_EINVAL;
- if (icr & APIC_SHORT_MASK)
+ if (icr & KVM_APIC_SHORT_MASK)
return -KVM_EINVAL;

rcu_read_lock();
@@ -803,7 +800,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,

ASSERT(target);
switch (short_hand) {
- case APIC_DEST_NOSHORT:
+ case KVM_APIC_DEST_NOSHORT:
if (dest_mode == APIC_DEST_PHYSICAL)
return kvm_apic_match_physical_addr(target, mda);
else
@@ -1201,10 +1198,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)

irq.vector = icr_low & APIC_VECTOR_MASK;
irq.delivery_mode = icr_low & APIC_MODE_MASK;
- irq.dest_mode = icr_low & APIC_DEST_MASK;
+ irq.dest_mode = icr_low & KVM_APIC_DEST_MASK;
irq.level = (icr_low & APIC_INT_ASSERT) != 0;
irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
- irq.shorthand = icr_low & APIC_SHORT_MASK;
+ irq.shorthand = icr_low & KVM_APIC_SHORT_MASK;
irq.msi_redir_hint = false;
if (apic_x2apic_mode(apic))
irq.dest_id = icr_high;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 36a5271..ba13c98e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -10,6 +10,7 @@
#define KVM_APIC_SIPI 1
#define KVM_APIC_LVT_NUM 6

+#define KVM_APIC_DEST_NOSHORT 0x0
#define KVM_APIC_SHORT_MASK 0xc0000
#define KVM_APIC_DEST_MASK 0x800

--
1.8.3.1

2019-09-14 06:45:59

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 14/16] kvm: ioapic: Refactor kvm_ioapic_update_eoi()

Refactor code for handling IOAPIC EOI for subsequent patch.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/ioapic.c | 110 +++++++++++++++++++++++++-------------------------
1 file changed, 56 insertions(+), 54 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index d859ae8..c57b7bb 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -151,10 +151,16 @@ static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic)
__rtc_irq_eoi_tracking_restore_one(vcpu);
}

-static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu)
+static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu,
+ int vector)
{
- if (test_and_clear_bit(vcpu->vcpu_id,
- ioapic->rtc_status.dest_map.map)) {
+ struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
+
+ /* RTC special handling */
+ if (test_bit(vcpu->vcpu_id, dest_map->map) &&
+ (vector == dest_map->vectors[vcpu->vcpu_id]) &&
+ (test_and_clear_bit(vcpu->vcpu_id,
+ ioapic->rtc_status.dest_map.map))) {
--ioapic->rtc_status.pending_eoi;
rtc_status_pending_eoi_check_valid(ioapic);
}
@@ -415,72 +421,68 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
}

#define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
-
-static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
- struct kvm_ioapic *ioapic, int vector, int trigger_mode)
+static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
+ struct kvm_ioapic *ioapic,
+ int trigger_mode,
+ int pin)
{
- struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
struct kvm_lapic *apic = vcpu->arch.apic;
- int i;
-
- /* RTC special handling */
- if (test_bit(vcpu->vcpu_id, dest_map->map) &&
- vector == dest_map->vectors[vcpu->vcpu_id])
- rtc_irq_eoi(ioapic, vcpu);
-
- for (i = 0; i < IOAPIC_NUM_PINS; i++) {
- union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
-
- if (ent->fields.vector != vector)
- continue;
+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[pin];

- /*
- * We are dropping lock while calling ack notifiers because ack
- * notifier callbacks for assigned devices call into IOAPIC
- * recursively. Since remote_irr is cleared only after call
- * to notifiers if the same vector will be delivered while lock
- * is dropped it will be put into irr and will be delivered
- * after ack notifier returns.
- */
- spin_unlock(&ioapic->lock);
- kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
- spin_lock(&ioapic->lock);
+ /*
+ * We are dropping lock while calling ack notifiers because ack
+ * notifier callbacks for assigned devices call into IOAPIC
+ * recursively. Since remote_irr is cleared only after call
+ * to notifiers if the same vector will be delivered while lock
+ * is dropped it will be put into irr and will be delivered
+ * after ack notifier returns.
+ */
+ spin_unlock(&ioapic->lock);
+ kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
+ spin_lock(&ioapic->lock);

- if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
- continue;
+ if (trigger_mode != IOAPIC_LEVEL_TRIG ||
+ kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ return;

- ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
- ent->fields.remote_irr = 0;
- if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
- ++ioapic->irq_eoi[i];
- if (ioapic->irq_eoi[i] == IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
- /*
- * Real hardware does not deliver the interrupt
- * immediately during eoi broadcast, and this
- * lets a buggy guest make slow progress
- * even if it does not correctly handle a
- * level-triggered interrupt. Emulate this
- * behavior if we detect an interrupt storm.
- */
- schedule_delayed_work(&ioapic->eoi_inject, HZ / 100);
- ioapic->irq_eoi[i] = 0;
- trace_kvm_ioapic_delayed_eoi_inj(ent->bits);
- } else {
- ioapic_service(ioapic, i, false);
- }
+ ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
+ ent->fields.remote_irr = 0;
+ if (!ent->fields.mask && (ioapic->irr & (1 << pin))) {
+ ++ioapic->irq_eoi[pin];
+ if (ioapic->irq_eoi[pin] == IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
+ /*
+ * Real hardware does not deliver the interrupt
+ * immediately during eoi broadcast, and this
+ * lets a buggy guest make slow progress
+ * even if it does not correctly handle a
+ * level-triggered interrupt. Emulate this
+ * behavior if we detect an interrupt storm.
+ */
+ schedule_delayed_work(&ioapic->eoi_inject, HZ / 100);
+ ioapic->irq_eoi[pin] = 0;
+ trace_kvm_ioapic_delayed_eoi_inj(ent->bits);
} else {
- ioapic->irq_eoi[i] = 0;
+ ioapic_service(ioapic, pin, false);
}
+ } else {
+ ioapic->irq_eoi[pin] = 0;
}
}

void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
{
+ int i;
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;

spin_lock(&ioapic->lock);
- __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
+ rtc_irq_eoi(ioapic, vcpu, vector);
+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+ if (ent->fields.vector != vector)
+ continue;
+ kvm_ioapic_update_eoi_one(vcpu, ioapic, trigger_mode, i);
+ }
spin_unlock(&ioapic->lock);
}

--
1.8.3.1

2019-09-14 06:46:27

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 08/16] svm: Add support for activate/deactivate AVIC at runtime

Add necessary logics for supporting activate/deactivate AVIC at runtime.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 27 +++++++++++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2e06ee2..a01bc6a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -388,6 +388,7 @@ struct amd_svm_iommu_ir {
static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
static void svm_complete_interrupts(struct vcpu_svm *svm);
+static inline void avic_post_state_restore(struct kvm_vcpu *vcpu);

static int nested_svm_exit_handled(struct vcpu_svm *svm);
static int nested_svm_intercept(struct vcpu_svm *svm);
@@ -2365,6 +2366,10 @@ static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)

static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
{
+ if (kvm_check_request(KVM_REQ_APICV_ACTIVATE, vcpu))
+ kvm_vcpu_activate_apicv(vcpu);
+ if (kvm_check_request(KVM_REQ_APICV_DEACTIVATE, vcpu))
+ kvm_vcpu_deactivate_apicv(vcpu);
avic_set_running(vcpu, true);
}

@@ -5204,10 +5209,19 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;

- if (kvm_vcpu_apicv_active(vcpu))
+ if (kvm_vcpu_apicv_active(vcpu)) {
+ /**
+ * During AVIC temporary deactivation, guest could update
+ * APIC ID, DFR and LDR registers, which would not be trapped
+ * by avic_unaccelerated_access_interception(). In this case,
+ * we need to check and update the AVIC logical APIC ID table
+ * accordingly before re-activating.
+ */
+ avic_post_state_restore(vcpu);
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
- else
+ } else {
vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
+ }
mark_dirty(vmcb, VMCB_AVIC);
}

@@ -7283,6 +7297,14 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
return false;
}

+static void svm_pre_update_apicv_exec_ctrl(struct kvm_vcpu *vcpu, bool activate)
+{
+ if (activate)
+ avic_setup_access_page(vcpu, false);
+ else
+ avic_destroy_access_page(vcpu);
+}
+
static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -7360,6 +7382,7 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
.set_virtual_apic_mode = svm_set_virtual_apic_mode,
.get_enable_apicv = svm_get_enable_apicv,
.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
+ .pre_update_apicv_exec_ctrl = svm_pre_update_apicv_exec_ctrl,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_irr_update = svm_hwapic_irr_update,
.hwapic_isr_update = svm_hwapic_isr_update,
--
1.8.3.1

2019-09-14 06:47:57

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 06/16] kvm: x86: svm: Add support to activate/deactivate posted interrupts

Introduce interface for activate/deactivate posted interrupts, and
implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/svm.c | 44 +++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 5 +++++
3 files changed, 53 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a50fca1b..624e883 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1193,6 +1193,10 @@ struct kvm_x86_ops {

int (*update_pi_irte)(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
+
+ int (*activate_pi_irte)(struct kvm_vcpu *vcpu);
+ int (*deactivate_pi_irte)(struct kvm_vcpu *vcpu);
+
void (*apicv_post_state_restore)(struct kvm_vcpu *vcpu);
bool (*dy_apicv_has_pending_interrupt)(struct kvm_vcpu *vcpu);

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 245bde0..8673617 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5398,6 +5398,48 @@ static int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
return ret;
}

+static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
+{
+ int ret = 0;
+ unsigned long flags;
+ struct amd_svm_iommu_ir *ir;
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!kvm_arch_has_assigned_device(vcpu->kvm))
+ return 0;
+
+ /*
+ * Here, we go through the per-vcpu ir_list to update all existing
+ * interrupt remapping table entry targeting this vcpu.
+ */
+ spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ if (list_empty(&svm->ir_list))
+ goto out;
+
+ list_for_each_entry(ir, &svm->ir_list, node) {
+ if (activate)
+ ret = amd_iommu_activate_guest_mode(ir->data);
+ else
+ ret = amd_iommu_deactivate_guest_mode(ir->data);
+ if (ret)
+ break;
+ }
+out:
+ spin_unlock_irqrestore(&svm->ir_list_lock, flags);
+ return ret;
+}
+
+static int svm_activate_pi_irte(struct kvm_vcpu *vcpu)
+{
+ return svm_set_pi_irte_mode(vcpu, true);
+}
+
+static int svm_deactivate_pi_irte(struct kvm_vcpu *vcpu)
+{
+ return svm_set_pi_irte_mode(vcpu, false);
+}
+
static int svm_nmi_allowed(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -7321,6 +7363,8 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
.deliver_posted_interrupt = svm_deliver_avic_intr,
.dy_apicv_has_pending_interrupt = svm_dy_apicv_has_pending_interrupt,
.update_pi_irte = svm_update_pi_irte,
+ .activate_pi_irte = svm_activate_pi_irte,
+ .deactivate_pi_irte = svm_deactivate_pi_irte,
.setup_mce = svm_setup_mce,

.smi_allowed = svm_smi_allowed,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bc74876..1540629 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7198,6 +7198,9 @@ void kvm_vcpu_activate_apicv(struct kvm_vcpu *vcpu)
kvm_apic_update_apicv(vcpu);

kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
+
+ if (kvm_x86_ops->activate_pi_irte)
+ kvm_x86_ops->activate_pi_irte(vcpu);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_activate_apicv);

@@ -7212,6 +7215,8 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)

vcpu->arch.apicv_active = false;
kvm_apic_update_apicv(vcpu);
+ if (kvm_x86_ops->deactivate_pi_irte)
+ kvm_x86_ops->deactivate_pi_irte(vcpu);
kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deactivate_apicv);
--
1.8.3.1

2019-09-14 07:07:09

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 05/16] kvm: x86: Add APICv activate/deactivate request trace points

Add trace points when sending request to activate/deactivate APICv.

Suggested-by: Alexander Graf <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/trace.h | 30 ++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 7 +++++++
2 files changed, 37 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b5c831e..e3f745a 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1297,6 +1297,36 @@
__entry->vcpu_id, __entry->timer_index)
);

+TRACE_EVENT(kvm_apicv_activate_request,
+ TP_PROTO(u32 vcpu),
+ TP_ARGS(vcpu),
+
+ TP_STRUCT__entry(__field(u32, vcpu)),
+
+ TP_fast_assign(__entry->vcpu = vcpu;),
+
+ TP_printk("vcpu=%u", __entry->vcpu)
+);
+
+TRACE_EVENT(kvm_apicv_deactivate_request,
+ TP_PROTO(u32 vcpu, bool disable),
+ TP_ARGS(vcpu, disable),
+
+ TP_STRUCT__entry(
+ __field(u32, vcpu)
+ __field(bool, disable)
+ ),
+
+ TP_fast_assign(
+ __entry->vcpu = vcpu;
+ __entry->disable = disable;
+ ),
+
+ TP_printk("vcpu=%u, disable=%s",
+ __entry->vcpu,
+ __entry->disable ? "disabled" : "suspended")
+);
+
/*
* Tracepoint for AMD AVIC
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 446df2b..bc74876 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7756,6 +7756,8 @@ void kvm_make_apicv_activate_request(struct kvm_vcpu *vcpu)
struct kvm_vcpu *v;
struct kvm *kvm = vcpu->kvm;

+ trace_kvm_apicv_activate_request(vcpu->vcpu_id);
+
mutex_lock(&kvm->arch.apicv_lock);
if (kvm->arch.apicv_state != APICV_SUSPENDED) {
mutex_unlock(&kvm->arch.apicv_lock);
@@ -7782,6 +7784,8 @@ void kvm_make_apicv_deactivate_request(struct kvm_vcpu *vcpu, bool disable)
struct kvm_vcpu *v;
struct kvm *kvm = vcpu->kvm;

+ trace_kvm_apicv_deactivate_request(vcpu->vcpu_id, disable);
+
mutex_lock(&kvm->arch.apicv_lock);
if (kvm->arch.apicv_state != APICV_ACTIVATED) {
mutex_unlock(&kvm->arch.apicv_lock);
@@ -10194,3 +10198,6 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_activate_request);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_deactivate_request);
+
--
1.8.3.1

2019-09-14 07:07:35

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 07/16] svm: Add support for setup/destroy virutal APIC backing page for AVIC

Activate/deactivate AVIC requires setting/unsetting the memory region used
for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT).
So, re-factor avic_init_access_page() to avic_setup_access_page()
and add srcu_read_lock/unlock, which are needed to allow this function
to be called during run-time.

Also, introduce avic_destroy_access_page() to unset the page when
deactivate AVIC.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 47 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 8673617..2e06ee2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1476,7 +1476,9 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
static void avic_init_vmcb(struct vcpu_svm *svm)
{
struct vmcb *vmcb = svm->vmcb;
- struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
+ struct kvm *kvm = svm->vcpu.kvm;
+ struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+
phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
phys_addr_t ppa = __sme_set(page_to_phys(kvm_svm->avic_physical_id_table_page));
@@ -1485,7 +1487,13 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
- vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+
+ mutex_lock(&kvm->arch.apicv_lock);
+ if (kvm->arch.apicv_state == APICV_ACTIVATED)
+ vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+ else
+ vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
+ mutex_unlock(&kvm->arch.apicv_lock);
}

static void init_vmcb(struct vcpu_svm *svm)
@@ -1668,19 +1676,24 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
* field of the VMCB. Therefore, we set up the
* APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
*/
-static int avic_init_access_page(struct kvm_vcpu *vcpu)
+static int avic_setup_access_page(struct kvm_vcpu *vcpu, bool init)
{
struct kvm *kvm = vcpu->kvm;
int ret = 0;

mutex_lock(&kvm->slots_lock);
+
if (kvm->arch.apic_access_page_done)
goto out;

+ if (!init)
+ srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
ret = __x86_set_memory_region(kvm,
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
APIC_DEFAULT_PHYS_BASE,
PAGE_SIZE);
+ if (!init)
+ vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
if (ret)
goto out;

@@ -1690,14 +1703,39 @@ static int avic_init_access_page(struct kvm_vcpu *vcpu)
return ret;
}

+static void avic_destroy_access_page(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ mutex_lock(&kvm->slots_lock);
+
+ if (!kvm->arch.apic_access_page_done)
+ goto out;
+
+ srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx);
+ __x86_set_memory_region(kvm,
+ APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+ APIC_DEFAULT_PHYS_BASE,
+ 0);
+ vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+ kvm->arch.apic_access_page_done = false;
+out:
+ mutex_unlock(&kvm->slots_lock);
+}
+
static int avic_init_backing_page(struct kvm_vcpu *vcpu)
{
- int ret;
+ int ret = 0;
u64 *entry, new_entry;
int id = vcpu->vcpu_id;
+ struct kvm *kvm = vcpu->kvm;
struct vcpu_svm *svm = to_svm(vcpu);

- ret = avic_init_access_page(vcpu);
+ mutex_lock(&kvm->arch.apicv_lock);
+ if (kvm->arch.apicv_state == APICV_ACTIVATED)
+ ret = avic_setup_access_page(vcpu, true);
+ mutex_unlock(&kvm->arch.apicv_lock);
+
if (ret)
return ret;

@@ -2187,7 +2225,10 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
*/
- svm->avic_is_running = true;
+ mutex_lock(&kvm->arch.apicv_lock);
+ if (irqchip_in_kernel(kvm) && kvm->arch.apicv_state == APICV_ACTIVATED)
+ svm->avic_is_running = true;
+ mutex_unlock(&kvm->arch.apicv_lock);

svm->nested.hsave = page_address(hsave_page);

--
1.8.3.1

2019-09-14 07:07:35

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.

Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/ioapic.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index c57b7bb..ddb41ee 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -48,6 +48,11 @@
static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
bool line_status);

+static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
+ struct kvm_ioapic *ioapic,
+ int trigger_mode,
+ int pin);
+
static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
unsigned long addr,
unsigned long length)
@@ -174,6 +179,31 @@ static bool rtc_irq_check_coalesced(struct kvm_ioapic *ioapic)
return false;
}

+static void ioapic_lazy_update_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+ union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
+
+ kvm_for_each_vcpu(i, vcpu, ioapic->kvm) {
+ if (!kvm_apic_match_dest(vcpu, NULL, KVM_APIC_DEST_NOSHORT,
+ entry->fields.dest_id,
+ entry->fields.dest_mode) ||
+ kvm_apic_pending_eoi(vcpu, entry->fields.vector))
+ continue;
+
+ /*
+ * If no longer has pending EOI in LAPICs, update
+ * EOI for this vetor.
+ */
+ rtc_irq_eoi(ioapic, vcpu, entry->fields.vector);
+ kvm_ioapic_update_eoi_one(vcpu, ioapic,
+ entry->fields.trig_mode,
+ irq);
+ break;
+ }
+}
+
static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
int irq_level, bool line_status)
{
@@ -192,6 +222,15 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}

/*
+ * In case APICv accelerate EOI write and do not trap,
+ * in-kernel IOAPIC will not be able to receive the EOI.
+ * In this case, we do lazy update of the pending EOI when
+ * trying to set IOAPIC irq.
+ */
+ if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
+ ioapic_lazy_update_eoi(ioapic, irq);
+
+ /*
* Return 0 for coalesced interrupts; for edge-triggered interrupts,
* this only happens if a previous edge has not been delivered due
* do masking. For level interrupts, the remote_irr field tells
--
1.8.3.1

2019-09-14 07:52:45

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH v3 16/16] svm: Allow AVIC with in-kernel irqchip mode

Once run-time AVIC activate/deactivate is supported, and EOI workaround
for AVIC is implemented, we can remove the kernel irqchip split mode
requirement for AVIC.

Hence, remove the check for irqchip split mode when enabling AVIC.

Cc: Radim Krčmář <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 457ffe1..4c649c0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5203,7 +5203,7 @@ static void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu)

static bool svm_get_enable_apicv(struct kvm *kvm)
{
- return avic && irqchip_split(kvm);
+ return avic;
}

static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
--
1.8.3.1

2019-10-08 18:47:21

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode

Ping:

Hi All,

Are there other concerns or suggestions for this series?

Thanks,
Suravee

On 9/13/19 2:00 PM, Suthikulpanit, Suravee wrote:
> The 'commit 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before
> enabling AVIC")' was introduced to fix miscellaneous boot-hang issues
> when enable AVIC. This is mainly due to AVIC hardware doest not #vmexit
> on write to LAPIC EOI register resulting in-kernel PIC and IOAPIC to
> wait and do not inject new interrupts (e.g. PIT, RTC).
>
> This limits AVIC to only work with kernel_irqchip=split mode, which is
> not currently enabled by default, and also required user-space to
> support split irqchip model, which might not be the case.
>
> The goal of this series is to enable AVIC to work in both irqchip modes,
> by allowing AVIC to be deactivated temporarily during runtime, and fallback
> to legacy interrupt injection mode (w/ vINTR and interrupt windows)
> when needed, and then re-enabled subsequently.
>
> Similar approach is also used to handle Hyper-V SynIC in the
> 'commit 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller")',
> where APICv is permanently disabled at runtime (currently broken for
> AVIC, and fixed by this series).
>
> This series contains three parts:
> * Part 1: patch 1-2
> Introduce APICv state enum and logic for keeping track of the state
> for each vm.
>
> * Part 2: patch 3-11
> Add support for activate/deactivate APICv at runtime
>
> * Part 3: patch 12-16:
> Provide workaround for AVIC EOI and allow enable AVIC w/
> kernel_irqchip=on
>
> Pre-requisite Patch:
> * commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC
> (de-)activation code")
> (https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/
> ?h=next&id=b9c6ff94e43a0ee053e0c1d983fba1ac4953b762)
>
> This series has been tested against v5.3-rc5 as following:
> * Booting Linux and Windows Server 2019 VMs upto 240 vcpus
> and FreeBSD upto 128 vcpus w/ qemu option "kernel-irqchip=on"
> and "-no-hpet".
> * Pass-through Intel 10GbE NIC and run netperf in the VM.
>
> Changes from V2: (https://lkml.org/lkml/2019/8/15/672)
> * Rebase to v5.3-rc5
> * Misc changes recommended by Alex and Vitaly.
> * Rename APICV_DEACTIVATED to APICV_SUSPENDED
> * Disable AVIC when guest booting w/ SVM support since AVIC
> does not currently support guest w/ nested virt.
> * Add tracepoint for APICV activate/deactivate request. (per Alex)
> * Consolidate changes for handling EOI for kvm PIT emulation and
> IOAPIC RTC handling in V2 into ioapic_lazy_update_eoi() in
> patch 17/18 of v3 serie.
> * Remove patches for providing per-vm apicv_state debug information.
>
> Changes from V1: (https://lkml.org/lkml/2019/3/22/1042)
> * Introduce APICv state enumeration
> * Introduce KVM debugfs for APICv state
> * Add synchronization logic for APICv state to prevent potential
> race condition (per Jan's suggestion)
> * Add support for activate/deactivate posted interrupt
> (per Jan's suggestion)
> * Remove callback functions for handling APIC ID, DFR and LDR update
> (per Paolo's suggestion)
> * Add workaround for handling EOI for in-kernel PIT and IOAPIC.
>
> Suravee Suthikulpanit (16):
> kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm
> parameter
> kvm: x86: Introduce KVM APICv state
> kvm: lapic: Introduce APICv update helper function
> kvm: x86: Add support for activate/de-activate APICv at runtime
> kvm: x86: Add APICv activate/deactivate request trace points
> kvm: x86: svm: Add support to activate/deactivate posted interrupts
> svm: Add support for setup/destroy virutal APIC backing page for AVIC
> svm: Add support for activate/deactivate AVIC at runtime
> kvm: x86: hyperv: Use APICv deactivate request interface
> svm: Disable AVIC when launching guest with SVM support
> svm: Temporary deactivate AVIC during ExtINT handling
> kvm: x86: Introduce struct kvm_x86_ops.apicv_eoi_accelerate
> kvm: lapic: Clean up APIC predefined macros
> kvm: ioapic: Refactor kvm_ioapic_update_eoi()
> kvm: x86: ioapic: Lazy update IOAPIC EOI
> svm: Allow AVIC with in-kernel irqchip mode
>
> arch/x86/include/asm/kvm_host.h | 28 +++++-
> arch/x86/kvm/hyperv.c | 12 ++-
> arch/x86/kvm/ioapic.c | 149 +++++++++++++++++++-----------
> arch/x86/kvm/lapic.c | 35 ++++---
> arch/x86/kvm/lapic.h | 2 +
> arch/x86/kvm/svm.c | 198 +++++++++++++++++++++++++++++++++++++---
> arch/x86/kvm/trace.h | 30 ++++++
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/kvm/x86.c | 136 ++++++++++++++++++++++++++-
> 9 files changed, 506 insertions(+), 86 deletions(-)
>

2019-10-09 08:26:11

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 06/16] kvm: x86: svm: Add support to activate/deactivate posted interrupts

On 13/09/19 21:00, Suthikulpanit, Suravee wrote:
> +++ b/arch/x86/kvm/x86.c
> @@ -7198,6 +7198,9 @@ void kvm_vcpu_activate_apicv(struct kvm_vcpu *vcpu)
> kvm_apic_update_apicv(vcpu);
>
> kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
> +
> + if (kvm_x86_ops->activate_pi_irte)
> + kvm_x86_ops->activate_pi_irte(vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_activate_apicv);
>
> @@ -7212,6 +7215,8 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
>
> vcpu->arch.apicv_active = false;
> kvm_apic_update_apicv(vcpu);
> + if (kvm_x86_ops->deactivate_pi_irte)
> + kvm_x86_ops->deactivate_pi_irte(vcpu);
> kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
> }
> EXPORT_SYMBOL_GPL(kvm_vcpu_deactivate_apicv);

This can be done in refresh_apicv_exec_ctrl.

Paolo

2019-10-09 08:39:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 04/16] kvm: x86: Add support for activate/de-activate APICv at runtime

On 13/09/19 21:00, Suthikulpanit, Suravee wrote:
> + kvm_for_each_vcpu(i, v, kvm)
> + kvm_clear_request(KVM_REQ_APICV_DEACTIVATE, v);
> +
> + if (kvm_x86_ops->pre_update_apicv_exec_ctrl)
> + kvm_x86_ops->pre_update_apicv_exec_ctrl(vcpu, true);
> +
> + kvm->arch.apicv_state = APICV_ACTIVATED;
> +
> + kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_ACTIVATE);
> +
> + mutex_unlock(&kvm->arch.apicv_lock);

I think a lot of this logic can be simplified. In particular, I would have:

- a single KVM_REQ_APICV_TOGGLE request that unifies
kvm_vcpu_activate_apicv and kvm_vcpu_deactivate_apicv. The invariant is
that, at the end of the function, (vcpu->arch.apicv_active ==
(kvm->arch.apicv_state = APICV_ACTIVATED)). The apicv_lock then becomes
an rwsem, so that kvm_activate_apic and kvm_deactivate_apic will
down_write it, while the processing of KVM_REQ_APICV_TOGGLE can
down_read it.

- the srcu_read_unlock/srcu_read_lock should be hidden in
svm_request_activate_avic/svm_request_deactivate_avic. Everything else
should only take struct kvm*, following what you've started with patch
1. In particular kvm_vcpu_deactivate_apicv should be changed to take a
struct kvm*, so that Hyper-V can do kvm_deactivate_apic(kvm,
APIC_DISABLED). Hyper-V should not care about
srcu_read_lock/srcu_read_unlock. avic_setup_access_page and
avic_destroy_access_page also can be changed to take struct kvm*.

Paolo

2019-10-09 09:01:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] kvm: x86: Support AMD SVM AVIC w/ in-kernel irqchip mode

On 08/10/19 20:44, Suthikulpanit, Suravee wrote:
> Ping:
>
> Hi All,
>
> Are there other concerns or suggestions for this series?

I've now reviewed it---sorry for the delay.

Paolo

2019-10-09 09:22:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
> /*
> + * In case APICv accelerate EOI write and do not trap,
> + * in-kernel IOAPIC will not be able to receive the EOI.
> + * In this case, we do lazy update of the pending EOI when
> + * trying to set IOAPIC irq.
> + */
> + if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
> + ioapic_lazy_update_eoi(ioapic, irq);
> +

This is okay for the RTC, and in fact I suggest that you make it work
like this even for Intel. This will get rid of kvm_apicv_eoi_accelerate
and be nicer overall.

However, it cannot work for the in-kernel PIT, because it is currently
checking ps->irq_ack before kvm_set_irq. Unfortunately, the in-kernel
PIT is relying on the ack notifier to timely queue the pt->worker work
item and reinject the missed tick.

Thus, you cannot enable APICv if ps->reinject is true.

Perhaps you can make kvm->arch.apicv_state a disabled counter? Then
Hyper-V can increment it when enabled, PIT can increment it when
ps->reinject becomes true and decrement it when it becomes false;
finally, svm.c can increment it when an SVM guest is launched and
increment/decrement it around ExtINT handling?

(This conflicts with some of the suggestions I made earlier, which
implied the existence of apicv_state, but everything should if anything
become easier).

Paolo

2019-10-09 09:56:20

by Roman Kagan

[permalink] [raw]
Subject: Re: [PATCH v3 15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

On Wed, Oct 09, 2019 at 11:21:41AM +0200, Paolo Bonzini wrote:
> On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
> > /*
> > + * In case APICv accelerate EOI write and do not trap,
> > + * in-kernel IOAPIC will not be able to receive the EOI.
> > + * In this case, we do lazy update of the pending EOI when
> > + * trying to set IOAPIC irq.
> > + */
> > + if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
> > + ioapic_lazy_update_eoi(ioapic, irq);
> > +
>
> This is okay for the RTC, and in fact I suggest that you make it work
> like this even for Intel. This will get rid of kvm_apicv_eoi_accelerate
> and be nicer overall.
>
> However, it cannot work for the in-kernel PIT, because it is currently
> checking ps->irq_ack before kvm_set_irq. Unfortunately, the in-kernel
> PIT is relying on the ack notifier to timely queue the pt->worker work
> item and reinject the missed tick.
>
> Thus, you cannot enable APICv if ps->reinject is true.
>
> Perhaps you can make kvm->arch.apicv_state a disabled counter? Then
> Hyper-V can increment it when enabled, PIT can increment it when
> ps->reinject becomes true and decrement it when it becomes false;
> finally, svm.c can increment it when an SVM guest is launched and
> increment/decrement it around ExtINT handling?

This can benefit Hyper-V emulation too. The point is that it's only
AutoEOI feature in SynIC that is incompatible with APICv. So the VM can
use APICv until the guest configures its first AutoEOI SINT. If the
hypervisor sets HV_DEPRECATING_AEOI_RECOMMENDED (bit 9) in
HYPERV_CPUID_ENLIGHTMENT_INFO (0x40000004) cpuid this may never happen
so we will not be pessimizing guests on modern hardware by merely
enabling SynIC. I started looking into this recently and would be happy
to piggy-back on this series.

Roman.

> (This conflicts with some of the suggestions I made earlier, which
> implied the existence of apicv_state, but everything should if anything
> become easier).
>
> Paolo

2019-10-31 15:18:56

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v3 15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

Paolo,

On 10/9/19 4:21 AM, Paolo Bonzini wrote:
> On 13/09/19 21:01, Suthikulpanit, Suravee wrote:
>> /*
>> + * In case APICv accelerate EOI write and do not trap,
>> + * in-kernel IOAPIC will not be able to receive the EOI.
>> + * In this case, we do lazy update of the pending EOI when
>> + * trying to set IOAPIC irq.
>> + */
>> + if (kvm_apicv_eoi_accelerate(ioapic->kvm, edge))
>> + ioapic_lazy_update_eoi(ioapic, irq);
>> +
>
> This is okay for the RTC, and in fact I suggest that you make it work
> like this even for Intel. This will get rid of kvm_apicv_eoi_accelerate
> and be nicer overall.
>
> However, it cannot work for the in-kernel PIT, because it is currently
> checking ps->irq_ack before kvm_set_irq. Unfortunately, the in-kernel
> PIT is relying on the ack notifier to timely queue the pt->worker work
> item and reinject the missed tick.
>
> Thus, you cannot enable APICv if ps->reinject is true.
>
> Perhaps you can make kvm->arch.apicv_state a disabled counter? Then
> Hyper-V can increment it when enabled, PIT can increment it when
> ps->reinject becomes true and decrement it when it becomes false;
> finally, svm.c can increment it when an SVM guest is launched and
> increment/decrement it around ExtINT handling?

I have been looking into the disabled counter idea and found a couple
issues:

* I am seeing more calls to enable_irq_window() than the number of
interrupt_window_interception(). This results in imbalanced
increment/decrement of the counter.

* APICv can be deactivated due to several reasons. Currently, it is
difficult to figure out why, and this makes debugging APICv difficult.

What if we change kvm->arch.apicv_state to kvm->arch.apicv_disable_mask
and have each bit representing the reason for deactivating APICv.

For example:
#define APICV_DISABLE_MASK_IRQWIN 0
#define APICV_DISABLE_MASK_HYPERV 1
#define APICV_DISABLE_MASK_PIT_REINJ 2
#define APICV_DISABLE_MASK_NESTED 3

In this case, we activate APICv only if kvm->arch.apicv_disable_mask ==
0. This way, we can find out why APICv is deactivated on a particular VM
at a particular point in time.

Thanks,
Suravee

2019-10-31 23:24:28

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 15/16] kvm: x86: ioapic: Lazy update IOAPIC EOI

On 31/10/19 16:17, Suthikulpanit, Suravee wrote:
> What if we change kvm->arch.apicv_state to kvm->arch.apicv_disable_mask
> and have each bit representing the reason for deactivating APICv.
>
> For example:
> #define APICV_DISABLE_MASK_IRQWIN 0
> #define APICV_DISABLE_MASK_HYPERV 1
> #define APICV_DISABLE_MASK_PIT_REINJ 2
> #define APICV_DISABLE_MASK_NESTED 3
>
> In this case, we activate APICv only if kvm->arch.apicv_disable_mask ==
> 0. This way, we can find out why APICv is deactivated on a particular VM
> at a particular point in time.

Yes, that also works for me if it makes for an easier implementation.

Paolo