The 'commit 67034bb9dd5e ("KVM: SVM: Add irqchip_split() checks before
enabling AVIC")' was introduced to fix miscellaneous boot-hang issues
when enable AVIC. This is mainly due to AVIC hardware doest not #vmexit
on write to LAPIC EOI register resulting in-kernel PIC and IOAPIC to
wait and do not inject new interrupts (e.g. PIT, RTC).
This limits AVIC to only work with kernel_irqchip=split mode, which is
not currently enabled by default, and also required user-space to
support split irqchip model, which might not be the case.
The goal of this series is to enable AVIC to work in both irqchip modes,
by allowing AVIC to be deactivated temporarily during runtime, and fallback
to legacy interrupt injection mode (w/ vINTR and interrupt windows)
when needed, and then re-enabled subsequently.
Similar approach is also used to handle Hyper-V SynIC in the
'commit 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller")',
where APICv is permanently disabled at runtime (currently broken for
AVIC, and fixed by this series).
This series contains serveral parts:
* Part 1: patch 1,2
Code clean up, refactor, and introduce helper funtions
* Part 2: patch 3
Introduce APICv deactivate bits to keep track of APICv state
for each vm.
* Part 3: patch 4-9
Add support for activate/deactivate APICv at runtime
* Part 4: patch 10-13:
Add support various cases where APICv needs to be deactivated
* Part 5: patch 14-16:
Introduce in-kernel IOAPIC workaround for AVIC EOI
* Part 6: path 17
Allow enable AVIC w/ kernel_irqchip=on
Pre-requisite Patch:
* commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code")
(https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/
?h=next&id=b9c6ff94e43a0ee053e0c1d983fba1ac4953b762)
This series has been tested against v5.3 as following:
* Booting Linux and Windows Server 2019 VMs upto 240 vcpus
and FreeBSD upto 128 vcpus w/ qemu option "kernel-irqchip=on"
and "-no-hpet".
* Pass-through Intel 10GbE NIC and run netperf in the VM.
Changes from V3: (https://lkml.org/lkml/2019/9/13/871)
(Per Paolo comments)
* Replace struct kvm_vcpu with struct kvm in various interfaces
* Replace KVM_REQ_APICV_ACTIVATE/DEACTIVATE with KVM_REQ_APICV_UPDATE request
* Replace APICv state enum (introduced in V3) w/ deactivate bits to track APICv state
* Remove kvm_apicv_eoi_accelerate() (introduced in V3)
* Deactivate APICv when using PIT re-inject mode
* Consolidate srcu_read_unlock/lock into svm_request_update_avic()
Suravee Suthikulpanit (17):
kvm: x86: Modify kvm_x86_ops.get_enable_apicv() to use struct kvm parameter
kvm: lapic: Introduce APICv update helper function
kvm: x86: Introduce APICv deactivate bits
kvm: x86: Add support for activate/de-activate APICv at runtime
kvm: x86: Add APICv activate/deactivate request trace points
kvm: x86: svm: Add support to activate/deactivate posted interrupts
svm: Add support for setup/destroy virutal APIC backing page for AVIC
kvm: x86: Introduce APICv pre-update hook
svm: Add support for activate/deactivate AVIC at runtime
kvm: x86: hyperv: Use APICv update request interface
svm: Deactivate AVIC when launching guest with nested SVM support
svm: Temporary deactivate AVIC during ExtINT handling
kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
kvm: lapic: Clean up APIC predefined macros
kvm: ioapic: Refactor kvm_ioapic_update_eoi()
kvm: ioapic: Lazy update IOAPIC EOI
svm: Allow AVIC with in-kernel irqchip mode
arch/x86/include/asm/kvm_host.h | 18 ++++-
arch/x86/kvm/hyperv.c | 5 +-
arch/x86/kvm/i8254.c | 10 +++
arch/x86/kvm/ioapic.c | 149 +++++++++++++++++++++++++---------------
arch/x86/kvm/lapic.c | 35 ++++++----
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/svm.c | 136 +++++++++++++++++++++++++++++++-----
arch/x86/kvm/trace.h | 19 +++++
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/x86.c | 61 +++++++++++++---
10 files changed, 343 insertions(+), 98 deletions(-)
--
1.8.3.1
Generally, APICv for all vcpus in the VM are enable/disable in the same
manner. So, get_enable_apicv() should represent APICv status of the VM
instead of each VCPU.
Modify kvm_x86_ops.get_enable_apicv() to take struct kvm as parameter
instead of struct kvm_vcpu.
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/svm.c | 4 ++--
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bdc16b0..843799b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1088,7 +1088,7 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
- bool (*get_enable_apicv)(struct kvm_vcpu *vcpu);
+ bool (*get_enable_apicv)(struct kvm *kvm);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index e036807..7090306 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5143,9 +5143,9 @@ static void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
return;
}
-static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
+static bool svm_get_enable_apicv(struct kvm *kvm)
{
- return avic && irqchip_split(vcpu->kvm);
+ return avic && irqchip_split(kvm);
}
static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c030c96..e4faa00 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3644,7 +3644,7 @@ void pt_update_intercept_for_msr(struct vcpu_vmx *vmx)
}
}
-static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
+static bool vmx_get_enable_apicv(struct kvm *kvm)
{
return enable_apicv;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91602d3..2341f48 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9217,7 +9217,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
goto fail_free_pio_data;
if (irqchip_in_kernel(vcpu->kvm)) {
- vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
+ vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu->kvm);
r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
if (r < 0)
goto fail_mmu_destroy;
--
1.8.3.1
Currently, after a VM boots with APICv enabled, it could be deactivated
due to various reasons (e.g. Hyper-v synic).
Introduce KVM APICv deactivate bits along with a new variable
struct kvm_arch.apicv_deact_msk to help keep track of why APICv is
deactivated for each VM.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +++++
arch/x86/kvm/svm.c | 3 +++
arch/x86/kvm/vmx/vmx.c | 4 ++++
arch/x86/kvm/x86.c | 22 +++++++++++++++++++++-
4 files changed, 33 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 843799b..1c05363 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -852,6 +852,8 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+#define APICV_DEACT_BIT_DISABLE 0
+
struct kvm_arch {
unsigned long n_used_mmu_pages;
unsigned long n_requested_mmu_pages;
@@ -881,6 +883,7 @@ struct kvm_arch {
struct kvm_apic_map *apic_map;
bool apic_access_page_done;
+ unsigned long apicv_deact_msk;
gpa_t wall_clock;
@@ -1416,6 +1419,8 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
+bool kvm_apicv_activated(struct kvm *kvm);
+void kvm_apicv_init(struct kvm *kvm, bool enable);
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7090306..a0caf66 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1985,6 +1985,9 @@ static int avic_vm_init(struct kvm *kvm)
hash_add(svm_vm_data_hash, &kvm_svm->hnode, kvm_svm->avic_vm_id);
spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags);
+ /* Enable KVM APICv support */
+ kvm_apicv_init(kvm, true);
+
return 0;
free_avic:
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e4faa00..28b97fb 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6775,6 +6775,10 @@ static int vmx_vm_init(struct kvm *kvm)
break;
}
}
+
+ /* Enable KVM APICv support */
+ kvm_apicv_init(kvm, vmx_get_enable_apicv(kvm));
+
return 0;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2341f48..70a70a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7201,6 +7201,21 @@ void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
}
+bool kvm_apicv_activated(struct kvm *kvm)
+{
+ return (READ_ONCE(kvm->arch.apicv_deact_msk) == 0);
+}
+EXPORT_SYMBOL_GPL(kvm_apicv_activated);
+
+void kvm_apicv_init(struct kvm *kvm, bool enable)
+{
+ if (enable)
+ clear_bit(APICV_DEACT_BIT_DISABLE, &kvm->arch.apicv_deact_msk);
+ else
+ set_bit(APICV_DEACT_BIT_DISABLE, &kvm->arch.apicv_deact_msk);
+}
+EXPORT_SYMBOL_GPL(kvm_apicv_init);
+
static void kvm_sched_yield(struct kvm *kvm, unsigned long dest_id)
{
struct kvm_vcpu *target = NULL;
@@ -9217,13 +9232,15 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
goto fail_free_pio_data;
if (irqchip_in_kernel(vcpu->kvm)) {
- vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu->kvm);
r = kvm_create_lapic(vcpu, lapic_timer_advance_ns);
if (r < 0)
goto fail_mmu_destroy;
} else
static_key_slow_inc(&kvm_no_apic_vcpu);
+ if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm))
+ vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu->kvm);
+
vcpu->arch.mce_banks = kzalloc(KVM_MAX_MCE_BANKS * sizeof(u64) * 4,
GFP_KERNEL_ACCOUNT);
if (!vcpu->arch.mce_banks) {
@@ -9322,6 +9339,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm_page_track_init(kvm);
kvm_mmu_init_vm(kvm);
+ /* Default to APICv disable */
+ kvm_apicv_init(kvm, false);
+
if (kvm_x86_ops->vm_init)
return kvm_x86_ops->vm_init(kvm);
--
1.8.3.1
Certain runtime conditions require APICv to be temporary deactivated.
However, current implementation only support permanently deactivate
APICv at runtime (mainly used when running Hyper-V synic).
In addition, for AMD, when activate / deactivate APICv during runtime,
all vcpus in the VM has to be operating in the same APICv mode, which
requires the requesting (main) vcpu to notify others.
So, introduce the following:
* A new KVM_REQ_APICV_UPDATE request bit
* Interfaces to request all vcpus to update (activate/deactivate) APICv
* Interface to update APICV-related parameters for each vcpu
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 +++++
arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1c05363..3b94f42 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -78,6 +78,8 @@
#define KVM_REQ_HV_STIMER KVM_ARCH_REQ(22)
#define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23)
#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24)
+#define KVM_REQ_APICV_UPDATE \
+ KVM_ARCH_REQ_FLAGS(25, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define CR0_RESERVED_BITS \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1421,6 +1423,9 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
bool kvm_apicv_activated(struct kvm *kvm);
void kvm_apicv_init(struct kvm *kvm, bool enable);
+void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
+void kvm_request_apicv_update(struct kvm *kvm, bool activate,
+ unsigned long bit);
int kvm_emulate_hypercall(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 70a70a1..394695a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -26,6 +26,7 @@
#include "cpuid.h"
#include "pmu.h"
#include "hyperv.h"
+#include "lapic.h"
#include <linux/clocksource.h>
#include <linux/interrupt.h>
@@ -7730,6 +7731,33 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
}
+void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
+{
+ if (!lapic_in_kernel(vcpu))
+ return;
+
+ vcpu->arch.apicv_active = kvm_apicv_activated(vcpu->kvm);
+ kvm_apic_update_apicv(vcpu);
+ kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);
+
+void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
+{
+ if (activate) {
+ if (!test_and_clear_bit(bit, &kvm->arch.apicv_deact_msk) ||
+ !kvm_apicv_activated(kvm))
+ return;
+ } else {
+ if (test_and_set_bit(bit, &kvm->arch.apicv_deact_msk) ||
+ kvm_apicv_activated(kvm))
+ return;
+ }
+
+ kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
+}
+EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
+
static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
{
if (!kvm_apic_present(vcpu))
@@ -7916,6 +7944,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
*/
if (kvm_check_request(KVM_REQ_HV_STIMER, vcpu))
kvm_hv_process_stimers(vcpu);
+ if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+ kvm_vcpu_update_apicv(vcpu);
}
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
--
1.8.3.1
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:
-global kvm-pit.lost_tick_policy=discard
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/i8254.c | 10 ++++++++++
arch/x86/kvm/svm.c | 8 +++++++-
3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fe61269..460f7a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,6 +858,7 @@ enum kvm_irqchip_mode {
#define APICV_DEACT_BIT_HYPERV 1
#define APICV_DEACT_BIT_NESTED 2
#define APICV_DEACT_BIT_IRQWIN 3
+#define APICV_DEACT_BIT_PIT_REINJ 4
struct kvm_arch {
unsigned long n_used_mmu_pages;
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 4a6dc54..3f77fda 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -295,12 +295,22 @@ void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject)
if (atomic_read(&ps->reinject) == reinject)
return;
+ /*
+ * AMD SVM AVIC accelerates EOI write and does not trap.
+ * This cause in-kernel PIT re-inject mode to fail
+ * since it checks ps->irq_ack before kvm_set_irq()
+ * and relies on the ack notifier to timely queue
+ * the pt->worker work iterm and reinject the missed tick.
+ * So, deactivate APICv when PIT is in reinject mode.
+ */
if (reinject) {
+ kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
/* The initial state is preserved while ps->reinject == 0. */
kvm_pit_reset_reinject(pit);
kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
} else {
+ kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
}
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e7ff04..9812feb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1679,7 +1679,13 @@ static int avic_update_access_page(struct kvm *kvm, bool activate)
int ret = 0;
mutex_lock(&kvm->slots_lock);
- if (kvm->arch.apic_access_page_done == activate)
+ /*
+ * During kvm_destroy_vm(), kvm_pit_set_reinject() could trigger
+ * APICv mode change, which update APIC_ACCESS_PAGE_PRIVATE_MEMSLOT
+ * memory region. So, we need to ensure that kvm->mm == current->mm.
+ */
+ if ((kvm->arch.apic_access_page_done == activate) ||
+ (kvm->mm != current->mm))
goto out;
ret = __x86_set_memory_region(kvm,
--
1.8.3.1
Refactor code for handling IOAPIC EOI for subsequent patch.
There is no functional change.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/ioapic.c | 110 +++++++++++++++++++++++++-------------------------
1 file changed, 56 insertions(+), 54 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index d859ae8..c57b7bb 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -151,10 +151,16 @@ static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic)
__rtc_irq_eoi_tracking_restore_one(vcpu);
}
-static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu)
+static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu,
+ int vector)
{
- if (test_and_clear_bit(vcpu->vcpu_id,
- ioapic->rtc_status.dest_map.map)) {
+ struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
+
+ /* RTC special handling */
+ if (test_bit(vcpu->vcpu_id, dest_map->map) &&
+ (vector == dest_map->vectors[vcpu->vcpu_id]) &&
+ (test_and_clear_bit(vcpu->vcpu_id,
+ ioapic->rtc_status.dest_map.map))) {
--ioapic->rtc_status.pending_eoi;
rtc_status_pending_eoi_check_valid(ioapic);
}
@@ -415,72 +421,68 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
}
#define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
-
-static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
- struct kvm_ioapic *ioapic, int vector, int trigger_mode)
+static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
+ struct kvm_ioapic *ioapic,
+ int trigger_mode,
+ int pin)
{
- struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
struct kvm_lapic *apic = vcpu->arch.apic;
- int i;
-
- /* RTC special handling */
- if (test_bit(vcpu->vcpu_id, dest_map->map) &&
- vector == dest_map->vectors[vcpu->vcpu_id])
- rtc_irq_eoi(ioapic, vcpu);
-
- for (i = 0; i < IOAPIC_NUM_PINS; i++) {
- union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
-
- if (ent->fields.vector != vector)
- continue;
+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[pin];
- /*
- * We are dropping lock while calling ack notifiers because ack
- * notifier callbacks for assigned devices call into IOAPIC
- * recursively. Since remote_irr is cleared only after call
- * to notifiers if the same vector will be delivered while lock
- * is dropped it will be put into irr and will be delivered
- * after ack notifier returns.
- */
- spin_unlock(&ioapic->lock);
- kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
- spin_lock(&ioapic->lock);
+ /*
+ * We are dropping lock while calling ack notifiers because ack
+ * notifier callbacks for assigned devices call into IOAPIC
+ * recursively. Since remote_irr is cleared only after call
+ * to notifiers if the same vector will be delivered while lock
+ * is dropped it will be put into irr and will be delivered
+ * after ack notifier returns.
+ */
+ spin_unlock(&ioapic->lock);
+ kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
+ spin_lock(&ioapic->lock);
- if (trigger_mode != IOAPIC_LEVEL_TRIG ||
- kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
- continue;
+ if (trigger_mode != IOAPIC_LEVEL_TRIG ||
+ kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+ return;
- ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
- ent->fields.remote_irr = 0;
- if (!ent->fields.mask && (ioapic->irr & (1 << i))) {
- ++ioapic->irq_eoi[i];
- if (ioapic->irq_eoi[i] == IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
- /*
- * Real hardware does not deliver the interrupt
- * immediately during eoi broadcast, and this
- * lets a buggy guest make slow progress
- * even if it does not correctly handle a
- * level-triggered interrupt. Emulate this
- * behavior if we detect an interrupt storm.
- */
- schedule_delayed_work(&ioapic->eoi_inject, HZ / 100);
- ioapic->irq_eoi[i] = 0;
- trace_kvm_ioapic_delayed_eoi_inj(ent->bits);
- } else {
- ioapic_service(ioapic, i, false);
- }
+ ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
+ ent->fields.remote_irr = 0;
+ if (!ent->fields.mask && (ioapic->irr & (1 << pin))) {
+ ++ioapic->irq_eoi[pin];
+ if (ioapic->irq_eoi[pin] == IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT) {
+ /*
+ * Real hardware does not deliver the interrupt
+ * immediately during eoi broadcast, and this
+ * lets a buggy guest make slow progress
+ * even if it does not correctly handle a
+ * level-triggered interrupt. Emulate this
+ * behavior if we detect an interrupt storm.
+ */
+ schedule_delayed_work(&ioapic->eoi_inject, HZ / 100);
+ ioapic->irq_eoi[pin] = 0;
+ trace_kvm_ioapic_delayed_eoi_inj(ent->bits);
} else {
- ioapic->irq_eoi[i] = 0;
+ ioapic_service(ioapic, pin, false);
}
+ } else {
+ ioapic->irq_eoi[pin] = 0;
}
}
void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
{
+ int i;
struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
spin_lock(&ioapic->lock);
- __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
+ rtc_irq_eoi(ioapic, vcpu, vector);
+ for (i = 0; i < IOAPIC_NUM_PINS; i++) {
+ union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i];
+
+ if (ent->fields.vector != vector)
+ continue;
+ kvm_ioapic_update_eoi_one(vcpu, ioapic, trigger_mode, i);
+ }
spin_unlock(&ioapic->lock);
}
--
1.8.3.1
AMD AVIC does not support ExtINT. Therefore, AVIC must be temporary
deactivated and fall back to using legacy interrupt injection via vINTR
and interrupt window.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 37 ++++++++++++++++++++++++++++++++++---
2 files changed, 35 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55d6476..fe61269 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -857,6 +857,7 @@ enum kvm_irqchip_mode {
#define APICV_DEACT_BIT_DISABLE 0
#define APICV_DEACT_BIT_HYPERV 1
#define APICV_DEACT_BIT_NESTED 2
+#define APICV_DEACT_BIT_IRQWIN 3
struct kvm_arch {
unsigned long n_used_mmu_pages;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 7f59b1a..0e7ff04 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -388,6 +388,8 @@ struct amd_svm_iommu_ir {
static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
static void svm_complete_interrupts(struct vcpu_svm *svm);
+static void svm_request_update_avic(struct kvm_vcpu *vcpu, bool activate);
+static bool svm_get_enable_apicv(struct kvm *kvm);
static inline void avic_post_state_restore(struct kvm_vcpu *vcpu);
static int nested_svm_exit_handled(struct vcpu_svm *svm);
@@ -4479,6 +4481,15 @@ static int interrupt_window_interception(struct vcpu_svm *svm)
{
kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
svm_clear_vintr(svm);
+
+ /*
+ * For AVIC, the only reason to end up here is ExtINTs.
+ * In this case AVIC was temporarily disabled for
+ * requesting the IRQ window and we have to re-enable it.
+ */
+ if (svm_get_enable_apicv(svm->vcpu.kvm))
+ svm_request_update_avic(&svm->vcpu, true);
+
svm->vmcb->control.int_ctl &= ~V_IRQ_MASK;
mark_dirty(svm->vmcb, VMCB_INTR);
++svm->vcpu.stat.irq_window_exits;
@@ -5166,6 +5177,21 @@ static void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
{
}
+static void svm_request_update_avic(struct kvm_vcpu *vcpu, bool activate)
+{
+ if (!lapic_in_kernel(vcpu))
+ return;
+ /*
+ * kvm_request_apicv_update() expects a prior read unlock
+ * on the the kvm->srcu since it subsequently calls read lock
+ * and re-unlock in __x86_set_memory_region()
+ * when updating APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.
+ */
+ srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
+ kvm_request_apicv_update(vcpu->kvm, activate, APICV_DEACT_BIT_IRQWIN);
+ vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+}
+
static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
{
int ret = 0;
@@ -5504,9 +5530,6 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
- if (kvm_vcpu_apicv_active(vcpu))
- return;
-
/*
* In case GIF=0 we can't rely on the CPU to tell us when GIF becomes
* 1, because that's a separate STGI/VMRUN intercept. The next time we
@@ -5516,6 +5539,14 @@ static void enable_irq_window(struct kvm_vcpu *vcpu)
* window under the assumption that the hardware will set the GIF.
*/
if ((vgif_enabled(svm) || gif_set(svm)) && nested_svm_intr(svm)) {
+ /*
+ * IRQ window is not needed when AVIC is enabled,
+ * unless we have pending ExtINT since it cannot be injected
+ * via AVIC. In such case, we need to temporarily disable AVIC,
+ * and fallback to injecting IRQ via V_IRQ.
+ */
+ if (kvm_vcpu_apicv_active(vcpu))
+ svm_request_update_avic(vcpu, false);
svm_set_vintr(svm);
svm_inject_irq(svm, 0x0);
}
--
1.8.3.1
Move these duplicated predefined macros to the header file so that
it can be re-used in other places.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/lapic.c | 13 +++++--------
arch/x86/kvm/lapic.h | 1 +
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4654230..07e154e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -56,9 +56,6 @@
#define APIC_VERSION (0x14UL | ((KVM_APIC_LVT_NUM - 1) << 16))
#define LAPIC_MMIO_LENGTH (1 << 12)
/* followed define is not in apicdef.h */
-#define APIC_SHORT_MASK 0xc0000
-#define APIC_DEST_NOSHORT 0x0
-#define APIC_DEST_MASK 0x800
#define MAX_APIC_VECTOR 256
#define APIC_VECTORS_PER_REG 32
@@ -575,9 +572,9 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
irq.level = (icr & APIC_INT_ASSERT) != 0;
irq.trig_mode = icr & APIC_INT_LEVELTRIG;
- if (icr & APIC_DEST_MASK)
+ if (icr & KVM_APIC_DEST_MASK)
return -KVM_EINVAL;
- if (icr & APIC_SHORT_MASK)
+ if (icr & KVM_APIC_SHORT_MASK)
return -KVM_EINVAL;
rcu_read_lock();
@@ -808,7 +805,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
ASSERT(target);
switch (short_hand) {
- case APIC_DEST_NOSHORT:
+ case KVM_APIC_DEST_NOSHORT:
if (dest_mode == APIC_DEST_PHYSICAL)
return kvm_apic_match_physical_addr(target, mda);
else
@@ -1206,10 +1203,10 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.vector = icr_low & APIC_VECTOR_MASK;
irq.delivery_mode = icr_low & APIC_MODE_MASK;
- irq.dest_mode = icr_low & APIC_DEST_MASK;
+ irq.dest_mode = icr_low & KVM_APIC_DEST_MASK;
irq.level = (icr_low & APIC_INT_ASSERT) != 0;
irq.trig_mode = icr_low & APIC_INT_LEVELTRIG;
- irq.shorthand = icr_low & APIC_SHORT_MASK;
+ irq.shorthand = icr_low & KVM_APIC_SHORT_MASK;
irq.msi_redir_hint = false;
if (apic_x2apic_mode(apic))
irq.dest_id = icr_high;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 36a5271..ba13c98e 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -10,6 +10,7 @@
#define KVM_APIC_SIPI 1
#define KVM_APIC_LVT_NUM 6
+#define KVM_APIC_DEST_NOSHORT 0x0
#define KVM_APIC_SHORT_MASK 0xc0000
#define KVM_APIC_DEST_MASK 0x800
--
1.8.3.1
In-kernel IOAPIC does not receive EOI with AMD SVM AVIC
since the processor accelerate write to APIC EOI register and
does not trap if the interrupt is edge-triggered.
Workaround this by lazy check for pending APIC EOI at the time when
setting new IOPIC irq, and update IOAPIC EOI if no pending APIC EOI.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/ioapic.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index c57b7bb..6fdd88f 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -48,6 +48,11 @@
static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
bool line_status);
+static void kvm_ioapic_update_eoi_one(struct kvm_vcpu *vcpu,
+ struct kvm_ioapic *ioapic,
+ int trigger_mode,
+ int pin);
+
static unsigned long ioapic_read_indirect(struct kvm_ioapic *ioapic,
unsigned long addr,
unsigned long length)
@@ -174,6 +179,31 @@ static bool rtc_irq_check_coalesced(struct kvm_ioapic *ioapic)
return false;
}
+static void ioapic_lazy_update_eoi(struct kvm_ioapic *ioapic, int irq)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+ union kvm_ioapic_redirect_entry *entry = &ioapic->redirtbl[irq];
+
+ kvm_for_each_vcpu(i, vcpu, ioapic->kvm) {
+ if (!kvm_apic_match_dest(vcpu, NULL, KVM_APIC_DEST_NOSHORT,
+ entry->fields.dest_id,
+ entry->fields.dest_mode) ||
+ kvm_apic_pending_eoi(vcpu, entry->fields.vector))
+ continue;
+
+ /*
+ * If no longer has pending EOI in LAPICs, update
+ * EOI for this vetor.
+ */
+ rtc_irq_eoi(ioapic, vcpu, entry->fields.vector);
+ kvm_ioapic_update_eoi_one(vcpu, ioapic,
+ entry->fields.trig_mode,
+ irq);
+ break;
+ }
+}
+
static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
int irq_level, bool line_status)
{
@@ -192,6 +222,15 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}
/*
+ * AMD SVM AVIC accelerate EOI write and do not trap,
+ * in-kernel IOAPIC will not be able to receive the EOI.
+ * In this case, we do lazy update of the pending EOI when
+ * trying to set IOAPIC irq.
+ */
+ if (kvm_apicv_activated(ioapic->kvm))
+ ioapic_lazy_update_eoi(ioapic, irq);
+
+ /*
* Return 0 for coalesced interrupts; for edge-triggered interrupts,
* this only happens if a previous edge has not been delivered due
* do masking. For level interrupts, the remote_irr field tells
--
1.8.3.1
Once run-time AVIC activate/deactivate is supported, and EOI workaround
for AVIC is implemented, we can remove the kernel irqchip split mode
requirement for AVIC.
Hence, remove the check for irqchip split mode when enabling AVIC.
Cc: Radim Krčmář <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9812feb..be9c1d3 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5172,7 +5172,7 @@ static void svm_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
static bool svm_get_enable_apicv(struct kvm *kvm)
{
- return avic && irqchip_split(kvm);
+ return avic;
}
static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
--
1.8.3.1
Since disabling APICv has to be done for all vcpus on AMD-based
system, adopt the newly introduced kvm_request_apicv_update()
interface, and introduce a new APICv deactivate bit for Hyper-V.
Also, remove the kvm_vcpu_deactivate_apicv() since no longer used.
Cc: Roman Kagan <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/hyperv.c | 5 +++--
arch/x86/kvm/x86.c | 13 -------------
3 files changed, 4 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f93d347..a6475fd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -855,6 +855,7 @@ enum kvm_irqchip_mode {
};
#define APICV_DEACT_BIT_DISABLE 0
+#define APICV_DEACT_BIT_HYPERV 1
struct kvm_arch {
unsigned long n_used_mmu_pages;
@@ -1421,7 +1422,6 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);
-void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu);
bool kvm_apicv_activated(struct kvm *kvm);
void kvm_apicv_init(struct kvm *kvm, bool enable);
void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index fff790a..aa93b46 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -772,9 +772,10 @@ int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages)
/*
* Hyper-V SynIC auto EOI SINT's are
- * not compatible with APICV, so deactivate APICV
+ * not compatible with APICV, so request
+ * to deactivate APICV permanently.
*/
- kvm_vcpu_deactivate_apicv(vcpu);
+ kvm_request_apicv_update(vcpu->kvm, false, APICV_DEACT_BIT_HYPERV);
synic->active = true;
synic->dont_zero_synic_pages = dont_zero_synic_pages;
return 0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c09ff78..0aa2833 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7189,19 +7189,6 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid)
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}
-void kvm_vcpu_deactivate_apicv(struct kvm_vcpu *vcpu)
-{
- if (!lapic_in_kernel(vcpu)) {
- WARN_ON_ONCE(vcpu->arch.apicv_active);
- return;
- }
- if (!vcpu->arch.apicv_active)
- return;
-
- vcpu->arch.apicv_active = false;
- kvm_x86_ops->refresh_apicv_exec_ctrl(vcpu);
-}
-
bool kvm_apicv_activated(struct kvm *kvm)
{
return (READ_ONCE(kvm->arch.apicv_deact_msk) == 0);
--
1.8.3.1
Add necessary logics for supporting activate/deactivate AVIC at runtime.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 21203a6..5b90458 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -388,6 +388,7 @@ struct amd_svm_iommu_ir {
static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
static void svm_flush_tlb(struct kvm_vcpu *vcpu, bool invalidate_gpa);
static void svm_complete_interrupts(struct vcpu_svm *svm);
+static inline void avic_post_state_restore(struct kvm_vcpu *vcpu);
static int nested_svm_exit_handled(struct vcpu_svm *svm);
static int nested_svm_intercept(struct vcpu_svm *svm);
@@ -1485,7 +1486,10 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_logical_id = lpa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
- vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+ if (kvm_apicv_activated(svm->vcpu.kvm))
+ vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+ else
+ vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
}
static void init_vmcb(struct vcpu_svm *svm)
@@ -1696,7 +1700,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
int id = vcpu->vcpu_id;
struct vcpu_svm *svm = to_svm(vcpu);
- ret = avic_update_access_page(vcpu->kvm, true);
+ if (kvm_apicv_activated(vcpu->kvm))
+ ret = avic_update_access_page(vcpu->kvm, true);
if (ret)
return ret;
@@ -2188,7 +2193,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
/* We initialize this flag to true to make sure that the is_running
* bit would be set the first time the vcpu is loaded.
*/
- svm->avic_is_running = true;
+ if (irqchip_in_kernel(kvm) && kvm_apicv_activated(kvm))
+ svm->avic_is_running = true;
svm->nested.hsave = page_address(hsave_page);
@@ -2325,6 +2331,8 @@ static void svm_vcpu_blocking(struct kvm_vcpu *vcpu)
static void svm_vcpu_unblocking(struct kvm_vcpu *vcpu)
{
+ if (kvm_check_request(KVM_REQ_APICV_UPDATE, vcpu))
+ kvm_vcpu_update_apicv(vcpu);
avic_set_running(vcpu, true);
}
@@ -5190,17 +5198,25 @@ static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
return ret;
}
-/* Note: Currently only used by Hyper-V. */
static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
bool activated = kvm_vcpu_apicv_active(vcpu);
- if (activated)
+ if (activated) {
+ /**
+ * During AVIC temporary deactivation, guest could update
+ * APIC ID, DFR and LDR registers, which would not be trapped
+ * by avic_unaccelerated_access_interception(). In this case,
+ * we need to check and update the AVIC logical APIC ID table
+ * accordingly before re-activating.
+ */
+ avic_post_state_restore(vcpu);
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
- else
+ } else {
vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
+ }
mark_dirty(vmcb, VMCB_AVIC);
svm_set_pi_irte_mode(vcpu, activated);
--
1.8.3.1
Introduce interface for activate/deactivate posted interrupts, and
implement SVM hooks to toggle AMD IOMMU guest virtual APIC mode.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index a0caf66..b7d0adc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5159,17 +5159,52 @@ static void svm_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
{
}
+static int svm_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
+{
+ int ret = 0;
+ unsigned long flags;
+ struct amd_svm_iommu_ir *ir;
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (!kvm_arch_has_assigned_device(vcpu->kvm))
+ return 0;
+
+ /*
+ * Here, we go through the per-vcpu ir_list to update all existing
+ * interrupt remapping table entry targeting this vcpu.
+ */
+ spin_lock_irqsave(&svm->ir_list_lock, flags);
+
+ if (list_empty(&svm->ir_list))
+ goto out;
+
+ list_for_each_entry(ir, &svm->ir_list, node) {
+ if (activate)
+ ret = amd_iommu_activate_guest_mode(ir->data);
+ else
+ ret = amd_iommu_deactivate_guest_mode(ir->data);
+ if (ret)
+ break;
+ }
+out:
+ spin_unlock_irqrestore(&svm->ir_list_lock, flags);
+ return ret;
+}
+
/* Note: Currently only used by Hyper-V. */
static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
+ bool activated = kvm_vcpu_apicv_active(vcpu);
- if (kvm_vcpu_apicv_active(vcpu))
+ if (activated)
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
else
vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
mark_dirty(vmcb, VMCB_AVIC);
+
+ svm_set_pi_irte_mode(vcpu, activated);
}
static void svm_load_eoi_exitmap(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
--
1.8.3.1
Re-factor avic_init_access_page() to avic_update_access_page() since
activate/deactivate AVIC requires setting/unsetting the memory region used
for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT).
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b7d0adc..46842a2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1668,23 +1668,22 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
* field of the VMCB. Therefore, we set up the
* APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
*/
-static int avic_init_access_page(struct kvm_vcpu *vcpu)
+static int avic_update_access_page(struct kvm *kvm, bool activate)
{
- struct kvm *kvm = vcpu->kvm;
int ret = 0;
mutex_lock(&kvm->slots_lock);
- if (kvm->arch.apic_access_page_done)
+ if (kvm->arch.apic_access_page_done == activate)
goto out;
ret = __x86_set_memory_region(kvm,
APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
APIC_DEFAULT_PHYS_BASE,
- PAGE_SIZE);
+ activate ? PAGE_SIZE : 0);
if (ret)
goto out;
- kvm->arch.apic_access_page_done = true;
+ kvm->arch.apic_access_page_done = activate;
out:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -1697,7 +1696,7 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
int id = vcpu->vcpu_id;
struct vcpu_svm *svm = to_svm(vcpu);
- ret = avic_init_access_page(vcpu);
+ ret = avic_update_access_page(vcpu->kvm, true);
if (ret)
return ret;
--
1.8.3.1
AMD SVM AVIC needs to update APIC backing page mapping before changing
APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
function hook to be called prior KVM APICv update request to each vcpu.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/x86.c | 2 ++
3 files changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3b94f42..f93d347 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1094,6 +1094,7 @@ struct kvm_x86_ops {
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
bool (*get_enable_apicv)(struct kvm *kvm);
+ void (*pre_update_apicv_exec_ctrl)(struct kvm *kvm, bool activate);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 46842a2..21203a6 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -7230,6 +7230,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
return false;
}
+static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
+{
+ avic_update_access_page(kvm, activate);
+}
+
static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -7307,6 +7312,7 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
.set_virtual_apic_mode = svm_set_virtual_apic_mode,
.get_enable_apicv = svm_get_enable_apicv,
.refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
+ .pre_update_apicv_exec_ctrl = svm_pre_update_apicv_exec_ctrl,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_irr_update = svm_hwapic_irr_update,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4fab93e..c09ff78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7755,6 +7755,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
}
trace_kvm_apicv_update_request(activate, bit);
+ if (kvm_x86_ops->pre_update_apicv_exec_ctrl)
+ kvm_x86_ops->pre_update_apicv_exec_ctrl(kvm, activate);
kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
}
EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
--
1.8.3.1
Since AVIC does not currently work w/ nested virtualization,
deactivate AVIC for the guest if setting CPUID Fn80000001_ECX[SVM]
(i.e. indicate support for SVM, which is needed for nested virtualization).
Suggested-by: Alexander Graf <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a6475fd..55d6476 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -856,6 +856,7 @@ enum kvm_irqchip_mode {
#define APICV_DEACT_BIT_DISABLE 0
#define APICV_DEACT_BIT_HYPERV 1
+#define APICV_DEACT_BIT_NESTED 2
struct kvm_arch {
unsigned long n_used_mmu_pages;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5b90458..7f59b1a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -5984,6 +5984,14 @@ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
return;
guest_cpuid_clear(vcpu, X86_FEATURE_X2APIC);
+
+ /*
+ * Currently, AVIC does not work with nested virtualization.
+ * So, we disable AVIC when cpuid for SVM is set in the L1 guest.
+ */
+ if (nested && guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+ kvm_request_apicv_update(vcpu->kvm, false,
+ APICV_DEACT_BIT_NESTED);
}
static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry)
--
1.8.3.1
Add trace points when sending request to activate/deactivate APICv.
Suggested-by: Alexander Graf <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/trace.h | 19 +++++++++++++++++++
arch/x86/kvm/x86.c | 2 ++
2 files changed, 21 insertions(+)
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index b5c831e..3bfc6b5 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -1297,6 +1297,25 @@
__entry->vcpu_id, __entry->timer_index)
);
+TRACE_EVENT(kvm_apicv_update_request,
+ TP_PROTO(bool activate, unsigned long bit),
+ TP_ARGS(activate, bit),
+
+ TP_STRUCT__entry(
+ __field(bool, activate)
+ __field(unsigned long, bit)
+ ),
+
+ TP_fast_assign(
+ __entry->activate = activate;
+ __entry->bit = bit;
+ ),
+
+ TP_printk("%s bit=%lu",
+ __entry->activate ? "activate" : "deactivate",
+ __entry->bit)
+);
+
/*
* Tracepoint for AMD AVIC
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 394695a..4fab93e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7754,6 +7754,7 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
return;
}
+ trace_kvm_apicv_update_request(activate, bit);
kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
}
EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
@@ -10145,3 +10146,4 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_update_request);
--
1.8.3.1
Re-factor code into a helper function for setting lapic parameters when
activate/deactivate APICv, and export the function for subsequent usage.
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/lapic.c | 22 +++++++++++++++++-----
arch/x86/kvm/lapic.h | 1 +
2 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e904ff0..4654230 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2152,6 +2152,21 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
pr_warn_once("APIC base relocation is unsupported by KVM");
}
+void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+
+ if (vcpu->arch.apicv_active) {
+ /* irr_pending is always true when apicv is activated. */
+ apic->irr_pending = true;
+ apic->isr_count = 1;
+ } else {
+ apic->irr_pending = (apic_search_irr(apic) != -1);
+ apic->isr_count = count_vectors(apic->regs + APIC_ISR);
+ }
+}
+EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
+
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
{
struct kvm_lapic *apic = vcpu->arch.apic;
@@ -2194,8 +2209,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_reg(apic, APIC_ISR + 0x10 * i, 0);
kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
- apic->irr_pending = vcpu->arch.apicv_active;
- apic->isr_count = vcpu->arch.apicv_active ? 1 : 0;
+ kvm_apic_update_apicv(vcpu);
apic->highest_isr_cache = -1;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);
@@ -2454,9 +2468,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
apic_manage_nmi_watchdog(apic, kvm_lapic_get_reg(apic, APIC_LVT0));
update_divide_count(apic);
start_apic_timer(apic);
- apic->irr_pending = true;
- apic->isr_count = vcpu->arch.apicv_active ?
- 1 : count_vectors(apic->regs + APIC_ISR);
+ kvm_apic_update_apicv(vcpu);
apic->highest_isr_cache = -1;
if (vcpu->arch.apicv_active) {
kvm_x86_ops->apicv_post_state_restore(vcpu);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 50053d2..36a5271 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -91,6 +91,7 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
+void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
--
1.8.3.1
On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> + unsigned long apicv_deact_msk;
No abbrev fld names. :) I can change this to something like
apicv_inhibit_reasons if there are no other issues (and likewise
s/APICV_DEACT_BIT_/APICV_INHIBIT_REASON_/).
>
> +bool kvm_apicv_activated(struct kvm *kvm)
> +{
> + return (READ_ONCE(kvm->arch.apicv_deact_msk) == 0);
> +}
Using READ_ONCE introduces a risk of races. I'll check during a more
thorough review if it's worth introducing separate kvm_apicv_active and
kvm_apicv_active_nolock functions.
Paolo
On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> +void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> +{
> + if (activate) {
> + if (!test_and_clear_bit(bit, &kvm->arch.apicv_deact_msk) ||
> + !kvm_apicv_activated(kvm))
> + return;
> + } else {
> + if (test_and_set_bit(bit, &kvm->arch.apicv_deact_msk) ||
> + kvm_apicv_activated(kvm))
> + return;
> + }
> +
> + kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> +}
> +EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
> +
It's worth documenting the locking requirements of
kvm_request_apicv_update (it can also be negative requirements, such as
"don't hold any lock"), because kvm_make_all_cpus_request is a somewhat
deadlock-prone API.
Again, something I'll check after a more thorough review.
Paolo
On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> + /*
> + * AMD SVM AVIC accelerates EOI write and does not trap.
> + * This cause in-kernel PIT re-inject mode to fail
> + * since it checks ps->irq_ack before kvm_set_irq()
> + * and relies on the ack notifier to timely queue
> + * the pt->worker work iterm and reinject the missed tick.
> + * So, deactivate APICv when PIT is in reinject mode.
> + */
> if (reinject) {
> + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
> /* The initial state is preserved while ps->reinject == 0. */
> kvm_pit_reset_reinject(pit);
> kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> } else {
> + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
> kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
This is not too nice for Intel which does support (through the EOI exit
mask) APICv even if PIT reinjection active.
We can work around it by adding a global mask of inhibit reasons that
apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
care about.
Paolo
On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> + /*
> + * IRQ window is not needed when AVIC is enabled,
> + * unless we have pending ExtINT since it cannot be injected
> + * via AVIC. In such case, we need to temporarily disable AVIC,
> + * and fallback to injecting IRQ via V_IRQ.
> + */
> + if (kvm_vcpu_apicv_active(vcpu))
> + svm_request_update_avic(vcpu, false);
This must be pretty heavy-weight on SMP VMs, even if most ExtINT guests
do not need SMP in the guest. One alternative is to enable/disable
APICv when LVT or IOAPIC registers are written with ExtINT mode. Not a
blocker, just an idea to consider.
Paolo
Paolo,
Thanks for quick response.
On 11/2/19 4:57 AM, Paolo Bonzini wrote:
> On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
>> + /*
>> + * AMD SVM AVIC accelerates EOI write and does not trap.
>> + * This cause in-kernel PIT re-inject mode to fail
>> + * since it checks ps->irq_ack before kvm_set_irq()
>> + * and relies on the ack notifier to timely queue
>> + * the pt->worker work iterm and reinject the missed tick.
>> + * So, deactivate APICv when PIT is in reinject mode.
>> + */
>> if (reinject) {
>> + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
>> /* The initial state is preserved while ps->reinject == 0. */
>> kvm_pit_reset_reinject(pit);
>> kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>> kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>> } else {
>> + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
>> kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
>> kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>
> This is not too nice for Intel which does support (through the EOI exit
> mask) APICv even if PIT reinjection active.
I see you point.
> We can work around it by adding a global mask of inhibit reasons that
> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
>
> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
> care about.
>
> Paolo
>
What about we enhance the pre_update_apivc_exec_ctrl() to also return
success/fail. In here, the vendor specific code can decide to update
APICv state or not.
Thanks,
Suravee
Paolo,
On 11/2/19 4:52 AM, Paolo Bonzini wrote:
> On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
>> +void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
>> +{
>> + if (activate) {
>> + if (!test_and_clear_bit(bit, &kvm->arch.apicv_deact_msk) ||
>> + !kvm_apicv_activated(kvm))
>> + return;
>> + } else {
>> + if (test_and_set_bit(bit, &kvm->arch.apicv_deact_msk) ||
>> + kvm_apicv_activated(kvm))
>> + return;
>> + }
>> +
>> + kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
>> +
>
> It's worth documenting the locking requirements of
> kvm_request_apicv_update (it can also be negative requirements, such as
> "don't hold any lock"), because kvm_make_all_cpus_request is a somewhat
> deadlock-prone API.
Currently, I have a comment in the svm_request_update_avic(), where it
calls kvm_request_apicv_update. I'll move it here instead and enhance
the comment.
Thanks,
Suravee
On Fri, Nov 01, 2019 at 10:41:31PM +0000, Suthikulpanit, Suravee wrote:
> AMD SVM AVIC needs to update APIC backing page mapping before changing
> APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
> function hook to be called prior KVM APICv update request to each vcpu.
This again seems to mix up APIC backing page and APIC access page.
And I must be missing something obvious, but why is it necessary to
unmap the APIC access page while AVIC is disabled? Does keeping it
around stand in the way when working with AVIC disabled?
Thanks,
Roman.
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/svm.c | 6 ++++++
> arch/x86/kvm/x86.c | 2 ++
> 3 files changed, 9 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3b94f42..f93d347 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1094,6 +1094,7 @@ struct kvm_x86_ops {
> void (*enable_irq_window)(struct kvm_vcpu *vcpu);
> void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
> bool (*get_enable_apicv)(struct kvm *kvm);
> + void (*pre_update_apicv_exec_ctrl)(struct kvm *kvm, bool activate);
> void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
> void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
> void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 46842a2..21203a6 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -7230,6 +7230,11 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
> return false;
> }
>
> +static void svm_pre_update_apicv_exec_ctrl(struct kvm *kvm, bool activate)
> +{
> + avic_update_access_page(kvm, activate);
> +}
> +
> static struct kvm_x86_ops svm_x86_ops __ro_after_init = {
> .cpu_has_kvm_support = has_svm,
> .disabled_by_bios = is_disabled,
> @@ -7307,6 +7312,7 @@ static bool svm_need_emulation_on_page_fault(struct kvm_vcpu *vcpu)
> .set_virtual_apic_mode = svm_set_virtual_apic_mode,
> .get_enable_apicv = svm_get_enable_apicv,
> .refresh_apicv_exec_ctrl = svm_refresh_apicv_exec_ctrl,
> + .pre_update_apicv_exec_ctrl = svm_pre_update_apicv_exec_ctrl,
> .load_eoi_exitmap = svm_load_eoi_exitmap,
> .hwapic_irr_update = svm_hwapic_irr_update,
> .hwapic_isr_update = svm_hwapic_isr_update,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4fab93e..c09ff78 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7755,6 +7755,8 @@ void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit)
> }
>
> trace_kvm_apicv_update_request(activate, bit);
> + if (kvm_x86_ops->pre_update_apicv_exec_ctrl)
> + kvm_x86_ops->pre_update_apicv_exec_ctrl(kvm, activate);
> kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> }
> EXPORT_SYMBOL_GPL(kvm_request_apicv_update);
> --
> 1.8.3.1
>
On Fri, Nov 01, 2019 at 10:41:30PM +0000, Suthikulpanit, Suravee wrote:
> Re-factor avic_init_access_page() to avic_update_access_page() since
> activate/deactivate AVIC requires setting/unsetting the memory region used
> for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT).
AFAICT the patch actually touches the (de)allocation of the APIC access
page rather than the APIC backing page (or I'm confused in the
nomenclature).
Thanks,
Roman.
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> arch/x86/kvm/svm.c | 11 +++++------
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index b7d0adc..46842a2 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1668,23 +1668,22 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
> * field of the VMCB. Therefore, we set up the
> * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
> */
> -static int avic_init_access_page(struct kvm_vcpu *vcpu)
> +static int avic_update_access_page(struct kvm *kvm, bool activate)
> {
> - struct kvm *kvm = vcpu->kvm;
> int ret = 0;
>
> mutex_lock(&kvm->slots_lock);
> - if (kvm->arch.apic_access_page_done)
> + if (kvm->arch.apic_access_page_done == activate)
> goto out;
>
> ret = __x86_set_memory_region(kvm,
> APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> APIC_DEFAULT_PHYS_BASE,
> - PAGE_SIZE);
> + activate ? PAGE_SIZE : 0);
> if (ret)
> goto out;
>
> - kvm->arch.apic_access_page_done = true;
> + kvm->arch.apic_access_page_done = activate;
> out:
> mutex_unlock(&kvm->slots_lock);
> return ret;
> @@ -1697,7 +1696,7 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> int id = vcpu->vcpu_id;
> struct vcpu_svm *svm = to_svm(vcpu);
>
> - ret = avic_init_access_page(vcpu);
> + ret = avic_update_access_page(vcpu->kvm, true);
> if (ret)
> return ret;
>
> --
> 1.8.3.1
>
On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
> I see you point.
>
>> We can work around it by adding a global mask of inhibit reasons that
>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
>>
>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
>> care about.
>
> What about we enhance the pre_update_apivc_exec_ctrl() to also return
> success/fail. In here, the vendor specific code can decide to update
> APICv state or not.
That works for me, too. Something like return false for deactivate and
true for activate.
Paolo
On Sat, Nov 02, 2019 at 10:57:47AM +0100, Paolo Bonzini wrote:
> On 01/11/19 23:41, Suthikulpanit, Suravee wrote:
> > + /*
> > + * AMD SVM AVIC accelerates EOI write and does not trap.
> > + * This cause in-kernel PIT re-inject mode to fail
> > + * since it checks ps->irq_ack before kvm_set_irq()
> > + * and relies on the ack notifier to timely queue
> > + * the pt->worker work iterm and reinject the missed tick.
> > + * So, deactivate APICv when PIT is in reinject mode.
> > + */
> > if (reinject) {
> > + kvm_request_apicv_update(kvm, false, APICV_DEACT_BIT_PIT_REINJ);
> > /* The initial state is preserved while ps->reinject == 0. */
> > kvm_pit_reset_reinject(pit);
> > kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> > kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
> > } else {
> > + kvm_request_apicv_update(kvm, true, APICV_DEACT_BIT_PIT_REINJ);
> > kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier);
> > kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier);
>
> This is not too nice for Intel which does support (through the EOI exit
> mask) APICv even if PIT reinjection active.
Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
given a non-empty eoi_exit_bitmap, and enable it back on a clear
eoi_exit_bitmap. This may remove the need to add special treatment to
PIT etc.
Thanks,
Roman.
> Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <[email protected]>:
>
> On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
>> I see you point.
>>
>>> We can work around it by adding a global mask of inhibit reasons that
>>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
>>>
>>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
>>> care about.
>>
>> What about we enhance the pre_update_apivc_exec_ctrl() to also return
>> success/fail. In here, the vendor specific code can decide to update
>> APICv state or not.
>
> That works for me, too. Something like return false for deactivate and
> true for activate.
I'm trying to wrap my head around how that will work with live migration. Do we also need to save/restore the deactivate reasons?
Alex
>
> Paolo
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
On Tue, Nov 05, 2019 at 07:05:57AM +0000, Graf (AWS), Alexander wrote:
>
>
> > Am 04.11.2019 um 22:50 schrieb Paolo Bonzini <[email protected]>:
> >
> > On 04/11/19 19:54, Suthikulpanit, Suravee wrote:
> >> I see you point.
> >>
> >>> We can work around it by adding a global mask of inhibit reasons that
> >>> apply to the vendor, and initializing it as soon as possible in vmx.c/svm.c.
> >>>
> >>> Then kvm_request_apicv_update can ignore reasons that the vendor doesn't
> >>> care about.
> >>
> >> What about we enhance the pre_update_apivc_exec_ctrl() to also return
> >> success/fail. In here, the vendor specific code can decide to update
> >> APICv state or not.
> >
> > That works for me, too. Something like return false for deactivate and
> > true for activate.
>
> I'm trying to wrap my head around how that will work with live
> migration. Do we also need to save/restore the deactivate reasons?
Nope, this is all invisible to userland. The target will deduce the
deactivation reasons on its own from the user-visible setup like PIT
configuration, Hyper-V SynIC, etc.
Roman.
On 05/11/19 00:17, Roman Kagan wrote:
>> This is not too nice for Intel which does support (through the EOI exit
>> mask) APICv even if PIT reinjection active.
> Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
> given a non-empty eoi_exit_bitmap, and enable it back on a clear
> eoi_exit_bitmap. This may remove the need to add special treatment to
> PIT etc.
That is a very nice idea---we can make that a single disable reason,
like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.
Paolo
Roman/Paolo
On 11/5/2019 4:47 PM, Paolo Bonzini wrote:
> On 05/11/19 00:17, Roman Kagan wrote:
>>> This is not too nice for Intel which does support (through the EOI exit
>>> mask) APICv even if PIT reinjection active.
>> Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
>> given a non-empty eoi_exit_bitmap, and enable it back on a clear
>> eoi_exit_bitmap. This may remove the need to add special treatment to
>> PIT etc.
>
> That is a very nice idea---we can make that a single disable reason,
> like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.
I took at look at the svm_load_eoi_exitmap() and it is called via:
kvm_make_scan_ioapic_request() ->
KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() ->
KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap()
The kvm_make_scan_ioapic_request() is called from multiple places:
arch/x86/kvm/irq_comm.c:
* kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing()
arch/x86/kvm/ioapic.c:
* kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier
* kvm_set_ioapic() : Setting ioapic irqchip
* ioapic_mmio_write() -> ioapic_write_indirect()
arch/x86/kvm/lapic.c:
* recalculate_apic_map()
Most calls would be from ioapic_mmio_write()->ioapic_write_indirect().
In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting
APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce
large overhead especially with SMP machine. So, for now, let's just disable APICv
when in-kernel PIT is in reinject (delay) mode.
I'll also add the logic to avoid unnecessary overhead for Intel.
Thanks,
Suravee
Roman,
On 11/4/19 3:53 PM, Roman Kagan wrote:
> On Fri, Nov 01, 2019 at 10:41:30PM +0000, Suthikulpanit, Suravee wrote:
>> Re-factor avic_init_access_page() to avic_update_access_page() since
>> activate/deactivate AVIC requires setting/unsetting the memory region used
>> for virtual APIC backing page (APIC_ACCESS_PAGE_PRIVATE_MEMSLOT).
>
> AFAICT the patch actually touches the (de)allocation of the APIC access
> page rather than the APIC backing page (or I'm confused in the
> nomenclature).
The APIC backing page is allocated during vcpu initialization, while
the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, is initialized per-vm, and is
used mainly for access permission control of the APIC backing page.
There is a comment in the arch/x86/kvm/svm.c:
/**
* Note:
* AVIC hardware walks the nested page table to check permissions,
* but does not use the SPA address specified in the leaf page
* table entry since it uses address in the AVIC_BACKING_PAGE pointer
* field of the VMCB. Therefore, we set up the
* APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
*/
When deactivate APICv, we do not destroy the APIC backing page, but
we need to de-allocate the APIC_ACCESS_PAGE_PRIVATE_MEMSLOT.
Thanks,
Suravee
> Thanks,
> Roman.
>
Roman,
On 11/4/19 4:05 PM, Roman Kagan wrote:
> On Fri, Nov 01, 2019 at 10:41:31PM +0000, Suthikulpanit, Suravee wrote:
>> AMD SVM AVIC needs to update APIC backing page mapping before changing
>> APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
>> function hook to be called prior KVM APICv update request to each vcpu.
> This again seems to mix up APIC backing page and APIC access page.
>
> And I must be missing something obvious, but why is it necessary to
> unmap the APIC access page while AVIC is disabled? Does keeping it
> around stand in the way when working with AVIC disabled?
I have replied to patch 07/17 with explanation.
Yes, keeping the APIC access page while disabling AVIC would cause
the SVM to not function properly.
Thanks,
Suravee
> Thanks,
> Roman.
>
On Mon, Nov 11, 2019 at 06:08:05PM -0600, Suravee Suthikulpanit wrote:
> On 11/4/19 4:05 PM, Roman Kagan wrote:
> > On Fri, Nov 01, 2019 at 10:41:31PM +0000, Suthikulpanit, Suravee wrote:
> > > AMD SVM AVIC needs to update APIC backing page mapping before changing
> > > APICv mode. Introduce struct kvm_x86_ops.pre_update_apicv_exec_ctrl
> > > function hook to be called prior KVM APICv update request to each vcpu.
> > This again seems to mix up APIC backing page and APIC access page.
> >
> > And I must be missing something obvious, but why is it necessary to
> > unmap the APIC access page while AVIC is disabled? Does keeping it
> > around stand in the way when working with AVIC disabled?
>
> I have replied to patch 07/17 with explanation.
>
> Yes, keeping the APIC access page while disabling AVIC would cause
> the SVM to not function properly.
I wonder why? Once AVIC is disabled guest access to this page would
trigger a regular NPT fault vmexit, just as it would with the NPT entry
for this page destroyed, wouldn't it? So there would be no difference
from the host's POV. Am I missing something?
Thanks,
Roman.
On Mon, Nov 11, 2019 at 11:37:53AM -0600, Suravee Suthikulpanit wrote:
> On 11/5/2019 4:47 PM, Paolo Bonzini wrote:
> > On 05/11/19 00:17, Roman Kagan wrote:
> > > > This is not too nice for Intel which does support (through the EOI exit
> > > > mask) APICv even if PIT reinjection active.
> > > Hmm, it's tempting to just make svm_load_eoi_exitmap() disable AVIC when
> > > given a non-empty eoi_exit_bitmap, and enable it back on a clear
> > > eoi_exit_bitmap. This may remove the need to add special treatment to
> > > PIT etc.
> >
> > That is a very nice idea---we can make that a single disable reason,
> > like APICV_DEACTIVATE_REASON_EOI, and Intel can simply never use it.
>
> I took at look at the svm_load_eoi_exitmap() and it is called via:
> kvm_make_scan_ioapic_request() ->
> KVM_REQ_SCAN_IOAPIC -> vcpu_scan_ioapic() ->
> KVM_REQ_LOAD_EOI_EXITMAP -> vcpu_load_eoi_exitmap()
>
> The kvm_make_scan_ioapic_request() is called from multiple places:
>
> arch/x86/kvm/irq_comm.c:
> * kvm_arch_post_irq_routing_update() : Called from kvm_set_irq_routing()
>
> arch/x86/kvm/ioapic.c:
> * kvm_arch_post_irq_ack_notifier_list_update() : (Un)registering irq ack notifier
> * kvm_set_ioapic() : Setting ioapic irqchip
> * ioapic_mmio_write() -> ioapic_write_indirect()
>
> arch/x86/kvm/lapic.c:
> * recalculate_apic_map()
>
> Most calls would be from ioapic_mmio_write()->ioapic_write_indirect().
>
> In case of AMD AVIC, the svm_load_e::vsoi_exitmap() is called several times, and requesting
> APICV (de)activate from here when the eoi_exit_bitmap is set/clear would introduce
> large overhead especially with SMP machine.
This doesn't look like a hot path, so I'm not sure it needs to be
optimized for performance. Especially so since
kvm_make_scan_ioapic_request does kvm_make_all_cpus_request which isn't
particularly fast by definition, and I guess the extra overhead there
won't be noticable.
OTOH introducing extra code paths has its maintenance costs, so sticking
the simple logic in svm_load_eoi_exitmap looks attractive.
Just my 2c,
Roman.