2022-09-03 00:24:52

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 00/23] KVM: x86: AVIC and local APIC fixes+cleanups

Bugs for everyone! Two new notable bug fixes:

- Purge vCPU's "highest ISR" cache when toggling APICv
- Flush TLB when activating AVIC

TL;DR: KVM's AVIC and optimized APIC map code doesn't correctly handle
various edge cases that are architecturally legal(ish), but are unlikely
to occur in most real world scenarios.

I have tested this heavily with KUT, but I haven't booted Windows and
don't have access to x2AVIC, so additional testing would be much
appreciated.

v2:
- Collect reviews. [Li, Maxim]
- Disable only MMIO access when x2APIC is enabled (instead of disabling
all of AVIC). [Maxim]
- Inhibit AVIC when logical IDs are aliased. [Maxim]
- Tweak name of set_virtual_apic_mode() hook. [Maxim]
- Straight up revert logical ID fastpath mess. [Maxim]
- Reword changelog about skipping vCPU during logical setup. [Maxim]
- Fix LDR updates on AVIC. [Maxim?]
- Fix a nasty ISR caching bug.
- Flush TLB when activating AVIC.

v1: https://lore.kernel.org/all/[email protected]

Sean Christopherson (22):
KVM: x86: Purge "highest ISR" cache when updating APICv state
KVM: SVM: Flush the "current" TLB when activating AVIC
KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
target
KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC
KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when
possible"
KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
mode
KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
KVM: x86: Explicitly skip adding vCPU to optimized logical map if
LDR==0
KVM: x86: Explicitly track all possibilities for APIC map's logical
modes
KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
KVM: SVM: Always update local APIC on writes to logical dest register
KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
a running vcpu"

Suravee Suthikulpanit (1):
KVM: SVM: Fix x2APIC Logical ID calculation for
avic_kick_target_vcpus_fast

Documentation/virt/kvm/x86/errata.rst | 11 +
arch/x86/include/asm/kvm_host.h | 37 ++-
arch/x86/kvm/lapic.c | 112 +++++++--
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/svm/avic.c | 321 +++++++++++++-------------
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 11 +-
arch/x86/kvm/x86.c | 35 ++-
8 files changed, 329 insertions(+), 204 deletions(-)


base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
--
2.37.2.789.g6183377224-goog


2022-09-03 00:25:04

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/23] KVM: x86: Purge "highest ISR" cache when updating APICv state

Purge the "highest ISR" cache when updating APICv state on a vCPU. The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.

This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
a chance to update its cache.

Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9dda989a1cf0..38e9b8e5278c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2429,6 +2429,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
*/
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
+ apic->highest_isr_cache = -1;
}
EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);

@@ -2485,7 +2486,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);

@@ -2772,7 +2772,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
__start_apic_timer(apic, APIC_TMCCT);
kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
if (apic->apicv_active) {
static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));
--
2.37.2.789.g6183377224-goog

2022-09-03 00:25:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 02/23] KVM: SVM: Flush the "current" TLB when activating AVIC

Flush the TLB when activating AVIC as the CPU can insert into the TLB
while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware
disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
hardware disabled, AVIC is not guaranteed to be inhibited. As a result,
KVM may create a valid NPT mapping for the APIC base, which the CPU can
cache as a non-AVIC translation.

Note, Intel handles this in vmx_set_virtual_apic_mode().

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 6919dee69f18..4fbef2af1efc 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
/* Disabling MSR intercept for x2APIC registers */
svm_set_x2apic_msr_interception(svm, false);
} else {
+ /*
+ * Flush the TLB, the guest may have inserted a non-APIC
+ * mappings into the TLB while AVIC was disabled.
+ */
+ kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
+
/* For xAVIC and hybrid-xAVIC modes */
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
/* Enabling MSR intercept for x2APIC registers */
--
2.37.2.789.g6183377224-goog

2022-09-03 00:25:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 03/23] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target

Emulate ICR writes on AVIC IPI failures due to invalid targets using the
same logic as failures due to invalid types. AVIC acceleration fails if
_any_ of the targets are invalid, and crucially VM-Exits before sending
IPIs to targets that _are_ valid. In logical mode, the destination is a
bitmap, i.e. a single IPI can target multiple logical IDs. Doing nothing
causes KVM to drop IPIs if at least one target is valid and at least one
target is invalid.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4fbef2af1efc..6a3d225eb02c 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -502,14 +502,18 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index);

switch (id) {
+ case AVIC_IPI_FAILURE_INVALID_TARGET:
case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
/*
* Emulate IPIs that are not handled by AVIC hardware, which
- * only virtualizes Fixed, Edge-Triggered INTRs. The exit is
- * a trap, e.g. ICR holds the correct value and RIP has been
- * advanced, KVM is responsible only for emulating the IPI.
- * Sadly, hardware may sometimes leave the BUSY flag set, in
- * which case KVM needs to emulate the ICR write as well in
+ * only virtualizes Fixed, Edge-Triggered INTRs, and falls over
+ * if _any_ targets are invalid, e.g. if the logical mode mask
+ * is a superset of running vCPUs.
+ *
+ * The exit is a trap, e.g. ICR holds the correct value and RIP
+ * has been advanced, KVM is responsible only for emulating the
+ * IPI. Sadly, hardware may sometimes leave the BUSY flag set,
+ * in which case KVM needs to emulate the ICR write as well in
* order to clear the BUSY flag.
*/
if (icrl & APIC_ICR_BUSY)
@@ -525,8 +529,6 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
*/
avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index);
break;
- case AVIC_IPI_FAILURE_INVALID_TARGET:
- break;
case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
WARN_ONCE(1, "Invalid backing page\n");
break;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:25:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 05/23] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.

avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation. And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.

The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:

kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);

and kvm_vcpu_update_apicv() invokes the refresh if activation changes:

if (apic->apicv_active == activate)
goto out;

apic->apicv_active = activate;
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

Rename the helper to reflect that it is also called during "refresh".

WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
kvm_lapic_set_base+0x149/0x1a0 [kvm]
kvm_set_apic_base+0x8f/0xd0 [kvm]
kvm_set_msr_common+0xa3a/0xdc0 [kvm]
svm_set_msr+0x364/0x6b0 [kvm_amd]
__kvm_set_msr+0xb8/0x1c0 [kvm]
kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
msr_interception+0x1c/0x30 [kvm_amd]
svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
svm_handle_exit+0xfc/0x160 [kvm_amd]
vcpu_enter_guest+0x21bb/0x23e0 [kvm]
vcpu_run+0x92/0x450 [kvm]
kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
kvm_vcpu_ioctl+0x559/0x620 [kvm]

Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 19be5f1afaac..de7fcb3a544b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -747,18 +747,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
avic_handle_ldr_update(vcpu);
}

-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
-{
- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
- WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
- return;
- }
- avic_refresh_apicv_exec_ctrl(vcpu);
-}
-
static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
{
int ret = 0;
@@ -1101,17 +1089,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}

-
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;
- bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
+ return;

if (!enable_apicv)
return;

- if (activated) {
+ if (kvm_vcpu_apicv_active(vcpu)) {
/**
* During AVIC temporary deactivation, guest could update
* APIC ID, DFR and LDR registers, which would not be trapped
@@ -1125,6 +1114,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
avic_deactivate_vmcb(svm);
}
vmcb_mark_dirty(vmcb, VMCB_AVIC);
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+ bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!enable_apicv)
+ return;
+
+ avic_refresh_virtual_apic_mode(vcpu);

if (activated)
avic_vcpu_load(vcpu, vcpu->cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..2aa5069bafb2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4807,7 +4807,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.enable_nmi_window = svm_enable_nmi_window,
.enable_irq_window = svm_enable_irq_window,
.update_cr8_intercept = svm_update_cr8_intercept,
- .set_virtual_apic_mode = avic_set_virtual_apic_mode,
+ .set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6a7686bf6900..7a95f50e80e7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -646,7 +646,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
void avic_ring_doorbell(struct kvm_vcpu *vcpu);
unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);


/* sev.c */
--
2.37.2.789.g6183377224-goog

2022-09-03 00:26:16

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 12/23] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs

Disable the optimized APIC logical map if a logical ID covers multiple
MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode,
events match if "ID & MDA != 0", i.e. creating an entry for only the
first bit can cause interrupts to be missed.

Note, creating an entry for every bit is also wrong as KVM would generate
IPIs for every matching bit. It would be possible to teach KVM to play
nice with this edge case, but it is very much an edge case and probably
not used in any real world OS, i.e. it's not worth optimizing.

Use an impossible value for the "mode" to effectively designate that it's
disabled. Don't bother adding a dedicated "invalid" value, the mode
handling will be cleaned up in the future and it would take just as much
effort to explain what value is "safe" at this time.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/lapic.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d956cd37908e..6b2f538b8fd0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -300,8 +300,15 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
continue;

- if (mask)
- cluster[ffs(mask) - 1] = apic;
+ if (!mask)
+ continue;
+
+ if (!is_power_of_2(mask)) {
+ new->mode = KVM_APIC_MODE_XAPIC_FLAT |
+ KVM_APIC_MODE_XAPIC_CLUSTER;
+ continue;
+ }
+ cluster[ffs(mask) - 1] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.37.2.789.g6183377224-goog

2022-09-03 00:27:19

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 11/23] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU

Add a helper to perform the final kick, two instances of the ICR decoding
is one too many.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index b4b5f1422db7..3400046ad0b4 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -344,6 +344,16 @@ void avic_ring_doorbell(struct kvm_vcpu *vcpu)
put_cpu();
}

+
+static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
+{
+ vcpu->arch.apic->irr_pending = true;
+ svm_complete_interrupt_delivery(vcpu,
+ icrl & APIC_MODE_MASK,
+ icrl & APIC_INT_LEVELTRIG,
+ icrl & APIC_VECTOR_MASK);
+}
+
/*
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
* destination APIC ID to vCPU without looping through all vCPUs.
@@ -442,11 +452,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (unlikely(!target_vcpu))
return 0;

- target_vcpu->arch.apic->irr_pending = true;
- svm_complete_interrupt_delivery(target_vcpu,
- icrl & APIC_MODE_MASK,
- icrl & APIC_INT_LEVELTRIG,
- icrl & APIC_VECTOR_MASK);
+ avic_kick_vcpu(target_vcpu, icrl);
return 0;
}

@@ -470,13 +476,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
- dest, icrl & APIC_DEST_MASK)) {
- vcpu->arch.apic->irr_pending = true;
- svm_complete_interrupt_delivery(vcpu,
- icrl & APIC_MODE_MASK,
- icrl & APIC_INT_LEVELTRIG,
- icrl & APIC_VECTOR_MASK);
- }
+ dest, icrl & APIC_DEST_MASK))
+ avic_kick_vcpu(vcpu, icrl);
}
}

--
2.37.2.789.g6183377224-goog

2022-09-03 00:31:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 20/23] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry

Do not modify AVIC's logical ID table if the logical ID portion of the
LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking
only the first bit means that KVM will fail to match MDAs that intersect
with "higher" bits in the "ID"

The "ID" acts as a bitmap, but is referred to as an ID because theres an
implicit, unenforced "requirement" that software only set one bit. This
edge case is arguably out-of-spec behavior, but KVM cleanly handles it
in all other cases, e.g. the optimized logical map (and AVIC!) is also
disabled in this scenario.

Refactor the code to consolidate the checks, and so that the code looks
more like avic_kick_target_vcpus_fast().

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 894d0afd761b..e34b9baa9ee0 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -540,26 +540,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
{
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
- int index;
u32 *logical_apic_id_table;
- int dlid = GET_APIC_LOGICAL_ID(ldr);
+ u32 cluster, index;

- if (!dlid)
- return NULL;
+ ldr = GET_APIC_LOGICAL_ID(ldr);

- if (flat) { /* flat */
- index = ffs(dlid) - 1;
- if (index > 7)
+ if (flat) {
+ cluster = 0;
+ } else {
+ cluster = (ldr >> 4) << 2;
+ if (cluster >= 0xf)
return NULL;
- } else { /* cluster */
- int cluster = (dlid & 0xf0) >> 4;
- int apic = ffs(dlid & 0x0f) - 1;
-
- if ((apic < 0) || (apic > 7) ||
- (cluster >= 0xf))
- return NULL;
- index = (cluster << 2) + apic;
+ ldr &= 0xf;
}
+ if (!ldr || !is_power_of_2(ldr))
+ return NULL;
+
+ index = __ffs(ldr);
+ if (WARN_ON_ONCE(index > 7))
+ return NULL;
+ index += (cluster << 2);

logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);

--
2.37.2.789.g6183377224-goog

2022-09-03 00:33:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 23/23] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"

Turns out that some warnings exist for good reasons. Restore the warning
in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
running vCPU now that KVM avoids doing so when switching between x2APIC
and xAPIC. The entire point of the WARN is to highlight that KVM should
not be reloading an AVIC.

Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
spamming the kernel if it does fire.

This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 17c78051f3ea..a13279205df3 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -1064,6 +1064,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
return;

entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);

entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
--
2.37.2.789.g6183377224-goog

2022-09-03 00:37:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 18/23] KVM: SVM: Always update local APIC on writes to logical dest register

Update the vCPU's local (virtual) APIC on LDR writes even if the write
"fails". The APIC needs to recalc the optimized logical map even if the
LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
map will be left as is and the vCPU will receive interrupts using its
old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index efb0632d7457..456f24378961 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -600,7 +600,7 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
clear_bit(AVIC_LOGICAL_ID_ENTRY_VALID_BIT, (unsigned long *)entry);
}

-static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
+static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
{
int ret = 0;
struct vcpu_svm *svm = to_svm(vcpu);
@@ -609,10 +609,10 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

/* AVIC does not support LDR update for x2APIC */
if (apic_x2apic_mode(vcpu->arch.apic))
- return 0;
+ return;

if (ldr == svm->ldr_reg)
- return 0;
+ return;

avic_invalidate_logical_id_entry(vcpu);

@@ -621,8 +621,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

if (!ret)
svm->ldr_reg = ldr;
-
- return ret;
}

static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
@@ -644,8 +642,7 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)

switch (offset) {
case APIC_LDR:
- if (avic_handle_ldr_update(vcpu))
- return 0;
+ avic_handle_ldr_update(vcpu);
break;
case APIC_DFR:
avic_handle_dfr_update(vcpu);
--
2.37.2.789.g6183377224-goog

2022-09-03 00:38:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
to fix a bug where the APIC access page is visible to vCPUs that have
x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.

On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
enabled even without x2AVIC support, the bug occurs any time AVIC is
enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
that the guest is operating in x2APIC mode.

Opportunistically drop the "can" while updating avic_activate_vmcb()'s
comment, i.e. to state that KVM _does_ support the hybrid mode. Move
the "Note:" down a line to conform to preferred kernel/KVM multi-line
comment style.

Leave Intel as-is for now to avoid a subtle performance regression, even
though Intel likely suffers from the same bug. On Intel, in theory the
bug rears its head only when vCPUs share host page tables (extremely
likely) and x2APIC enabling is not consistent within the guest, i.e. if
some vCPUs have x2APIC enabled and other does do not (unlikely to occur
except in certain situations, e.g. bringing up APs).

Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
Cc: [email protected]
Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 10 ++++++++++
arch/x86/kvm/lapic.c | 4 +++-
arch/x86/kvm/mmu/mmu.c | 2 +-
arch/x86/kvm/svm/avic.c | 15 +++++++-------
arch/x86/kvm/x86.c | 35 +++++++++++++++++++++++++++++----
5 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..1fd1b66ceeb6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1132,6 +1132,15 @@ enum kvm_apicv_inhibit {
* AVIC is disabled because SEV doesn't support it.
*/
APICV_INHIBIT_REASON_SEV,
+
+ /*
+ * Due to sharing page tables across vCPUs, the xAPIC memslot must be
+ * inhibited if any vCPU has x2APIC enabled. Note, this is a "partial"
+ * inhibit; APICv can still be activated, but KVM mustn't retain/create
+ * SPTEs for the APIC access page. Like the APIC ID and APIC base
+ * inhibits, this is sticky for simplicity.
+ */
+ APICV_INHIBIT_REASON_X2APIC,
};

struct kvm_arch {
@@ -1903,6 +1912,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
struct x86_exception *exception);

+bool kvm_apicv_memslot_activated(struct kvm *kvm);
bool kvm_apicv_activated(struct kvm *kvm);
bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 38e9b8e5278c..d956cd37908e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
}
}

- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
+ if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
+ kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
+ }

if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
kvm_vcpu_update_apicv(vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e418ef3ecfcb..cea25552869f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4150,7 +4150,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* when the AVIC is re-enabled.
*/
if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
- !kvm_apicv_activated(vcpu->kvm))
+ !kvm_apicv_memslot_activated(vcpu->kvm))
return RET_PF_EMULATE;
}

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 6a3d225eb02c..19be5f1afaac 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)

vmcb->control.int_ctl |= AVIC_ENABLE_MASK;

- /* Note:
- * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
- * MSR accesses, while interrupt injection to a running vCPU
- * can be achieved using AVIC doorbell. The AVIC hardware still
- * accelerate MMIO accesses, but this does not cause any harm
- * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
+ /*
+ * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
+ * accesses, while interrupt injection to a running vCPU can be
+ * achieved using AVIC doorbell. KVM disables the APIC access page
+ * (prevents mapping it into the guest) if any vCPU has x2APIC enabled,
+ * thus enabling AVIC activates only the doorbell mechanism.
*/
if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
avic_mode == AVIC_MODE_X2) {
@@ -1014,7 +1014,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
BIT(APICV_INHIBIT_REASON_SEV) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
+ BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
+ BIT(APICV_INHIBIT_REASON_X2APIC);

return supported & BIT(reason);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7374d768296..6ab9088c2531 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9379,15 +9379,29 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}

-bool kvm_apicv_activated(struct kvm *kvm)
+bool kvm_apicv_memslot_activated(struct kvm *kvm)
{
return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
}
+
+static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
+{
+ /*
+ * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
+ * APICv can continue to be utilized.
+ */
+ return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
+}
+
+bool kvm_apicv_activated(struct kvm *kvm)
+{
+ return !kvm_apicv_get_inhibit_reasons(kvm);
+}
EXPORT_SYMBOL_GPL(kvm_apicv_activated);

bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
{
- ulong vm_reasons = READ_ONCE(vcpu->kvm->arch.apicv_inhibit_reasons);
+ ulong vm_reasons = kvm_apicv_get_inhibit_reasons(vcpu->kvm);
ulong vcpu_reasons = static_call(kvm_x86_vcpu_get_apicv_inhibit_reasons)(vcpu);

return (vm_reasons | vcpu_reasons) == 0;
@@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,

set_or_clear_apicv_inhibit(&new, reason, set);

- if (!!old != !!new) {
+ /*
+ * If the overall "is APICv activated" status is unchanged, simply add
+ * or remove the inihbit from the pile. x2APIC is an exception, as it
+ * is a partial inhibit (only blocks SPTEs for the APIC access page).
+ * If x2APIC is the only inhibit in either the old or the new set, then
+ * vCPUs need to be kicked to transition between partially-inhibited
+ * and fully-inhibited.
+ */
+ if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {
/*
* Kick all vCPUs before setting apicv_inhibit_reasons to avoid
* false positives in the sanity check WARN in svm_vcpu_run().
@@ -10137,7 +10159,12 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
*/
kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
kvm->arch.apicv_inhibit_reasons = new;
- if (new) {
+
+ /*
+ * Zap SPTEs for the APIC access page if APICv is newly
+ * inhibited (partially or fully).
+ */
+ if (new && !old) {
unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
kvm_zap_gfn_range(kvm, gfn, gfn+1);
}
--
2.37.2.789.g6183377224-goog

2022-09-03 00:38:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 07/23] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick

Compute the destination from ICRH using the sender's x2APIC status, not
each (potential) target's x2APIC status.

Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID")
Cc: Li RongQing <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Li RongQing <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3022a135c060..50721c9167c4 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -456,6 +456,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
u32 icrl, u32 icrh, u32 index)
{
+ u32 dest = apic_x2apic_mode(source) ? icrh : GET_XAPIC_DEST_FIELD(icrh);
unsigned long i;
struct kvm_vcpu *vcpu;

@@ -471,13 +472,6 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
* since entered the guest will have processed pending IRQs at VMRUN.
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
- u32 dest;
-
- if (apic_x2apic_mode(vcpu->arch.apic))
- dest = icrh;
- else
- dest = GET_XAPIC_DEST_FIELD(icrh);
-
if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
dest, icrl & APIC_DEST_MASK)) {
vcpu->arch.apic->irr_pending = true;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:40:04

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 22/23] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps

Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
unaccelerated write traps. The register is read-only and isn't emulated
by KVM. Sending the register through kvm_apic_write_nodecode() will
result in screaming when x2APIC is enabled due to the unexpected failure
to retrieve the MSR (KVM expects that only "legal" accesses will trap).

Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 39b367a14a8c..17c78051f3ea 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -658,6 +658,9 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
case APIC_DFR:
avic_handle_dfr_update(vcpu);
break;
+ case APIC_RRR:
+ /* Ignore writes to Read Remote Data, it's read-only. */
+ return 1;
default:
break;
}
--
2.37.2.789.g6183377224-goog

2022-09-03 00:40:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 06/23] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean

Replace the "avic_mode" enum with a single bool to track whether or not
x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any
flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
redundant and unnecessary noise.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 46 +++++++++++++++++++----------------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 9 +-------
3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index de7fcb3a544b..3022a135c060 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -53,7 +53,7 @@ static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
static u32 next_vm_id = 0;
static bool next_vm_id_wrapped = 0;
static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
-enum avic_modes avic_mode;
+bool x2avic_enabled;

/*
* This is a wrapper of struct amd_iommu_ir_data.
@@ -79,8 +79,7 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
* (prevents mapping it into the guest) if any vCPU has x2APIC enabled,
* thus enabling AVIC activates only the doorbell mechanism.
*/
- if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
- avic_mode == AVIC_MODE_X2) {
+ if (x2avic_enabled && apic_x2apic_mode(svm->vcpu.arch.apic)) {
vmcb->control.int_ctl |= X2APIC_MODE_MASK;
vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
/* Disabling MSR intercept for x2APIC registers */
@@ -247,8 +246,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
u64 *avic_physical_id_table;
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);

- if ((avic_mode == AVIC_MODE_X1 && index > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && index > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && index > AVIC_MAX_PHYSICAL_ID) ||
+ (index > X2AVIC_MAX_PHYSICAL_ID))
return NULL;

avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page);
@@ -295,8 +294,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
int id = vcpu->vcpu_id;
struct vcpu_svm *svm = to_svm(vcpu);

- if ((avic_mode == AVIC_MODE_X1 && id > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && id > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
+ (id > X2AVIC_MAX_PHYSICAL_ID))
return -EINVAL;

if (!vcpu->arch.apic->regs)
@@ -1094,10 +1093,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;

- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (!enable_apicv)
+ if (!lapic_in_kernel(vcpu) || !enable_apicv)
return;

if (kvm_vcpu_apicv_active(vcpu)) {
@@ -1173,32 +1169,32 @@ bool avic_hardware_setup(struct kvm_x86_ops *x86_ops)
if (!npt_enabled)
return false;

+ /* AVIC is a prerequisite for x2AVIC. */
+ if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) {
+ if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
+ pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
+ pr_warn(FW_BUG "Try enable AVIC using force_avic option");
+ }
+ return false;
+ }
+
if (boot_cpu_has(X86_FEATURE_AVIC)) {
- avic_mode = AVIC_MODE_X1;
pr_info("AVIC enabled\n");
} else if (force_avic) {
/*
* Some older systems does not advertise AVIC support.
* See Revision Guide for specific AMD processor for more detail.
*/
- avic_mode = AVIC_MODE_X1;
pr_warn("AVIC is not supported in CPUID but force enabled");
pr_warn("Your system might crash and burn");
}

/* AVIC is a prerequisite for x2AVIC. */
- if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
- if (avic_mode == AVIC_MODE_X1) {
- avic_mode = AVIC_MODE_X2;
- pr_info("x2AVIC enabled\n");
- } else {
- pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
- pr_warn(FW_BUG "Try enable AVIC using force_avic option");
- }
- }
+ x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC);
+ if (x2avic_enabled)
+ pr_info("x2AVIC enabled\n");

- if (avic_mode != AVIC_MODE_NONE)
- amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+ amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);

- return !!avic_mode;
+ return true;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2aa5069bafb2..709f0b3e7a48 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -821,7 +821,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;

- if (avic_mode != AVIC_MODE_X2 ||
+ if (!x2avic_enabled ||
!apic_x2apic_mode(svm->vcpu.arch.apic))
return;

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7a95f50e80e7..29c334a932c3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,14 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
extern int vgif;
extern bool intercept_smi;
-
-enum avic_modes {
- AVIC_MODE_NONE = 0,
- AVIC_MODE_X1,
- AVIC_MODE_X2,
-};
-
-extern enum avic_modes avic_mode;
+extern bool x2avic_enabled;

/*
* Clean bits in VMCB.
--
2.37.2.789.g6183377224-goog

2022-09-03 00:40:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 14/23] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID. As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

KVM intentionally differs from real hardware, because real hardware
(Knights Landing) does just "x2apic_id & 0xff" to decide whether to
accept the interrupt in xAPIC mode and it can deliver one interrupt to
more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts. It's extremely unlikely any real world guest aliase APIC IDs,
or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work! The hack works if and
only if the optimized APIC map is successfully allocated. If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack. Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk. Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/x86/errata.rst | 11 ++++++
arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++-----
2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 410e0aa63493..49a05f24747b 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -37,3 +37,14 @@ Nested virtualization features
------------------------------

TBD
+
+x2APIC
+------
+When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
+allows sending events to a single vCPU using its x2APIC ID even if the target
+vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
+on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
+have the same physical APIC ID, KVM will deliver events targeting that APIC ID
+only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
+not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
+matching the target APIC ID receive the interrupt).
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 75748c380ceb..4c5f49c4d4f1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -260,10 +260,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
+ u32 x2apic_id, physical_id;
u16 mask;
u32 ldr;
u8 xapic_id;
- u32 x2apic_id;

if (!kvm_apic_present(vcpu))
continue;
@@ -271,16 +271,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
xapic_id = kvm_xapic_id(apic);
x2apic_id = kvm_x2apic_id(apic);

- /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
- if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
- x2apic_id <= new->max_apic_id)
- new->phys_map[x2apic_id] = apic;
/*
- * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
- * prevent them from masking VCPUs with APIC ID <= 0xff.
+ * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
+ * IDs. Allow sending events to vCPUs by their x2APIC ID even
+ * if the target vCPU is in legacy xAPIC mode, and silently
+ * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
+ * bits, causing IDs > 0xff to wrap and collide).
+ *
+ * Honor the architectural (and KVM's non-optimized) behavior
+ * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
+ * is supposed to process messages independently. If multiple
+ * vCPUs have the same effective APIC ID, e.g. due to the
+ * x2APIC wrap or because the guest manually modified its xAPIC
+ * IDs, events targeting that ID are supposed to be recognized
+ * by all vCPUs with said ID.
*/
- if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
- new->phys_map[xapic_id] = apic;
+ if (kvm->arch.x2apic_format) {
+ /* See also kvm_apic_match_physical_addr(). */
+ if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+ x2apic_id <= new->max_apic_id)
+ new->phys_map[x2apic_id] = apic;
+
+ if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+ new->phys_map[xapic_id] = apic;
+ } else {
+ /*
+ * Disable the optimized map if the physical APIC ID is
+ * already mapped, i.e. is aliased to multiple vCPUs.
+ * The optimized map requires a strict 1:1 mapping
+ * between IDs and vCPUs.
+ */
+ if (apic_x2apic_mode(apic))
+ physical_id = x2apic_id;
+ else
+ physical_id = xapic_id;
+
+ if (new->phys_map[physical_id]) {
+ kvfree(new);
+ new = NULL;
+ goto out;
+ }
+ new->phys_map[physical_id] = apic;
+ }

if (!kvm_apic_sw_enabled(apic))
continue;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:41:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 15/23] KVM: x86: Explicitly skip adding vCPU to optimized logical map if LDR==0

Explicitly skip adding a vCPU to the optimized map of logical IDs if the
the vCPU's LDR is '0', i.e. if the vCPU will never response to logical
mode interrupts. KVM already skips the vCPU in this case, but relies on
kvm_apic_map_get_logical_dest() to generate mask==0. KVM still needs the
mask=0 check as a non-zero LDR can yield mask==0 depending on the mode,
but explicitly handling the LDR will make it simpler to clean up the
logical mode tracking in the future.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4c5f49c4d4f1..80528d86f010 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -318,10 +318,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;

ldr = kvm_lapic_get_reg(apic, APIC_LDR);
+ if (!ldr)
+ continue;

if (apic_x2apic_mode(apic)) {
new->mode |= KVM_APIC_MODE_X2APIC;
- } else if (ldr) {
+ } else {
ldr = GET_APIC_LOGICAL_ID(ldr);
if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:41:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 16/23] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

Track all possibilities for the optimized APIC map's logical modes
instead of overloading the pseudo-bitmap and treating any "unknown" value
as "invalid".

As documented by the now-stale comment above the mode values, the values
did have meaning when the optimized map was originally added. That
dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
physical LAPIC array for logical x2APIC"), but the obfuscated behavior
and its comment were left behind.

Opportunistically rename "mode" to "logical_mode", partly to make it
clear that the "disabled" case applies only to the logical map, but also
to prove that there is no lurking code that expects "mode" to be a bitmap.

Functionally, this is a glorified nop.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 21 ++++++++++--------
arch/x86/kvm/lapic.c | 38 ++++++++++++++++++++++++---------
2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1fd1b66ceeb6..13dadc96d9ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -955,19 +955,22 @@ struct kvm_arch_memory_slot {
};

/*
- * We use as the mode the number of bits allocated in the LDR for the
- * logical processor ID. It happens that these are all powers of two.
- * This makes it is very easy to detect cases where the APICs are
- * configured for multiple modes; in that case, we cannot use the map and
- * hence cannot use kvm_irq_delivery_to_apic_fast either.
+ * Track the mode of the optimized logical map, as the rules for decoding the
+ * destination vary per mode. Enabling the optimized logical map requires all
+ * software-enabled local APIs to be in the same mode, each addressable APIC to
+ * be mapped to only one MDA, and each MDA to map to at most one APIC.
*/
-#define KVM_APIC_MODE_XAPIC_CLUSTER 4
-#define KVM_APIC_MODE_XAPIC_FLAT 8
-#define KVM_APIC_MODE_X2APIC 16
+enum kvm_apic_logical_mode {
+ KVM_APIC_MODE_SW_DISABLED,
+ KVM_APIC_MODE_XAPIC_CLUSTER,
+ KVM_APIC_MODE_XAPIC_FLAT,
+ KVM_APIC_MODE_X2APIC,
+ KVM_APIC_MODE_MAP_DISABLED,
+};

struct kvm_apic_map {
struct rcu_head rcu;
- u8 mode;
+ enum kvm_apic_logical_mode logical_mode;
u32 max_apic_id;
union {
struct kvm_lapic *xapic_flat_map[8];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80528d86f010..407e933eb073 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)

static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
- switch (map->mode) {
+ switch (map->logical_mode) {
+ case KVM_APIC_MODE_SW_DISABLED:
+ /* Arbitrarily use the flat map so that @cluster isn't NULL. */
+ *cluster = map->xapic_flat_map;
+ *mask = 0;
+ return true;
case KVM_APIC_MODE_X2APIC: {
u32 offset = (dest_id >> 16) * 16;
u32 max_apic_id = map->max_apic_id;
@@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
*mask = dest_id & 0xf;
return true;
+ case KVM_APIC_MODE_MAP_DISABLED:
+ return false;
default:
- /* Not optimized. */
+ WARN_ON_ONCE(1);
return false;
}
}
@@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
goto out;

new->max_apic_id = max_id;
+ new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
+ enum kvm_apic_logical_mode logical_mode;
u32 x2apic_id, physical_id;
u16 mask;
u32 ldr;
@@ -314,7 +323,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
new->phys_map[physical_id] = apic;
}

- if (!kvm_apic_sw_enabled(apic))
+ if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
+ !kvm_apic_sw_enabled(apic))
continue;

ldr = kvm_lapic_get_reg(apic, APIC_LDR);
@@ -322,25 +332,33 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;

if (apic_x2apic_mode(apic)) {
- new->mode |= KVM_APIC_MODE_X2APIC;
+ logical_mode = KVM_APIC_MODE_X2APIC;
} else {
ldr = GET_APIC_LOGICAL_ID(ldr);
if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
- new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
+ logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
else
- new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
+ logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
}
+ if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
+ new->logical_mode != logical_mode) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ continue;
+ }
+ new->logical_mode = logical_mode;

- if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
+ if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+ &cluster, &mask))) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
continue;
+ }

if (!mask)
continue;

ldr = ffs(mask) - 1;
if (!is_power_of_2(mask) || cluster[ldr]) {
- new->mode = KVM_APIC_MODE_XAPIC_FLAT |
- KVM_APIC_MODE_XAPIC_CLUSTER;
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
continue;
}
cluster[ldr] = apic;
@@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
{
if (kvm->arch.x2apic_broadcast_quirk_disabled) {
if ((irq->dest_id == APIC_BROADCAST &&
- map->mode != KVM_APIC_MODE_X2APIC))
+ map->logical_mode != KVM_APIC_MODE_X2APIC))
return true;
if (irq->dest_id == X2APIC_BROADCAST)
return true;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:41:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 09/23] Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible"

Due to a likely mismerge of patches, KVM ended up with a superfluous
commit to "enable" AVIC's fast path for x2AVIC mode. Even worse, the
superfluous commit has several bugs and creates a nasty local shadow
variable.

Rather than fix the bugs piece-by-piece[*] to achieve the same end
result, revert the patch wholesale.

Opportunistically add a comment documenting the x2AVIC dependencies.

This reverts commit 8c9e639da435874fb845c4d296ce55664071ea7a.

[*] https://lore.kernel.org/all/[email protected]

Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 29 +++++++++++------------------
1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 163edc42f979..8259a64c99d6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -405,7 +405,17 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source

logid_index = cluster + __ffs(bitmap);

- if (!apic_x2apic_mode(source)) {
+ if (apic_x2apic_mode(source)) {
+ /*
+ * For x2APIC, the logical APIC ID is a read-only value
+ * that is derived from the x2APIC ID, thus the x2APIC
+ * ID can be found by reversing the calculation (done
+ * above). Note, bits 31:20 of the x2APIC ID are not
+ * propagated to the logical ID, but KVM limits the
+ * x2APIC ID limited to KVM_MAX_VCPU_IDS.
+ */
+ l1_physical_id = logid_index;
+ } else {
u32 *avic_logical_id_table =
page_address(kvm_svm->avic_logical_id_table_page);

@@ -420,23 +430,6 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source

l1_physical_id = logid_entry &
AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
- } else {
- /*
- * For x2APIC logical mode, cannot leverage the index.
- * Instead, calculate physical ID from logical ID in ICRH.
- */
- int cluster = (icrh & 0xffff0000) >> 16;
- int apic = ffs(icrh & 0xffff) - 1;
-
- /*
- * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
- * contains anything but a single bit, we cannot use the
- * fast path, because it is limited to a single vCPU.
- */
- if (apic < 0 || icrh != (1 << apic))
- return -EINVAL;
-
- l1_physical_id = (cluster << 4) + apic;
}
}

--
2.37.2.789.g6183377224-goog

2022-09-03 00:42:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 10/23] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
vCPU ID, i.e. that there's no need to check for a destination match in
the AVIC kick fast path.

Opportunistically tweak comments to remove "guest bug", as that suggests
KVM is punting on error handling, which is not the case. Targeting a
non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
or not it's a guest bug is irrelevant. Such behavior is architecturally
legal and thus needs to faithfully emulated by KVM (and it is).

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8259a64c99d6..b4b5f1422db7 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -395,8 +395,8 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
cluster = (dest >> 4) << 2;
}

+ /* Nothing to do if there are no destinations in the cluster. */
if (unlikely(!bitmap))
- /* guest bug: nobody to send the logical interrupt to */
return 0;

if (!is_power_of_2(bitmap))
@@ -424,7 +424,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (WARN_ON_ONCE(index != logid_index))
return -EINVAL;

- /* guest bug: non existing/reserved logical destination */
+ /* Nothing to do if the logical destination is invalid. */
if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
return 0;

@@ -433,9 +433,13 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
}
}

+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
+ * vCPU doesn't exist.
+ */
target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
if (unlikely(!target_vcpu))
- /* guest bug: non existing vCPU is a target of this IPI*/
return 0;

target_vcpu->arch.apic->irr_pending = true;
--
2.37.2.789.g6183377224-goog

2022-09-03 00:42:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

Disable the optimized APIC logical map if multiple vCPUs are aliased to
the same logical ID. Architecturally, all CPUs whose logical ID matches
the MDA are supposed to receive the interrupt; overwriting existing map
entries can result in missed IPIs.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/lapic.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6b2f538b8fd0..75748c380ceb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!mask)
continue;

- if (!is_power_of_2(mask)) {
+ ldr = ffs(mask) - 1;
+ if (!is_power_of_2(mask) || cluster[ldr]) {
new->mode = KVM_APIC_MODE_XAPIC_FLAT |
KVM_APIC_MODE_XAPIC_CLUSTER;
continue;
}
- cluster[ffs(mask) - 1] = apic;
+ cluster[ldr] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.37.2.789.g6183377224-goog

2022-09-03 00:53:51

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 08/23] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast

From: Suravee Suthikulpanit <[email protected]>

For X2APIC ID in cluster mode, the logical ID is bit [15:0].

Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 50721c9167c4..163edc42f979 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -383,7 +383,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
- bitmap = dest & 0xFFFF0000;
+ bitmap = dest & 0xFFFF;
cluster = (dest >> 16) << 4;
} else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
/* 8 bit dest mask*/
--
2.37.2.789.g6183377224-goog

2022-09-03 00:56:43

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 21/23] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

Iterate over all target logical IDs in the AVIC kick fastpath instead of
bailing if there is more than one target. Now that KVM inhibits AVIC if
vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is
guaranteed to match to at most one vCPU, i.e. iterating over the bitmap
is guaranteed to kick each valid target exactly once.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 112 ++++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index e34b9baa9ee0..39b367a14a8c 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -354,6 +354,50 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
icrl & APIC_VECTOR_MASK);
}

+static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
+ u32 icrl)
+{
+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID.
+ */
+ struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
+
+ /* Once again, nothing to do if the target vCPU doesn't exist. */
+ if (unlikely(!target_vcpu))
+ return;
+
+ avic_kick_vcpu(target_vcpu, icrl);
+}
+
+static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
+ u32 logid_index, u32 icrl)
+{
+ u32 physical_id;
+
+ if (avic_logical_id_table) {
+ u32 logid_entry = avic_logical_id_table[logid_index];
+
+ /* Nothing to do if the logical destination is invalid. */
+ if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
+ return;
+
+ physical_id = logid_entry &
+ AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
+ } else {
+ /*
+ * For x2APIC, the logical APIC ID is a read-only value that is
+ * derived from the x2APIC ID, thus the x2APIC ID can be found
+ * by reversing the calculation (stored in logid_index). Note,
+ * bits 31:20 of the x2APIC ID aren't propagated to the logical
+ * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
+ */
+ physical_id = logid_index;
+ }
+
+ avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
+}
+
/*
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
* destination APIC ID to vCPU without looping through all vCPUs.
@@ -361,11 +405,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
u32 icrl, u32 icrh, u32 index)
{
- u32 l1_physical_id, dest;
- struct kvm_vcpu *target_vcpu;
int dest_mode = icrl & APIC_DEST_MASK;
int shorthand = icrl & APIC_SHORT_MASK;
struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+ u32 dest;

if (shorthand != APIC_DEST_NOSHORT)
return -EINVAL;
@@ -382,14 +425,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
return -EINVAL;

- l1_physical_id = dest;
-
- if (WARN_ON_ONCE(l1_physical_id != index))
+ if (WARN_ON_ONCE(dest != index))
return -EINVAL;

+ avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
} else {
- u32 bitmap, cluster;
- int logid_index;
+ u32 *avic_logical_id_table;
+ unsigned long bitmap, i;
+ u32 cluster;

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
@@ -409,50 +452,21 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (unlikely(!bitmap))
return 0;

- if (!is_power_of_2(bitmap))
- /* multiple logical destinations, use slow path */
- return -EINVAL;
-
- logid_index = cluster + __ffs(bitmap);
-
- if (apic_x2apic_mode(source)) {
- /*
- * For x2APIC, the logical APIC ID is a read-only value
- * that is derived from the x2APIC ID, thus the x2APIC
- * ID can be found by reversing the calculation (done
- * above). Note, bits 31:20 of the x2APIC ID are not
- * propagated to the logical ID, but KVM limits the
- * x2APIC ID limited to KVM_MAX_VCPU_IDS.
- */
- l1_physical_id = logid_index;
- } else {
- u32 *avic_logical_id_table =
- page_address(kvm_svm->avic_logical_id_table_page);
-
- u32 logid_entry = avic_logical_id_table[logid_index];
-
- if (WARN_ON_ONCE(index != logid_index))
- return -EINVAL;
-
- /* Nothing to do if the logical destination is invalid. */
- if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
- return 0;
-
- l1_physical_id = logid_entry &
- AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
- }
+ if (apic_x2apic_mode(source))
+ avic_logical_id_table = NULL;
+ else
+ avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
+
+ /*
+ * AVIC is inhibited if vCPUs aren't mapped 1:1 with logical
+ * IDs, thus each bit in the destination is guaranteed to map
+ * to at most one vCPU.
+ */
+ for_each_set_bit(i, &bitmap, 16)
+ avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
+ cluster + i, icrl);
}

- /*
- * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
- * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
- * vCPU doesn't exist.
- */
- target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
- if (unlikely(!target_vcpu))
- return 0;
-
- avic_kick_vcpu(target_vcpu, icrl);
return 0;
}

--
2.37.2.789.g6183377224-goog

2022-09-03 00:56:49

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 19/23] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"

Update SVM's cache of the LDR even if the new value is "bad". Leaving
stale information in the cache can result in KVM missing updates and/or
invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
is triggered after a different vCPU has "claimed" the old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 456f24378961..894d0afd761b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -566,23 +566,24 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
return &logical_apic_id_table[index];
}

-static int avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
+static void avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
{
bool flat;
u32 *entry, new_entry;

+ if (!ldr)
+ return;
+
flat = kvm_lapic_get_reg(vcpu->arch.apic, APIC_DFR) == APIC_DFR_FLAT;
entry = avic_get_logical_id_entry(vcpu, ldr, flat);
if (!entry)
- return -EINVAL;
+ return;

new_entry = READ_ONCE(*entry);
new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
WRITE_ONCE(*entry, new_entry);
-
- return 0;
}

static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
@@ -602,7 +603,6 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)

static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
{
- int ret = 0;
struct vcpu_svm *svm = to_svm(vcpu);
u32 ldr = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LDR);
u32 id = kvm_xapic_id(vcpu->arch.apic);
@@ -616,11 +616,8 @@ static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)

avic_invalidate_logical_id_entry(vcpu);

- if (ldr)
- ret = avic_ldr_write(vcpu, id, ldr);
-
- if (!ret)
- svm->ldr_reg = ldr;
+ svm->ldr_reg = ldr;
+ avic_ldr_write(vcpu, id, ldr);
}

static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
--
2.37.2.789.g6183377224-goog

2022-09-03 00:56:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 17/23] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode

Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID.
Architecturally, all CPUs whose logical ID matches the MDA are supposed
to receive the interrupt; overwriting existing entries in AVIC's
logical=>physical map can result in missed IPIs.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/lapic.c | 5 +++++
arch/x86/kvm/svm/avic.c | 3 ++-
3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 13dadc96d9ac..042dcdf987d2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1144,6 +1144,12 @@ enum kvm_apicv_inhibit {
* inhibits, this is sticky for simplicity.
*/
APICV_INHIBIT_REASON_X2APIC,
+
+ /*
+ * AVIC is disabled because not all vCPUs with a valid LDR have a 1:1
+ * mapping between logical ID and vCPU.
+ */
+ APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
};

struct kvm_arch {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 407e933eb073..4cebbdd3431b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -364,6 +364,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
cluster[ldr] = apic;
}
out:
+ if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+ kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+ else
+ kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+
old = rcu_dereference_protected(kvm->arch.apic_map,
lockdep_is_held(&kvm->arch.apic_map_lock));
rcu_assign_pointer(kvm->arch.apic_map, new);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3400046ad0b4..efb0632d7457 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -994,7 +994,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_SEV) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_X2APIC);
+ BIT(APICV_INHIBIT_REASON_X2APIC) |
+ BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);

return supported & BIT(reason);
}
--
2.37.2.789.g6183377224-goog

2022-09-05 22:08:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 02/23] KVM: SVM: Flush the "current" TLB when activating AVIC

On 9/3/22 02:22, Sean Christopherson wrote:
> Flush the TLB when activating AVIC as the CPU can insert into the TLB
> while AVIC is "locally" disabled. KVM doesn't treat "APIC hardware
> disabled" as VM-wide AVIC inhibition, and so when a vCPU has its APIC
> hardware disabled, AVIC is not guaranteed to be inhibited. As a result,
> KVM may create a valid NPT mapping for the APIC base, which the CPU can
> cache as a non-AVIC translation.
>
> Note, Intel handles this in vmx_set_virtual_apic_mode().
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6919dee69f18..4fbef2af1efc 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -86,6 +86,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> /* Disabling MSR intercept for x2APIC registers */
> svm_set_x2apic_msr_interception(svm, false);
> } else {
> + /*
> + * Flush the TLB, the guest may have inserted a non-APIC
> + * mappings into the TLB while AVIC was disabled.

mapping

Otherwise,

Reviewed-by: Paolo Bonzini <[email protected]>

Paolo

> + */
> + kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, &svm->vcpu);
> +
> /* For xAVIC and hybrid-xAVIC modes */
> vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
> /* Enabling MSR intercept for x2APIC registers */

2022-09-05 22:09:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On 9/3/22 02:22, Sean Christopherson wrote:
> Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> to fix a bug where the APIC access page is visible to vCPUs that have
> x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
>
> On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> enabled even without x2AVIC support, the bug occurs any time AVIC is
> enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
>
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
>
> Leave Intel as-is for now to avoid a subtle performance regression, even
> though Intel likely suffers from the same bug. On Intel, in theory the
> bug rears its head only when vCPUs share host page tables (extremely
> likely) and x2APIC enabling is not consistent within the guest, i.e. if
> some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> except in certain situations, e.g. bringing up APs).
>
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Cc: [email protected]
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> arch/x86/kvm/lapic.c | 4 +++-
> arch/x86/kvm/mmu/mmu.c | 2 +-
> arch/x86/kvm/svm/avic.c | 15 +++++++-------
> arch/x86/kvm/x86.c | 35 +++++++++++++++++++++++++++++----
> 5 files changed, 53 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..1fd1b66ceeb6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1132,6 +1132,15 @@ enum kvm_apicv_inhibit {
> * AVIC is disabled because SEV doesn't support it.
> */
> APICV_INHIBIT_REASON_SEV,
> +
> + /*
> + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> + * inhibited if any vCPU has x2APIC enabled. Note, this is a "partial"
> + * inhibit; APICv can still be activated, but KVM mustn't retain/create
> + * SPTEs for the APIC access page. Like the APIC ID and APIC base
> + * inhibits, this is sticky for simplicity.
> + */
> + APICV_INHIBIT_REASON_X2APIC,
> };
>
> struct kvm_arch {
> @@ -1903,6 +1912,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
> gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
> struct x86_exception *exception);
>
> +bool kvm_apicv_memslot_activated(struct kvm *kvm);
> bool kvm_apicv_activated(struct kvm *kvm);
> bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
> void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 38e9b8e5278c..d956cd37908e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> }
> }
>
> - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
> + }
>
> if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
> kvm_vcpu_update_apicv(vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e418ef3ecfcb..cea25552869f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4150,7 +4150,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> * when the AVIC is re-enabled.
> */
> if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
> - !kvm_apicv_activated(vcpu->kvm))
> + !kvm_apicv_memslot_activated(vcpu->kvm))
> return RET_PF_EMULATE;
> }
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6a3d225eb02c..19be5f1afaac 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>
> vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>
> - /* Note:
> - * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> - * MSR accesses, while interrupt injection to a running vCPU
> - * can be achieved using AVIC doorbell. The AVIC hardware still
> - * accelerate MMIO accesses, but this does not cause any harm
> - * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> + /*
> + * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> + * accesses, while interrupt injection to a running vCPU can be
> + * achieved using AVIC doorbell. KVM disables the APIC access page
> + * (prevents mapping it into the guest) if any vCPU has x2APIC enabled,
> + * thus enabling AVIC activates only the doorbell mechanism.
> */
> if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
> avic_mode == AVIC_MODE_X2) {
> @@ -1014,7 +1014,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> BIT(APICV_INHIBIT_REASON_SEV) |
> BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> - BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> + BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> + BIT(APICV_INHIBIT_REASON_X2APIC);
>
> return supported & BIT(reason);
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..6ab9088c2531 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9379,15 +9379,29 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
> }
>
> -bool kvm_apicv_activated(struct kvm *kvm)
> +bool kvm_apicv_memslot_activated(struct kvm *kvm)
> {
> return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
> }
> +
> +static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
> +{
> + /*
> + * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
> + * APICv can continue to be utilized.
> + */
> + return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
> +}
> +
> +bool kvm_apicv_activated(struct kvm *kvm)
> +{
> + return !kvm_apicv_get_inhibit_reasons(kvm);
> +}
> EXPORT_SYMBOL_GPL(kvm_apicv_activated);
>
> bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
> {
> - ulong vm_reasons = READ_ONCE(vcpu->kvm->arch.apicv_inhibit_reasons);
> + ulong vm_reasons = kvm_apicv_get_inhibit_reasons(vcpu->kvm);
> ulong vcpu_reasons = static_call(kvm_x86_vcpu_get_apicv_inhibit_reasons)(vcpu);
>
> return (vm_reasons | vcpu_reasons) == 0;
> @@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>
> set_or_clear_apicv_inhibit(&new, reason, set);
>
> - if (!!old != !!new) {
> + /*
> + * If the overall "is APICv activated" status is unchanged, simply add
> + * or remove the inihbit from the pile. x2APIC is an exception, as it
> + * is a partial inhibit (only blocks SPTEs for the APIC access page).
> + * If x2APIC is the only inhibit in either the old or the new set, then
> + * vCPUs need to be kicked to transition between partially-inhibited
> + * and fully-inhibited.
> + */
> + if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {
> /*
> * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
> * false positives in the sanity check WARN in svm_vcpu_run().
> @@ -10137,7 +10159,12 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> */
> kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> kvm->arch.apicv_inhibit_reasons = new;
> - if (new) {
> +
> + /*
> + * Zap SPTEs for the APIC access page if APICv is newly
> + * inhibited (partially or fully).
> + */
> + if (new && !old) {
> unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
> kvm_zap_gfn_range(kvm, gfn, gfn+1);
> }

Reviewed-by: Paolo Bonzini <[email protected]>

2022-09-05 22:16:53

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 03/23] KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid target

On 9/3/22 02:22, Sean Christopherson wrote:
> Emulate ICR writes on AVIC IPI failures due to invalid targets using the
> same logic as failures due to invalid types. AVIC acceleration fails if
> _any_ of the targets are invalid, and crucially VM-Exits before sending
> IPIs to targets that _are_ valid. In logical mode, the destination is a
> bitmap, i.e. a single IPI can target multiple logical IDs. Doing nothing
> causes KVM to drop IPIs if at least one target is valid and at least one
> target is invalid.
>
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 4fbef2af1efc..6a3d225eb02c 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -502,14 +502,18 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
> trace_kvm_avic_incomplete_ipi(vcpu->vcpu_id, icrh, icrl, id, index);
>
> switch (id) {
> + case AVIC_IPI_FAILURE_INVALID_TARGET:
> case AVIC_IPI_FAILURE_INVALID_INT_TYPE:
> /*
> * Emulate IPIs that are not handled by AVIC hardware, which
> - * only virtualizes Fixed, Edge-Triggered INTRs. The exit is
> - * a trap, e.g. ICR holds the correct value and RIP has been
> - * advanced, KVM is responsible only for emulating the IPI.
> - * Sadly, hardware may sometimes leave the BUSY flag set, in
> - * which case KVM needs to emulate the ICR write as well in
> + * only virtualizes Fixed, Edge-Triggered INTRs, and falls over
> + * if _any_ targets are invalid, e.g. if the logical mode mask
> + * is a superset of running vCPUs.
> + *
> + * The exit is a trap, e.g. ICR holds the correct value and RIP
> + * has been advanced, KVM is responsible only for emulating the
> + * IPI. Sadly, hardware may sometimes leave the BUSY flag set,
> + * in which case KVM needs to emulate the ICR write as well in
> * order to clear the BUSY flag.
> */
> if (icrl & APIC_ICR_BUSY)
> @@ -525,8 +529,6 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
> */
> avic_kick_target_vcpus(vcpu->kvm, apic, icrl, icrh, index);
> break;
> - case AVIC_IPI_FAILURE_INVALID_TARGET:
> - break;
> case AVIC_IPI_FAILURE_INVALID_BACKING_PAGE:
> WARN_ONCE(1, "Invalid backing page\n");
> break;

Reviewed-by: Paolo Bonzini <[email protected]>

2022-09-05 22:20:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 01/23] KVM: x86: Purge "highest ISR" cache when updating APICv state

On 9/3/22 02:22, Sean Christopherson wrote:
> Purge the "highest ISR" cache when updating APICv state on a vCPU. The
> cache must not be used when APICv is active as hardware may emulate EOIs
> (and other operations) without exiting to KVM.
>
> This fixes a bug where KVM will effectively block IRQs in perpetuity due
> to the "highest ISR" never getting reset if APICv is activated on a vCPU
> while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
> a chance to update its cache.
>
> Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
> Cc: [email protected]
> Cc: Suravee Suthikulpanit <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9dda989a1cf0..38e9b8e5278c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2429,6 +2429,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
> */
> apic->isr_count = count_vectors(apic->regs + APIC_ISR);
> }
> + apic->highest_isr_cache = -1;
> }
> EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);
>
> @@ -2485,7 +2486,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
> kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
> }
> kvm_apic_update_apicv(vcpu);
> - apic->highest_isr_cache = -1;
> update_divide_count(apic);
> atomic_set(&apic->lapic_timer.pending, 0);
>
> @@ -2772,7 +2772,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> __start_apic_timer(apic, APIC_TMCCT);
> kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
> kvm_apic_update_apicv(vcpu);
> - apic->highest_isr_cache = -1;
> if (apic->apicv_active) {
> static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
> static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));

Reviewed-by: Paolo Bonzini <[email protected]>

2022-09-13 19:59:17

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

Sean,

This patch inhibits VM running in x2APIC mode on system w/ x2AVIC support.

On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> to fix a bug where the APIC access page is visible to vCPUs that have
> x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
>
> On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> enabled even without x2AVIC support, the bug occurs any time AVIC is
> enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
>
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
>
> Leave Intel as-is for now to avoid a subtle performance regression, even
> though Intel likely suffers from the same bug. On Intel, in theory the
> bug rears its head only when vCPUs share host page tables (extremely
> likely) and x2APIC enabling is not consistent within the guest, i.e. if
> some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> except in certain situations, e.g. bringing up APs).
>
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Cc: [email protected]
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> arch/x86/kvm/lapic.c | 4 +++-
> arch/x86/kvm/mmu/mmu.c | 2 +-
> arch/x86/kvm/svm/avic.c | 15 +++++++-------
> arch/x86/kvm/x86.c | 35 +++++++++++++++++++++++++++++----
> 5 files changed, 53 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..1fd1b66ceeb6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1132,6 +1132,15 @@ enum kvm_apicv_inhibit {
> * AVIC is disabled because SEV doesn't support it.
> */
> APICV_INHIBIT_REASON_SEV,
> +
> + /*
> + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> + * inhibited if any vCPU has x2APIC enabled. Note, this is a "partial"
> + * inhibit; APICv can still be activated, but KVM mustn't retain/create
> + * SPTEs for the APIC access page. Like the APIC ID and APIC base
> + * inhibits, this is sticky for simplicity.
> + */
> + APICV_INHIBIT_REASON_X2APIC,

Actually, shouldn't the APICV_INHIBIT_REASON_X2APIC is set only when
vCPU has x2APIC enabled on the system with _NO x2AVIC support_ ? For
example, .....

> };
>
> struct kvm_arch {
> @@ -1903,6 +1912,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
> gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
> struct x86_exception *exception);
>
> +bool kvm_apicv_memslot_activated(struct kvm *kvm);
> bool kvm_apicv_activated(struct kvm *kvm);
> bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
> void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 38e9b8e5278c..d956cd37908e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> }
> }
>
> - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
> + }

.... Here, since we do not want to inhibit APICV/AVIC on system that can
support x2AVIC, this should be set in the vendor-specific call-back
function, where appropriate checks can be made.

>
> if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
> kvm_vcpu_update_apicv(vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e418ef3ecfcb..cea25552869f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4150,7 +4150,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> * when the AVIC is re-enabled.
> */
> if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
> - !kvm_apicv_activated(vcpu->kvm))
> + !kvm_apicv_memslot_activated(vcpu->kvm))
> return RET_PF_EMULATE;
> }
>
> ....
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..6ab9088c2531 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9379,15 +9379,29 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
> kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
> }
>
> -bool kvm_apicv_activated(struct kvm *kvm)
> +bool kvm_apicv_memslot_activated(struct kvm *kvm)
> {
> return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
> }
> +
> +static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
> +{
> + /*
> + * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
> + * APICv can continue to be utilized.
> + */
> + return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
> +}
> +
> +bool kvm_apicv_activated(struct kvm *kvm)
> +{
> + return !kvm_apicv_get_inhibit_reasons(kvm);
> +}
> EXPORT_SYMBOL_GPL(kvm_apicv_activated);
>
> bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
> {
> - ulong vm_reasons = READ_ONCE(vcpu->kvm->arch.apicv_inhibit_reasons);
> + ulong vm_reasons = kvm_apicv_get_inhibit_reasons(vcpu->kvm);
> ulong vcpu_reasons = static_call(kvm_x86_vcpu_get_apicv_inhibit_reasons)(vcpu);
>
> return (vm_reasons | vcpu_reasons) == 0;
> @@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>
> set_or_clear_apicv_inhibit(&new, reason, set);
>
> - if (!!old != !!new) {
> + /*
> + * If the overall "is APICv activated" status is unchanged, simply add
> + * or remove the inihbit from the pile. x2APIC is an exception, as it
> + * is a partial inhibit (only blocks SPTEs for the APIC access page).
> + * If x2APIC is the only inhibit in either the old or the new set, then
> + * vCPUs need to be kicked to transition between partially-inhibited
> + * and fully-inhibited.
> + */
> + if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {

Why are we comparing APICV inhibit reasons (old, new) with X2APIC_ENABLE
here? Do you mean to compare with APICV_INHIBIT_REASON_X2APIC?

Thanks,
Suravee

> /*
> * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
> * false positives in the sanity check WARN in svm_vcpu_run().
> @@ -10137,7 +10159,12 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> */
> kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
> kvm->arch.apicv_inhibit_reasons = new;
> - if (new) {
> +
> + /*
> + * Zap SPTEs for the APIC access page if APICv is newly
> + * inhibited (partially or fully).
> + */
> + if (new && !old) {
> unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
> kvm_zap_gfn_range(kvm, gfn, gfn+1);
> }

2022-09-13 23:42:19

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

Hi Sean

On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> Disable the optimized APIC logical map if multiple vCPUs are aliased to
> the same logical ID. Architecturally, all CPUs whose logical ID matches
> the MDA are supposed to receive the interrupt; overwriting existing map
> entries can result in missed IPIs.
>
> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> Signed-off-by: Sean Christopherson <[email protected]>
> Reviewed-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 6b2f538b8fd0..75748c380ceb 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> if (!mask)
> continue;
>
> - if (!is_power_of_2(mask)) {
> + ldr = ffs(mask) - 1;
> + if (!is_power_of_2(mask) || cluster[ldr]) {

Should this be checking if the cluster[ldr] is pointing to the same
struct apic instead? For example:

if (!is_power_of_2(mask) || cluster[ldr] != apic)

From my observation, the kvm_recalculate_apic_map() can be called many
times, and the cluster[ldr] could have already been assigned from the
previous invocation. So, as long as it is the same, it should be okay.

Best Regards,
Suravee

> new->mode = KVM_APIC_MODE_XAPIC_FLAT |
> KVM_APIC_MODE_XAPIC_CLUSTER;
> continue;
> }
> - cluster[ffs(mask) - 1] = apic;
> + cluster[ldr] = apic;
> }
> out:
> old = rcu_dereference_protected(kvm->arch.apic_map,

2022-09-14 07:51:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
> Hi Sean
>
> On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> > Disable the optimized APIC logical map if multiple vCPUs are aliased to
> > the same logical ID. Architecturally, all CPUs whose logical ID matches
> > the MDA are supposed to receive the interrupt; overwriting existing map
> > entries can result in missed IPIs.
> >
> > Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> > Signed-off-by: Sean Christopherson <[email protected]>
> > Reviewed-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/lapic.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 6b2f538b8fd0..75748c380ceb 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> > if (!mask)
> > continue;
> > - if (!is_power_of_2(mask)) {
> > + ldr = ffs(mask) - 1;
> > + if (!is_power_of_2(mask) || cluster[ldr]) {
>
> Should this be checking if the cluster[ldr] is pointing to the same struct
> apic instead? For example:
>
> if (!is_power_of_2(mask) || cluster[ldr] != apic)
>
> From my observation, the kvm_recalculate_apic_map() can be called many
> times, and the cluster[ldr] could have already been assigned from the
> previous invocation. So, as long as it is the same, it should be okay.

No, because cluster[ldr] can never match "apic". kvm_recalculate_apic_map()
creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
update of the current map.

The loop containing this code is:

kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;

...
}

so it's impossible for cluster[ldr] to hold the current "apic", because this is
the first and only iteration that processes the current "apic".

2022-09-14 08:16:22

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
> Sean,
>
> This patch inhibits VM running in x2APIC mode on system w/ x2AVIC support.

The intent is that it only inhibits the MMIO page.

> On 9/2/2022 7:22 PM, Sean Christopherson wrote:
> > Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> > to fix a bug where the APIC access page is visible to vCPUs that have
> > x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
> >
> > On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> > enabled even without x2AVIC support, the bug occurs any time AVIC is
> > enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> > that the guest is operating in x2APIC mode.
> >
> > Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> > comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> > the "Note:" down a line to conform to preferred kernel/KVM multi-line
> > comment style.
> >
> > Leave Intel as-is for now to avoid a subtle performance regression, even
> > though Intel likely suffers from the same bug. On Intel, in theory the
> > bug rears its head only when vCPUs share host page tables (extremely
> > likely) and x2APIC enabling is not consistent within the guest, i.e. if
> > some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> > except in certain situations, e.g. bringing up APs).
> >
> > Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> > Cc: [email protected]
> > Suggested-by: Maxim Levitsky <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 10 ++++++++++
> > arch/x86/kvm/lapic.c | 4 +++-
> > arch/x86/kvm/mmu/mmu.c | 2 +-
> > arch/x86/kvm/svm/avic.c | 15 +++++++-------
> > arch/x86/kvm/x86.c | 35 +++++++++++++++++++++++++++++----
> > 5 files changed, 53 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 2c96c43c313a..1fd1b66ceeb6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1132,6 +1132,15 @@ enum kvm_apicv_inhibit {
> > * AVIC is disabled because SEV doesn't support it.
> > */
> > APICV_INHIBIT_REASON_SEV,
> > +
> > + /*
> > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > + * inhibited if any vCPU has x2APIC enabled. Note, this is a "partial"
> > + * inhibit; APICv can still be activated, but KVM mustn't retain/create
> > + * SPTEs for the APIC access page. Like the APIC ID and APIC base
> > + * inhibits, this is sticky for simplicity.
> > + */
> > + APICV_INHIBIT_REASON_X2APIC,
>
> Actually, shouldn't the APICV_INHIBIT_REASON_X2APIC is set only when vCPU
> has x2APIC enabled on the system with _NO x2AVIC support_ ? For example,
> .....
>
> > };
> > struct kvm_arch {
> > @@ -1903,6 +1912,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
> > gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
> > struct x86_exception *exception);
> > +bool kvm_apicv_memslot_activated(struct kvm *kvm);
> > bool kvm_apicv_activated(struct kvm *kvm);
> > bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
> > void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 38e9b8e5278c..d956cd37908e 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > }
> > }
> > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> > + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
> > + }
>
> .... Here, since we do not want to inhibit APICV/AVIC on system that can
> support x2AVIC, this should be set in the vendor-specific call-back
> function, where appropriate checks can be made.

No, again the intent is to inhibit only the MMIO page. The x2APIC inhibit is
ignored when determining whether or not APICv is inhibited, but is included when
checking if the memslot is inhibited.

bool kvm_apicv_memslot_activated(struct kvm *kvm)
{
return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
}

static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
{
/*
* x2APIC only needs to "inhibit" the MMIO region, all other aspects of
* APICv can continue to be utilized.
*/
return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
}

> > @@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> > set_or_clear_apicv_inhibit(&new, reason, set);
> > - if (!!old != !!new) {
> > + /*
> > + * If the overall "is APICv activated" status is unchanged, simply add
> > + * or remove the inihbit from the pile. x2APIC is an exception, as it
> > + * is a partial inhibit (only blocks SPTEs for the APIC access page).
> > + * If x2APIC is the only inhibit in either the old or the new set, then
> > + * vCPUs need to be kicked to transition between partially-inhibited
> > + * and fully-inhibited.
> > + */
> > + if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {
>
> Why are we comparing APICV inhibit reasons (old, new) with X2APIC_ENABLE
> here? Do you mean to compare with APICV_INHIBIT_REASON_X2APIC?

Yeaaaaah.

2022-09-14 18:17:37

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

Sean

On 9/14/2022 2:39 AM, Sean Christopherson wrote:
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 38e9b8e5278c..d956cd37908e 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>>> }
>>> }
>>> - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
>>> + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
>>> kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
>>> + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
>>> + }
>> .... Here, since we do not want to inhibit APICV/AVIC on system that can
>> support x2AVIC, this should be set in the vendor-specific call-back
>> function, where appropriate checks can be made.
> No, again the intent is to inhibit only the MMIO page. The x2APIC inhibit is
> ignored when determining whether or not APICv is inhibited, but is included when
> checking if the memslot is inhibited.
>
> bool kvm_apicv_memslot_activated(struct kvm *kvm)
> {
> return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
> }
>
> static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
> {
> /*
> * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
> * APICv can continue to be utilized.
> */
> return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;

Also, this should be:

return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~(1UL <<
APICV_INHIBIT_REASON_X2APIC);

Suravee

2022-09-15 02:50:57

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

Sean,

On 9/14/2022 2:42 AM, Sean Christopherson wrote:
> On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
>> Hi Sean
>>
>> On 9/2/2022 7:22 PM, Sean Christopherson wrote:
>>> Disable the optimized APIC logical map if multiple vCPUs are aliased to
>>> the same logical ID. Architecturally, all CPUs whose logical ID matches
>>> the MDA are supposed to receive the interrupt; overwriting existing map
>>> entries can result in missed IPIs.
>>>
>>> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
>>> Signed-off-by: Sean Christopherson <[email protected]>
>>> Reviewed-by: Maxim Levitsky <[email protected]>
>>> ---
>>> arch/x86/kvm/lapic.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index 6b2f538b8fd0..75748c380ceb 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
>>> if (!mask)
>>> continue;
>>> - if (!is_power_of_2(mask)) {
>>> + ldr = ffs(mask) - 1;
>>> + if (!is_power_of_2(mask) || cluster[ldr]) {
>>
>> Should this be checking if the cluster[ldr] is pointing to the same struct
>> apic instead? For example:
>>
>> if (!is_power_of_2(mask) || cluster[ldr] != apic)
>>
>> From my observation, the kvm_recalculate_apic_map() can be called many
>> times, and the cluster[ldr] could have already been assigned from the
>> previous invocation. So, as long as it is the same, it should be okay.
>
> No, because cluster[ldr] can never match "apic". kvm_recalculate_apic_map()
> creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
> update of the current map.

Yes, the _new_ is getting created and initialized every time we call
kvm_recalculate_apic_map(), and then passed into
kvm_apic_map_get_logical_dest() along with the reference of cluster and
mask to get populated based on the provided ldr. Please note that the
new->phys_map[x2apic_id] is already assigned before the calling of
kvm_apic_map_get_logical_dest(). Therefore, this check:

if (!is_power_of_2(mask) || cluster[ldr]) {
^here
will always fail, we are setting the new->logical_mode =
KVM_APIC_MODE_MAP_DISABLED, which causing APICv/AVIC to always be inhibited.

I do not think this logic is correct. I also add debug printf to check,
and I never see cluster[ldr] is NULL. Here is example of the debug
printf just before the check above.

printk("DEBUG: vcpu_id=%u, logical_mode=%#x, mask=%#x, ldr=%#x,
cluster[ldr]=%#llx, apic=%#llx\n", vcpu->vcpu_id, new->logical_mode,
mask, ldr, (unsigned long long) cluster[ldr], (unsigned long long) apic);

DEBUG: vcpu_id=0, logical_mode=0x3, mask=0x1, ldr=0x0,
cluster[ldr]=0xffff8f54fe7e8000, apic=0xffff8f54fe7e8000
DEBUG: vcpu_id=1, logical_mode=0x3, mask=0x2, ldr=0x1,
cluster[ldr]=0xffff8f5475c28000, apic=0xffff8f5475c28000
DEBUG: vcpu_id=2, logical_mode=0x3, mask=0x4, ldr=0x2,
cluster[ldr]=0xffff8f54ac488000, apic=0xffff8f54ac488000
DEBUG: vcpu_id=3, logical_mode=0x3, mask=0x8, ldr=0x3,
cluster[ldr]=0xffff8f550dc34200, apic=0xffff8f550dc34200
DEBUG: vcpu_id=4, logical_mode=0x3, mask=0x10, ldr=0x4,
cluster[ldr]=0xffff8f550dd08000, apic=0xffff8f550dd08000
.....
DEBUG: vcpu_id=15, logical_mode=0x3, mask=0x8000, ldr=0xf,
cluster[ldr]=0xffff8f54ac488200, apic=0xffff8f54ac488200
DEBUG: vcpu_id=16, logical_mode=0x3, mask=0x1, ldr=0x0,
cluster[ldr]=0xffff8f5531ff8000, apic=0xffff8f5531ff8000
DEBUG: vcpu_id=17, logical_mode=0x3, mask=0x2, ldr=0x1,
cluster[ldr]=0xffff8f54e87c8200, apic=0xffff8f54e87c8200
DEBUG: vcpu_id=18, logical_mode=0x3, mask=0x4, ldr=0x2,
cluster[ldr]=0xffff8f551c870200, apic=0xffff8f551c870200

Please, lemme know if I am missing something.

Best Regards,
Suravee

> The loop containing this code is:
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
>
> ...
> }
>
> so it's impossible for cluster[ldr] to hold the current "apic", because this is
> the first and only iteration that processes the current "apic".

2022-09-16 18:00:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Wed, Sep 14, 2022, Suthikulpanit, Suravee wrote:
> Sean
>
> On 9/14/2022 2:39 AM, Sean Christopherson wrote:
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 38e9b8e5278c..d956cd37908e 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > > > }
> > > > }
> > > > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > > > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > > > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> > > > + kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
> > > > + }
> > > .... Here, since we do not want to inhibit APICV/AVIC on system that can
> > > support x2AVIC, this should be set in the vendor-specific call-back
> > > function, where appropriate checks can be made.
> > No, again the intent is to inhibit only the MMIO page. The x2APIC inhibit is
> > ignored when determining whether or not APICv is inhibited, but is included when
> > checking if the memslot is inhibited.
> >
> > bool kvm_apicv_memslot_activated(struct kvm *kvm)
> > {
> > return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
> > }
> >
> > static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
> > {
> > /*
> > * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
> > * APICv can continue to be utilized.
> > */
> > return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
>
> Also, this should be:
>
> return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~(1UL <<
> APICV_INHIBIT_REASON_X2APIC);

Ugh, I found and fixed this locally when testing, but lost the change when shuffling
code between systems.

Thanks!

2022-09-16 19:28:37

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 13/23] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

On Wed, Sep 14, 2022, Suthikulpanit, Suravee wrote:
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 6b2f538b8fd0..75748c380ceb 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> > > > if (!mask)
> > > > continue;
> > > > - if (!is_power_of_2(mask)) {
> > > > + ldr = ffs(mask) - 1;
> > > > + if (!is_power_of_2(mask) || cluster[ldr]) {
> > >
> > > Should this be checking if the cluster[ldr] is pointing to the same struct
> > > apic instead? For example:
> > >
> > > if (!is_power_of_2(mask) || cluster[ldr] != apic)
> > >
> > > From my observation, the kvm_recalculate_apic_map() can be called many
> > > times, and the cluster[ldr] could have already been assigned from the
> > > previous invocation. So, as long as it is the same, it should be okay.
> >
> > No, because cluster[ldr] can never match "apic". kvm_recalculate_apic_map()
> > creates and populates a _new_ kvm_apic_map every time, it doesn't do an in-place
> > update of the current map.
>
> Yes, the _new_ is getting created and initialized every time we call
> kvm_recalculate_apic_map(), and then passed into
> kvm_apic_map_get_logical_dest() along with the reference of cluster and mask
> to get populated based on the provided ldr. Please note that the
> new->phys_map[x2apic_id] is already assigned before the calling of

Ooh, this is what I was missing. LDR is read-only for x2APIC, and so KVM simply
uses the phys_map and computes the phys_map indices by reversing the x2APIC=>LDR
calculation.

So it's so much not that _can_ "apic" can match "cluster[ldr]", it's actually that
"apic" _must_ match "cluster[ldr]" for this flow. Overwriting the cluster entry
is all kinds of pointless. It's either unnecessary (no bugs) or it breaks things
(bugs in either LDR calculation or logical dest math).

Rather than add an exception to the cluster[] check, which is very confusing for
xAPIC, the whole flow can be skipped for x2APIC, with a sanity check that the LDR
does indeed align with the x2APIC ID.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 76a19bf1eb55..e9d7c647e8a7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -347,6 +347,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
}
new->logical_mode = logical_mode;

+ /* I'll add a comment here. */
+ if (apic_x2apic_mode(apic)) {
+ WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
+ continue;
+ }
+
if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
&cluster, &mask))) {
new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;

Alternatively, the x2APIC handling could be done after kvm_apic_map_get_logical_dest(),
e.g. so that KVM can sanity check mask+cluster[ldr], but that's annoying to implement
and IMO it's overkill since the we can just as easily verify the math via tests on top
of the LDR sanity check.

I'll do a better job of verifying that APICv + x2APIC yields the correct inhibits.
I verified x2APIC functional correctness, and that APICv + xAPIC yielded the correct
inhibits, but I obviously didn't verify APICv + x2APIC yielded the correct inhibits.

Thanks much!

2022-09-16 20:00:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, Sep 13, 2022, Suthikulpanit, Suravee wrote:
> > @@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
> > set_or_clear_apicv_inhibit(&new, reason, set);
> > - if (!!old != !!new) {
> > + /*
> > + * If the overall "is APICv activated" status is unchanged, simply add
> > + * or remove the inihbit from the pile. x2APIC is an exception, as it
> > + * is a partial inhibit (only blocks SPTEs for the APIC access page).
> > + * If x2APIC is the only inhibit in either the old or the new set, then
> > + * vCPUs need to be kicked to transition between partially-inhibited
> > + * and fully-inhibited.
> > + */
> > + if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {
>
> Why are we comparing APICV inhibit reasons (old, new) with X2APIC_ENABLE
> here? Do you mean to compare with APICV_INHIBIT_REASON_X2APIC?

Heh, the truly hilarious part about this is that the code actually works, because
by pure coincidence, X2APIC_ENABLE == BIT(APICV_INHIBIT_REASON_X2APIC). Obviously
still needs to be changed, just found it amusing.

2022-09-20 13:32:32

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Sat, 2022-09-03 at 00:22 +0000, Sean Christopherson wrote:
> Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> to fix a bug where the APIC access page is visible to vCPUs that have
> x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
>
> On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> enabled even without x2AVIC support, the bug occurs any time AVIC is
> enabled as x2APIC is fully emulated by KVM.  I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
>
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode.  Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
>
> Leave Intel as-is for now to avoid a subtle performance regression, even
> though Intel likely suffers from the same bug.  On Intel, in theory the
> bug rears its head only when vCPUs share host page tables (extremely
> likely) and x2APIC enabling is not consistent within the guest, i.e. if
> some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> except in certain situations, e.g. bringing up APs).

Are you sure about this?

This is what I am thinking will happen, I might be wrong but I am not sure:

Due to kvm internal memory slot at 0xfee00000 userspace is very unlikely
to put a memslot over this range - let assume that userspace didn't map
RAM over the 0xfee00000 page.

In fact if userspace did try to put the memory slot either first vCPU creation
will fail because this is where we create the private memslot,
or userspace memslot will fail if it was added later.

So assuming no guest RAM in the area, the correct x86 behavier should be:

xapic enabled - 0xfee00xxx acts as APIC
x2apic enabled - 0xfee00xxx is unmapped mmio, thus writes are dropped, reads return 0xFFFFFFFF.

However currently what will happen with APICv/AVIC (regardless of the hybrid mode):

1. we start with xapic, the guest accesses it once, and that populates our SPTE entry
for 0xfee00000, but since xapic is enabled, read/writes to it are redirected to apic backing page
(the spte is ignored)

2. guest changes mode to x2apic (even on all vCPUS).
That doesn't currently change any inhibits or so,
thus I *think* only vmx_set_virtual_apic_mode is called, and it doesn't flush the SPTE
for 0xfee00000, it remains present and writable.

3. guest accesses the 0xfee00xxx, assuming APICv/x2avic, the hardware won't redirect
the access to apic backing page, but will instead just use that SPTE and let the guest
read/write the private page that we mapped in the range, which is wrong.

Am I missing something?

Also I somewhat doen't like the partial inhibit - it is to some extent misleading.
I don't have a very strong option on using the inhibit, but its meaning just feels a bit overloaded.

So why not to do it this way:

1. zap the SPTE always when switching apic mode (e.g move the code in
__kvm_set_or_clear_apicv_inhibit to a common funtion)

2. make kvm_faultin_pfn check a flag 'any vcpu enabled x2apic' and refuse
to re-install that spte?

Something like that (only compile tested, and likely misses memory barriers):


commit 0d00c3716d5073d58d018fab9e5a261600483e25
Author: Maxim Levitsky <[email protected]>
Date: Tue Sep 20 14:39:57 2022 +0300

KVM: x86: hide APICv/AVIC mmio when guest enables x2apic


TBD

Signed-off-by: Maxim Levitsky <[email protected]>

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bd4b8c27129cc9..fe8ba17481fe1d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1356,6 +1356,9 @@ struct kvm_arch {
*/
#define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1)
struct kvm_mmu_memory_cache split_desc_cache;
+
+
+ bool apicv_mmio_hidden;
};

struct kvm_vm_stat {
@@ -1914,6 +1917,8 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason, bool set);

+void kvm_update_virtual_apic_mode(struct kvm_vcpu *vcpu);
+
static inline void kvm_set_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason)
{
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c448b6e4b3ad33..c18094b95c4ba9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2442,10 +2442,8 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);

- if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
- kvm_vcpu_update_apicv(vcpu);
- static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
- }
+ if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE))
+ kvm_update_virtual_apic_mode(vcpu);

apic->base_address = apic->vcpu->arch.apic_base &
MSR_IA32_APICBASE_BASE;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 86fe9a7ced1700..4ef3c87662742d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4153,9 +4153,9 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* MMIO SPTE. That way the cache doesn't need to be purged
* when the AVIC is re-enabled.
*/
- if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
- !kvm_apicv_activated(vcpu->kvm))
- return RET_PF_EMULATE;
+ if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT)
+ if (!kvm_apicv_activated(vcpu->kvm) || vcpu->kvm->arch.apicv_mmio_hidden)
+ return RET_PF_EMULATE;
}

async = false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0bc974cdcced10..fc0c6ec79ad194 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10127,6 +10127,13 @@ void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv);

+static void kvm_zap_apicv_memslot(struct kvm *kvm)
+{
+ unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
+
+ kvm_zap_gfn_range(kvm, gfn, gfn+1);
+}
+
void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
enum kvm_apicv_inhibit reason, bool set)
{
@@ -10156,10 +10163,8 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
*/
kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
kvm->arch.apicv_inhibit_reasons = new;
- if (new) {
- unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
- kvm_zap_gfn_range(kvm, gfn, gfn+1);
- }
+ if (new)
+ kvm_zap_apicv_memslot(kvm);
} else {
kvm->arch.apicv_inhibit_reasons = new;
}
@@ -10177,6 +10182,31 @@ void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
}
EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit);

+void kvm_update_virtual_apic_mode(struct kvm_vcpu *vcpu)
+{
+ struct kvm *kvm = vcpu->kvm;
+
+ if (!enable_apicv)
+ return;
+
+ if (apic_x2apic_mode(vcpu->arch.apic) && !READ_ONCE(kvm->arch.apicv_mmio_hidden)) {
+ /*
+ * Permanently hide APICv/AVIC private KVM memory slot when one of
+ * vCPUs enables the X2APIC for the first time.
+ *
+ * That ensures that no vCPU could access this memory slot
+ * in case of hardware supported x2apic acceleration
+ * (apicv/x2avic), and also that in case of hybrid AVIC mode,
+ * the AVIC doesn't continue to accelerate this MMIO.
+ */
+ WRITE_ONCE(kvm->arch.apicv_mmio_hidden, true);
+ kvm_zap_apicv_memslot(kvm);
+ }
+
+ kvm_vcpu_update_apicv(vcpu);
+ static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
+}
+
static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
{
if (!kvm_apic_present(vcpu))



Best regards,
Maxim Levitsky


>
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Cc: [email protected]
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>  arch/x86/include/asm/kvm_host.h | 10 ++++++++++
>  arch/x86/kvm/lapic.c            |  4 +++-
>  arch/x86/kvm/mmu/mmu.c          |  2 +-
>  arch/x86/kvm/svm/avic.c         | 15 +++++++-------
>  arch/x86/kvm/x86.c              | 35 +++++++++++++++++++++++++++++----
>  5 files changed, 53 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..1fd1b66ceeb6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1132,6 +1132,15 @@ enum kvm_apicv_inhibit {
>          * AVIC is disabled because SEV doesn't support it.
>          */
>         APICV_INHIBIT_REASON_SEV,
> +
> +       /*
> +        * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> +        * inhibited if any vCPU has x2APIC enabled.  Note, this is a "partial"
> +        * inhibit; APICv can still be activated, but KVM mustn't retain/create
> +        * SPTEs for the APIC access page.  Like the APIC ID and APIC base
> +        * inhibits, this is sticky for simplicity.
> +        */
> +       APICV_INHIBIT_REASON_X2APIC,
>  };
>  
>  struct kvm_arch {
> @@ -1903,6 +1912,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
>  gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
>                                 struct x86_exception *exception);
>  
> +bool kvm_apicv_memslot_activated(struct kvm *kvm);
>  bool kvm_apicv_activated(struct kvm *kvm);
>  bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu);
>  void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 38e9b8e5278c..d956cd37908e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2394,8 +2394,10 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
>                 }
>         }
>  
> -       if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> +       if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
>                 kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> +               kvm_set_apicv_inhibit(vcpu->kvm, APICV_INHIBIT_REASON_X2APIC);
> +       }
>  
>         if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
>                 kvm_vcpu_update_apicv(vcpu);
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index e418ef3ecfcb..cea25552869f 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -4150,7 +4150,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
>                  * when the AVIC is re-enabled.
>                  */
>                 if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
> -                   !kvm_apicv_activated(vcpu->kvm))
> +                   !kvm_apicv_memslot_activated(vcpu->kvm))
>                         return RET_PF_EMULATE;
>         }
>  
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 6a3d225eb02c..19be5f1afaac 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>  
>         vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>  
> -       /* Note:
> -        * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> -        * MSR accesses, while interrupt injection to a running vCPU
> -        * can be achieved using AVIC doorbell. The AVIC hardware still
> -        * accelerate MMIO accesses, but this does not cause any harm
> -        * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> +       /*
> +        * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> +        * accesses, while interrupt injection to a running vCPU can be
> +        * achieved using AVIC doorbell.  KVM disables the APIC access page
> +        * (prevents mapping it into the guest) if any vCPU has x2APIC enabled,
> +        * thus enabling AVIC activates only the doorbell mechanism.
>          */
>         if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
>             avic_mode == AVIC_MODE_X2) {
> @@ -1014,7 +1014,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
>                           BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
>                           BIT(APICV_INHIBIT_REASON_SEV)      |
>                           BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> -                         BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> +                         BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> +                         BIT(APICV_INHIBIT_REASON_X2APIC);
>  
>         return supported & BIT(reason);
>  }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..6ab9088c2531 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9379,15 +9379,29 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
>         kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
>  }
>  
> -bool kvm_apicv_activated(struct kvm *kvm)
> +bool kvm_apicv_memslot_activated(struct kvm *kvm)
>  {
>         return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
>  }
> +
> +static unsigned long kvm_apicv_get_inhibit_reasons(struct kvm *kvm)
> +{
> +       /*
> +        * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
> +        * APICv can continue to be utilized.
> +        */
> +       return READ_ONCE(kvm->arch.apicv_inhibit_reasons) & ~APICV_INHIBIT_REASON_X2APIC;
> +}
> +
> +bool kvm_apicv_activated(struct kvm *kvm)
> +{
> +       return !kvm_apicv_get_inhibit_reasons(kvm);
> +}
>  EXPORT_SYMBOL_GPL(kvm_apicv_activated);
>  
>  bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
>  {
> -       ulong vm_reasons = READ_ONCE(vcpu->kvm->arch.apicv_inhibit_reasons);
> +       ulong vm_reasons = kvm_apicv_get_inhibit_reasons(vcpu->kvm);
>         ulong vcpu_reasons = static_call(kvm_x86_vcpu_get_apicv_inhibit_reasons)(vcpu);
>  
>         return (vm_reasons | vcpu_reasons) == 0;
> @@ -10122,7 +10136,15 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>  
>         set_or_clear_apicv_inhibit(&new, reason, set);
>  
> -       if (!!old != !!new) {
> +       /*
> +        * If the overall "is APICv activated" status is unchanged, simply add
> +        * or remove the inihbit from the pile.  x2APIC is an exception, as it
> +        * is a partial inhibit (only blocks SPTEs for the APIC access page).
> +        * If x2APIC is the only inhibit in either the old or the new set, then
> +        * vCPUs need to be kicked to transition between partially-inhibited
> +        * and fully-inhibited.
> +        */
> +       if ((!!old != !!new) || old == X2APIC_ENABLE || new == X2APIC_ENABLE) {
>                 /*
>                  * Kick all vCPUs before setting apicv_inhibit_reasons to avoid
>                  * false positives in the sanity check WARN in svm_vcpu_run().
> @@ -10137,7 +10159,12 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
>                  */
>                 kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE);
>                 kvm->arch.apicv_inhibit_reasons = new;
> -               if (new) {
> +
> +               /*
> +                * Zap SPTEs for the APIC access page if APICv is newly
> +                * inhibited (partially or fully).
> +                */
> +               if (new && !old) {
>                         unsigned long gfn = gpa_to_gfn(APIC_DEFAULT_PHYS_BASE);
>                         kvm_zap_gfn_range(kvm, gfn, gfn+1);
>                 }


2022-09-20 16:01:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, Sep 20, 2022, Maxim Levitsky wrote:
> On Sat, 2022-09-03 at 00:22 +0000, Sean Christopherson wrote:
> > Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> > to fix a bug where the APIC access page is visible to vCPUs that have
> > x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
> >
> > On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> > enabled even without x2AVIC support, the bug occurs any time AVIC is
> > enabled as x2APIC is fully emulated by KVM.? I.e. hardware isn't aware
> > that the guest is operating in x2APIC mode.
> >
> > Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> > comment, i.e. to state that KVM _does_ support the hybrid mode.? Move
> > the "Note:" down a line to conform to preferred kernel/KVM multi-line
> > comment style.
> >
> > Leave Intel as-is for now to avoid a subtle performance regression, even
> > though Intel likely suffers from the same bug.? On Intel, in theory the
> > bug rears its head only when vCPUs share host page tables (extremely
> > likely) and x2APIC enabling is not consistent within the guest, i.e. if
> > some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> > except in certain situations, e.g. bringing up APs).
>
> Are you sure about this?

Ah, no. The key on Intel is the separate VMCS control for virtualizing xAPIC
accesses. As you note below, KVM will provide memory semantics, which is technically
wrong.

> This is what I am thinking will happen, I might be wrong but I am not sure:

...

> 3. guest accesses the 0xfee00xxx, assuming APICv/x2avic, the hardware won't redirect
> the access to apic backing page, but will instead just use that SPTE and let the guest
> read/write the private page that we mapped in the range, which is wrong.
>
> Am I missing something?

No, I don't believe so. I'm still hesitant to add the effetive inhibit to Intel,
though that's probably just pure paranoia at this point. Probably makes sense to
just do it and verify that x2APIC virtualization still works.

> Also I somewhat doen't like the partial inhibit - it is to some extent
> misleading. I don't have a very strong option on using the inhibit, but its
> meaning just feels a bit overloaded.
>
> So why not to do it this way:
>
> 1. zap the SPTE always when switching apic mode (e.g move the code in
> __kvm_set_or_clear_apicv_inhibit to a common funtion)
>
> 2. make kvm_faultin_pfn check a flag 'any vcpu enabled x2apic' and refuse
> to re-install that spte?
>
> Something like that (only compile tested, and likely misses memory barriers):

Actually, since this is "sticky", we can go even further and just delete the
memslot. Deleting the memslot is slightly complicated by the need to drop SRCU
if kvm_lapic_set_base() enables x2APIC during KVM_RUN, but that's enough enough
to handled by putting the disabling logic in vcpu_run() and using KVM_REQ_UNBLOCK
to ensure the memslot is deleted before the vCPU re-enters the guest.

Testing now...

2022-09-20 17:30:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, Sep 20, 2022, Sean Christopherson wrote:
> On Tue, Sep 20, 2022, Maxim Levitsky wrote:
> > On Sat, 2022-09-03 at 00:22 +0000, Sean Christopherson wrote:
> > > Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> > > to fix a bug where the APIC access page is visible to vCPUs that have
> > > x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
> > >
> > > On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> > > enabled even without x2AVIC support, the bug occurs any time AVIC is
> > > enabled as x2APIC is fully emulated by KVM.? I.e. hardware isn't aware
> > > that the guest is operating in x2APIC mode.
> > >
> > > Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> > > comment, i.e. to state that KVM _does_ support the hybrid mode.? Move
> > > the "Note:" down a line to conform to preferred kernel/KVM multi-line
> > > comment style.
> > >
> > > Leave Intel as-is for now to avoid a subtle performance regression, even
> > > though Intel likely suffers from the same bug.? On Intel, in theory the
> > > bug rears its head only when vCPUs share host page tables (extremely
> > > likely) and x2APIC enabling is not consistent within the guest, i.e. if
> > > some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> > > except in certain situations, e.g. bringing up APs).
> >
> > Are you sure about this?
>
> Ah, no. The key on Intel is the separate VMCS control for virtualizing xAPIC
> accesses. As you note below, KVM will provide memory semantics, which is technically
> wrong.
>
> > This is what I am thinking will happen, I might be wrong but I am not sure:
>
> ...
>
> > 3. guest accesses the 0xfee00xxx, assuming APICv/x2avic, the hardware won't redirect
> > the access to apic backing page, but will instead just use that SPTE and let the guest
> > read/write the private page that we mapped in the range, which is wrong.
> >
> > Am I missing something?
>
> No, I don't believe so. I'm still hesitant to add the effetive inhibit to Intel,
> though that's probably just pure paranoia at this point. Probably makes sense to
> just do it and verify that x2APIC virtualization still works.

Scratch that, Intel can end up with memory semantics irrespective of x2APIC, e.g.
if APICv is enabled but a vCPU disables its APIC. We could force a bus error by
inhibiting APICv if any vCPU has its APIC hardware disabled, but IMO that's a bad
tradeoff as there are legitimate reasons to disable the APIC on select vCPUs.

So, I'll omit Intel from the x2APIC pseudo-inhibit, and maybe add a KVM erratum
to document that KVM may provide memory semantics for APIC_BASE when APICv/AVIC
is enabled.

2022-09-23 11:01:22

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, 2022-09-20 at 16:50 +0000, Sean Christopherson wrote:
> On Tue, Sep 20, 2022, Sean Christopherson wrote:
> > On Tue, Sep 20, 2022, Maxim Levitsky wrote:
> > > On Sat, 2022-09-03 at 00:22 +0000, Sean Christopherson wrote:
> > > > Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> > > > to fix a bug where the APIC access page is visible to vCPUs that have
> > > > x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
> > > >
> > > > On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> > > > enabled even without x2AVIC support, the bug occurs any time AVIC is
> > > > enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> > > > that the guest is operating in x2APIC mode.
> > > >
> > > > Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> > > > comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> > > > the "Note:" down a line to conform to preferred kernel/KVM multi-line
> > > > comment style.
> > > >
> > > > Leave Intel as-is for now to avoid a subtle performance regression, even
> > > > though Intel likely suffers from the same bug. On Intel, in theory the
> > > > bug rears its head only when vCPUs share host page tables (extremely
> > > > likely) and x2APIC enabling is not consistent within the guest, i.e. if
> > > > some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> > > > except in certain situations, e.g. bringing up APs).
> > >
> > > Are you sure about this?
> >
> > Ah, no. The key on Intel is the separate VMCS control for virtualizing xAPIC
> > accesses. As you note below, KVM will provide memory semantics, which is technically
> > wrong.
> >
> > > This is what I am thinking will happen, I might be wrong but I am not sure:
> >
> > ...
> >
> > > 3. guest accesses the 0xfee00xxx, assuming APICv/x2avic, the hardware won't redirect
> > > the access to apic backing page, but will instead just use that SPTE and let the guest
> > > read/write the private page that we mapped in the range, which is wrong.
> > >
> > > Am I missing something?
> >
> > No, I don't believe so. I'm still hesitant to add the effetive inhibit to Intel,
> > though that's probably just pure paranoia at this point. Probably makes sense to
> > just do it and verify that x2APIC virtualization still works.
>
> Scratch that, Intel can end up with memory semantics irrespective of x2APIC, e.g.
> if APICv is enabled but a vCPU disables its APIC. We could force a bus error by
> inhibiting APICv if any vCPU has its APIC hardware disabled, but IMO that's a bad
> tradeoff as there are legitimate reasons to disable the APIC on select vCPUs.
>
> So, I'll omit Intel from the x2APIC pseudo-inhibit, and maybe add a KVM erratum
> to document that KVM may provide memory semantics for APIC_BASE when APICv/AVIC
> is enabled.
>

I can agree with this - disabled APIC is indeed a valid use case, if a VM doesn't
use all the vCPUs it was assigned, and hotplugs them later.

Thus I can agree to leave it as is for Intel, although if our goal is to be as
close to x86 spec as possible as you said, then it is more correct to do it as I suggested,
because it is still better to be compliant to the x86 spec in one case of two,
that to be compliant in none of the cases.

I am not going to argue about this though.


Best regards,
Maxim Levitsky

2022-09-23 11:08:33

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 04/23] KVM: x86: Inhibit AVIC SPTEs if any vCPU enables x2APIC

On Tue, 2022-09-20 at 15:46 +0000, Sean Christopherson wrote:
> On Tue, Sep 20, 2022, Maxim Levitsky wrote:
> > On Sat, 2022-09-03 at 00:22 +0000, Sean Christopherson wrote:
> > > Reintroduce APICV_INHIBIT_REASON_X2APIC as a "partial" inhibit for AMD
> > > to fix a bug where the APIC access page is visible to vCPUs that have
> > > x2APIC enabled, i.e. shouldn't be able to "see" the xAPIC MMIO region.
> > >
> > > On AMD, due to its "hybrid" mode where AVIC is enabled when x2APIC is
> > > enabled even without x2AVIC support, the bug occurs any time AVIC is
> > > enabled as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> > > that the guest is operating in x2APIC mode.
> > >
> > > Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> > > comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> > > the "Note:" down a line to conform to preferred kernel/KVM multi-line
> > > comment style.
> > >
> > > Leave Intel as-is for now to avoid a subtle performance regression, even
> > > though Intel likely suffers from the same bug. On Intel, in theory the
> > > bug rears its head only when vCPUs share host page tables (extremely
> > > likely) and x2APIC enabling is not consistent within the guest, i.e. if
> > > some vCPUs have x2APIC enabled and other does do not (unlikely to occur
> > > except in certain situations, e.g. bringing up APs).
> >
> > Are you sure about this?
>
> Ah, no. The key on Intel is the separate VMCS control for virtualizing xAPIC
> accesses. As you note below, KVM will provide memory semantics, which is technically
> wrong.
>
> > This is what I am thinking will happen, I might be wrong but I am not sure:
>
> ...
>
> > 3. guest accesses the 0xfee00xxx, assuming APICv/x2avic, the hardware won't redirect
> > the access to apic backing page, but will instead just use that SPTE and let the guest
> > read/write the private page that we mapped in the range, which is wrong.
> >
> > Am I missing something?
>
> No, I don't believe so. I'm still hesitant to add the effetive inhibit to Intel,
> though that's probably just pure paranoia at this point. Probably makes sense to
> just do it and verify that x2APIC virtualization still works.
>
> > Also I somewhat doen't like the partial inhibit - it is to some extent
> > misleading. I don't have a very strong option on using the inhibit, but its
> > meaning just feels a bit overloaded.
> >
> > So why not to do it this way:
> >
> > 1. zap the SPTE always when switching apic mode (e.g move the code in
> > __kvm_set_or_clear_apicv_inhibit to a common funtion)
> >
> > 2. make kvm_faultin_pfn check a flag 'any vcpu enabled x2apic' and refuse
> > to re-install that spte?
> >
> > Something like that (only compile tested, and likely misses memory barriers):
>
> Actually, since this is "sticky", we can go even further and just delete the
> memslot. Deleting the memslot is slightly complicated by the need to drop SRCU
> if kvm_lapic_set_base() enables x2APIC during KVM_RUN, but that's enough enough
> to handled by putting the disabling logic in vcpu_run() and using KVM_REQ_UNBLOCK
> to ensure the memslot is deleted before the vCPU re-enters the guest.

Yes, that is the elephant in the room - deleting the memslot makes all of the sense,
and I thought about doing it, except that it has a chance of letting
the genie out of its bottle again - remember that mess we had with the fact that
the memslots are rcu protected?

If it works, I 100% support the idea.

Also I think you want to remove the KVM_REQ_UNBLOCK, in the other patch series
you just posted?

Best regards,
Maxim Levitsky

>
> Testing now...
>