2023-01-06 01:31:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 00/33] KVM: x86: AVIC and local APIC fixes+cleanups

Paolo,

This is a wholesale (and hopefully a drop-in) replacement for the patches
in `kvm-lapic-fix-and-cleanup`. The changes for " KVM: x86: Inhibit APIC
memslot if x2APIC and AVIC are enabled" in v5 are relatively minor, but it
led to multiple conflicts in later patches, i.e. a patch-to-be-squashed
wasn't going to work. Let me know if you've already done a lot of
massaging on your side, shouldn't be too difficult to generate patches to
go on top.

The first half or so patches fix semi-urgent, real-world relevant APICv
and AVIC bugs.

The second half fixes a variety of AVIC and optimized APIC map bugs
where KVM doesn't play nice with various edge cases that are
architecturally legal(ish), but are unlikely to occur in most real world
scenarios

v5:
- Fix even more bugs! [Greg]
- Collect reviews. [Maxim]
- Don't use an inhibit flag for hybrid AVIC. [Maxim]
- Fix an LDR cluster calc goof in the AVIC code. [Maxim]
- Drop a redundant "ldr == 0" check. [Maxim]
- Add helpers for logical vs. physical optimized map calcs. [Maxim]

v4:
- https://lore.kernel.org/all/[email protected]
- Fix more bugs! [Alejandro]
- Delete APIC memslot to inhibit xAVIC acceleration when x2APIC is
enabled on AMD/SVM instead of using a "partial" inihbit. [Maxim]

v3:
- https://lore.kernel.org/all/[email protected]
- Collect reviews. [Paolo]
- Drop "partial" x2APIC inhibit and instead delete the memslot.
[Maxim, Suravee]
- Skip logical mode updates for x2APIC, which just reuses the
phys_map with some clever logic. [Suravee]
- Add a fix for "nodecode write" traps. [Alejandro]

v2:
- https://lore.kernel.org/all/[email protected]
- Collect reviews. [Li, Maxim]
- Disable only MMIO access when x2APIC is enabled (instead of disabling
all of AVIC). [Maxim]
- Inhibit AVIC when logical IDs are aliased. [Maxim]
- Tweak name of set_virtual_apic_mode() hook. [Maxim]
- Straight up revert logical ID fastpath mess. [Maxim]
- Reword changelog about skipping vCPU during logical setup. [Maxim]
- Fix LDR updates on AVIC. [Maxim?]
- Fix a nasty ISR caching bug.
- Flush TLB when activating AVIC.

v1: https://lore.kernel.org/all/[email protected]

Greg Edwards (1):
KVM: x86: Allow APICv APIC ID inhibit to be cleared

Sean Christopherson (31):
KVM: x86: Blindly get current x2APIC reg value on "nodecode write"
traps
KVM: x86: Purge "highest ISR" cache when updating APICv state
KVM: SVM: Flush the "current" TLB when activating AVIC
KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
target
KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is
disabled
KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to
32-bit ID
KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
KVM: x86: Handle APICv updates for APIC "mode" changes via request
KVM: x86: Move APIC access page helper to common x86 code
KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when
possible"
KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
KVM: x86: Explicitly track all possibilities for APIC map's logical
modes
KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
mode
KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
KVM: SVM: Always update local APIC on writes to logical dest register
KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
a running vcpu"
KVM: x86: Track required APICv inhibits with variable, not callback
KVM: x86: Add helpers to recalc physical vs. logical optimized APIC
maps

Suravee Suthikulpanit (1):
KVM: SVM: Fix x2APIC Logical ID calculation for
avic_kick_target_vcpus_fast

Documentation/virt/kvm/x86/errata.rst | 11 +
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 52 +++-
arch/x86/kvm/lapic.c | 319 +++++++++++++++++-----
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/svm/avic.c | 372 ++++++++++++--------------
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/svm.c | 8 +-
arch/x86/kvm/svm/svm.h | 27 +-
arch/x86/kvm/vmx/vmx.c | 58 +---
arch/x86/kvm/x86.c | 29 +-
11 files changed, 524 insertions(+), 357 deletions(-)


base-commit: c04ec04c0d15a51aa33660be175ed978beb8de0c
--
2.39.0.314.g84b9a713c41-goog


2023-01-06 01:32:01

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 01/33] KVM: x86: Blindly get current x2APIC reg value on "nodecode write" traps

When emulating a x2APIC write in response to an APICv/AVIC trap, get the
the written value from the vAPIC page without checking that reads are
allowed for the target register. AVIC can generate trap-like VM-Exits on
writes to EOI, and so KVM needs to get the written value from the backing
page without running afoul of EOI's write-only behavior.

Alternatively, EOI could be special cased to always write '0', e.g. so
that the sanity check could be preserved, but x2APIC on AMD is actually
supposed to disallow non-zero writes (not emulated by KVM), and the
sanity check was a byproduct of how the KVM code was written, i.e. wasn't
added to guard against anything in particular.

Fixes: 70c8327c11c6 ("KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg")
Fixes: 1bd9dfec9fd4 ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Alejandro Jimenez <[email protected]>
Cc: [email protected]
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4efdb4a4d72c..5c0f93fc073a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2284,23 +2284,18 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
struct kvm_lapic *apic = vcpu->arch.apic;
u64 val;

- if (apic_x2apic_mode(apic)) {
- if (KVM_BUG_ON(kvm_lapic_msr_read(apic, offset, &val), vcpu->kvm))
- return;
- } else {
- val = kvm_lapic_get_reg(apic, offset);
- }
-
/*
* ICR is a single 64-bit register when x2APIC is enabled. For legacy
* xAPIC, ICR writes need to go down the common (slightly slower) path
* to get the upper half from ICR2.
*/
if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
+ val = kvm_lapic_get_reg64(apic, APIC_ICR);
kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
trace_kvm_apic_write(APIC_ICR, val);
} else {
/* TODO: optimize to just emulate side effect w/o one more write */
+ val = kvm_lapic_get_reg(apic, offset);
kvm_lapic_reg_write(apic, offset, (u32)val);
}
}
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:32:26

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 25/33] KVM: SVM: Always update local APIC on writes to logical dest register

Update the vCPU's local (virtual) APIC on LDR writes even if the write
"fails". The APIC needs to recalc the optimized logical map even if the
LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
map will be left as is and the vCPU will receive interrupts using its
old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 1fd473a57159..144e383a4c5d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -573,7 +573,7 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
clear_bit(AVIC_LOGICAL_ID_ENTRY_VALID_BIT, (unsigned long *)entry);
}

-static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
+static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
{
int ret = 0;
struct vcpu_svm *svm = to_svm(vcpu);
@@ -582,10 +582,10 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

/* AVIC does not support LDR update for x2APIC */
if (apic_x2apic_mode(vcpu->arch.apic))
- return 0;
+ return;

if (ldr == svm->ldr_reg)
- return 0;
+ return;

avic_invalidate_logical_id_entry(vcpu);

@@ -594,8 +594,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

if (!ret)
svm->ldr_reg = ldr;
-
- return ret;
}

static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
@@ -617,8 +615,7 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)

switch (offset) {
case APIC_LDR:
- if (avic_handle_ldr_update(vcpu))
- return 0;
+ avic_handle_ldr_update(vcpu);
break;
case APIC_DFR:
avic_handle_dfr_update(vcpu);
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:32:57

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 02/33] KVM: x86: Purge "highest ISR" cache when updating APICv state

Purge the "highest ISR" cache when updating APICv state on a vCPU. The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.

This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
a chance to update its cache.

Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5c0f93fc073a..33a661d82da7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2424,6 +2424,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
*/
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
+ apic->highest_isr_cache = -1;
}

void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -2479,7 +2480,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);

@@ -2767,7 +2767,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
__start_apic_timer(apic, APIC_TMCCT);
kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
if (apic->apicv_active) {
static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:40:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 07/33] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.

avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation. And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.

The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:

kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);

and kvm_vcpu_update_apicv() invokes the refresh if activation changes:

if (apic->apicv_active == activate)
goto out;

apic->apicv_active = activate;
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

Rename the helper to reflect that it is also called during "refresh".

WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
kvm_lapic_set_base+0x149/0x1a0 [kvm]
kvm_set_apic_base+0x8f/0xd0 [kvm]
kvm_set_msr_common+0xa3a/0xdc0 [kvm]
svm_set_msr+0x364/0x6b0 [kvm_amd]
__kvm_set_msr+0xb8/0x1c0 [kvm]
kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
msr_interception+0x1c/0x30 [kvm_amd]
svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
svm_handle_exit+0xfc/0x160 [kvm_amd]
vcpu_enter_guest+0x21bb/0x23e0 [kvm]
vcpu_run+0x92/0x450 [kvm]
kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
kvm_vcpu_ioctl+0x559/0x620 [kvm]

Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3b2c88b168ba..97ad0661f963 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -747,18 +747,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
avic_handle_ldr_update(vcpu);
}

-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
-{
- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
- WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
- return;
- }
- avic_refresh_apicv_exec_ctrl(vcpu);
-}
-
static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
{
int ret = 0;
@@ -1100,17 +1088,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}

-
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;
- bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
+ return;

if (!enable_apicv)
return;

- if (activated) {
+ if (kvm_vcpu_apicv_active(vcpu)) {
/**
* During AVIC temporary deactivation, guest could update
* APIC ID, DFR and LDR registers, which would not be trapped
@@ -1124,6 +1113,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
avic_deactivate_vmcb(svm);
}
vmcb_mark_dirty(vmcb, VMCB_AVIC);
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+ bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!enable_apicv)
+ return;
+
+ avic_refresh_virtual_apic_mode(vcpu);

if (activated)
avic_vcpu_load(vcpu, vcpu->cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6ffadbd57744..26044e1d2422 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4771,7 +4771,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.enable_nmi_window = svm_enable_nmi_window,
.enable_irq_window = svm_enable_irq_window,
.update_cr8_intercept = svm_update_cr8_intercept,
- .set_virtual_apic_mode = avic_set_virtual_apic_mode,
+ .set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 4826e6cc611b..d0ed3f595229 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -648,7 +648,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
void avic_ring_doorbell(struct kvm_vcpu *vcpu);
unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);


/* sev.c */
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:40:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 18/33] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

Track all possibilities for the optimized APIC map's logical modes
instead of overloading the pseudo-bitmap and treating any "unknown" value
as "invalid".

As documented by the now-stale comment above the mode values, the values
did have meaning when the optimized map was originally added. That
dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
physical LAPIC array for logical x2APIC"), but the obfuscated behavior
and its comment were left behind.

Opportunistically rename "mode" to "logical_mode", partly to make it
clear that the "disabled" case applies only to the logical map, but also
to prove that there is no lurking code that expects "mode" to be a bitmap.

Functionally, this is a glorified nop.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 29 ++++++++++++++++--------
arch/x86/kvm/lapic.c | 40 ++++++++++++++++++++++++++-------
2 files changed, 52 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1d92c148e799..732421a0ad02 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1022,19 +1022,30 @@ struct kvm_arch_memory_slot {
};

/*
- * We use as the mode the number of bits allocated in the LDR for the
- * logical processor ID. It happens that these are all powers of two.
- * This makes it is very easy to detect cases where the APICs are
- * configured for multiple modes; in that case, we cannot use the map and
- * hence cannot use kvm_irq_delivery_to_apic_fast either.
+ * Track the mode of the optimized logical map, as the rules for decoding the
+ * destination vary per mode. Enabling the optimized logical map requires all
+ * software-enabled local APIs to be in the same mode, each addressable APIC to
+ * be mapped to only one MDA, and each MDA to map to at most one APIC.
*/
-#define KVM_APIC_MODE_XAPIC_CLUSTER 4
-#define KVM_APIC_MODE_XAPIC_FLAT 8
-#define KVM_APIC_MODE_X2APIC 16
+enum kvm_apic_logical_mode {
+ /* All local APICs are software disabled. */
+ KVM_APIC_MODE_SW_DISABLED,
+ /* All software enabled local APICs in xAPIC cluster addressing mode. */
+ KVM_APIC_MODE_XAPIC_CLUSTER,
+ /* All software enabled local APICs in xAPIC flat addressing mode. */
+ KVM_APIC_MODE_XAPIC_FLAT,
+ /* All software enabled local APICs in x2APIC mode. */
+ KVM_APIC_MODE_X2APIC,
+ /*
+ * Optimized map disabled, e.g. not all local APICs in the same logical
+ * mode, same logical ID assigned to multiple APICs, etc.
+ */
+ KVM_APIC_MODE_MAP_DISABLED,
+};

struct kvm_apic_map {
struct rcu_head rcu;
- u8 mode;
+ enum kvm_apic_logical_mode logical_mode;
u32 max_apic_id;
union {
struct kvm_lapic *xapic_flat_map[8];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2aee712e42bb..2567998b692c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)

static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
- switch (map->mode) {
+ switch (map->logical_mode) {
+ case KVM_APIC_MODE_SW_DISABLED:
+ /* Arbitrarily use the flat map so that @cluster isn't NULL. */
+ *cluster = map->xapic_flat_map;
+ *mask = 0;
+ return true;
case KVM_APIC_MODE_X2APIC: {
u32 offset = (dest_id >> 16) * 16;
u32 max_apic_id = map->max_apic_id;
@@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
*mask = dest_id & 0xf;
return true;
+ case KVM_APIC_MODE_MAP_DISABLED:
+ return false;
default:
- /* Not optimized. */
+ WARN_ON_ONCE(1);
return false;
}
}
@@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
goto out;

new->max_apic_id = max_id;
+ new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
+ enum kvm_apic_logical_mode logical_mode;
u16 mask;
u32 ldr;
u8 xapic_id;
@@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
new->phys_map[xapic_id] = apic;

- if (!kvm_apic_sw_enabled(apic))
+ if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
+ !kvm_apic_sw_enabled(apic))
continue;

ldr = kvm_lapic_get_reg(apic, APIC_LDR);
@@ -290,17 +300,31 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;

if (apic_x2apic_mode(apic)) {
- new->mode |= KVM_APIC_MODE_X2APIC;
+ logical_mode = KVM_APIC_MODE_X2APIC;
} else {
ldr = GET_APIC_LOGICAL_ID(ldr);
if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
- new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
+ logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
else
- new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
+ logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
}

- if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
+ /*
+ * To optimize logical mode delivery, all software-enabled APICs must
+ * be configured for the same mode.
+ */
+ if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
+ new->logical_mode = logical_mode;
+ } else if (new->logical_mode != logical_mode) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
continue;
+ }
+
+ if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+ &cluster, &mask))) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ continue;
+ }

if (mask)
cluster[ffs(mask) - 1] = apic;
@@ -953,7 +977,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
{
if (kvm->arch.x2apic_broadcast_quirk_disabled) {
if ((irq->dest_id == APIC_BROADCAST &&
- map->mode != KVM_APIC_MODE_X2APIC))
+ map->logical_mode != KVM_APIC_MODE_X2APIC))
return true;
if (irq->dest_id == X2APIC_BROADCAST)
return true;
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:42:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 28/33] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

Iterate over all target logical IDs in the AVIC kick fastpath instead of
bailing if there is more than one target. Now that KVM inhibits AVIC if
vCPUs aren't mapped 1:1 with logical IDs, each bit in the destination is
guaranteed to match to at most one vCPU, i.e. iterating over the bitmap
is guaranteed to kick each valid target exactly once.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 112 ++++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index eb979695e7d8..2c6737f72bd4 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -327,6 +327,50 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
icrl & APIC_VECTOR_MASK);
}

+static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
+ u32 icrl)
+{
+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID.
+ */
+ struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
+
+ /* Once again, nothing to do if the target vCPU doesn't exist. */
+ if (unlikely(!target_vcpu))
+ return;
+
+ avic_kick_vcpu(target_vcpu, icrl);
+}
+
+static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
+ u32 logid_index, u32 icrl)
+{
+ u32 physical_id;
+
+ if (avic_logical_id_table) {
+ u32 logid_entry = avic_logical_id_table[logid_index];
+
+ /* Nothing to do if the logical destination is invalid. */
+ if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
+ return;
+
+ physical_id = logid_entry &
+ AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
+ } else {
+ /*
+ * For x2APIC, the logical APIC ID is a read-only value that is
+ * derived from the x2APIC ID, thus the x2APIC ID can be found
+ * by reversing the calculation (stored in logid_index). Note,
+ * bits 31:20 of the x2APIC ID aren't propagated to the logical
+ * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
+ */
+ physical_id = logid_index;
+ }
+
+ avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
+}
+
/*
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
* destination APIC ID to vCPU without looping through all vCPUs.
@@ -334,11 +378,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
u32 icrl, u32 icrh, u32 index)
{
- u32 l1_physical_id, dest;
- struct kvm_vcpu *target_vcpu;
int dest_mode = icrl & APIC_DEST_MASK;
int shorthand = icrl & APIC_SHORT_MASK;
struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+ u32 dest;

if (shorthand != APIC_DEST_NOSHORT)
return -EINVAL;
@@ -355,14 +398,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
return -EINVAL;

- l1_physical_id = dest;
-
- if (WARN_ON_ONCE(l1_physical_id != index))
+ if (WARN_ON_ONCE(dest != index))
return -EINVAL;

+ avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
} else {
- u32 bitmap, cluster;
- int logid_index;
+ u32 *avic_logical_id_table;
+ unsigned long bitmap, i;
+ u32 cluster;

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
@@ -382,50 +425,21 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (unlikely(!bitmap))
return 0;

- if (!is_power_of_2(bitmap))
- /* multiple logical destinations, use slow path */
- return -EINVAL;
-
- logid_index = cluster + __ffs(bitmap);
-
- if (apic_x2apic_mode(source)) {
- /*
- * For x2APIC, the logical APIC ID is a read-only value
- * that is derived from the x2APIC ID, thus the x2APIC
- * ID can be found by reversing the calculation (done
- * above). Note, bits 31:20 of the x2APIC ID are not
- * propagated to the logical ID, but KVM limits the
- * x2APIC ID limited to KVM_MAX_VCPU_IDS.
- */
- l1_physical_id = logid_index;
- } else {
- u32 *avic_logical_id_table =
- page_address(kvm_svm->avic_logical_id_table_page);
-
- u32 logid_entry = avic_logical_id_table[logid_index];
-
- if (WARN_ON_ONCE(index != logid_index))
- return -EINVAL;
-
- /* Nothing to do if the logical destination is invalid. */
- if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
- return 0;
-
- l1_physical_id = logid_entry &
- AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
- }
+ if (apic_x2apic_mode(source))
+ avic_logical_id_table = NULL;
+ else
+ avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
+
+ /*
+ * AVIC is inhibited if vCPUs aren't mapped 1:1 with logical
+ * IDs, thus each bit in the destination is guaranteed to map
+ * to at most one vCPU.
+ */
+ for_each_set_bit(i, &bitmap, 16)
+ avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
+ cluster + i, icrl);
}

- /*
- * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
- * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
- * vCPU doesn't exist.
- */
- target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
- if (unlikely(!target_vcpu))
- return 0;
-
- avic_kick_vcpu(target_vcpu, icrl);
return 0;
}

--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:42:40

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 33/33] KVM: x86: Add helpers to recalc physical vs. logical optimized APIC maps

Move the guts of kvm_recalculate_apic_map()'s main loop to two separate
helpers to handle recalculating the physical and logical pieces of the
optimized map. Having 100+ lines of code in the for-loop makes it hard
to understand what is being calculated where.

No functional change intended.

Suggested-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 250 +++++++++++++++++++++++--------------------
1 file changed, 133 insertions(+), 117 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 856e81a2dc64..669ea125b7e2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -218,6 +218,134 @@ static void kvm_apic_map_free(struct rcu_head *rcu)
kvfree(map);
}

+static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
+ struct kvm_vcpu *vcpu,
+ bool *xapic_id_mismatch)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ u32 x2apic_id = kvm_x2apic_id(apic);
+ u32 xapic_id = kvm_xapic_id(apic);
+ u32 physical_id;
+
+ /*
+ * Deliberately truncate the vCPU ID when detecting a mismatched APIC
+ * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
+ * 32-bit value. Any unwanted aliasing due to truncation results will
+ * be detected below.
+ */
+ if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
+ *xapic_id_mismatch = true;
+
+ /*
+ * Apply KVM's hotplug hack if userspace has enable 32-bit APIC IDs.
+ * Allow sending events to vCPUs by their x2APIC ID even if the target
+ * vCPU is in legacy xAPIC mode, and silently ignore aliased xAPIC IDs
+ * (the x2APIC ID is truncated to 8 bits, causing IDs > 0xff to wrap
+ * and collide).
+ *
+ * Honor the architectural (and KVM's non-optimized) behavior if
+ * userspace has not enabled 32-bit x2APIC IDs. Each APIC is supposed
+ * to process messages independently. If multiple vCPUs have the same
+ * effective APIC ID, e.g. due to the x2APIC wrap or because the guest
+ * manually modified its xAPIC IDs, events targeting that ID are
+ * supposed to be recognized by all vCPUs with said ID.
+ */
+ if (vcpu->kvm->arch.x2apic_format) {
+ /* See also kvm_apic_match_physical_addr(). */
+ if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+ x2apic_id <= new->max_apic_id)
+ new->phys_map[x2apic_id] = apic;
+
+ if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+ new->phys_map[xapic_id] = apic;
+ } else {
+ /*
+ * Disable the optimized map if the physical APIC ID is already
+ * mapped, i.e. is aliased to multiple vCPUs. The optimized
+ * map requires a strict 1:1 mapping between IDs and vCPUs.
+ */
+ if (apic_x2apic_mode(apic))
+ physical_id = x2apic_id;
+ else
+ physical_id = xapic_id;
+
+ if (new->phys_map[physical_id])
+ return -EINVAL;
+
+ new->phys_map[physical_id] = apic;
+ }
+
+ return 0;
+}
+
+static void kvm_recalculate_logical_map(struct kvm_apic_map *new,
+ struct kvm_vcpu *vcpu)
+{
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ enum kvm_apic_logical_mode logical_mode;
+ struct kvm_lapic **cluster;
+ u16 mask;
+ u32 ldr;
+
+ if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+ return;
+
+ if (!kvm_apic_sw_enabled(apic))
+ return;
+
+ ldr = kvm_lapic_get_reg(apic, APIC_LDR);
+ if (!ldr)
+ return;
+
+ if (apic_x2apic_mode(apic)) {
+ logical_mode = KVM_APIC_MODE_X2APIC;
+ } else {
+ ldr = GET_APIC_LOGICAL_ID(ldr);
+ if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
+ logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
+ else
+ logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
+ }
+
+ /*
+ * To optimize logical mode delivery, all software-enabled APICs must
+ * be configured for the same mode.
+ */
+ if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
+ new->logical_mode = logical_mode;
+ } else if (new->logical_mode != logical_mode) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ return;
+ }
+
+ /*
+ * In x2APIC mode, the LDR is read-only and derived directly from the
+ * x2APIC ID, thus is guaranteed to be addressable. KVM reuses
+ * kvm_apic_map.phys_map to optimize logical mode x2APIC interrupts by
+ * reversing the LDR calculation to get cluster of APICs, i.e. no
+ * additional work is required.
+ */
+ if (apic_x2apic_mode(apic)) {
+ WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(kvm_x2apic_id(apic)));
+ return;
+ }
+
+ if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+ &cluster, &mask))) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ return;
+ }
+
+ if (!mask)
+ return;
+
+ ldr = ffs(mask) - 1;
+ if (!is_power_of_2(mask) || cluster[ldr])
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ else
+ cluster[ldr] = apic;
+}
+
/*
* CLEAN -> DIRTY and UPDATE_IN_PROGRESS -> DIRTY changes happen without a lock.
*
@@ -272,128 +400,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

kvm_for_each_vcpu(i, vcpu, kvm) {
- struct kvm_lapic *apic = vcpu->arch.apic;
- struct kvm_lapic **cluster;
- enum kvm_apic_logical_mode logical_mode;
- u32 x2apic_id, physical_id;
- u16 mask;
- u32 ldr;
- u8 xapic_id;
-
if (!kvm_apic_present(vcpu))
continue;

- xapic_id = kvm_xapic_id(apic);
- x2apic_id = kvm_x2apic_id(apic);
-
- /*
- * Deliberately truncate the vCPU ID when detecting a mismatched
- * APIC ID to avoid false positives if the vCPU ID, i.e. x2APIC
- * ID, is a 32-bit value. Any unwanted aliasing due to
- * truncation results will be detected below.
- */
- if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
- xapic_id_mismatch = true;
-
- /*
- * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
- * IDs. Allow sending events to vCPUs by their x2APIC ID even
- * if the target vCPU is in legacy xAPIC mode, and silently
- * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
- * bits, causing IDs > 0xff to wrap and collide).
- *
- * Honor the architectural (and KVM's non-optimized) behavior
- * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
- * is supposed to process messages independently. If multiple
- * vCPUs have the same effective APIC ID, e.g. due to the
- * x2APIC wrap or because the guest manually modified its xAPIC
- * IDs, events targeting that ID are supposed to be recognized
- * by all vCPUs with said ID.
- */
- if (kvm->arch.x2apic_format) {
- /* See also kvm_apic_match_physical_addr(). */
- if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
- x2apic_id <= new->max_apic_id)
- new->phys_map[x2apic_id] = apic;
-
- if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
- new->phys_map[xapic_id] = apic;
- } else {
- /*
- * Disable the optimized map if the physical APIC ID is
- * already mapped, i.e. is aliased to multiple vCPUs.
- * The optimized map requires a strict 1:1 mapping
- * between IDs and vCPUs.
- */
- if (apic_x2apic_mode(apic))
- physical_id = x2apic_id;
- else
- physical_id = xapic_id;
-
- if (new->phys_map[physical_id]) {
- kvfree(new);
- new = NULL;
- goto out;
- }
- new->phys_map[physical_id] = apic;
- }
-
- if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
- !kvm_apic_sw_enabled(apic))
- continue;
-
- ldr = kvm_lapic_get_reg(apic, APIC_LDR);
- if (!ldr)
- continue;
-
- if (apic_x2apic_mode(apic)) {
- logical_mode = KVM_APIC_MODE_X2APIC;
- } else {
- ldr = GET_APIC_LOGICAL_ID(ldr);
- if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
- logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
- else
- logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
+ if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
+ kvfree(new);
+ new = NULL;
+ goto out;
}

- /*
- * To optimize logical mode delivery, all software-enabled APICs must
- * be configured for the same mode.
- */
- if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
- new->logical_mode = logical_mode;
- } else if (new->logical_mode != logical_mode) {
- new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
- continue;
- }
-
- /*
- * In x2APIC mode, the LDR is read-only and derived directly
- * from the x2APIC ID, thus is guaranteed to be addressable.
- * KVM reuses kvm_apic_map.phys_map to optimize logical mode
- * x2APIC interrupts by reversing the LDR calculation to get
- * cluster of APICs, i.e. no additional work is required.
- */
- if (apic_x2apic_mode(apic)) {
- WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
- continue;
- }
-
- if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
- &cluster, &mask))) {
- new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
- continue;
- }
-
- if (!mask)
- continue;
-
- ldr = ffs(mask) - 1;
- if (!is_power_of_2(mask) || cluster[ldr]) {
- new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
- continue;
- }
- cluster[ldr] = apic;
+ kvm_recalculate_logical_map(new, vcpu);
}
out:
/*
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:42:41

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 31/33] KVM: x86: Track required APICv inhibits with variable, not callback

Track the per-vendor required APICv inhibits with a variable instead of
calling into vendor code every time KVM wants to query the set of
required inhibits. The required inhibits are a property of the vendor's
virtualization architecture, i.e. are 100% static.

Using a variable allows the compiler to inline the check, e.g. generate
a single-uop TEST+Jcc, and thus eliminates any desire to avoid checking
inhibits for performance reasons.

No functional change intended.

Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/avic.c | 19 -------------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 16 +++++++++++++++-
arch/x86/kvm/vmx/vmx.c | 24 +++++++++++-------------
arch/x86/kvm/x86.c | 2 +-
7 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index abccd51dcfca..84f43caef9b7 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -76,7 +76,6 @@ KVM_X86_OP(set_nmi_mask)
KVM_X86_OP(enable_nmi_window)
KVM_X86_OP(enable_irq_window)
KVM_X86_OP_OPTIONAL(update_cr8_intercept)
-KVM_X86_OP(check_apicv_inhibit_reasons)
KVM_X86_OP(refresh_apicv_exec_ctrl)
KVM_X86_OP_OPTIONAL(hwapic_irr_update)
KVM_X86_OP_OPTIONAL(hwapic_isr_update)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b41a01b3a4af..7ca854714ccd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1623,6 +1623,7 @@ struct kvm_x86_ops {
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
bool (*check_apicv_inhibit_reasons)(enum kvm_apicv_inhibit reason);
+ const unsigned long required_apicv_inhibits;
bool allow_apicv_in_x2apic_without_x2apic_virtualization;
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 80f346b79c62..14677bc31b83 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -963,25 +963,6 @@ int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
return ret;
}

-bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
-{
- ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
- BIT(APICV_INHIBIT_REASON_ABSENT) |
- BIT(APICV_INHIBIT_REASON_HYPERV) |
- BIT(APICV_INHIBIT_REASON_NESTED) |
- BIT(APICV_INHIBIT_REASON_IRQWIN) |
- BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
- BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
- BIT(APICV_INHIBIT_REASON_SEV) |
- BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
- BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
-
- return supported & BIT(reason);
-}
-
-
static inline int
avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
{
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9f172f2de195..f2453df77727 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4773,8 +4773,8 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.update_cr8_intercept = svm_update_cr8_intercept,
.set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
- .check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
.apicv_post_state_restore = avic_apicv_post_state_restore,
+ .required_apicv_inhibits = AVIC_REQUIRED_APICV_INHIBITS,

.get_exit_info = svm_get_exit_info,

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 546825c82490..41eabb098b13 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -621,6 +621,21 @@ void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb);
extern struct kvm_x86_nested_ops svm_nested_ops;

/* avic.c */
+#define AVIC_REQUIRED_APICV_INHIBITS \
+( \
+ BIT(APICV_INHIBIT_REASON_DISABLE) | \
+ BIT(APICV_INHIBIT_REASON_ABSENT) | \
+ BIT(APICV_INHIBIT_REASON_HYPERV) | \
+ BIT(APICV_INHIBIT_REASON_NESTED) | \
+ BIT(APICV_INHIBIT_REASON_IRQWIN) | \
+ BIT(APICV_INHIBIT_REASON_PIT_REINJ) | \
+ BIT(APICV_INHIBIT_REASON_BLOCKIRQ) | \
+ BIT(APICV_INHIBIT_REASON_SEV) | \
+ BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) | \
+ BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) | \
+ BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) | \
+ BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED) \
+)

bool avic_hardware_setup(struct kvm_x86_ops *ops);
int avic_ga_log_notifier(u32 ga_tag);
@@ -634,7 +649,6 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
void avic_vcpu_put(struct kvm_vcpu *vcpu);
void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
-bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason);
int avic_pi_update_irte(struct kvm *kvm, unsigned int host_irq,
uint32_t guest_irq, bool set);
void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef84d11a0fe4..ad2ac66ef32e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8023,18 +8023,16 @@ static void vmx_hardware_unsetup(void)
free_kvm_area();
}

-static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
-{
- ulong supported = BIT(APICV_INHIBIT_REASON_DISABLE) |
- BIT(APICV_INHIBIT_REASON_ABSENT) |
- BIT(APICV_INHIBIT_REASON_HYPERV) |
- BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
- BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
- BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
-
- return supported & BIT(reason);
-}
+#define VMX_REQUIRED_APICV_INHIBITS \
+( \
+ BIT(APICV_INHIBIT_REASON_DISABLE)| \
+ BIT(APICV_INHIBIT_REASON_ABSENT) | \
+ BIT(APICV_INHIBIT_REASON_HYPERV) | \
+ BIT(APICV_INHIBIT_REASON_BLOCKIRQ) | \
+ BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) | \
+ BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) | \
+ BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) \
+)

static void vmx_vm_destroy(struct kvm *kvm)
{
@@ -8118,7 +8116,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.refresh_apicv_exec_ctrl = vmx_refresh_apicv_exec_ctrl,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.apicv_post_state_restore = vmx_apicv_post_state_restore,
- .check_apicv_inhibit_reasons = vmx_check_apicv_inhibit_reasons,
+ .required_apicv_inhibits = VMX_REQUIRED_APICV_INHIBITS,
.hwapic_irr_update = vmx_hwapic_irr_update,
.hwapic_isr_update = vmx_hwapic_isr_update,
.guest_apic_has_interrupt = vmx_guest_apic_has_interrupt,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1abe3f1e821c..5becce5bd45a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10112,7 +10112,7 @@ void __kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,

lockdep_assert_held_write(&kvm->arch.apicv_update_lock);

- if (!static_call(kvm_x86_check_apicv_inhibit_reasons)(reason))
+ if (!(kvm_x86_ops.required_apicv_inhibits & BIT(reason)))
return;

old = new = kvm->arch.apicv_inhibit_reasons;
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:57:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 20/33] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs

Disable the optimized APIC logical map if a logical ID covers multiple
MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode,
events match if "ID & MDA != 0", i.e. creating an entry for only the
first bit can cause interrupts to be missed.

Note, creating an entry for every bit is also wrong as KVM would generate
IPIs for every matching bit. It would be possible to teach KVM to play
nice with this edge case, but it is very much an edge case and probably
not used in any real world OS, i.e. it's not worth optimizing.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/lapic.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index fd7726ff95c6..dca87bb6dd1a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -343,8 +343,14 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;
}

- if (mask)
- cluster[ffs(mask) - 1] = apic;
+ if (!mask)
+ continue;
+
+ if (!is_power_of_2(mask)) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ continue;
+ }
+ cluster[ffs(mask) - 1] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.39.0.314.g84b9a713c41-goog

2023-01-06 01:59:25

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v5 27/33] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry

Do not modify AVIC's logical ID table if the logical ID portion of the
LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking
only the first bit means that KVM will fail to match MDAs that intersect
with "higher" bits in the "ID"

The "ID" acts as a bitmap, but is referred to as an ID because there's an
implicit, unenforced "requirement" that software only set one bit. This
edge case is arguably out-of-spec behavior, but KVM cleanly handles it
in all other cases, e.g. the optimized logical map (and AVIC!) is also
disabled in this scenario.

Refactor the code to consolidate the checks, and so that the code looks
more like avic_kick_target_vcpus_fast().

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 5b33f91b701c..eb979695e7d8 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
{
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
- int index;
u32 *logical_apic_id_table;
- int dlid = GET_APIC_LOGICAL_ID(ldr);
+ u32 cluster, index;

- if (!dlid)
- return NULL;
+ ldr = GET_APIC_LOGICAL_ID(ldr);

- if (flat) { /* flat */
- index = ffs(dlid) - 1;
- if (index > 7)
+ if (flat) {
+ cluster = 0;
+ } else {
+ cluster = (ldr >> 4);
+ if (cluster >= 0xf)
return NULL;
- } else { /* cluster */
- int cluster = (dlid & 0xf0) >> 4;
- int apic = ffs(dlid & 0x0f) - 1;
-
- if ((apic < 0) || (apic > 7) ||
- (cluster >= 0xf))
- return NULL;
- index = (cluster << 2) + apic;
+ ldr &= 0xf;
}
+ if (!ldr || !is_power_of_2(ldr))
+ return NULL;
+
+ index = __ffs(ldr);
+ if (WARN_ON_ONCE(index > 7))
+ return NULL;
+ index += (cluster << 2);

logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);

--
2.39.0.314.g84b9a713c41-goog

2023-01-08 15:37:10

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v5 18/33] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Fri, 2023-01-06 at 01:12 +0000, Sean Christopherson wrote:
> Track all possibilities for the optimized APIC map's logical modes
> instead of overloading the pseudo-bitmap and treating any "unknown" value
> as "invalid".
>
> As documented by the now-stale comment above the mode values, the values
> did have meaning when the optimized map was originally added. That
> dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
> physical LAPIC array for logical x2APIC"), but the obfuscated behavior
> and its comment were left behind.
>
> Opportunistically rename "mode" to "logical_mode", partly to make it
> clear that the "disabled" case applies only to the logical map, but also
> to prove that there is no lurking code that expects "mode" to be a bitmap.
>
> Functionally, this is a glorified nop.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 29 ++++++++++++++++--------
> arch/x86/kvm/lapic.c | 40 ++++++++++++++++++++++++++-------
> 2 files changed, 52 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1d92c148e799..732421a0ad02 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1022,19 +1022,30 @@ struct kvm_arch_memory_slot {
> };
>
> /*
> - * We use as the mode the number of bits allocated in the LDR for the
> - * logical processor ID. It happens that these are all powers of two.
> - * This makes it is very easy to detect cases where the APICs are
> - * configured for multiple modes; in that case, we cannot use the map and
> - * hence cannot use kvm_irq_delivery_to_apic_fast either.
> + * Track the mode of the optimized logical map, as the rules for decoding the
> + * destination vary per mode. Enabling the optimized logical map requires all
> + * software-enabled local APIs to be in the same mode, each addressable APIC to
> + * be mapped to only one MDA, and each MDA to map to at most one APIC.
> */
> -#define KVM_APIC_MODE_XAPIC_CLUSTER 4
> -#define KVM_APIC_MODE_XAPIC_FLAT 8
> -#define KVM_APIC_MODE_X2APIC 16
> +enum kvm_apic_logical_mode {
> + /* All local APICs are software disabled. */
> + KVM_APIC_MODE_SW_DISABLED,
> + /* All software enabled local APICs in xAPIC cluster addressing mode. */
> + KVM_APIC_MODE_XAPIC_CLUSTER,
> + /* All software enabled local APICs in xAPIC flat addressing mode. */
> + KVM_APIC_MODE_XAPIC_FLAT,
> + /* All software enabled local APICs in x2APIC mode. */
> + KVM_APIC_MODE_X2APIC,
> + /*
> + * Optimized map disabled, e.g. not all local APICs in the same logical
> + * mode, same logical ID assigned to multiple APICs, etc.
> + */
> + KVM_APIC_MODE_MAP_DISABLED,
> +};
>
> struct kvm_apic_map {
> struct rcu_head rcu;
> - u8 mode;
> + enum kvm_apic_logical_mode logical_mode;
> u32 max_apic_id;
> union {
> struct kvm_lapic *xapic_flat_map[8];
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 2aee712e42bb..2567998b692c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
>
> static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> - switch (map->mode) {
> + switch (map->logical_mode) {
> + case KVM_APIC_MODE_SW_DISABLED:
> + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> + *cluster = map->xapic_flat_map;
> + *mask = 0;
> + return true;
> case KVM_APIC_MODE_X2APIC: {
> u32 offset = (dest_id >> 16) * 16;
> u32 max_apic_id = map->max_apic_id;
> @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> *cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
> *mask = dest_id & 0xf;
> return true;
> + case KVM_APIC_MODE_MAP_DISABLED:
> + return false;
> default:
> - /* Not optimized. */
> + WARN_ON_ONCE(1);
> return false;
> }
> }
> @@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> goto out;
>
> new->max_apic_id = max_id;
> + new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct kvm_lapic **cluster;
> + enum kvm_apic_logical_mode logical_mode;
> u16 mask;
> u32 ldr;
> u8 xapic_id;
> @@ -282,7 +291,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> new->phys_map[xapic_id] = apic;
>
> - if (!kvm_apic_sw_enabled(apic))
> + if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> + !kvm_apic_sw_enabled(apic))
> continue;
>
> ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> @@ -290,17 +300,31 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> continue;
>
> if (apic_x2apic_mode(apic)) {
> - new->mode |= KVM_APIC_MODE_X2APIC;
> + logical_mode = KVM_APIC_MODE_X2APIC;
> } else {
> ldr = GET_APIC_LOGICAL_ID(ldr);
> if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> - new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
> + logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> else
> - new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
> + logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> }
>
> - if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
> + /*
> + * To optimize logical mode delivery, all software-enabled APICs must
> + * be configured for the same mode.
> + */
> + if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
> + new->logical_mode = logical_mode;
> + } else if (new->logical_mode != logical_mode) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> continue;
> + }
> +
> + if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> + &cluster, &mask))) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + continue;
> + }
>
> if (mask)
> cluster[ffs(mask) - 1] = apic;
> @@ -953,7 +977,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> {
> if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> if ((irq->dest_id == APIC_BROADCAST &&
> - map->mode != KVM_APIC_MODE_X2APIC))
> + map->logical_mode != KVM_APIC_MODE_X2APIC))
> return true;
> if (irq->dest_id == X2APIC_BROADCAST)
> return true;


Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-01-08 15:38:47

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v5 27/33] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry

On Fri, 2023-01-06 at 01:13 +0000, Sean Christopherson wrote:
> Do not modify AVIC's logical ID table if the logical ID portion of the
> LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking
> only the first bit means that KVM will fail to match MDAs that intersect
> with "higher" bits in the "ID"
>
> The "ID" acts as a bitmap, but is referred to as an ID because there's an
> implicit, unenforced "requirement" that software only set one bit. This
> edge case is arguably out-of-spec behavior, but KVM cleanly handles it
> in all other cases, e.g. the optimized logical map (and AVIC!) is also
> disabled in this scenario.
>
> Refactor the code to consolidate the checks, and so that the code looks
> more like avic_kick_target_vcpus_fast().


Very minor nitpick about the subject, it feels like it just changes something
minor but it pretty much rewrites the avic_get_logical_id_entry.

When I bisected for the bug, this did confuse me a bit.

I don't have a good idea on how to call this patch so I won't object to the current
subject to stay as well.

Best regards,
Maxim Levitsky


>
> Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
> Cc: Suravee Suthikulpanit <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 5b33f91b701c..eb979695e7d8 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
> {
> struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
> - int index;
> u32 *logical_apic_id_table;
> - int dlid = GET_APIC_LOGICAL_ID(ldr);
> + u32 cluster, index;
>
> - if (!dlid)
> - return NULL;
> + ldr = GET_APIC_LOGICAL_ID(ldr);
>
> - if (flat) { /* flat */
> - index = ffs(dlid) - 1;
> - if (index > 7)
> + if (flat) {
> + cluster = 0;
> + } else {
> + cluster = (ldr >> 4);
> + if (cluster >= 0xf)
> return NULL;
> - } else { /* cluster */
> - int cluster = (dlid & 0xf0) >> 4;
> - int apic = ffs(dlid & 0x0f) - 1;
> -
> - if ((apic < 0) || (apic > 7) ||
> - (cluster >= 0xf))
> - return NULL;
> - index = (cluster << 2) + apic;
> + ldr &= 0xf;
> }
> + if (!ldr || !is_power_of_2(ldr))
> + return NULL;
> +
> + index = __ffs(ldr);
> + if (WARN_ON_ONCE(index > 7))
> + return NULL;
> + index += (cluster << 2);
>
> logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);
>

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2023-01-08 15:45:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v5 33/33] KVM: x86: Add helpers to recalc physical vs. logical optimized APIC maps

On Fri, 2023-01-06 at 01:13 +0000, Sean Christopherson wrote:
> Move the guts of kvm_recalculate_apic_map()'s main loop to two separate
> helpers to handle recalculating the physical and logical pieces of the
> optimized map. Having 100+ lines of code in the for-loop makes it hard
> to understand what is being calculated where.
>
> No functional change intended.
>
> Suggested-by: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 250 +++++++++++++++++++++++--------------------
> 1 file changed, 133 insertions(+), 117 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 856e81a2dc64..669ea125b7e2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -218,6 +218,134 @@ static void kvm_apic_map_free(struct rcu_head *rcu)
> kvfree(map);
> }
>
> +static int kvm_recalculate_phys_map(struct kvm_apic_map *new,
> + struct kvm_vcpu *vcpu,
> + bool *xapic_id_mismatch)
> +{
> + struct kvm_lapic *apic = vcpu->arch.apic;
> + u32 x2apic_id = kvm_x2apic_id(apic);
> + u32 xapic_id = kvm_xapic_id(apic);
> + u32 physical_id;
> +
> + /*
> + * Deliberately truncate the vCPU ID when detecting a mismatched APIC
> + * ID to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a
> + * 32-bit value. Any unwanted aliasing due to truncation results will
> + * be detected below.
> + */
> + if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
> + *xapic_id_mismatch = true;
> +
> + /*
> + * Apply KVM's hotplug hack if userspace has enable 32-bit APIC IDs.
> + * Allow sending events to vCPUs by their x2APIC ID even if the target
> + * vCPU is in legacy xAPIC mode, and silently ignore aliased xAPIC IDs
> + * (the x2APIC ID is truncated to 8 bits, causing IDs > 0xff to wrap
> + * and collide).
> + *
> + * Honor the architectural (and KVM's non-optimized) behavior if
> + * userspace has not enabled 32-bit x2APIC IDs. Each APIC is supposed
> + * to process messages independently. If multiple vCPUs have the same
> + * effective APIC ID, e.g. due to the x2APIC wrap or because the guest
> + * manually modified its xAPIC IDs, events targeting that ID are
> + * supposed to be recognized by all vCPUs with said ID.
> + */
> + if (vcpu->kvm->arch.x2apic_format) {
> + /* See also kvm_apic_match_physical_addr(). */
> + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> + x2apic_id <= new->max_apic_id)
> + new->phys_map[x2apic_id] = apic;
> +
> + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> + new->phys_map[xapic_id] = apic;
> + } else {
> + /*
> + * Disable the optimized map if the physical APIC ID is already
> + * mapped, i.e. is aliased to multiple vCPUs. The optimized
> + * map requires a strict 1:1 mapping between IDs and vCPUs.
> + */
> + if (apic_x2apic_mode(apic))
> + physical_id = x2apic_id;
> + else
> + physical_id = xapic_id;
> +
> + if (new->phys_map[physical_id])
> + return -EINVAL;
Very small nitpick: -EINVAL feels like redundant here,
I'll say just return -1 or returning boolean would be a tiny bit better.
But this really doesn't matter much.

> +
> + new->phys_map[physical_id] = apic;
> + }
> +
> + return 0;
> +}
> +
> +static void kvm_recalculate_logical_map(struct kvm_apic_map *new,
> + struct kvm_vcpu *vcpu)
> +{
> + struct kvm_lapic *apic = vcpu->arch.apic;
> + enum kvm_apic_logical_mode logical_mode;
> + struct kvm_lapic **cluster;
> + u16 mask;
> + u32 ldr;
> +
> + if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
> + return;
> +
> + if (!kvm_apic_sw_enabled(apic))
> + return;
> +
> + ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> + if (!ldr)
> + return;
> +
> + if (apic_x2apic_mode(apic)) {
> + logical_mode = KVM_APIC_MODE_X2APIC;
> + } else {
> + ldr = GET_APIC_LOGICAL_ID(ldr);
> + if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> + logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> + else
> + logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> + }
> +
> + /*
> + * To optimize logical mode delivery, all software-enabled APICs must
> + * be configured for the same mode.
> + */
> + if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
> + new->logical_mode = logical_mode;
> + } else if (new->logical_mode != logical_mode) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + return;
> + }
> +
> + /*
> + * In x2APIC mode, the LDR is read-only and derived directly from the
> + * x2APIC ID, thus is guaranteed to be addressable. KVM reuses
> + * kvm_apic_map.phys_map to optimize logical mode x2APIC interrupts by
> + * reversing the LDR calculation to get cluster of APICs, i.e. no
> + * additional work is required.
> + */
> + if (apic_x2apic_mode(apic)) {
> + WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(kvm_x2apic_id(apic)));
> + return;
> + }
> +
> + if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> + &cluster, &mask))) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + return;
> + }
> +
> + if (!mask)
> + return;
> +
> + ldr = ffs(mask) - 1;
> + if (!is_power_of_2(mask) || cluster[ldr])
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + else
> + cluster[ldr] = apic;
> +}
> +
> /*
> * CLEAN -> DIRTY and UPDATE_IN_PROGRESS -> DIRTY changes happen without a lock.
> *
> @@ -272,128 +400,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> - struct kvm_lapic *apic = vcpu->arch.apic;
> - struct kvm_lapic **cluster;
> - enum kvm_apic_logical_mode logical_mode;
> - u32 x2apic_id, physical_id;
> - u16 mask;
> - u32 ldr;
> - u8 xapic_id;
> -
> if (!kvm_apic_present(vcpu))
> continue;
>
> - xapic_id = kvm_xapic_id(apic);
> - x2apic_id = kvm_x2apic_id(apic);
> -
> - /*
> - * Deliberately truncate the vCPU ID when detecting a mismatched
> - * APIC ID to avoid false positives if the vCPU ID, i.e. x2APIC
> - * ID, is a 32-bit value. Any unwanted aliasing due to
> - * truncation results will be detected below.
> - */
> - if (!apic_x2apic_mode(apic) && xapic_id != (u8)vcpu->vcpu_id)
> - xapic_id_mismatch = true;
> -
> - /*
> - * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
> - * IDs. Allow sending events to vCPUs by their x2APIC ID even
> - * if the target vCPU is in legacy xAPIC mode, and silently
> - * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
> - * bits, causing IDs > 0xff to wrap and collide).
> - *
> - * Honor the architectural (and KVM's non-optimized) behavior
> - * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
> - * is supposed to process messages independently. If multiple
> - * vCPUs have the same effective APIC ID, e.g. due to the
> - * x2APIC wrap or because the guest manually modified its xAPIC
> - * IDs, events targeting that ID are supposed to be recognized
> - * by all vCPUs with said ID.
> - */
> - if (kvm->arch.x2apic_format) {
> - /* See also kvm_apic_match_physical_addr(). */
> - if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> - x2apic_id <= new->max_apic_id)
> - new->phys_map[x2apic_id] = apic;
> -
> - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> - new->phys_map[xapic_id] = apic;
> - } else {
> - /*
> - * Disable the optimized map if the physical APIC ID is
> - * already mapped, i.e. is aliased to multiple vCPUs.
> - * The optimized map requires a strict 1:1 mapping
> - * between IDs and vCPUs.
> - */
> - if (apic_x2apic_mode(apic))
> - physical_id = x2apic_id;
> - else
> - physical_id = xapic_id;
> -
> - if (new->phys_map[physical_id]) {
> - kvfree(new);
> - new = NULL;
> - goto out;
> - }
> - new->phys_map[physical_id] = apic;
> - }
> -
> - if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> - !kvm_apic_sw_enabled(apic))
> - continue;
> -
> - ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> - if (!ldr)
> - continue;
> -
> - if (apic_x2apic_mode(apic)) {
> - logical_mode = KVM_APIC_MODE_X2APIC;
> - } else {
> - ldr = GET_APIC_LOGICAL_ID(ldr);
> - if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> - logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> - else
> - logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> + if (kvm_recalculate_phys_map(new, vcpu, &xapic_id_mismatch)) {
> + kvfree(new);
> + new = NULL;
> + goto out;
> }
>
> - /*
> - * To optimize logical mode delivery, all software-enabled APICs must
> - * be configured for the same mode.
> - */
> - if (new->logical_mode == KVM_APIC_MODE_SW_DISABLED) {
> - new->logical_mode = logical_mode;
> - } else if (new->logical_mode != logical_mode) {
> - new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> - continue;
> - }
> -
> - /*
> - * In x2APIC mode, the LDR is read-only and derived directly
> - * from the x2APIC ID, thus is guaranteed to be addressable.
> - * KVM reuses kvm_apic_map.phys_map to optimize logical mode
> - * x2APIC interrupts by reversing the LDR calculation to get
> - * cluster of APICs, i.e. no additional work is required.
> - */
> - if (apic_x2apic_mode(apic)) {
> - WARN_ON_ONCE(ldr != kvm_apic_calc_x2apic_ldr(x2apic_id));
> - continue;
> - }
> -
> - if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> - &cluster, &mask))) {
> - new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> - continue;
> - }
> -
> - if (!mask)
> - continue;
> -
> - ldr = ffs(mask) - 1;
> - if (!is_power_of_2(mask) || cluster[ldr]) {
> - new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> - continue;
> - }
> - cluster[ldr] = apic;
> + kvm_recalculate_logical_map(new, vcpu);
> }
> out:
> /*

Looks good, much cleaner now.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky


2023-02-15 20:25:37

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH v5 00/33] KVM: x86: AVIC and local APIC fixes+cleanups



On 1/6/2023 8:12 AM, Sean Christopherson wrote:
> Paolo,
>
> This is a wholesale (and hopefully a drop-in) replacement for the patches
> in `kvm-lapic-fix-and-cleanup`. The changes for " KVM: x86: Inhibit APIC
> memslot if x2APIC and AVIC are enabled" in v5 are relatively minor, but it
> led to multiple conflicts in later patches, i.e. a patch-to-be-squashed
> wasn't going to work. Let me know if you've already done a lot of
> massaging on your side, shouldn't be too difficult to generate patches to
> go on top.
>
> The first half or so patches fix semi-urgent, real-world relevant APICv
> and AVIC bugs.
>
> The second half fixes a variety of AVIC and optimized APIC map bugs
> where KVM doesn't play nice with various edge cases that are
> architecturally legal(ish), but are unlikely to occur in most real world
> scenarios
>
> v5:
> - Fix even more bugs! [Greg]
> - Collect reviews. [Maxim]
> - Don't use an inhibit flag for hybrid AVIC. [Maxim]
> - Fix an LDR cluster calc goof in the AVIC code. [Maxim]
> - Drop a redundant "ldr == 0" check. [Maxim]
> - Add helpers for logical vs. physical optimized map calcs. [Maxim]
>
> v4:
> - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20221001005915.2041642-1-seanjc%40google.com&data=05%7C01%7Csuravee.suthikulpanit%40amd.com%7C266a9838dbb74b5ac6fb08daef832f65%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638085643969977723%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BUOZ4e1Ov%2BwPkKTSe81P1tPUtxpEr2uddkhMn73kLhQ%3D&reserved=0
> - Fix more bugs! [Alejandro]
> - Delete APIC memslot to inhibit xAVIC acceleration when x2APIC is
> enabled on AMD/SVM instead of using a "partial" inihbit. [Maxim]
>
> v3:
> - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220920233134.940511-1-seanjc%40google.com&data=05%7C01%7Csuravee.suthikulpanit%40amd.com%7C266a9838dbb74b5ac6fb08daef832f65%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638085643969977723%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=usZ0dheeVyhBJxLP2ahA1hUlN64gwAPooWR9jnQcRyQ%3D&reserved=0
> - Collect reviews. [Paolo]
> - Drop "partial" x2APIC inhibit and instead delete the memslot.
> [Maxim, Suravee]
> - Skip logical mode updates for x2APIC, which just reuses the
> phys_map with some clever logic. [Suravee]
> - Add a fix for "nodecode write" traps. [Alejandro]
>
> v2:
> - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220903002254.2411750-1-seanjc%40google.com&data=05%7C01%7Csuravee.suthikulpanit%40amd.com%7C266a9838dbb74b5ac6fb08daef832f65%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638085643969977723%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=olaieXGtEfg8K6wwNHjHRa0aRit0npXnoxTmnfR%2FhqQ%3D&reserved=0
> - Collect reviews. [Li, Maxim]
> - Disable only MMIO access when x2APIC is enabled (instead of disabling
> all of AVIC). [Maxim]
> - Inhibit AVIC when logical IDs are aliased. [Maxim]
> - Tweak name of set_virtual_apic_mode() hook. [Maxim]
> - Straight up revert logical ID fastpath mess. [Maxim]
> - Reword changelog about skipping vCPU during logical setup. [Maxim]
> - Fix LDR updates on AVIC. [Maxim?]
> - Fix a nasty ISR caching bug.
> - Flush TLB when activating AVIC.
>
> v1: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20220831003506.4117148-1-seanjc%40google.com&data=05%7C01%7Csuravee.suthikulpanit%40amd.com%7C266a9838dbb74b5ac6fb08daef832f65%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638085643969977723%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=d9my%2BQJgnq3673JLC1YFgVSNz1zlSDi7x537abArwS0%3D&reserved=0
>
> Greg Edwards (1):
> KVM: x86: Allow APICv APIC ID inhibit to be cleared
>
> Sean Christopherson (31):
> KVM: x86: Blindly get current x2APIC reg value on "nodecode write"
> traps
> KVM: x86: Purge "highest ISR" cache when updating APICv state
> KVM: SVM: Flush the "current" TLB when activating AVIC
> KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
> target
> KVM: x86: Don't inhibit APICv/AVIC on xAPIC ID "change" if APIC is
> disabled
> KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to
> 32-bit ID
> KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
> KVM: x86: Handle APICv updates for APIC "mode" changes via request
> KVM: x86: Move APIC access page helper to common x86 code
> KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
> KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
> KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
> Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when
> possible"
> KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
> KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
> KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
> KVM: x86: Explicitly track all possibilities for APIC map's logical
> modes
> KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
> KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
> KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
> mode
> KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
> KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
> KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
> KVM: SVM: Always update local APIC on writes to logical dest register
> KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
> KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
> KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
> KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
> Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
> a running vcpu"
> KVM: x86: Track required APICv inhibits with variable, not callback
> KVM: x86: Add helpers to recalc physical vs. logical optimized APIC
> maps
>
> Suravee Suthikulpanit (1):
> KVM: SVM: Fix x2APIC Logical ID calculation for
> avic_kick_target_vcpus_fast
>
> Documentation/virt/kvm/x86/errata.rst | 11 +
> arch/x86/include/asm/kvm-x86-ops.h | 1 -
> arch/x86/include/asm/kvm_host.h | 52 +++-
> arch/x86/kvm/lapic.c | 319 +++++++++++++++++-----
> arch/x86/kvm/lapic.h | 2 +
> arch/x86/kvm/svm/avic.c | 372 ++++++++++++--------------
> arch/x86/kvm/svm/nested.c | 2 +-
> arch/x86/kvm/svm/svm.c | 8 +-
> arch/x86/kvm/svm/svm.h | 27 +-
> arch/x86/kvm/vmx/vmx.c | 58 +---
> arch/x86/kvm/x86.c | 29 +-
> 11 files changed, 524 insertions(+), 357 deletions(-)
>
>
> base-commit: c04ec04c0d15a51aa33660be175ed978beb8de0c

For AMD AVIC / x2AVIC:

Tested-by: Suravee Suthikulpanit <[email protected]>
Tested-by: Kishon VijayAbraham <[email protected]>

Thanks,
Suravee