2022-08-31 00:40:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/19] KVM: x86: AVIC and local APIC fixes+cleanups

This started as a simple cleanup, and then I made the mistake of writing
a test to verify my changes to AVIC's handling of logical mode interrupts.

TL;DR: KVM's AVIC and optimized APIC map code doesn't correctly handle
various edge cases that are 100% architecturally legal, but are unlikely
to occur in most real world scenarios.

There are a variety of other fixes, but most of them are non-fatal.

I have tested this heavily with KUT, but I haven't booted Windows and
don't have access to x2AVIC, so additional testing would be much
appreciated.

I'll post my KVM-Unit-Tests later this week (need to write quite a few
changelogs). The gist of the tests is to target multiple and non-existent
vCPUs in logical mode, and to target multiple vCPUs in physical mode by
aliasing vCPU0 and vCPU1 to the same physical ID.

Sean Christopherson (19):
KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
target
KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
Revert "KVM: SVM: Introduce hybrid-AVIC mode"
KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not
ICHR[31:16]
KVM: SVM: Drop buggy and redundant AVIC "single logical dest" check
KVM: SVM: Remove redundant cluster calculation that also creates a
shadow
KVM: SVM: Drop duplicate calcuation of AVIC/x2AVIC "logical index"
KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
mode
KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
KVM: x86: Explicitly track all possibilities for APIC map's logical
modes
KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
a running vcpu"

Documentation/virt/kvm/x86/errata.rst | 11 ++
arch/x86/include/asm/kvm_host.h | 27 ++-
arch/x86/kvm/lapic.c | 100 ++++++++--
arch/x86/kvm/svm/avic.c | 273 ++++++++++++++------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 9 +-
6 files changed, 260 insertions(+), 162 deletions(-)


base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
--
2.37.2.672.g94769d06f0-goog


2022-08-31 00:41:30

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

Track all possibilities for the optimized APIC map's logical modes
instead of overloading the pseudo-bitmap and treating any "unknown" value
as "invalid".

As documented by the now-stale comment above the mode values, the values
did have meaning when the optimized map was originally added. That
dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
physical LAPIC array for logical x2APIC"), but the obfuscated behavior
and its comment were left behind.

Opportunistically rename "mode" to "logical_mode", partly to make it
clear that the "disabled" case applies only to the logical map, but also
to prove that there is no lurking code that expects "mode" to be a bitmap.

Functionally, this is a glorified nop.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 21 ++++++++++--------
arch/x86/kvm/lapic.c | 38 ++++++++++++++++++++++++---------
2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f51411f3112..0184e64ab555 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -955,19 +955,22 @@ struct kvm_arch_memory_slot {
};

/*
- * We use as the mode the number of bits allocated in the LDR for the
- * logical processor ID. It happens that these are all powers of two.
- * This makes it is very easy to detect cases where the APICs are
- * configured for multiple modes; in that case, we cannot use the map and
- * hence cannot use kvm_irq_delivery_to_apic_fast either.
+ * Track the mode of the optimized logical map, as the rules for decoding the
+ * destination vary per mode. Enabling the optimized logical map requires all
+ * software-enabled local APIs to be in the same mode, each addressable APIC to
+ * be mapped to only one MDA, and each MDA to map to at most one APIC.
*/
-#define KVM_APIC_MODE_XAPIC_CLUSTER 4
-#define KVM_APIC_MODE_XAPIC_FLAT 8
-#define KVM_APIC_MODE_X2APIC 16
+enum kvm_apic_logical_mode {
+ KVM_APIC_MODE_SW_DISABLED,
+ KVM_APIC_MODE_XAPIC_CLUSTER,
+ KVM_APIC_MODE_XAPIC_FLAT,
+ KVM_APIC_MODE_X2APIC,
+ KVM_APIC_MODE_MAP_DISABLED,
+};

struct kvm_apic_map {
struct rcu_head rcu;
- u8 mode;
+ enum kvm_apic_logical_mode logical_mode;
u32 max_apic_id;
union {
struct kvm_lapic *xapic_flat_map[8];
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8209caffe3ab..3b6ef36b3963 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)

static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
- switch (map->mode) {
+ switch (map->logical_mode) {
+ case KVM_APIC_MODE_SW_DISABLED:
+ /* Arbitrarily use the flat map so that @cluster isn't NULL. */
+ *cluster = map->xapic_flat_map;
+ *mask = 0;
+ return true;
case KVM_APIC_MODE_X2APIC: {
u32 offset = (dest_id >> 16) * 16;
u32 max_apic_id = map->max_apic_id;
@@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
*cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
*mask = dest_id & 0xf;
return true;
+ case KVM_APIC_MODE_MAP_DISABLED:
+ return false;
default:
- /* Not optimized. */
+ WARN_ON_ONCE(1);
return false;
}
}
@@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
goto out;

new->max_apic_id = max_id;
+ new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
+ enum kvm_apic_logical_mode logical_mode;
u32 x2apic_id, physical_id;
u16 mask;
u32 ldr;
@@ -314,7 +323,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
new->phys_map[physical_id] = apic;
}

- if (!kvm_apic_sw_enabled(apic))
+ if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
+ !kvm_apic_sw_enabled(apic))
continue;

ldr = kvm_lapic_get_reg(apic, APIC_LDR);
@@ -322,25 +332,33 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;

if (apic_x2apic_mode(apic)) {
- new->mode |= KVM_APIC_MODE_X2APIC;
+ logical_mode = KVM_APIC_MODE_X2APIC;
} else {
ldr = GET_APIC_LOGICAL_ID(ldr);
if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
- new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
+ logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
else
- new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
+ logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
}
+ if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
+ new->logical_mode != logical_mode) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ continue;
+ }
+ new->logical_mode = logical_mode;

- if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
+ if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
+ &cluster, &mask))) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
continue;
+ }

if (!mask)
continue;

ldr = ffs(mask) - 1;
if (!is_power_of_2(mask) || cluster[ldr]) {
- new->mode = KVM_APIC_MODE_XAPIC_FLAT |
- KVM_APIC_MODE_XAPIC_CLUSTER;
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
continue;
}
cluster[ldr] = apic;
@@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
{
if (kvm->arch.x2apic_broadcast_quirk_disabled) {
if ((irq->dest_id == APIC_BROADCAST &&
- map->mode != KVM_APIC_MODE_X2APIC))
+ map->logical_mode != KVM_APIC_MODE_X2APIC))
return true;
if (irq->dest_id == X2APIC_BROADCAST)
return true;
--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:43:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 18/19] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps

Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
unaccelerated write traps. The register is read-only and isn't emulated
by KVM. Sending the register through kvm_apic_write_nodecode() will
result in screaming when x2APIC is enabled due to the unexpected failure
to retrieve the MSR (KVM expects that only "legal" accesses will trap).

Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index dad5affe44c1..b2033a56010c 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -675,6 +675,9 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
case APIC_DFR:
avic_handle_dfr_update(vcpu);
break;
+ case APIC_RRR:
+ /* Ignore writes to Read Remote Data, it's read-only. */
+ return 1;
default:
break;
}
--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:50:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/19] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.

avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation. And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.

The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:

kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);

and kvm_vcpu_update_apicv() invokes the refresh if activation changes:

if (apic->apicv_active == activate)
goto out;

apic->apicv_active = activate;
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
kvm_lapic_set_base+0x149/0x1a0 [kvm]
kvm_set_apic_base+0x8f/0xd0 [kvm]
kvm_set_msr_common+0xa3a/0xdc0 [kvm]
svm_set_msr+0x364/0x6b0 [kvm_amd]
__kvm_set_msr+0xb8/0x1c0 [kvm]
kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
msr_interception+0x1c/0x30 [kvm_amd]
svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
svm_handle_exit+0xfc/0x160 [kvm_amd]
vcpu_enter_guest+0x21bb/0x23e0 [kvm]
vcpu_run+0x92/0x450 [kvm]
kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
kvm_vcpu_ioctl+0x559/0x620 [kvm]

Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index b1ade555e8d0..f3a74c8284cb 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -741,18 +741,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
avic_handle_ldr_update(vcpu);
}

-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
-{
- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
- WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
- return;
- }
- avic_refresh_apicv_exec_ctrl(vcpu);
-}
-
static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
{
int ret = 0;
@@ -1094,17 +1082,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}

-
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;
- bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
+ return;

if (!enable_apicv)
return;

- if (activated) {
+ if (kvm_vcpu_apicv_active(vcpu)) {
/**
* During AVIC temporary deactivation, guest could update
* APIC ID, DFR and LDR registers, which would not be trapped
@@ -1118,6 +1107,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
avic_deactivate_vmcb(svm);
}
vmcb_mark_dirty(vmcb, VMCB_AVIC);
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+ bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!enable_apicv)
+ return;
+
+ avic_set_virtual_apic_mode(vcpu);

if (activated)
avic_vcpu_load(vcpu, vcpu->cpu);
--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:51:38

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 11/19] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU

Add a helper to perform the final kick, two instances of the ICR decoding
is one too many.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3959d4766911..2095ece70712 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -329,6 +329,16 @@ void avic_ring_doorbell(struct kvm_vcpu *vcpu)
put_cpu();
}

+
+static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
+{
+ vcpu->arch.apic->irr_pending = true;
+ svm_complete_interrupt_delivery(vcpu,
+ icrl & APIC_MODE_MASK,
+ icrl & APIC_INT_LEVELTRIG,
+ icrl & APIC_VECTOR_MASK);
+}
+
/*
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
* destination APIC ID to vCPU without looping through all vCPUs.
@@ -427,11 +437,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (unlikely(!target_vcpu))
return 0;

- target_vcpu->arch.apic->irr_pending = true;
- svm_complete_interrupt_delivery(target_vcpu,
- icrl & APIC_MODE_MASK,
- icrl & APIC_INT_LEVELTRIG,
- icrl & APIC_VECTOR_MASK);
+ avic_kick_vcpu(target_vcpu, icrl);
return 0;
}

@@ -455,13 +461,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
- dest, icrl & APIC_DEST_MASK)) {
- vcpu->arch.apic->irr_pending = true;
- svm_complete_interrupt_delivery(vcpu,
- icrl & APIC_MODE_MASK,
- icrl & APIC_INT_LEVELTRIG,
- icrl & APIC_VECTOR_MASK);
- }
+ dest, icrl & APIC_DEST_MASK))
+ avic_kick_vcpu(vcpu, icrl);
}
}

--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:53:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 14/19] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID. As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

KVM intentionally differs from real hardware, because real hardware
(Knights Landing) does just "x2apic_id & 0xff" to decide whether to
accept the interrupt in xAPIC mode and it can deliver one interrupt to
more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts. It's extremely unlikely any real world guest aliase APIC IDs,
or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work! The hack works if and
only if the optimized APIC map is successfully allocated. If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack. Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk. Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/x86/errata.rst | 11 ++++++
arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++-----
2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 410e0aa63493..49a05f24747b 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -37,3 +37,14 @@ Nested virtualization features
------------------------------

TBD
+
+x2APIC
+------
+When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
+allows sending events to a single vCPU using its x2APIC ID even if the target
+vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
+on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
+have the same physical APIC ID, KVM will deliver events targeting that APIC ID
+only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
+not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
+matching the target APIC ID receive the interrupt).
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index d537b51295d6..c224b5c7cd92 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -260,10 +260,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
kvm_for_each_vcpu(i, vcpu, kvm) {
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
+ u32 x2apic_id, physical_id;
u16 mask;
u32 ldr;
u8 xapic_id;
- u32 x2apic_id;

if (!kvm_apic_present(vcpu))
continue;
@@ -271,16 +271,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
xapic_id = kvm_xapic_id(apic);
x2apic_id = kvm_x2apic_id(apic);

- /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
- if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
- x2apic_id <= new->max_apic_id)
- new->phys_map[x2apic_id] = apic;
/*
- * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
- * prevent them from masking VCPUs with APIC ID <= 0xff.
+ * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
+ * IDs. Allow sending events to vCPUs by their x2APIC ID even
+ * if the target vCPU is in legacy xAPIC mode, and silently
+ * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
+ * bits, causing IDs > 0xff to wrap and collide).
+ *
+ * Honor the architectural (and KVM's non-optimized) behavior
+ * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
+ * is supposed to process messages independently. If multiple
+ * vCPUs have the same effective APIC ID, e.g. due to the
+ * x2APIC wrap or because the guest manually modified its xAPIC
+ * IDs, events targeting that ID are supposed to be recognized
+ * by all vCPUs with said ID.
*/
- if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
- new->phys_map[xapic_id] = apic;
+ if (kvm->arch.x2apic_format) {
+ /* See also kvm_apic_match_physical_addr(). */
+ if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+ x2apic_id <= new->max_apic_id)
+ new->phys_map[x2apic_id] = apic;
+
+ if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+ new->phys_map[xapic_id] = apic;
+ } else {
+ /*
+ * Disable the optimized map if the physical APIC ID is
+ * already mapped, i.e. is aliased to multiple vCPUs.
+ * The optimized map requires a strict 1:1 mapping
+ * between IDs and vCPUs.
+ */
+ if (apic_x2apic_mode(apic))
+ physical_id = x2apic_id;
+ else
+ physical_id = xapic_id;
+
+ if (new->phys_map[physical_id]) {
+ kvfree(new);
+ new = NULL;
+ goto out;
+ }
+ new->phys_map[physical_id] = apic;
+ }

if (!kvm_apic_sw_enabled(apic))
continue;
--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:53:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 06/19] KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not ICHR[31:16]

When attempting a fast kick for x2AVIC, get the destination bitmap from
ICR[15:0], not ICHR[31:16]. The upper 16 bits contain the cluster, the
lower 16 bits hold the bitmap.

Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3ace0f2f52f0..3c333cd2e752 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -368,7 +368,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
- bitmap = dest & 0xFFFF0000;
+ bitmap = dest & 0xFFFF;
cluster = (dest >> 16) << 4;
} else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
/* 8 bit dest mask*/
--
2.37.2.672.g94769d06f0-goog

2022-08-31 00:59:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 04/19] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean

Replace the "avic_mode" enum with a single bool to track whether or not
x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any
flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
redundant and unnecessary noise.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 45 +++++++++++++++++++----------------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 9 +--------
3 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 1d516d658f9a..b59f8ee2671f 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -53,7 +53,7 @@ static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
static u32 next_vm_id = 0;
static bool next_vm_id_wrapped = 0;
static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
-enum avic_modes avic_mode;
+bool x2avic_enabled;

/*
* This is a wrapper of struct amd_iommu_ir_data.
@@ -231,8 +231,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
u64 *avic_physical_id_table;
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);

- if ((avic_mode == AVIC_MODE_X1 && index > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && index > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && index > AVIC_MAX_PHYSICAL_ID) ||
+ (index > X2AVIC_MAX_PHYSICAL_ID))
return NULL;

avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page);
@@ -279,8 +279,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
int id = vcpu->vcpu_id;
struct vcpu_svm *svm = to_svm(vcpu);

- if ((avic_mode == AVIC_MODE_X1 && id > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && id > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
+ (id > X2AVIC_MAX_PHYSICAL_ID))
return -EINVAL;

if (!vcpu->arch.apic->regs)
@@ -532,7 +532,7 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
* AVIC must be disabled if x2AVIC isn't supported and the guest has
* x2APIC enabled.
*/
- if (avic_mode != AVIC_MODE_X2 && apic_x2apic_mode(vcpu->arch.apic))
+ if (!x2avic_enabled && apic_x2apic_mode(vcpu->arch.apic))
return APICV_INHIBIT_REASON_X2APIC;

return 0;
@@ -1086,10 +1086,7 @@ void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;

- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (!enable_apicv)
+ if (!lapic_in_kernel(vcpu) || !enable_apicv)
return;

if (kvm_vcpu_apicv_active(vcpu)) {
@@ -1165,32 +1162,32 @@ bool avic_hardware_setup(struct kvm_x86_ops *x86_ops)
if (!npt_enabled)
return false;

+ /* AVIC is a prerequisite for x2AVIC. */
+ if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) {
+ if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
+ pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
+ pr_warn(FW_BUG "Try enable AVIC using force_avic option");
+ }
+ return false;
+ }
+
if (boot_cpu_has(X86_FEATURE_AVIC)) {
- avic_mode = AVIC_MODE_X1;
pr_info("AVIC enabled\n");
} else if (force_avic) {
/*
* Some older systems does not advertise AVIC support.
* See Revision Guide for specific AMD processor for more detail.
*/
- avic_mode = AVIC_MODE_X1;
pr_warn("AVIC is not supported in CPUID but force enabled");
pr_warn("Your system might crash and burn");
}

/* AVIC is a prerequisite for x2AVIC. */
- if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
- if (avic_mode == AVIC_MODE_X1) {
- avic_mode = AVIC_MODE_X2;
- pr_info("x2AVIC enabled\n");
- } else {
- pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
- pr_warn(FW_BUG "Try enable AVIC using force_avic option");
- }
- }
+ x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC);
+ if (x2avic_enabled)
+ pr_info("x2AVIC enabled\n");

- if (avic_mode != AVIC_MODE_NONE)
- amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+ amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);

- return !!avic_mode;
+ return true;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..b25c89069128 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -821,7 +821,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;

- if (avic_mode != AVIC_MODE_X2 ||
+ if (!x2avic_enabled ||
!apic_x2apic_mode(svm->vcpu.arch.apic))
return;

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6a7686bf6900..cee79ade400a 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,14 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
extern int vgif;
extern bool intercept_smi;
-
-enum avic_modes {
- AVIC_MODE_NONE = 0,
- AVIC_MODE_X1,
- AVIC_MODE_X2,
-};
-
-extern enum avic_modes avic_mode;
+extern bool x2avic_enabled;

/*
* Clean bits in VMCB.
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:00:52

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 13/19] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

Disable the optimized APIC logical map if multiple vCPUs are aliased to
the same logical ID. Architecturally, all CPUs whose logical ID matches
the MDA are supposed to receive the interrupt; overwriting existing map
entries can result in missed IPIs.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 82278acae95b..d537b51295d6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!mask)
continue;

- if (!is_power_of_2(mask)) {
+ ldr = ffs(mask) - 1;
+ if (!is_power_of_2(mask) || cluster[ldr]) {
new->mode = KVM_APIC_MODE_XAPIC_FLAT |
KVM_APIC_MODE_XAPIC_CLUSTER;
continue;
}
- cluster[ffs(mask) - 1] = apic;
+ cluster[ldr] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:20:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 17/19] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

Iterate over all target logical IDs in the AVIC kick fastpath instead of
bailing if there is more than one target and KVM's optimized APIC map is
enabled for logical mode. If the optimized map is enabled, all vCPUs are
guaranteed to be mapped 1:1 to a logical ID or effectively have logical
mode disabled, i.e. iterating over the bitmap is guaranteed to kick each
target exactly once.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 126 +++++++++++++++++++++++++---------------
1 file changed, 79 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2095ece70712..dad5affe44c1 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -339,6 +339,62 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
icrl & APIC_VECTOR_MASK);
}

+static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
+ u32 icrl)
+{
+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID.
+ */
+ struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
+
+ /* Once again, nothing to do if the target vCPU doesn't exist. */
+ if (unlikely(!target_vcpu))
+ return;
+
+ avic_kick_vcpu(target_vcpu, icrl);
+}
+
+static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
+ u32 logid_index, u32 icrl)
+{
+ u32 physical_id;
+
+ if (!avic_logical_id_table) {
+ u32 logid_entry = avic_logical_id_table[logid_index];
+
+ /* Nothing to do if the logical destination is invalid. */
+ if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
+ return;
+
+ physical_id = logid_entry &
+ AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
+ } else {
+ /*
+ * For x2APIC, the logical APIC ID is a read-only value that is
+ * derived from the x2APIC ID, thus the x2APIC ID can be found
+ * by reversing the calculation (stored in logid_index). Note,
+ * bits 31:20 of the x2APIC ID aren't propagated to the logical
+ * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
+ */
+ physical_id = logid_index;
+ }
+
+ avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
+}
+
+static bool is_optimized_logical_map_enabled(struct kvm *kvm)
+{
+ struct kvm_apic_map *map;
+ bool enabled;
+
+ rcu_read_lock();
+ map = rcu_dereference(kvm->arch.apic_map);
+ enabled = map && map->logical_mode != KVM_APIC_MODE_MAP_DISABLED;
+ rcu_read_unlock();
+ return enabled;
+}
+
/*
* A fast-path version of avic_kick_target_vcpus(), which attempts to match
* destination APIC ID to vCPU without looping through all vCPUs.
@@ -346,11 +402,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
u32 icrl, u32 icrh, u32 index)
{
- u32 l1_physical_id, dest;
- struct kvm_vcpu *target_vcpu;
int dest_mode = icrl & APIC_DEST_MASK;
int shorthand = icrl & APIC_SHORT_MASK;
struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
+ u32 dest;

if (shorthand != APIC_DEST_NOSHORT)
return -EINVAL;
@@ -367,14 +422,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
return -EINVAL;

- l1_physical_id = dest;
-
- if (WARN_ON_ONCE(l1_physical_id != index))
+ if (WARN_ON_ONCE(dest != index))
return -EINVAL;

+ avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
} else {
- u32 bitmap, cluster;
- int logid_index;
+ u32 *avic_logical_id_table;
+ unsigned long bitmap, i;
+ u32 cluster;

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
@@ -394,50 +449,27 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (unlikely(!bitmap))
return 0;

- if (!is_power_of_2(bitmap))
- /* multiple logical destinations, use slow path */
+ /*
+ * Use the slow path if more than one bit is set in the bitmap
+ * and KVM's optimized logical map is disabled to avoid kicking
+ * a vCPU multiple times. If the optimized map is disabled, a
+ * vCPU _may_ have multiple bits set in its logical ID, i.e.
+ * may have multiple entries in the logical table.
+ */
+ if (!is_power_of_2(bitmap) &&
+ !is_optimized_logical_map_enabled(kvm))
return -EINVAL;

- logid_index = cluster + __ffs(bitmap);
-
- if (!apic_x2apic_mode(source)) {
- u32 *avic_logical_id_table =
- page_address(kvm_svm->avic_logical_id_table_page);
-
- u32 logid_entry = avic_logical_id_table[logid_index];
-
- if (WARN_ON_ONCE(index != logid_index))
- return -EINVAL;
-
- /* Nothing to do if the logical destination is invalid. */
- if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
- return 0;
-
- l1_physical_id = logid_entry &
- AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
- } else {
- /*
- * For x2APIC, the logical APIC ID is a read-only value
- * that is derived from the x2APIC ID, thus the x2APIC
- * ID can be found by reversing the calculation (done
- * above). Note, bits 31:20 of the x2APIC ID are not
- * propagated to the logical ID, but KVM limits the
- * x2APIC ID limited to KVM_MAX_VCPU_IDS.
- */
- l1_physical_id = logid_index;
- }
+ if (apic_x2apic_mode(source))
+ avic_logical_id_table = NULL;
+ else
+ avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
+
+ for_each_set_bit(i, &bitmap, 16)
+ avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
+ cluster + i, icrl);
}

- /*
- * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
- * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
- * vCPU doesn't exist.
- */
- target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
- if (unlikely(!target_vcpu))
- return 0;
-
- avic_kick_vcpu(target_vcpu, icrl);
return 0;
}

--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:20:55

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/19] KVM: SVM: Remove redundant cluster calculation that also creates a shadow

Drop the redundant "cluster" calculation and its horrific shadowing in
the x2avic logical mode path. The "cluster" that is declared at an outer
scope is derived using the exact same calculation and has performed the
left-shift.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 14f567550a1e..8c9cad96008e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -410,10 +410,9 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
* For x2APIC logical mode, cannot leverage the index.
* Instead, calculate physical ID from logical ID in ICRH.
*/
- int cluster = (icrh & 0xffff0000) >> 16;
int apic = ffs(bitmap) - 1;

- l1_physical_id = (cluster << 4) + apic;
+ l1_physical_id = cluster + apic;
}
}

--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:22:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 15/19] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0

Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if
the vCPU will never response to logical mode interrupts. KVM already
skips setup in this case, but relies on kvm_apic_map_get_logical_dest()
to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR
can yield mask==0 depending on the mode, but explicitly handling the LDR
will make it simpler to clean up the logical mode tracking in the future.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c224b5c7cd92..8209caffe3ab 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -318,10 +318,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;

ldr = kvm_lapic_get_reg(apic, APIC_LDR);
+ if (!ldr)
+ continue;

if (apic_x2apic_mode(apic)) {
new->mode |= KVM_APIC_MODE_X2APIC;
- } else if (ldr) {
+ } else {
ldr = GET_APIC_LOGICAL_ID(ldr);
if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:24:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 07/19] KVM: SVM: Drop buggy and redundant AVIC "single logical dest" check

Use the already-calculated-and-sanity-checked destination bitmap when
processing a fast AVIC kick in logical mode, and drop the logical path's
flawed logic. The intent of the check is to ensure the bitmap is a power
of two, whereas "icrh != (1 << avic)" effectively checks that the bitmap
is a power of two _and_ the target cluster is '0'.

Note, the flawed check isn't a functional issue, it simply means that KVM
will go down the slow path if the target cluster is non-zero.

Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3c333cd2e752..14f567550a1e 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -411,15 +411,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
* Instead, calculate physical ID from logical ID in ICRH.
*/
int cluster = (icrh & 0xffff0000) >> 16;
- int apic = ffs(icrh & 0xffff) - 1;
-
- /*
- * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
- * contains anything but a single bit, we cannot use the
- * fast path, because it is limited to a single vCPU.
- */
- if (apic < 0 || icrh != (1 << apic))
- return -EINVAL;
+ int apic = ffs(bitmap) - 1;

l1_physical_id = (cluster << 4) + apic;
}
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:24:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 10/19] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
vCPU ID, i.e. that there's no need to check for a destination match in
the AVIC kick fast path.

Opportunistically tweak comments to remove "guest bug", as that suggests
KVM is punting on error handling, which is not the case. Targeting a
non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
or not it's a guest bug is irrelevant. Such behavior is architecturally
legal and thus needs to faithfully emulated by KVM (and it is).

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 05a1cde8175c..3959d4766911 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -380,8 +380,8 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
cluster = (dest >> 4) << 2;
}

+ /* Nothing to do if there are no destinations in the cluster. */
if (unlikely(!bitmap))
- /* guest bug: nobody to send the logical interrupt to */
return 0;

if (!is_power_of_2(bitmap))
@@ -399,7 +399,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (WARN_ON_ONCE(index != logid_index))
return -EINVAL;

- /* guest bug: non existing/reserved logical destination */
+ /* Nothing to do if the logical destination is invalid. */
if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
return 0;

@@ -418,9 +418,13 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
}
}

+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
+ * vCPU doesn't exist.
+ */
target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
if (unlikely(!target_vcpu))
- /* guest bug: non existing vCPU is a target of this IPI*/
return 0;

target_vcpu->arch.apic->irr_pending = true;
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:25:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 09/19] KVM: SVM: Drop duplicate calcuation of AVIC/x2AVIC "logical index"

Drop the duplicate calculation of the logical "index", which ends up
being the same for x2APIC vs. xAPIC: cluster + bit number.

Note, the existing code is a mess and uses ffs(), which is 1-based, and
__ffs(), which is 0-based, for the exact same calculation, i.e. replacing
"ffs(bitmap) - 1" with "__ffs(bitmap)" is intentional.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8c9cad96008e..05a1cde8175c 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -407,12 +407,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
} else {
/*
- * For x2APIC logical mode, cannot leverage the index.
- * Instead, calculate physical ID from logical ID in ICRH.
+ * For x2APIC, the logical APIC ID is a read-only value
+ * that is derived from the x2APIC ID, thus the x2APIC
+ * ID can be found by reversing the calculation (done
+ * above). Note, bits 31:20 of the x2APIC ID are not
+ * propagated to the logical ID, but KVM limits the
+ * x2APIC ID limited to KVM_MAX_VCPU_IDS.
*/
- int apic = ffs(bitmap) - 1;
-
- l1_physical_id = cluster + apic;
+ l1_physical_id = logid_index;
}
}

--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:27:59

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 19/19] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"

Turns out that some warnings exist for good reasons. Restore the warning
in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
running vCPU now that KVM avoids doing so when switching between x2APIC
and xAPIC. The entire point of the WARN is to highlight that KVM should
not be reloading an AVIC.

Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
spamming the kernel if it does fire.

This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index b2033a56010c..3c300113d40b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -1080,6 +1080,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
return;

entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);

entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:31:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 12/19] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs

Disable the optimized APIC logical map if a logical ID covers multiple
MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode,
events match if "ID & MDA != 0", i.e. creating an entry for only the
first bit can cause interrupts to be missed.

Note, creating an entry for every bit is also wrong as KVM would generate
IPIs for every matching bit. It would be possible to teach KVM to play
nice with this edge case, but it is very much an edge case and probably
not used in any real world OS, i.e. it's not worth optimizing.

Use an impossible value for the "mode" to effectively designate that it's
disabled. Don't bother adding a dedicated "invalid" value, the mode
handling will be cleaned up in the future and it would take just as much
effort to explain what value is "safe" at this time.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9dda989a1cf0..82278acae95b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -300,8 +300,15 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
continue;

- if (mask)
- cluster[ffs(mask) - 1] = apic;
+ if (!mask)
+ continue;
+
+ if (!is_power_of_2(mask)) {
+ new->mode = KVM_APIC_MODE_XAPIC_FLAT |
+ KVM_APIC_MODE_XAPIC_CLUSTER;
+ continue;
+ }
+ cluster[ffs(mask) - 1] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.37.2.672.g94769d06f0-goog

2022-08-31 01:32:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/19] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick

Compute the destination from ICRH using the sender's x2APIC status, not
each (potential) target's x2APIC status.

Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID")
Cc: Li RongQing <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index b59f8ee2671f..3ace0f2f52f0 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -441,6 +441,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
u32 icrl, u32 icrh, u32 index)
{
+ u32 dest = apic_x2apic_mode(source) ? icrh : GET_XAPIC_DEST_FIELD(icrh);
unsigned long i;
struct kvm_vcpu *vcpu;

@@ -456,13 +457,6 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
* since entered the guest will have processed pending IRQs at VMRUN.
*/
kvm_for_each_vcpu(i, vcpu, kvm) {
- u32 dest;
-
- if (apic_x2apic_mode(vcpu->arch.apic))
- dest = icrh;
- else
- dest = GET_XAPIC_DEST_FIELD(icrh);
-
if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
dest, icrl & APIC_DEST_MASK)) {
vcpu->arch.apic->irr_pending = true;
--
2.37.2.672.g94769d06f0-goog

2022-08-31 06:32:17

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 19/19] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Turns out that some warnings exist for good reasons. Restore the warning
> in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
> running vCPU now that KVM avoids doing so when switching between x2APIC
> and xAPIC. The entire point of the WARN is to highlight that KVM should
> not be reloading an AVIC.
>
> Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
> spamming the kernel if it does fire.
>
> This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index b2033a56010c..3c300113d40b 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -1080,6 +1080,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> return;
>
> entry = READ_ONCE(*(svm->avic_physical_id_cache));
> + WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
>
> entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
> entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);

Note that this warning was removed because it would trigger wheh x2avic code would switch
between xapic and x2apic.

I do agree 100% that this warning is useful.

Best regards,
Maxim Levitsky

2022-08-31 06:33:32

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 07/19] KVM: SVM: Drop buggy and redundant AVIC "single logical dest" check

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Use the already-calculated-and-sanity-checked destination bitmap when
> processing a fast AVIC kick in logical mode, and drop the logical path's
> flawed logic. The intent of the check is to ensure the bitmap is a power
> of two, whereas "icrh != (1 << avic)" effectively checks that the bitmap
> is a power of two _and_ the target cluster is '0'.
>
> Note, the flawed check isn't a functional issue, it simply means that KVM
> will go down the slow path if the target cluster is non-zero.
>
> Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 10 +---------
> 1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 3c333cd2e752..14f567550a1e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -411,15 +411,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> * Instead, calculate physical ID from logical ID in ICRH.
> */
> int cluster = (icrh & 0xffff0000) >> 16;
> - int apic = ffs(icrh & 0xffff) - 1;
> -
> - /*
> - * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
> - * contains anything but a single bit, we cannot use the
> - * fast path, because it is limited to a single vCPU.
> - */
> - if (apic < 0 || icrh != (1 << apic))
> - return -EINVAL;
> + int apic = ffs(bitmap) - 1;
>
> l1_physical_id = (cluster << 4) + apic;
> }

Oh, I didn't notice this bug. However isn't removing the check is wrong as well?

What if we do have multiple bits set in the bitmap? After you remove this code,
we will set IPI only to APIC which matches the 1st bit, no?
(The fast code only sends IPI to one vCPU)

Best regards,
Maxim Levitsky

2022-08-31 07:08:28

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 19/19] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"

On Wed, 2022-08-31 at 09:07 +0300, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > Turns out that some warnings exist for good reasons. Restore the warning
> > in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
> > running vCPU now that KVM avoids doing so when switching between x2APIC
> > and xAPIC. The entire point of the WARN is to highlight that KVM should
> > not be reloading an AVIC.
> >
> > Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
> > spamming the kernel if it does fire.
> >
> > This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.
> >
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/svm/avic.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index b2033a56010c..3c300113d40b 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -1080,6 +1080,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> > return;
> >
> > entry = READ_ONCE(*(svm->avic_physical_id_cache));
> > + WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);
> >
> > entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
> > entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
>
> Note that this warning was removed because it would trigger wheh x2avic code would switch
> between xapic and x2apic.
>
> I do agree 100% that this warning is useful.

Ah I see that you fixed the apic reloading to not trigger this code,
need more coffee - in this case this patch makes lot of sense.

I'll review it again when I review rest of the patches in the patch series.

Best regards,
Maxim Levitsky
>
> Best regards,
> Maxim Levitsky


2022-08-31 07:13:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 06/19] KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not ICHR[31:16]

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> When attempting a fast kick for x2AVIC, get the destination bitmap from
> ICR[15:0], not ICHR[31:16]. The upper 16 bits contain the cluster, the
> lower 16 bits hold the bitmap.
>
> Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 3ace0f2f52f0..3c333cd2e752 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -368,7 +368,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
>
> if (apic_x2apic_mode(source)) {
> /* 16 bit dest mask, 16 bit cluster id */
> - bitmap = dest & 0xFFFF0000;
> + bitmap = dest & 0xFFFF;
> cluster = (dest >> 16) << 4;
> } else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
> /* 8 bit dest mask*/

I swear I have seen a patch from Suravee Suthikulpanit fixing this my mistake, I don't know why it was not
accepted upstream.

Best regards,
Maxim Levitsky

2022-08-31 09:53:58

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 05/19] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Compute the destination from ICRH using the sender's x2APIC status, not
> each (potential) target's x2APIC status.
>
> Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support 32-bit APIC ID")
> Cc: Li RongQing <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index b59f8ee2671f..3ace0f2f52f0 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -441,6 +441,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> u32 icrl, u32 icrh, u32 index)
> {
> + u32 dest = apic_x2apic_mode(source) ? icrh : GET_XAPIC_DEST_FIELD(icrh);
> unsigned long i;
> struct kvm_vcpu *vcpu;
>
> @@ -456,13 +457,6 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> * since entered the guest will have processed pending IRQs at VMRUN.
> */
> kvm_for_each_vcpu(i, vcpu, kvm) {
> - u32 dest;
> -
> - if (apic_x2apic_mode(vcpu->arch.apic))
> - dest = icrh;
> - else
> - dest = GET_XAPIC_DEST_FIELD(icrh);
> -
> if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> dest, icrl & APIC_DEST_MASK)) {
> vcpu->arch.apic->irr_pending = true;

I didn't notice this in a review, makes sense.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2022-08-31 09:55:24

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 06/19] KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not ICHR[31:16]

On Wed, 2022-08-31 at 09:09 +0300, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > When attempting a fast kick for x2AVIC, get the destination bitmap from
> > ICR[15:0], not ICHR[31:16]. The upper 16 bits contain the cluster, the
> > lower 16 bits hold the bitmap.
> >
> > Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
> > Cc: Maxim Levitsky <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/svm/avic.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 3ace0f2f52f0..3c333cd2e752 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -368,7 +368,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> >
> > if (apic_x2apic_mode(source)) {
> > /* 16 bit dest mask, 16 bit cluster id */
> > - bitmap = dest & 0xFFFF0000;
> > + bitmap = dest & 0xFFFF;
> > cluster = (dest >> 16) << 4;
> > } else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
> > /* 8 bit dest mask*/
>
> I swear I have seen a patch from Suravee Suthikulpanit fixing this my mistake, I don't know why it was not
> accepted upstream.

This is the patch, which I guess got forgotten.

https://www.spinics.net/lists/kernel/msg4417427.html

Since it is literaly the same patch, you can just add credit to Suravee Suthikulpanit.

So with the credit added:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2022-08-31 10:17:01

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 04/19] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Replace the "avic_mode" enum with a single bool to track whether or not
> x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any
> flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
> redundant and unnecessary noise.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 45 +++++++++++++++++++----------------------
> arch/x86/kvm/svm/svm.c | 2 +-
> arch/x86/kvm/svm/svm.h | 9 +--------
> 3 files changed, 23 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 1d516d658f9a..b59f8ee2671f 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -53,7 +53,7 @@ static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
> static u32 next_vm_id = 0;
> static bool next_vm_id_wrapped = 0;
> static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
> -enum avic_modes avic_mode;
> +bool x2avic_enabled;
>
> /*
> * This is a wrapper of struct amd_iommu_ir_data.
> @@ -231,8 +231,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
> u64 *avic_physical_id_table;
> struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
>
> - if ((avic_mode == AVIC_MODE_X1 && index > AVIC_MAX_PHYSICAL_ID) ||
> - (avic_mode == AVIC_MODE_X2 && index > X2AVIC_MAX_PHYSICAL_ID))
> + if ((!x2avic_enabled && index > AVIC_MAX_PHYSICAL_ID) ||
> + (index > X2AVIC_MAX_PHYSICAL_ID))
> return NULL;
>
> avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page);
> @@ -279,8 +279,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
> int id = vcpu->vcpu_id;
> struct vcpu_svm *svm = to_svm(vcpu);
>
> - if ((avic_mode == AVIC_MODE_X1 && id > AVIC_MAX_PHYSICAL_ID) ||
> - (avic_mode == AVIC_MODE_X2 && id > X2AVIC_MAX_PHYSICAL_ID))
> + if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
> + (id > X2AVIC_MAX_PHYSICAL_ID))
> return -EINVAL;
>
> if (!vcpu->arch.apic->regs)
> @@ -532,7 +532,7 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> * AVIC must be disabled if x2AVIC isn't supported and the guest has
> * x2APIC enabled.
> */
> - if (avic_mode != AVIC_MODE_X2 && apic_x2apic_mode(vcpu->arch.apic))
> + if (!x2avic_enabled && apic_x2apic_mode(vcpu->arch.apic))
> return APICV_INHIBIT_REASON_X2APIC;
>
> return 0;
> @@ -1086,10 +1086,7 @@ void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb01.ptr;
>
> - if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
> - return;
> -
> - if (!enable_apicv)
> + if (!lapic_in_kernel(vcpu) || !enable_apicv)
> return;
>
> if (kvm_vcpu_apicv_active(vcpu)) {
> @@ -1165,32 +1162,32 @@ bool avic_hardware_setup(struct kvm_x86_ops *x86_ops)
> if (!npt_enabled)
> return false;
>
> + /* AVIC is a prerequisite for x2AVIC. */
> + if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) {
> + if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
> + pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
> + pr_warn(FW_BUG "Try enable AVIC using force_avic option");
> + }
> + return false;
> + }
> +
> if (boot_cpu_has(X86_FEATURE_AVIC)) {
> - avic_mode = AVIC_MODE_X1;
> pr_info("AVIC enabled\n");
> } else if (force_avic) {
> /*
> * Some older systems does not advertise AVIC support.
> * See Revision Guide for specific AMD processor for more detail.
> */
> - avic_mode = AVIC_MODE_X1;
> pr_warn("AVIC is not supported in CPUID but force enabled");
> pr_warn("Your system might crash and burn");
> }
>
> /* AVIC is a prerequisite for x2AVIC. */
> - if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
> - if (avic_mode == AVIC_MODE_X1) {
> - avic_mode = AVIC_MODE_X2;
> - pr_info("x2AVIC enabled\n");
> - } else {
> - pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
> - pr_warn(FW_BUG "Try enable AVIC using force_avic option");
> - }
> - }
> + x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC);
> + if (x2avic_enabled)
> + pr_info("x2AVIC enabled\n");
>
> - if (avic_mode != AVIC_MODE_NONE)
> - amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
> + amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
>
> - return !!avic_mode;
> + return true;
> }
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index f3813dbacb9f..b25c89069128 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -821,7 +821,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
> if (intercept == svm->x2avic_msrs_intercepted)
> return;
>
> - if (avic_mode != AVIC_MODE_X2 ||
> + if (!x2avic_enabled ||
> !apic_x2apic_mode(svm->vcpu.arch.apic))
> return;
>
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 6a7686bf6900..cee79ade400a 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -35,14 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> extern bool npt_enabled;
> extern int vgif;
> extern bool intercept_smi;
> -
> -enum avic_modes {
> - AVIC_MODE_NONE = 0,
> - AVIC_MODE_X1,
> - AVIC_MODE_X2,
> -};
> -
> -extern enum avic_modes avic_mode;
> +extern bool x2avic_enabled;
>
> /*
> * Clean bits in VMCB.

Overall looks like an improvement, I would probably do it this way if I were to
implement this code, but the original code is alright as well.


Reviewed-by: Maxim Levitsky <[email protected]>


Best regards,
Maxim Levitsky

2022-08-31 10:23:45

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 08/19] KVM: SVM: Remove redundant cluster calculation that also creates a shadow

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Drop the redundant "cluster" calculation and its horrific shadowing in
> the x2avic logical mode path. The "cluster" that is declared at an outer
> scope is derived using the exact same calculation and has performed the
> left-shift.

Actually I think we should just revert the commit
'KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible'


I know that the patch that intially introduced the the avic_kick_target_vcpus_fast had
x2apic/x2avic code, and then it was split to avoid adding x2avic code before it was merged,
resulting in this patch to add the x2apic specific code.

But when I fixed most issues of avic_kick_target_vcpus_fast in my
'KVM: x86: SVM: fix avic_kick_target_vcpus_fast', I added back x2apic code because
it was just natural to do since I had to calculate cluster/bitmap masks anyway.

I expected this patch to be dropped because of this but I guess it was not noticed,
or patches were merged in the wrong order.

This is the reason of shadowing, duplicate calculations/etc.

Patch 9 and 7 of your series can be dropped as well then.

Best regards,
Maxim Levitsky


>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 14f567550a1e..8c9cad96008e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -410,10 +410,9 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> * For x2APIC logical mode, cannot leverage the index.
> * Instead, calculate physical ID from logical ID in ICRH.
> */
> - int cluster = (icrh & 0xffff0000) >> 16;
> int apic = ffs(bitmap) - 1;
>
> - l1_physical_id = (cluster << 4) + apic;
> + l1_physical_id = cluster + apic;
> }
> }
>


2022-08-31 10:43:30

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 10/19] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
> vCPU ID, i.e. that there's no need to check for a destination match in
> the AVIC kick fast path.
>
> Opportunistically tweak comments to remove "guest bug", as that suggests
> KVM is punting on error handling, which is not the case. Targeting a
> non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
> or not it's a guest bug is irrelevant. Such behavior is architecturally
> legal and thus needs to faithfully emulated by KVM (and it is).

I don't want to pick a fight, but personally these things *are* guest bugs / improper usage of APIC,
and I don't think it is wrong to document them as such.

Reviewed-by: Maxim Levitsky <[email protected]>


>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 05a1cde8175c..3959d4766911 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -380,8 +380,8 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> cluster = (dest >> 4) << 2;
> }
>
> + /* Nothing to do if there are no destinations in the cluster. */
> if (unlikely(!bitmap))
> - /* guest bug: nobody to send the logical interrupt to */
> return 0;
>
> if (!is_power_of_2(bitmap))
> @@ -399,7 +399,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> if (WARN_ON_ONCE(index != logid_index))
> return -EINVAL;
>
> - /* guest bug: non existing/reserved logical destination */
> + /* Nothing to do if the logical destination is invalid. */
> if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
> return 0;
>
> @@ -418,9 +418,13 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> }
> }
>
> + /*
> + * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
> + * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
> + * vCPU doesn't exist.
> + */
> target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
> if (unlikely(!target_vcpu))
> - /* guest bug: non existing vCPU is a target of this IPI*/
> return 0;
>
> target_vcpu->arch.apic->irr_pending = true;


2022-08-31 10:44:17

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 02/19] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
> avic_set_virtual_apic_mode() and invert the dependency being said
> functions to avoid calling avic_vcpu_{load,put}() and
> avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.
>
> avic_set_virtual_apic_mode() is invoked from common x86 with preemption
> enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
> those and updating IRTE stuff is unnecessary as the only reason
> avic_set_virtual_apic_mode() is called is to handle transitions between
> xAPIC and x2APIC that don't also toggle APICv activation. And if
> activation doesn't change, there's no need to fiddle with the physical
> APIC ID table or update IRTE.
>
> The "full" refresh is guaranteed to be called if activation changes in
> this case as the only call to the "set" path is:
>
> kvm_vcpu_update_apicv(vcpu);
> static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
>
> and kvm_vcpu_update_apicv() invokes the refresh if activation changes:
>
> if (apic->apicv_active == activate)
> goto out;
>
> apic->apicv_active = activate;
> kvm_apic_update_apicv(vcpu);
> static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);
>
> WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
> CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
> Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
> RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
> avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
> avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
> kvm_lapic_set_base+0x149/0x1a0 [kvm]
> kvm_set_apic_base+0x8f/0xd0 [kvm]
> kvm_set_msr_common+0xa3a/0xdc0 [kvm]
> svm_set_msr+0x364/0x6b0 [kvm_amd]
> __kvm_set_msr+0xb8/0x1c0 [kvm]
> kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
> msr_interception+0x1c/0x30 [kvm_amd]
> svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
> svm_handle_exit+0xfc/0x160 [kvm_amd]
> vcpu_enter_guest+0x21bb/0x23e0 [kvm]
> vcpu_run+0x92/0x450 [kvm]
> kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
> kvm_vcpu_ioctl+0x559/0x620 [kvm]
>
> Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
> Cc: [email protected]
> Cc: Suravee Suthikulpanit <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
> 1 file changed, 15 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index b1ade555e8d0..f3a74c8284cb 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -741,18 +741,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> avic_handle_ldr_update(vcpu);
> }
>
> -void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
> -{
> - if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
> - return;
> -
> - if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
> - WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
> - return;
> - }
> - avic_refresh_apicv_exec_ctrl(vcpu);
> -}
> -
> static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
> {
> int ret = 0;
> @@ -1094,17 +1082,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
> WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
> }
>
> -
> -void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> +void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
> {
> struct vcpu_svm *svm = to_svm(vcpu);
> struct vmcb *vmcb = svm->vmcb01.ptr;
> - bool activated = kvm_vcpu_apicv_active(vcpu);
> +
> + if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
> + return;
>
> if (!enable_apicv)
> return;
>
> - if (activated) {
> + if (kvm_vcpu_apicv_active(vcpu)) {
> /**
> * During AVIC temporary deactivation, guest could update
> * APIC ID, DFR and LDR registers, which would not be trapped
> @@ -1118,6 +1107,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> avic_deactivate_vmcb(svm);
> }
> vmcb_mark_dirty(vmcb, VMCB_AVIC);
> +}
> +
> +void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> +{
> + bool activated = kvm_vcpu_apicv_active(vcpu);
> +
> + if (!enable_apicv)
> + return;
> +
> + avic_set_virtual_apic_mode(vcpu);

This call is misleading - this will usually be called
when avic mode didn't change - I think we need a better name for
avic_set_virtual_apic_mode.

Other than that this makes sense.

Best regards,
Maxim Levitsky

>
> if (activated)
> avic_vcpu_load(vcpu, vcpu->cpu);


2022-08-31 11:04:18

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 18/19] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
> unaccelerated write traps. The register is read-only and isn't emulated
> by KVM. Sending the register through kvm_apic_write_nodecode() will
> result in screaming when x2APIC is enabled due to the unexpected failure
> to retrieve the MSR (KVM expects that only "legal" accesses will trap).

I wonder about ESR register as well (280H), KVM doesn't seem to support it either,
but allows 0 writes. Anyway:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

>
> Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index dad5affe44c1..b2033a56010c 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -675,6 +675,9 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
> case APIC_DFR:
> avic_handle_dfr_update(vcpu);
> break;
> + case APIC_RRR:
> + /* Ignore writes to Read Remote Data, it's read-only. */
> + return 1;
> default:
> break;
> }


2022-08-31 11:34:36

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 11/19] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Add a helper to perform the final kick, two instances of the ICR decoding
> is one too many.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 25 +++++++++++++------------
> 1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 3959d4766911..2095ece70712 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -329,6 +329,16 @@ void avic_ring_doorbell(struct kvm_vcpu *vcpu)
> put_cpu();
> }
>
> +
> +static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
> +{
> + vcpu->arch.apic->irr_pending = true;
> + svm_complete_interrupt_delivery(vcpu,
> + icrl & APIC_MODE_MASK,
> + icrl & APIC_INT_LEVELTRIG,
> + icrl & APIC_VECTOR_MASK);
> +}
> +
> /*
> * A fast-path version of avic_kick_target_vcpus(), which attempts to match
> * destination APIC ID to vCPU without looping through all vCPUs.
> @@ -427,11 +437,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> if (unlikely(!target_vcpu))
> return 0;
>
> - target_vcpu->arch.apic->irr_pending = true;
> - svm_complete_interrupt_delivery(target_vcpu,
> - icrl & APIC_MODE_MASK,
> - icrl & APIC_INT_LEVELTRIG,
> - icrl & APIC_VECTOR_MASK);
> + avic_kick_vcpu(target_vcpu, icrl);
> return 0;
> }
>
> @@ -455,13 +461,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> */
> kvm_for_each_vcpu(i, vcpu, kvm) {
> if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> - dest, icrl & APIC_DEST_MASK)) {
> - vcpu->arch.apic->irr_pending = true;
> - svm_complete_interrupt_delivery(vcpu,
> - icrl & APIC_MODE_MASK,
> - icrl & APIC_INT_LEVELTRIG,
> - icrl & APIC_VECTOR_MASK);
> - }
> + dest, icrl & APIC_DEST_MASK))
> + avic_kick_vcpu(vcpu, icrl);
> }
> }
>

I don't know what I think about this, sometimes *minor* code duplication might actually
be a good thing, as it is easier to read the code, but I don't have much against this
as well.

I am not sure if before or after this code is more readable.

But anyway,

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2022-08-31 13:36:14

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 12/19] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Disable the optimized APIC logical map if a logical ID covers multiple
> MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode,
> events match if "ID & MDA != 0", i.e. creating an entry for only the
> first bit can cause interrupts to be missed.
>
> Note, creating an entry for every bit is also wrong as KVM would generate
> IPIs for every matching bit. It would be possible to teach KVM to play
> nice with this edge case, but it is very much an edge case and probably
> not used in any real world OS, i.e. it's not worth optimizing.
>
> Use an impossible value for the "mode" to effectively designate that it's
> disabled. Don't bother adding a dedicated "invalid" value, the mode
> handling will be cleaned up in the future and it would take just as much
> effort to explain what value is "safe" at this time.
>
> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9dda989a1cf0..82278acae95b 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -300,8 +300,15 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
> continue;
>
> - if (mask)
> - cluster[ffs(mask) - 1] = apic;
> + if (!mask)
> + continue;
> +
> + if (!is_power_of_2(mask)) {
> + new->mode = KVM_APIC_MODE_XAPIC_FLAT |
> + KVM_APIC_MODE_XAPIC_CLUSTER;
> + continue;
> + }
> + cluster[ffs(mask) - 1] = apic;
> }
> out:
> old = rcu_dereference_protected(kvm->arch.apic_map,


I was about to complain about the abuse of the invalid mode,
but I see that this is fixed in later patch as it is said in the commit
description, so no complains.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2022-08-31 13:52:10

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 13/19] KVM: x86: Disable APIC logical map if vCPUs are aliased in logical mode

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Disable the optimized APIC logical map if multiple vCPUs are aliased to
> the same logical ID. Architecturally, all CPUs whose logical ID matches
> the MDA are supposed to receive the interrupt; overwriting existing map
> entries can result in missed IPIs.
>
> Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 82278acae95b..d537b51295d6 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -303,12 +303,13 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> if (!mask)
> continue;
>
> - if (!is_power_of_2(mask)) {
> + ldr = ffs(mask) - 1;
> + if (!is_power_of_2(mask) || cluster[ldr]) {
> new->mode = KVM_APIC_MODE_XAPIC_FLAT |
> KVM_APIC_MODE_XAPIC_CLUSTER;
> continue;
> }
> - cluster[ffs(mask) - 1] = apic;
> + cluster[ldr] = apic;
> }
> out:
> old = rcu_dereference_protected(kvm->arch.apic_map,

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

2022-08-31 13:53:53

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 14/19] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
> for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to
> honor x86 architectural behavior if multiple vCPUs shared a physical APIC
> ID. As called out in the changelog that added the hack, all CPUs whose
> (possibly truncated) APIC ID matches the target are supposed to receive
> the IPI.
>
> KVM intentionally differs from real hardware, because real hardware
> (Knights Landing) does just "x2apic_id & 0xff" to decide whether to
> accept the interrupt in xAPIC mode and it can deliver one interrupt to
> more than one physical destination, e.g. 0x123 to 0x123 and 0x23.
>
> Applying the hack even when x2APIC is not fully enabled means KVM doesn't
> correctly handle scenarios where the guest has aliased xAPIC IDs across
> multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
> interrupts. It's extremely unlikely any real world guest aliase APIC IDs,
> or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
> lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
> vCPU is "normal".
>
> Furthermore, the hack is _not_ guaranteed to work! The hack works if and
> only if the optimized APIC map is successfully allocated. If the map
> allocation fails (unlikely), KVM will fall back to its unoptimized
> behavior, which _does_ honor the architectural behavior.
>
> Pivot on 32-bit x2APIC IDs being enabled as that is required to take
> advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
> break existing setups unless they are way, way off in the weeds.
>
> And an entry in KVM's errata to document the hack. Alternatively, KVM
> could provide an actual x2APIC quirk and document the hack that way, but
> there's unlikely to ever be a use case for disabling the quirk. Go the
> errata route to avoid having to validate a quirk no one cares about.
>
> Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> Documentation/virt/kvm/x86/errata.rst | 11 ++++++
> arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++-----
> 2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
> index 410e0aa63493..49a05f24747b 100644
> --- a/Documentation/virt/kvm/x86/errata.rst
> +++ b/Documentation/virt/kvm/x86/errata.rst
> @@ -37,3 +37,14 @@ Nested virtualization features
> ------------------------------
>
> TBD
> +
> +x2APIC
> +------
> +When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
> +allows sending events to a single vCPU using its x2APIC ID even if the target
> +vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
> +on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
> +have the same physical APIC ID, KVM will deliver events targeting that APIC ID
> +only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
> +not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
> +matching the target APIC ID receive the interrupt).
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index d537b51295d6..c224b5c7cd92 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -260,10 +260,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct kvm_lapic **cluster;
> + u32 x2apic_id, physical_id;
> u16 mask;
> u32 ldr;
> u8 xapic_id;
> - u32 x2apic_id;
>
> if (!kvm_apic_present(vcpu))
> continue;
> @@ -271,16 +271,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> xapic_id = kvm_xapic_id(apic);
> x2apic_id = kvm_x2apic_id(apic);
>
> - /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
> - if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> - x2apic_id <= new->max_apic_id)
> - new->phys_map[x2apic_id] = apic;
> /*
> - * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
> - * prevent them from masking VCPUs with APIC ID <= 0xff.
> + * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
> + * IDs. Allow sending events to vCPUs by their x2APIC ID even
> + * if the target vCPU is in legacy xAPIC mode, and silently
> + * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
> + * bits, causing IDs > 0xff to wrap and collide).
> + *
> + * Honor the architectural (and KVM's non-optimized) behavior
> + * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
> + * is supposed to process messages independently. If multiple
> + * vCPUs have the same effective APIC ID, e.g. due to the
> + * x2APIC wrap or because the guest manually modified its xAPIC
> + * IDs, events targeting that ID are supposed to be recognized
> + * by all vCPUs with said ID.
> */
> - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> - new->phys_map[xapic_id] = apic;
> + if (kvm->arch.x2apic_format) {
> + /* See also kvm_apic_match_physical_addr(). */
> + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> + x2apic_id <= new->max_apic_id)
> + new->phys_map[x2apic_id] = apic;
> +
> + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> + new->phys_map[xapic_id] = apic;
> + } else {
> + /*
> + * Disable the optimized map if the physical APIC ID is
> + * already mapped, i.e. is aliased to multiple vCPUs.
> + * The optimized map requires a strict 1:1 mapping
> + * between IDs and vCPUs.
> + */
> + if (apic_x2apic_mode(apic))
> + physical_id = x2apic_id;
> + else
> + physical_id = xapic_id;
> +
> + if (new->phys_map[physical_id]) {
> + kvfree(new);
> + new = NULL;
> + goto out;
Why not to use the same KVM_APIC_MODE_XAPIC_FLAT | KVM_APIC_MODE_XAPIC_CLUSTER
hack here?

Other than that looks correct but I don't know this area well enough to be 100% sure.

Best regards,
Maxim Levitsky


> + }
> + new->phys_map[physical_id] = apic;
> + }
>
> if (!kvm_apic_sw_enabled(apic))
> continue;


2022-08-31 13:57:11

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 15/19] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if
> the vCPU will never response to logical mode interrupts. KVM already
> skips setup in this case, but relies on kvm_apic_map_get_logical_dest()
> to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR
> can yield mask==0 depending on the mode, but explicitly handling the LDR
> will make it simpler to clean up the logical mode tracking in the future.

If I am not mistaken, the commit description is a bit misleading - in this case we just don't add
the vCPU to the map since it is unreachable, but we do continue creating the map.

Best regards,
MaxiM Levitsky

>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c224b5c7cd92..8209caffe3ab 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -318,10 +318,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> continue;
>
> ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> + if (!ldr)
> + continue;
>
> if (apic_x2apic_mode(apic)) {
> new->mode |= KVM_APIC_MODE_X2APIC;
> - } else if (ldr) {
> + } else {
> ldr = GET_APIC_LOGICAL_ID(ldr);
> if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> new->mode |= KVM_APIC_MODE_XAPIC_FLAT;


2022-08-31 14:28:00

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Track all possibilities for the optimized APIC map's logical modes
> instead of overloading the pseudo-bitmap and treating any "unknown" value
> as "invalid".
>
> As documented by the now-stale comment above the mode values, the values
> did have meaning when the optimized map was originally added. That
> dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
> physical LAPIC array for logical x2APIC"), but the obfuscated behavior
> and its comment were left behind.
>
> Opportunistically rename "mode" to "logical_mode", partly to make it
> clear that the "disabled" case applies only to the logical map, but also
> to prove that there is no lurking code that expects "mode" to be a bitmap.
>
> Functionally, this is a glorified nop.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 21 ++++++++++--------
> arch/x86/kvm/lapic.c | 38 ++++++++++++++++++++++++---------
> 2 files changed, 40 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1f51411f3112..0184e64ab555 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -955,19 +955,22 @@ struct kvm_arch_memory_slot {
> };
>
> /*
> - * We use as the mode the number of bits allocated in the LDR for the
> - * logical processor ID. It happens that these are all powers of two.
> - * This makes it is very easy to detect cases where the APICs are
> - * configured for multiple modes; in that case, we cannot use the map and
> - * hence cannot use kvm_irq_delivery_to_apic_fast either.
> + * Track the mode of the optimized logical map, as the rules for decoding the
> + * destination vary per mode. Enabling the optimized logical map requires all
> + * software-enabled local APIs to be in the same mode, each addressable APIC to
> + * be mapped to only one MDA, and each MDA to map to at most one APIC.
> */
> -#define KVM_APIC_MODE_XAPIC_CLUSTER 4
> -#define KVM_APIC_MODE_XAPIC_FLAT 8
> -#define KVM_APIC_MODE_X2APIC 16
> +enum kvm_apic_logical_mode {
> + KVM_APIC_MODE_SW_DISABLED,
> + KVM_APIC_MODE_XAPIC_CLUSTER,
> + KVM_APIC_MODE_XAPIC_FLAT,
> + KVM_APIC_MODE_X2APIC,
> + KVM_APIC_MODE_MAP_DISABLED,
> +};
>
> struct kvm_apic_map {
> struct rcu_head rcu;
> - u8 mode;
> + enum kvm_apic_logical_mode logical_mode;
> u32 max_apic_id;
> union {
> struct kvm_lapic *xapic_flat_map[8];
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 8209caffe3ab..3b6ef36b3963 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
>
> static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> - switch (map->mode) {
> + switch (map->logical_mode) {
> + case KVM_APIC_MODE_SW_DISABLED:
> + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> + *cluster = map->xapic_flat_map;
> + *mask = 0;
> + return true;
Could you explain why this is needed? I probably missed something.

> case KVM_APIC_MODE_X2APIC: {
> u32 offset = (dest_id >> 16) * 16;
> u32 max_apic_id = map->max_apic_id;
> @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> *cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
> *mask = dest_id & 0xf;
> return true;
> + case KVM_APIC_MODE_MAP_DISABLED:
> + return false;
> default:
> - /* Not optimized. */
> + WARN_ON_ONCE(1);
> return false;
> }
> }
> @@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> goto out;
>
> new->max_apic_id = max_id;
> + new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct kvm_lapic **cluster;
> + enum kvm_apic_logical_mode logical_mode;
> u32 x2apic_id, physical_id;
> u16 mask;
> u32 ldr;
> @@ -314,7 +323,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> new->phys_map[physical_id] = apic;
> }
>
> - if (!kvm_apic_sw_enabled(apic))
> + if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> + !kvm_apic_sw_enabled(apic))
> continue;
>
> ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> @@ -322,25 +332,33 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> continue;
>
> if (apic_x2apic_mode(apic)) {
> - new->mode |= KVM_APIC_MODE_X2APIC;
> + logical_mode = KVM_APIC_MODE_X2APIC;
> } else {
> ldr = GET_APIC_LOGICAL_ID(ldr);
> if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> - new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
> + logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> else
> - new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
> + logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> }
> + if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
> + new->logical_mode != logical_mode) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + continue;
> + }
> + new->logical_mode = logical_mode;
>
> - if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
> + if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> + &cluster, &mask))) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> continue;
> + }
>
> if (!mask)
> continue;
>
> ldr = ffs(mask) - 1;
> if (!is_power_of_2(mask) || cluster[ldr]) {
> - new->mode = KVM_APIC_MODE_XAPIC_FLAT |
> - KVM_APIC_MODE_XAPIC_CLUSTER;
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> continue;
> }
> cluster[ldr] = apic;
> @@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> {
> if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> if ((irq->dest_id == APIC_BROADCAST &&
> - map->mode != KVM_APIC_MODE_X2APIC))
> + map->logical_mode != KVM_APIC_MODE_X2APIC))
> return true;
> if (irq->dest_id == X2APIC_BROADCAST)
> return true;

To be honest I would put that patch first, and then do all the other patches, this
way you would avoid all of the hacks they do and removed here.


Other than that this looks like a very good cleanup.

Best regards,
Maxim Levitsky



2022-08-31 14:40:36

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 17/19] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Iterate over all target logical IDs in the AVIC kick fastpath instead of
> bailing if there is more than one target and KVM's optimized APIC map is
> enabled for logical mode. If the optimized map is enabled, all vCPUs are
> guaranteed to be mapped 1:1 to a logical ID or effectively have logical
> mode disabled, i.e. iterating over the bitmap is guaranteed to kick each
> target exactly once.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 126 +++++++++++++++++++++++++---------------
> 1 file changed, 79 insertions(+), 47 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 2095ece70712..dad5affe44c1 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -339,6 +339,62 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
> icrl & APIC_VECTOR_MASK);
> }
>
> +static void avic_kick_vcpu_by_physical_id(struct kvm *kvm, u32 physical_id,
> + u32 icrl)
> +{
> + /*
> + * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
> + * i.e. APIC ID == vCPU ID.
> + */
> + struct kvm_vcpu *target_vcpu = kvm_get_vcpu_by_id(kvm, physical_id);
> +
> + /* Once again, nothing to do if the target vCPU doesn't exist. */
> + if (unlikely(!target_vcpu))
> + return;
> +
> + avic_kick_vcpu(target_vcpu, icrl);
> +}
> +
> +static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
> + u32 logid_index, u32 icrl)
> +{
> + u32 physical_id;
> +
> + if (!avic_logical_id_table) {
^ Typo, the '!' shoudn't be there.

> + u32 logid_entry = avic_logical_id_table[logid_index];
> +
> + /* Nothing to do if the logical destination is invalid. */
> + if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
> + return;
> +
> + physical_id = logid_entry &
> + AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> + } else {
> + /*
> + * For x2APIC, the logical APIC ID is a read-only value that is
> + * derived from the x2APIC ID, thus the x2APIC ID can be found
> + * by reversing the calculation (stored in logid_index). Note,
> + * bits 31:20 of the x2APIC ID aren't propagated to the logical
> + * ID, but KVM limits the x2APIC ID limited to KVM_MAX_VCPU_IDS.
> + */
> + physical_id = logid_index;
> + }
> +
> + avic_kick_vcpu_by_physical_id(kvm, physical_id, icrl);
> +}

These two functions are a very good cleanup IMHO.

> +
> +static bool is_optimized_logical_map_enabled(struct kvm *kvm)
> +{
> + struct kvm_apic_map *map;
> + bool enabled;
> +
> + rcu_read_lock();
> + map = rcu_dereference(kvm->arch.apic_map);
> + enabled = map && map->logical_mode != KVM_APIC_MODE_MAP_DISABLED;
> + rcu_read_unlock();
> + return enabled;
> +}

This function doesn't belong to avic, it should be in common KVM code.


> +
> /*
> * A fast-path version of avic_kick_target_vcpus(), which attempts to match
> * destination APIC ID to vCPU without looping through all vCPUs.
> @@ -346,11 +402,10 @@ static void avic_kick_vcpu(struct kvm_vcpu *vcpu, u32 icrl)
> static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source,
> u32 icrl, u32 icrh, u32 index)
> {
> - u32 l1_physical_id, dest;
> - struct kvm_vcpu *target_vcpu;
> int dest_mode = icrl & APIC_DEST_MASK;
> int shorthand = icrl & APIC_SHORT_MASK;
> struct kvm_svm *kvm_svm = to_kvm_svm(kvm);
> + u32 dest;
>
> if (shorthand != APIC_DEST_NOSHORT)
> return -EINVAL;
> @@ -367,14 +422,14 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> if (!apic_x2apic_mode(source) && dest == APIC_BROADCAST)
> return -EINVAL;
>
> - l1_physical_id = dest;
> -
> - if (WARN_ON_ONCE(l1_physical_id != index))
> + if (WARN_ON_ONCE(dest != index))
> return -EINVAL;
>
> + avic_kick_vcpu_by_physical_id(kvm, dest, icrl);
> } else {
> - u32 bitmap, cluster;
> - int logid_index;
> + u32 *avic_logical_id_table;
> + unsigned long bitmap, i;
> + u32 cluster;
>
> if (apic_x2apic_mode(source)) {
> /* 16 bit dest mask, 16 bit cluster id */
> @@ -394,50 +449,27 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> if (unlikely(!bitmap))
> return 0;
>
> - if (!is_power_of_2(bitmap))
> - /* multiple logical destinations, use slow path */
> + /*
> + * Use the slow path if more than one bit is set in the bitmap
> + * and KVM's optimized logical map is disabled to avoid kicking
> + * a vCPU multiple times. If the optimized map is disabled, a
> + * vCPU _may_ have multiple bits set in its logical ID, i.e.
> + * may have multiple entries in the logical table.
> + */
> + if (!is_power_of_2(bitmap) &&
> + !is_optimized_logical_map_enabled(kvm))
> return -EINVAL;


I hate to say it but there is another issue here, which I know about for a while
but haven't gotten yet to fix.

The issue is that AVIC's logical to physical map can't cover all the corner cases
that you discovered - it only supports the sane subset: for each cluster, and for each bit
in the mask, it has a physical apic id - so things like logical ids with multiple bits,
having same logical id for multiple vcpus and so on can't work.

In this case we need to either inhibit AVIC (I support this 100%), or clear
its logical ID map, so all logicical IPIs VM exit, and then they can be emulated.

I haven't studied it formally but the code which rebuilds the AVIC's logical ID map
starts at 'avic_handle_ldr_update'.


Besides that this patch makes sense, and it explains why you removed the logic which
was incorrectly checking for having a single bit in the bitmap, but I still
prefer to revert the patch as I explained there.

Best regards,
Maxim Levitsky

>
> - logid_index = cluster + __ffs(bitmap);
> -
> - if (!apic_x2apic_mode(source)) {
> - u32 *avic_logical_id_table =
> - page_address(kvm_svm->avic_logical_id_table_page);
> -
> - u32 logid_entry = avic_logical_id_table[logid_index];
> -
> - if (WARN_ON_ONCE(index != logid_index))
> - return -EINVAL;
> -
> - /* Nothing to do if the logical destination is invalid. */
> - if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
> - return 0;
> -
> - l1_physical_id = logid_entry &
> - AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> - } else {
> - /*
> - * For x2APIC, the logical APIC ID is a read-only value
> - * that is derived from the x2APIC ID, thus the x2APIC
> - * ID can be found by reversing the calculation (done
> - * above). Note, bits 31:20 of the x2APIC ID are not
> - * propagated to the logical ID, but KVM limits the
> - * x2APIC ID limited to KVM_MAX_VCPU_IDS.
> - */
> - l1_physical_id = logid_index;
> - }
> + if (apic_x2apic_mode(source))
> + avic_logical_id_table = NULL;
> + else
> + avic_logical_id_table = page_address(kvm_svm->avic_logical_id_table_page);
> +
> + for_each_set_bit(i, &bitmap, 16)
> + avic_kick_vcpu_by_logical_id(kvm, avic_logical_id_table,
> + cluster + i, icrl);
> }
>
> - /*
> - * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
> - * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
> - * vCPU doesn't exist.
> - */
> - target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
> - if (unlikely(!target_vcpu))
> - return 0;
> -
> - avic_kick_vcpu(target_vcpu, icrl);
> return 0;
> }
>


2022-08-31 15:28:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 11/19] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > @@ -455,13 +461,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> > */
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> > - dest, icrl & APIC_DEST_MASK)) {
> > - vcpu->arch.apic->irr_pending = true;
> > - svm_complete_interrupt_delivery(vcpu,
> > - icrl & APIC_MODE_MASK,
> > - icrl & APIC_INT_LEVELTRIG,
> > - icrl & APIC_VECTOR_MASK);
> > - }
> > + dest, icrl & APIC_DEST_MASK))
> > + avic_kick_vcpu(vcpu, icrl);
> > }
> > }
> >
>
> I don't know what I think about this, sometimes *minor* code duplication
> might actually be a good thing, as it is easier to read the code, but I don't
> have much against this as well.
>
> I am not sure if before or after this code is more readable.

I don't have a strong opinion either. I think I prefer having the helper, but
have no objection to leaving things as is. Originally I was thinking there was
going to be a third call site, but that didn't happen.

2022-08-31 16:47:11

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 02/19] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > @@ -1118,6 +1107,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> > avic_deactivate_vmcb(svm);
> > }
> > vmcb_mark_dirty(vmcb, VMCB_AVIC);
> > +}
> > +
> > +void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> > +{
> > + bool activated = kvm_vcpu_apicv_active(vcpu);
> > +
> > + if (!enable_apicv)
> > + return;
> > +
> > + avic_set_virtual_apic_mode(vcpu);
>
> This call is misleading - this will usually be called
> when avic mode didn't change - I think we need a better name for
> avic_set_virtual_apic_mode.

I don't disagree, but I'm having trouble coming up with a succinct alternative.
The helper primarily configures the VMCB, but the call to
avic_apicv_post_state_restore() makes avic_refresh_vmcb_controls() undesirable.

Maybe avic_refresh_virtual_apic_mode()?

2022-08-31 16:49:34

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 07/19] KVM: SVM: Drop buggy and redundant AVIC "single logical dest" check

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > Use the already-calculated-and-sanity-checked destination bitmap when
> > processing a fast AVIC kick in logical mode, and drop the logical path's
> > flawed logic. The intent of the check is to ensure the bitmap is a power
> > of two, whereas "icrh != (1 << avic)" effectively checks that the bitmap
> > is a power of two _and_ the target cluster is '0'.
> >
> > Note, the flawed check isn't a functional issue, it simply means that KVM
> > will go down the slow path if the target cluster is non-zero.
> >
> > Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/svm/avic.c | 10 +---------
> > 1 file changed, 1 insertion(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index 3c333cd2e752..14f567550a1e 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -411,15 +411,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > * Instead, calculate physical ID from logical ID in ICRH.
> > */
> > int cluster = (icrh & 0xffff0000) >> 16;
> > - int apic = ffs(icrh & 0xffff) - 1;
> > -
> > - /*
> > - * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
> > - * contains anything but a single bit, we cannot use the
> > - * fast path, because it is limited to a single vCPU.
> > - */
> > - if (apic < 0 || icrh != (1 << apic))
> > - return -EINVAL;
> > + int apic = ffs(bitmap) - 1;
> >
> > l1_physical_id = (cluster << 4) + apic;
> > }
>
> Oh, I didn't notice this bug. However isn't removing the check is wrong as well?
>
> What if we do have multiple bits set in the bitmap? After you remove this code,
> we will set IPI only to APIC which matches the 1st bit, no?

The common code (out of sight here) already ensures the bitmap has exactly one bit set:

if (unlikely(!bitmap))
/* guest bug: nobody to send the logical interrupt to */
return 0;

if (!is_power_of_2(bitmap))
/* multiple logical destinations, use slow path */
return -EINVAL;

logid_index = cluster + __ffs(bitmap);

2022-08-31 17:00:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 14/19] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> > - new->phys_map[xapic_id] = apic;
> > + if (kvm->arch.x2apic_format) {
> > + /* See also kvm_apic_match_physical_addr(). */
> > + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> > + x2apic_id <= new->max_apic_id)
> > + new->phys_map[x2apic_id] = apic;
> > +
> > + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> > + new->phys_map[xapic_id] = apic;
> > + } else {
> > + /*
> > + * Disable the optimized map if the physical APIC ID is
> > + * already mapped, i.e. is aliased to multiple vCPUs.
> > + * The optimized map requires a strict 1:1 mapping
> > + * between IDs and vCPUs.
> > + */
> > + if (apic_x2apic_mode(apic))
> > + physical_id = x2apic_id;
> > + else
> > + physical_id = xapic_id;
> > +
> > + if (new->phys_map[physical_id]) {
> > + kvfree(new);
> > + new = NULL;
> > + goto out;
> Why not to use the same KVM_APIC_MODE_XAPIC_FLAT | KVM_APIC_MODE_XAPIC_CLUSTER
> hack here?

The map's "mode" only covers logical mode (the cleanup patch renames "mode" to
"logical_mode" to make this more clear). There is no equivalent for dealing with
the physical IDs. Alternatively, a flag to say "physical map is disabled" could
be added, but KVM already has to cleanly handle a NULL map and in all likelihood
the logical map is also going to be disabled anyways.

Not to mention that APIC performance is unlikely to be a priority for any guest
that triggers this code :-)

2022-08-31 17:11:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 10/19] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
> > vCPU ID, i.e. that there's no need to check for a destination match in
> > the AVIC kick fast path.
> >
> > Opportunistically tweak comments to remove "guest bug", as that suggests
> > KVM is punting on error handling, which is not the case. Targeting a
> > non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
> > or not it's a guest bug is irrelevant. Such behavior is architecturally
> > legal and thus needs to faithfully emulated by KVM (and it is).
>
> I don't want to pick a fight,

Please don't hesitate to push back, I would much rather discuss points of contention
than have an ongoing, silent battle through patches.

> but personally these things *are* guest bugs / improper usage of APIC, and I
> don't think it is wrong to document them as such.

If the guest is intentionally exercising an edge case, e.g. KUT or selftests, then
from the guest's perspective its code/behavior isn't buggy.

I completely agree that abusing/aliasing logical IDs is improper usage and arguably
out of spec, but the scenarios here are very much in spec, e.g. a bitmap of '0'
isn't expressly forbidden and both Intel and AMD specs very clearly state that
APICs discard interrupt messages if they are not the destination.

But that's somewhat beside the point, as I have no objection to documenting scenarios
that are out-of-spec or undefined. What I object to is documenting such scenarios as
"guest bugs". If the CPU/APIC/whatever behavior is undefined, then document it
as undefined. Saying "guest bug" doesn't help future readers understand what is
architecturally supposed to happen, whereas a comment like

/*
* Logical IDs are architecturally "required" to be unique, i.e. this is
* technically undefined behavior. Disable the optimized logical map so
* that KVM is consistent with itself, as the non-optimized matching
* logic with accept interrupts on all CPUs with the logical ID.
*/

anchors KVM's behavior to the specs and explains why KVM does XYZ in response to
undefined behavior.

I feel very strongly about "guest bug" because KVM has a history of disregarding
architectural correctness and using a "good enough" approach. Simply stating
"guest bug" makes it all the more difficult to differentiate between KVM handling
architecturally undefined behavior, versus KVM deviating from the spec because
someone decided that KVM's partial emulation/virtualziation was "good enough".

2022-08-31 17:20:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 06/19] KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not ICHR[31:16]

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 09:09 +0300, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > When attempting a fast kick for x2AVIC, get the destination bitmap from
> > > ICR[15:0], not ICHR[31:16]. The upper 16 bits contain the cluster, the
> > > lower 16 bits hold the bitmap.
> > >
> > > Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
> > > Cc: Maxim Levitsky <[email protected]>
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/avic.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 3ace0f2f52f0..3c333cd2e752 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -368,7 +368,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > >
> > > if (apic_x2apic_mode(source)) {
> > > /* 16 bit dest mask, 16 bit cluster id */
> > > - bitmap = dest & 0xFFFF0000;
> > > + bitmap = dest & 0xFFFF;
> > > cluster = (dest >> 16) << 4;
> > > } else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
> > > /* 8 bit dest mask*/
> >
> > I swear I have seen a patch from Suravee Suthikulpanit fixing this my mistake, I don't know why it was not
> > accepted upstream.
>
> This is the patch, which I guess got forgotten.
>
> https://www.spinics.net/lists/kernel/msg4417427.html

Ah, we just missed it, doubt there's anything more than that to the story.

> Since it is literaly the same patch, you can just add credit to Suravee Suthikulpanit.
>
> So with the credit added:
>
> Reviewed-by: Maxim Levitsky <[email protected]>

I'll grab Suravee's patch and added your review. Thanks!

2022-08-31 17:20:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 8209caffe3ab..3b6ef36b3963 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
> >
> > static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> > u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> > - switch (map->mode) {
> > + switch (map->logical_mode) {
> > + case KVM_APIC_MODE_SW_DISABLED:
> > + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> > + *cluster = map->xapic_flat_map;
> > + *mask = 0;
> > + return true;
> Could you explain why this is needed? I probably missed something.

If all vCPUs leave their APIC software disabled, or leave LDR=0, then the overall
mode will be KVM_APIC_MODE_SW_DISABLED. In this case, the effective "mask" is '0'
because there are no targets. And this returns %true because there are no targets,
i.e. there's no need to go down the slow path after kvm_apic_map_get_dest_lapic().

> > @@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> > {
> > if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> > if ((irq->dest_id == APIC_BROADCAST &&
> > - map->mode != KVM_APIC_MODE_X2APIC))
> > + map->logical_mode != KVM_APIC_MODE_X2APIC))
> > return true;
> > if (irq->dest_id == X2APIC_BROADCAST)
> > return true;
>
> To be honest I would put that patch first, and then do all the other patches,
> this way you would avoid all of the hacks they do and removed here.

I did it this way so that I could test this patch for correctness. Without the
bug fixes in place it's not really possible to verify this patch is 100% correct.

I completely agree that it would be a lot easier to read/understand/review if
this came first, but I'd rather not sacrifice the ability to easily test this patch.

2022-08-31 17:21:39

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 15/19] KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > Explicitly skip the optimized map setup if the vCPU's LDR is '0', i.e. if
> > the vCPU will never response to logical mode interrupts. KVM already
> > skips setup in this case, but relies on kvm_apic_map_get_logical_dest()
> > to generate mask==0. KVM still needs the mask=0 check as a non-zero LDR
> > can yield mask==0 depending on the mode, but explicitly handling the LDR
> > will make it simpler to clean up the logical mode tracking in the future.
>
> If I am not mistaken, the commit description is a bit misleading - in this
> case we just don't add the vCPU to the map since it is unreachable, but we do
> continue creating the map.

I'll reword. I intended that to mean "skip the optimized setup for the vCPU", but
even that is unnecessarily confusing.

2022-08-31 17:52:21

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 10/19] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

On Wed, 2022-08-31 at 16:16 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
> > > vCPU ID, i.e. that there's no need to check for a destination match in
> > > the AVIC kick fast path.
> > >
> > > Opportunistically tweak comments to remove "guest bug", as that suggests
> > > KVM is punting on error handling, which is not the case. Targeting a
> > > non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
> > > or not it's a guest bug is irrelevant. Such behavior is architecturally
> > > legal and thus needs to faithfully emulated by KVM (and it is).
> >
> > I don't want to pick a fight,
>
> Please don't hesitate to push back, I would much rather discuss points of contention
> than have an ongoing, silent battle through patches.
>
> > but personally these things *are* guest bugs / improper usage of APIC, and I
> > don't think it is wrong to document them as such.
>
> If the guest is intentionally exercising an edge case, e.g. KUT or selftests, then
> from the guest's perspective its code/behavior isn't buggy.
>
> I completely agree that abusing/aliasing logical IDs is improper usage and arguably
> out of spec, but the scenarios here are very much in spec, e.g. a bitmap of '0'
> isn't expressly forbidden and both Intel and AMD specs very clearly state that
> APICs discard interrupt messages if they are not the destination.
>
> But that's somewhat beside the point, as I have no objection to documenting scenarios
> that are out-of-spec or undefined. What I object to is documenting such scenarios as
> "guest bugs". If the CPU/APIC/whatever behavior is undefined, then document it
> as undefined. Saying "guest bug" doesn't help future readers understand what is
> architecturally supposed to happen, whereas a comment like
>
> /*
> * Logical IDs are architecturally "required" to be unique, i.e. this is
> * technically undefined behavior. Disable the optimized logical map so
> * that KVM is consistent with itself, as the non-optimized matching
> * logic with accept interrupts on all CPUs with the logical ID.
> */
>
> anchors KVM's behavior to the specs and explains why KVM does XYZ in response to
> undefined behavior.
>
> I feel very strongly about "guest bug" because KVM has a history of disregarding
> architectural correctness and using a "good enough" approach. Simply stating
> "guest bug" makes it all the more difficult to differentiate between KVM handling
> architecturally undefined behavior, versus KVM deviating from the spec because
> someone decided that KVM's partial emulation/virtualziation was "good enough".
>

All right, I agree with you.

Best regards,
Maxim Levitsky

2022-08-31 18:07:22

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, 2022-08-31 at 16:56 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 8209caffe3ab..3b6ef36b3963 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
> > >
> > > static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> > > u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> > > - switch (map->mode) {
> > > + switch (map->logical_mode) {
> > > + case KVM_APIC_MODE_SW_DISABLED:
> > > + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> > > + *cluster = map->xapic_flat_map;
> > > + *mask = 0;
> > > + return true;
> > Could you explain why this is needed? I probably missed something.
>
> If all vCPUs leave their APIC software disabled, or leave LDR=0, then the overall
> mode will be KVM_APIC_MODE_SW_DISABLED. In this case, the effective "mask" is '0'
> because there are no targets. And this returns %true because there are no targets,
> i.e. there's no need to go down the slow path after kvm_apic_map_get_dest_lapic().

I guess this case doesn't need optimization (although maybe some OSes do leave all LDRs to 0,
if they don't use logical addressing, don't know)

Anyway thanks, that makes sense.

>
> > > @@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> > > {
> > > if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> > > if ((irq->dest_id == APIC_BROADCAST &&
> > > - map->mode != KVM_APIC_MODE_X2APIC))
> > > + map->logical_mode != KVM_APIC_MODE_X2APIC))
> > > return true;
> > > if (irq->dest_id == X2APIC_BROADCAST)
> > > return true;
> >
> > To be honest I would put that patch first, and then do all the other patches,
> > this way you would avoid all of the hacks they do and removed here.
>
> I did it this way so that I could test this patch for correctness. Without the
> bug fixes in place it's not really possible to verify this patch is 100% correct.
>
> I completely agree that it would be a lot easier to read/understand/review if
> this came first, but I'd rather not sacrifice the ability to easily test this patch.
>

I am not 100% sure about this, but I won't argue about it, let it be.

Best regards,
Maxim Levitsky

2022-08-31 18:27:26

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 14/19] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

On Wed, 2022-08-31 at 16:41 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > > - if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> > > - new->phys_map[xapic_id] = apic;
> > > + if (kvm->arch.x2apic_format) {
> > > + /* See also kvm_apic_match_physical_addr(). */
> > > + if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
> > > + x2apic_id <= new->max_apic_id)
> > > + new->phys_map[x2apic_id] = apic;
> > > +
> > > + if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
> > > + new->phys_map[xapic_id] = apic;
> > > + } else {
> > > + /*
> > > + * Disable the optimized map if the physical APIC ID is
> > > + * already mapped, i.e. is aliased to multiple vCPUs.
> > > + * The optimized map requires a strict 1:1 mapping
> > > + * between IDs and vCPUs.
> > > + */
> > > + if (apic_x2apic_mode(apic))
> > > + physical_id = x2apic_id;
> > > + else
> > > + physical_id = xapic_id;
> > > +
> > > + if (new->phys_map[physical_id]) {
> > > + kvfree(new);
> > > + new = NULL;
> > > + goto out;
> > Why not to use the same KVM_APIC_MODE_XAPIC_FLAT | KVM_APIC_MODE_XAPIC_CLUSTER
> > hack here?
>
> The map's "mode" only covers logical mode (the cleanup patch renames "mode" to
> "logical_mode" to make this more clear). There is no equivalent for dealing with
> the physical IDs. Alternatively, a flag to say "physical map is disabled" could
> be added, but KVM already has to cleanly handle a NULL map and in all likelihood
> the logical map is also going to be disabled anyways.
>
> Not to mention that APIC performance is unlikely to be a priority for any guest
> that triggers this code :-)
>

Thanks for the explanation, that makes sense!

Best regards,
Maxim Levitsky

2022-08-31 18:28:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 11/19] KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU

On Wed, 2022-08-31 at 15:08 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > @@ -455,13 +461,8 @@ static void avic_kick_target_vcpus(struct kvm *kvm, struct kvm_lapic *source,
> > > */
> > > kvm_for_each_vcpu(i, vcpu, kvm) {
> > > if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> > > - dest, icrl & APIC_DEST_MASK)) {
> > > - vcpu->arch.apic->irr_pending = true;
> > > - svm_complete_interrupt_delivery(vcpu,
> > > - icrl & APIC_MODE_MASK,
> > > - icrl & APIC_INT_LEVELTRIG,
> > > - icrl & APIC_VECTOR_MASK);
> > > - }
> > > + dest, icrl & APIC_DEST_MASK))
> > > + avic_kick_vcpu(vcpu, icrl);
> > > }
> > > }
> > >
> >
> > I don't know what I think about this, sometimes *minor* code duplication
> > might actually be a good thing, as it is easier to read the code, but I don't
> > have much against this as well.
> >
> > I am not sure if before or after this code is more readable.
>
> I don't have a strong opinion either. I think I prefer having the helper, but
> have no objection to leaving things as is. Originally I was thinking there was
> going to be a third call site, but that didn't happen.
>

Yep - when something is duplicated 3 times, it is really rare to not want to have a helper,
Anyway I don't have a strong opinion about this either.

I mostly was unsure about the fact that helper receives icrl and not icrh, kind of wierd,
but anyway let it be.

Best regards,
Maxim Levitsky

2022-08-31 18:34:55

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 17/19] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > +static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
> > + u32 logid_index, u32 icrl)
> > +{
> > + u32 physical_id;
> > +
> > + if (!avic_logical_id_table) {
> ^ Typo, the '!' shoudn't be there.

Ouch. I suspect the tests pass because this just ends up routing events through
the slow path. I try to concoct a testcase to expose this bug.

> > +static bool is_optimized_logical_map_enabled(struct kvm *kvm)
> > +{
> > + struct kvm_apic_map *map;
> > + bool enabled;
> > +
> > + rcu_read_lock();
> > + map = rcu_dereference(kvm->arch.apic_map);
> > + enabled = map && map->logical_mode != KVM_APIC_MODE_MAP_DISABLED;
> > + rcu_read_unlock();
> > + return enabled;
> > +}
>
> This function doesn't belong to avic, it should be in common KVM code.

I'll move it. I'm not expecting any additional users, but I agree it belongs
elsewhere. Actually, might be a moot point (see below).

> > @@ -394,50 +449,27 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > if (unlikely(!bitmap))
> > return 0;
> >
> > - if (!is_power_of_2(bitmap))
> > - /* multiple logical destinations, use slow path */
> > + /*
> > + * Use the slow path if more than one bit is set in the bitmap
> > + * and KVM's optimized logical map is disabled to avoid kicking
> > + * a vCPU multiple times. If the optimized map is disabled, a
> > + * vCPU _may_ have multiple bits set in its logical ID, i.e.
> > + * may have multiple entries in the logical table.
> > + */
> > + if (!is_power_of_2(bitmap) &&
> > + !is_optimized_logical_map_enabled(kvm))
> > return -EINVAL;
>
>
> I hate to say it but there is another issue here, which I know about for a while
> but haven't gotten yet to fix.
>
> The issue is that AVIC's logical to physical map can't cover all the corner cases
> that you discovered - it only supports the sane subset: for each cluster, and for each bit
> in the mask, it has a physical apic id - so things like logical ids with multiple bits,
> having same logical id for multiple vcpus and so on can't work.
>
> In this case we need to either inhibit AVIC (I support this 100%),

I like the idea of inhibiting.

> or clear its logical ID map, so all logicical IPIs VM exit, and then they
> can be emulated.
>
> I haven't studied it formally but the code which rebuilds the AVIC's logical ID map
> starts at 'avic_handle_ldr_update'.

I suspected there are issues here, but the new tests passed (somewhat surprisingly)
so I stopped trying to decipher the AVIC LDR handling.

Eww. And the VM-Exit trap logic is broken too. If the guest updates and disables
its LDR, SVM returns immediately and doesn't call into common APIC code, i.e. doesn't
recalc the optimized map. E.g. if the guest clears its LDR, the optimized map will
be left as is and the vCPU will receive interrupts using its old LDR.

case APIC_LDR:
if (avic_handle_ldr_update(vcpu))
return 0;
break;

Rather than handling this purely in AVIC code, what if we a key off of
the optimized map being enabled? E.g. drop the return from avic_handle_ldr_update()
and in the kvm_recalculate_apic_map() do:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 3b6ef36b3963..6e188010b614 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -364,6 +364,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
cluster[ldr] = apic;
}
out:
+ if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+ kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_MAP_DISABLED);
+ else
+ kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_MAP_DISABLED);
+
old = rcu_dereference_protected(kvm->arch.apic_map,
lockdep_is_held(&kvm->arch.apic_map_lock));
rcu_assign_pointer(kvm->arch.apic_map, new);

2022-08-31 18:35:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 06/19] KVM: SVM: Get x2APIC logical dest bitmap from ICRH[15:0], not ICHR[31:16]

On Wed, 2022-08-31 at 16:35 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 09:09 +0300, Maxim Levitsky wrote:
> > > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > > When attempting a fast kick for x2AVIC, get the destination bitmap from
> > > > ICR[15:0], not ICHR[31:16]. The upper 16 bits contain the cluster, the
> > > > lower 16 bits hold the bitmap.
> > > >
> > > > Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
> > > > Cc: Maxim Levitsky <[email protected]>
> > > > Signed-off-by: Sean Christopherson <[email protected]>
> > > > ---
> > > > arch/x86/kvm/svm/avic.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > > index 3ace0f2f52f0..3c333cd2e752 100644
> > > > --- a/arch/x86/kvm/svm/avic.c
> > > > +++ b/arch/x86/kvm/svm/avic.c
> > > > @@ -368,7 +368,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > > >
> > > > if (apic_x2apic_mode(source)) {
> > > > /* 16 bit dest mask, 16 bit cluster id */
> > > > - bitmap = dest & 0xFFFF0000;
> > > > + bitmap = dest & 0xFFFF;
> > > > cluster = (dest >> 16) << 4;
> > > > } else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
> > > > /* 8 bit dest mask*/
> > >
> > > I swear I have seen a patch from Suravee Suthikulpanit fixing this my mistake, I don't know why it was not
> > > accepted upstream.
> >
> > This is the patch, which I guess got forgotten.
> >
> > https://www.spinics.net/lists/kernel/msg4417427.html
>
> Ah, we just missed it, doubt there's anything more than that to the story.

100% sure about it.

BTW there another minor bug I need to fix which was pointed to me by Suravee Suthikulpanit,
just so that I don't forget about it:

My code which inhibits APIC when APIC_ID != vcpu_id has a bug in regard that
when we have more that 255 vCPUs, this condition becames always true, but it is
to some extent wrong to inhibit the AVIC always in this case - there is a use case
when the guest uses only 255 vCPUs, then AVIC will work.

The relaxed condition for inhibit should be that APIC_ID == (vcpu_id & 0xFF), and it AVIC
is actually used on vCPU > 255, then it will be inhibited ( I need to check if the code
for this exists).

Best regards,
Maxim Levitsky



>
> > Since it is literaly the same patch, you can just add credit to Suravee Suthikulpanit.
> >
> > So with the credit added:
> >
> > Reviewed-by: Maxim Levitsky <[email protected]>
>
> I'll grab Suravee's patch and added your review. Thanks!
>


2022-08-31 19:01:35

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> Track all possibilities for the optimized APIC map's logical modes
> instead of overloading the pseudo-bitmap and treating any "unknown" value
> as "invalid".
>
> As documented by the now-stale comment above the mode values, the values
> did have meaning when the optimized map was originally added. That
> dependent logical was removed by commit e45115b62f9a ("KVM: x86: use
> physical LAPIC array for logical x2APIC"), but the obfuscated behavior
> and its comment were left behind.
>
> Opportunistically rename "mode" to "logical_mode", partly to make it
> clear that the "disabled" case applies only to the logical map, but also
> to prove that there is no lurking code that expects "mode" to be a bitmap.
>
> Functionally, this is a glorified nop.
>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 21 ++++++++++--------
> arch/x86/kvm/lapic.c | 38 ++++++++++++++++++++++++---------
> 2 files changed, 40 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1f51411f3112..0184e64ab555 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -955,19 +955,22 @@ struct kvm_arch_memory_slot {
> };
>
> /*
> - * We use as the mode the number of bits allocated in the LDR for the
> - * logical processor ID. It happens that these are all powers of two.
> - * This makes it is very easy to detect cases where the APICs are
> - * configured for multiple modes; in that case, we cannot use the map and
> - * hence cannot use kvm_irq_delivery_to_apic_fast either.
> + * Track the mode of the optimized logical map, as the rules for decoding the
> + * destination vary per mode. Enabling the optimized logical map requires all
> + * software-enabled local APIs to be in the same mode, each addressable APIC to
> + * be mapped to only one MDA, and each MDA to map to at most one APIC.
> */
> -#define KVM_APIC_MODE_XAPIC_CLUSTER 4
> -#define KVM_APIC_MODE_XAPIC_FLAT 8
> -#define KVM_APIC_MODE_X2APIC 16
> +enum kvm_apic_logical_mode {
> + KVM_APIC_MODE_SW_DISABLED,
> + KVM_APIC_MODE_XAPIC_CLUSTER,
> + KVM_APIC_MODE_XAPIC_FLAT,
> + KVM_APIC_MODE_X2APIC,
> + KVM_APIC_MODE_MAP_DISABLED,
> +};
>
> struct kvm_apic_map {
> struct rcu_head rcu;
> - u8 mode;
> + enum kvm_apic_logical_mode logical_mode;
> u32 max_apic_id;
> union {
> struct kvm_lapic *xapic_flat_map[8];
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 8209caffe3ab..3b6ef36b3963 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
>
> static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> - switch (map->mode) {
> + switch (map->logical_mode) {
> + case KVM_APIC_MODE_SW_DISABLED:
> + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> + *cluster = map->xapic_flat_map;
> + *mask = 0;
> + return true;
> case KVM_APIC_MODE_X2APIC: {
> u32 offset = (dest_id >> 16) * 16;
> u32 max_apic_id = map->max_apic_id;
> @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> *cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
> *mask = dest_id & 0xf;
> return true;
> + case KVM_APIC_MODE_MAP_DISABLED:
> + return false;
> default:
> - /* Not optimized. */
> + WARN_ON_ONCE(1);

BTW unless I am mistaken, this warning is guest triggerable, and thus as you say when
'panic_on_warn=1', this will panic the host kernel.


Best regards,
Maxim Levitsky


> return false;
> }
> }
> @@ -256,10 +263,12 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> goto out;
>
> new->max_apic_id = max_id;
> + new->logical_mode = KVM_APIC_MODE_SW_DISABLED;
>
> kvm_for_each_vcpu(i, vcpu, kvm) {
> struct kvm_lapic *apic = vcpu->arch.apic;
> struct kvm_lapic **cluster;
> + enum kvm_apic_logical_mode logical_mode;
> u32 x2apic_id, physical_id;
> u16 mask;
> u32 ldr;
> @@ -314,7 +323,8 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> new->phys_map[physical_id] = apic;
> }
>
> - if (!kvm_apic_sw_enabled(apic))
> + if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
> + !kvm_apic_sw_enabled(apic))
> continue;
>
> ldr = kvm_lapic_get_reg(apic, APIC_LDR);
> @@ -322,25 +332,33 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> continue;
>
> if (apic_x2apic_mode(apic)) {
> - new->mode |= KVM_APIC_MODE_X2APIC;
> + logical_mode = KVM_APIC_MODE_X2APIC;
> } else {
> ldr = GET_APIC_LOGICAL_ID(ldr);
> if (kvm_lapic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT)
> - new->mode |= KVM_APIC_MODE_XAPIC_FLAT;
> + logical_mode = KVM_APIC_MODE_XAPIC_FLAT;
> else
> - new->mode |= KVM_APIC_MODE_XAPIC_CLUSTER;
> + logical_mode = KVM_APIC_MODE_XAPIC_CLUSTER;
> }
> + if (new->logical_mode != KVM_APIC_MODE_SW_DISABLED &&
> + new->logical_mode != logical_mode) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> + continue;
> + }
> + new->logical_mode = logical_mode;
>
> - if (!kvm_apic_map_get_logical_dest(new, ldr, &cluster, &mask))
> + if (WARN_ON_ONCE(!kvm_apic_map_get_logical_dest(new, ldr,
> + &cluster, &mask))) {
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> continue;
> + }
>
> if (!mask)
> continue;
>
> ldr = ffs(mask) - 1;
> if (!is_power_of_2(mask) || cluster[ldr]) {
> - new->mode = KVM_APIC_MODE_XAPIC_FLAT |
> - KVM_APIC_MODE_XAPIC_CLUSTER;
> + new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
> continue;
> }
> cluster[ldr] = apic;
> @@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> {
> if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> if ((irq->dest_id == APIC_BROADCAST &&
> - map->mode != KVM_APIC_MODE_X2APIC))
> + map->logical_mode != KVM_APIC_MODE_X2APIC))
> return true;
> if (irq->dest_id == X2APIC_BROADCAST)
> return true;


2022-08-31 19:07:22

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 17/19] KVM: SVM: Handle multiple logical targets in AVIC kick fastpath

On Wed, 2022-08-31 at 18:19 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > > +static void avic_kick_vcpu_by_logical_id(struct kvm *kvm, u32 *avic_logical_id_table,
> > > + u32 logid_index, u32 icrl)
> > > +{
> > > + u32 physical_id;
> > > +
> > > + if (!avic_logical_id_table) {
> > ^ Typo, the '!' shoudn't be there.
>
> Ouch. I suspect the tests pass because this just ends up routing events through
> the slow path. I try to concoct a testcase to expose this bug.
>
> > > +static bool is_optimized_logical_map_enabled(struct kvm *kvm)
> > > +{
> > > + struct kvm_apic_map *map;
> > > + bool enabled;
> > > +
> > > + rcu_read_lock();
> > > + map = rcu_dereference(kvm->arch.apic_map);
> > > + enabled = map && map->logical_mode != KVM_APIC_MODE_MAP_DISABLED;
> > > + rcu_read_unlock();
> > > + return enabled;
> > > +}
> >
> > This function doesn't belong to avic, it should be in common KVM code.
>
> I'll move it. I'm not expecting any additional users, but I agree it belongs
> elsewhere. Actually, might be a moot point (see below).
>
> > > @@ -394,50 +449,27 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > > if (unlikely(!bitmap))
> > > return 0;
> > >
> > > - if (!is_power_of_2(bitmap))
> > > - /* multiple logical destinations, use slow path */
> > > + /*
> > > + * Use the slow path if more than one bit is set in the bitmap
> > > + * and KVM's optimized logical map is disabled to avoid kicking
> > > + * a vCPU multiple times. If the optimized map is disabled, a
> > > + * vCPU _may_ have multiple bits set in its logical ID, i.e.
> > > + * may have multiple entries in the logical table.
> > > + */
> > > + if (!is_power_of_2(bitmap) &&
> > > + !is_optimized_logical_map_enabled(kvm))
> > > return -EINVAL;
> >
> > I hate to say it but there is another issue here, which I know about for a while
> > but haven't gotten yet to fix.
> >
> > The issue is that AVIC's logical to physical map can't cover all the corner cases
> > that you discovered - it only supports the sane subset: for each cluster, and for each bit
> > in the mask, it has a physical apic id - so things like logical ids with multiple bits,
> > having same logical id for multiple vcpus and so on can't work.
> >
> > In this case we need to either inhibit AVIC (I support this 100%),
>
> I like the idea of inhibiting.
>
> > or clear its logical ID map, so all logicical IPIs VM exit, and then they
> > can be emulated.
> >
> > I haven't studied it formally but the code which rebuilds the AVIC's logical ID map
> > starts at 'avic_handle_ldr_update'.
>
> I suspected there are issues here, but the new tests passed (somewhat surprisingly)
> so I stopped trying to decipher the AVIC LDR handling.
>
> Eww. And the VM-Exit trap logic is broken too. If the guest updates and disables
> its LDR, SVM returns immediately and doesn't call into common APIC code, i.e. doesn't
> recalc the optimized map. E.g. if the guest clears its LDR, the optimized map will
> be left as is and the vCPU will receive interrupts using its old LDR.
>
> case APIC_LDR:
> if (avic_handle_ldr_update(vcpu))
> return 0;
> break;
>
> Rather than handling this purely in AVIC code, what if we a key off of
> the optimized map being enabled? E.g. drop the return from avic_handle_ldr_update()
> and in the kvm_recalculate_apic_map() do:
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 3b6ef36b3963..6e188010b614 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -364,6 +364,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
> cluster[ldr] = apic;
> }
> out:
> + if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
> + kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_MAP_DISABLED);
> + else
> + kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_MAP_DISABLED);
> +

This looks very good, it will even work on APICv, because the 'check_apicv_inhibit_reasons'
will not return true for this new reason (APICv IPIv I think doesn't deal with logical destination at all);

Best regards,
Maxim Levitsky

> old = rcu_dereference_protected(kvm->arch.apic_map,
> lockdep_is_held(&kvm->arch.apic_map_lock));
> rcu_assign_pointer(kvm->arch.apic_map, new);
>


2022-08-31 19:09:02

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 07/19] KVM: SVM: Drop buggy and redundant AVIC "single logical dest" check

On Wed, 2022-08-31 at 16:37 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > Use the already-calculated-and-sanity-checked destination bitmap when
> > > processing a fast AVIC kick in logical mode, and drop the logical path's
> > > flawed logic. The intent of the check is to ensure the bitmap is a power
> > > of two, whereas "icrh != (1 << avic)" effectively checks that the bitmap
> > > is a power of two _and_ the target cluster is '0'.
> > >
> > > Note, the flawed check isn't a functional issue, it simply means that KVM
> > > will go down the slow path if the target cluster is non-zero.
> > >
> > > Fixes: 8c9e639da435 ("KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible")
> > > Signed-off-by: Sean Christopherson <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/avic.c | 10 +---------
> > > 1 file changed, 1 insertion(+), 9 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index 3c333cd2e752..14f567550a1e 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -411,15 +411,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
> > > * Instead, calculate physical ID from logical ID in ICRH.
> > > */
> > > int cluster = (icrh & 0xffff0000) >> 16;
> > > - int apic = ffs(icrh & 0xffff) - 1;
> > > -
> > > - /*
> > > - * If the x2APIC logical ID sub-field (i.e. icrh[15:0])
> > > - * contains anything but a single bit, we cannot use the
> > > - * fast path, because it is limited to a single vCPU.
> > > - */
> > > - if (apic < 0 || icrh != (1 << apic))
> > > - return -EINVAL;
> > > + int apic = ffs(bitmap) - 1;
> > >
> > > l1_physical_id = (cluster << 4) + apic;
> > > }
> >
> > Oh, I didn't notice this bug. However isn't removing the check is wrong as well?
> >
> > What if we do have multiple bits set in the bitmap? After you remove this code,
> > we will set IPI only to APIC which matches the 1st bit, no?
>
> The common code (out of sight here) already ensures the bitmap has exactly one bit set:
>
> if (unlikely(!bitmap))
> /* guest bug: nobody to send the logical interrupt to */
> return 0;
>
> if (!is_power_of_2(bitmap))
> /* multiple logical destinations, use slow path */
> return -EINVAL;
>
> logid_index = cluster + __ffs(bitmap);
>
Ah right, but again because the patch that added x2apic logic after I already added should not
be added. I vote again to just revert it.

Best regards,
Maxim Levitsky

2022-08-31 19:57:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:35 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 8209caffe3ab..3b6ef36b3963 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -168,7 +168,12 @@ static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
> >
> > static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> > u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
> > - switch (map->mode) {
> > + switch (map->logical_mode) {
> > + case KVM_APIC_MODE_SW_DISABLED:
> > + /* Arbitrarily use the flat map so that @cluster isn't NULL. */
> > + *cluster = map->xapic_flat_map;
> > + *mask = 0;
> > + return true;
> > case KVM_APIC_MODE_X2APIC: {
> > u32 offset = (dest_id >> 16) * 16;
> > u32 max_apic_id = map->max_apic_id;
> > @@ -193,8 +198,10 @@ static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
> > *cluster = map->xapic_cluster_map[(dest_id >> 4) & 0xf];
> > *mask = dest_id & 0xf;
> > return true;
> > + case KVM_APIC_MODE_MAP_DISABLED:
> > + return false;
> > default:
> > - /* Not optimized. */
> > + WARN_ON_ONCE(1);
>
> BTW unless I am mistaken, this warning is guest triggerable, and thus as you
> say when 'panic_on_warn=1', this will panic the host kernel.

If it's guest triggerable then it's a bug in this patch. The "default" case was
reachable with the old approach of OR-ing in bits, but the intent of this patch
is to fully enumerate all values of map->logical_mode and make the "default" case
impossible.

And I don't think it's reachable. The case statements are:

case KVM_APIC_MODE_SW_DISABLED:
case KVM_APIC_MODE_X2APIC:
case KVM_APIC_MODE_XAPIC_FLAT:
case KVM_APIC_MODE_XAPIC_CLUSTER:
case KVM_APIC_MODE_MAP_DISABLED:
default:

which covers all of the possible enum values.

enum kvm_apic_logical_mode {
KVM_APIC_MODE_SW_DISABLED,
KVM_APIC_MODE_XAPIC_CLUSTER,
KVM_APIC_MODE_XAPIC_FLAT,
KVM_APIC_MODE_X2APIC,
KVM_APIC_MODE_MAP_DISABLED,
};

The map is explicitly initialized to KVM_APIC_MODE_SW_DISABLED (to avoid relying
on KVM_APIC_MODE_SW_DISABLED==0)

new->logical_mode = KVM_APIC_MODE_SW_DISABLED;

so unless I've missed a "logical_mode |= ..." somewhere, reaching "default" should
be impossible.

2022-09-01 03:39:50

by Li RongQing

[permalink] [raw]
Subject: RE: [PATCH 05/19] KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick

> -----Original Message-----
> From: Sean Christopherson <[email protected]>
> Sent: Wednesday, August 31, 2022 8:35 AM
> To: Sean Christopherson <[email protected]>; Paolo Bonzini
> <[email protected]>
> Cc: [email protected]; [email protected]; Suravee Suthikulpanit
> <[email protected]>; Maxim Levitsky <[email protected]>;
> Li,Rongqing <[email protected]>
> Subject: [PATCH 05/19] KVM: SVM: Compute dest based on sender's x2APIC
> status for AVIC kick
>
> Compute the destination from ICRH using the sender's x2APIC status, not each
> (potential) target's x2APIC status.
>
> Fixes: c514d3a348ac ("KVM: SVM: Update avic_kick_target_vcpus to support
> 32-bit APIC ID")
> Cc: Li RongQing <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/svm/avic.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c index
> b59f8ee2671f..3ace0f2f52f0 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -441,6 +441,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm,
> struct kvm_lapic *source static void avic_kick_target_vcpus(struct kvm *kvm,
> struct kvm_lapic *source,
> u32 icrl, u32 icrh, u32 index)
> {
> + u32 dest = apic_x2apic_mode(source) ? icrh :
> +GET_XAPIC_DEST_FIELD(icrh);
> unsigned long i;
> struct kvm_vcpu *vcpu;
>
> @@ -456,13 +457,6 @@ static void avic_kick_target_vcpus(struct kvm *kvm,
> struct kvm_lapic *source,
> * since entered the guest will have processed pending IRQs at VMRUN.
> */
> kvm_for_each_vcpu(i, vcpu, kvm) {
> - u32 dest;
> -
> - if (apic_x2apic_mode(vcpu->arch.apic))
> - dest = icrh;
> - else
> - dest = GET_XAPIC_DEST_FIELD(icrh);
> -
> if (kvm_apic_match_dest(vcpu, source, icrl & APIC_SHORT_MASK,
> dest, icrl & APIC_DEST_MASK)) {
> vcpu->arch.apic->irr_pending = true;
> --
> 2.37.2.672.g94769d06f0-goog

Reviewed-by: Li RongQing <[email protected]>

Thanks.

-Li

2022-09-01 21:14:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 08/19] KVM: SVM: Remove redundant cluster calculation that also creates a shadow

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > Drop the redundant "cluster" calculation and its horrific shadowing in
> > the x2avic logical mode path. The "cluster" that is declared at an outer
> > scope is derived using the exact same calculation and has performed the
> > left-shift.
>
> Actually I think we should just revert the commit
> 'KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible'
>
>
> I know that the patch that intially introduced the the avic_kick_target_vcpus_fast had
> x2apic/x2avic code, and then it was split to avoid adding x2avic code before it was merged,
> resulting in this patch to add the x2apic specific code.
>
> But when I fixed most issues of avic_kick_target_vcpus_fast in my
> 'KVM: x86: SVM: fix avic_kick_target_vcpus_fast', I added back x2apic code because
> it was just natural to do since I had to calculate cluster/bitmap masks anyway.
>
> I expected this patch to be dropped because of this but I guess it was not noticed,
> or patches were merged in the wrong order.
>
> This is the reason of shadowing, duplicate calculations/etc.

Ooooh, I completely missed that x2AVIC was already supported. I saw the code, but
between the fixing the first bug and unwinding everything everything else I didn't
see that the end result ended up being a full revert.

So yeah, a full revert is definitely in order.

Thanks!

2022-09-16 19:33:56

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 16/19] KVM: x86: Explicitly track all possibilities for APIC map's logical modes

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 16:56 +0000, Sean Christopherson wrote:
> > On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > > > @@ -993,7 +1011,7 @@ static bool kvm_apic_is_broadcast_dest(struct kvm *kvm, struct kvm_lapic **src,
> > > > {
> > > > if (kvm->arch.x2apic_broadcast_quirk_disabled) {
> > > > if ((irq->dest_id == APIC_BROADCAST &&
> > > > - map->mode != KVM_APIC_MODE_X2APIC))
> > > > + map->logical_mode != KVM_APIC_MODE_X2APIC))
> > > > return true;
> > > > if (irq->dest_id == X2APIC_BROADCAST)
> > > > return true;
> > >
> > > To be honest I would put that patch first, and then do all the other patches,
> > > this way you would avoid all of the hacks they do and removed here.
> >
> > I did it this way so that I could test this patch for correctness. Without the
> > bug fixes in place it's not really possible to verify this patch is 100% correct.
> >
> > I completely agree that it would be a lot easier to read/understand/review if
> > this came first, but I'd rather not sacrifice the ability to easily test this patch.
> >
>
> I am not 100% sure about this, but I won't argue about it, let it be.

Whelp, so much for my argument. I'm going to bite the bullet and move this patch
first so that the fix for logical x2APIC mode[*] doesn't need to pile on the hacks.

[*] https://lore.kernel.org/all/[email protected]