2022-09-20 23:47:30

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 00/28] KVM: x86: AVIC and local APIC fixes+cleanups

TL;DR: KVM's AVIC and optimized APIC map code doesn't correctly handle
various edge cases that are architecturally legal(ish), but are unlikely
to occur in most real world scenarios.

I have tested this heavily with KUT, but I haven't booted Windows and
don't have access to x2AVIC, so additional testing would be much
appreciated.

v3:
- Collect reviews. [Paolo]
- Drop "partial" x2APIC inhibit and instead delete the memslot.
[Maxim, Suravee]
- Skip logical mode updates for x2APIC, which just reuses the
phys_map with some clever logic. [Suravee]
- Add a fix for "nodecode write" traps. [Alejandro]

v2:
- https://lore.kernel.org/all/[email protected]
- Collect reviews. [Li, Maxim]
- Disable only MMIO access when x2APIC is enabled (instead of disabling
all of AVIC). [Maxim]
- Inhibit AVIC when logical IDs are aliased. [Maxim]
- Tweak name of set_virtual_apic_mode() hook. [Maxim]
- Straight up revert logical ID fastpath mess. [Maxim]
- Reword changelog about skipping vCPU during logical setup. [Maxim]
- Fix LDR updates on AVIC. [Maxim?]
- Fix a nasty ISR caching bug.
- Flush TLB when activating AVIC.

v1: https://lore.kernel.org/all/[email protected]

Sean Christopherson (27):
KVM: x86: Blindly get current x2APIC reg value on "nodecode write"
traps
KVM: x86: Purge "highest ISR" cache when updating APICv state
KVM: SVM: Flush the "current" TLB when activating AVIC
KVM: SVM: Process ICR on AVIC IPI delivery failure due to invalid
target
KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to
32-bit ID
KVM: x86: Move APIC access page helper to common x86 code
KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled
KVM: SVM: Don't put/load AVIC when setting virtual APIC mode
KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean
KVM: SVM: Compute dest based on sender's x2APIC status for AVIC kick
Revert "KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when
possible"
KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch
KVM: SVM: Add helper to perform final AVIC "kick" of single vCPU
KVM: x86: Explicitly skip optimized logical map setup if vCPU's LDR==0
KVM: x86: Explicitly track all possibilities for APIC map's logical
modes
KVM: x86: Skip redundant x2APIC logical mode optimized cluster setup
KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs
KVM: x86: Disable APIC logical map if vCPUs are aliased in logical
mode
KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs
KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled
KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode
KVM: SVM: Always update local APIC on writes to logical dest register
KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"
KVM: SVM: Require logical ID to be power-of-2 for AVIC entry
KVM: SVM: Handle multiple logical targets in AVIC kick fastpath
KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps
Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on
a running vcpu"

Suravee Suthikulpanit (1):
KVM: SVM: Fix x2APIC Logical ID calculation for
avic_kick_target_vcpus_fast

Documentation/virt/kvm/x86/errata.rst | 11 +
arch/x86/include/asm/kvm_host.h | 51 +++-
arch/x86/kvm/lapic.c | 224 +++++++++++++---
arch/x86/kvm/lapic.h | 2 +
arch/x86/kvm/svm/avic.c | 363 ++++++++++++--------------
arch/x86/kvm/svm/svm.c | 4 +-
arch/x86/kvm/svm/svm.h | 11 +-
arch/x86/kvm/vmx/vmx.c | 36 +--
arch/x86/kvm/x86.c | 7 +
9 files changed, 426 insertions(+), 283 deletions(-)


base-commit: 372d07084593dc7a399bf9bee815711b1fb1bcf2
--
2.37.3.968.ga6b4b080e4-goog


2022-09-20 23:49:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 24/28] KVM: SVM: Update svm->ldr_reg cache even if LDR is "bad"

Update SVM's cache of the LDR even if the new value is "bad". Leaving
stale information in the cache can result in KVM missing updates and/or
invalidating the wrong entry, e.g. if avic_invalidate_logical_id_entry()
is triggered after a different vCPU has "claimed" the old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2b640c73f447..4b6fc9d64f4d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -539,23 +539,24 @@ static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
return &logical_apic_id_table[index];
}

-static int avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
+static void avic_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id, u32 ldr)
{
bool flat;
u32 *entry, new_entry;

+ if (!ldr)
+ return;
+
flat = kvm_lapic_get_reg(vcpu->arch.apic, APIC_DFR) == APIC_DFR_FLAT;
entry = avic_get_logical_id_entry(vcpu, ldr, flat);
if (!entry)
- return -EINVAL;
+ return;

new_entry = READ_ONCE(*entry);
new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
new_entry |= (g_physical_id & AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
WRITE_ONCE(*entry, new_entry);
-
- return 0;
}

static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
@@ -575,7 +576,6 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)

static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
{
- int ret = 0;
struct vcpu_svm *svm = to_svm(vcpu);
u32 ldr = kvm_lapic_get_reg(vcpu->arch.apic, APIC_LDR);
u32 id = kvm_xapic_id(vcpu->arch.apic);
@@ -589,11 +589,8 @@ static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)

avic_invalidate_logical_id_entry(vcpu);

- if (ldr)
- ret = avic_ldr_write(vcpu, id, ldr);
-
- if (!ret)
- svm->ldr_reg = ldr;
+ svm->ldr_reg = ldr;
+ avic_ldr_write(vcpu, id, ldr);
}

static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:50:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 08/28] KVM: SVM: Don't put/load AVIC when setting virtual APIC mode

Move the VMCB updates from avic_refresh_apicv_exec_ctrl() into
avic_set_virtual_apic_mode() and invert the dependency being said
functions to avoid calling avic_vcpu_{load,put}() and
avic_set_pi_irte_mode() when "only" setting the virtual APIC mode.

avic_set_virtual_apic_mode() is invoked from common x86 with preemption
enabled, which makes avic_vcpu_{load,put}() unhappy. Luckily, calling
those and updating IRTE stuff is unnecessary as the only reason
avic_set_virtual_apic_mode() is called is to handle transitions between
xAPIC and x2APIC that don't also toggle APICv activation. And if
activation doesn't change, there's no need to fiddle with the physical
APIC ID table or update IRTE.

The "full" refresh is guaranteed to be called if activation changes in
this case as the only call to the "set" path is:

kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);

and kvm_vcpu_update_apicv() invokes the refresh if activation changes:

if (apic->apicv_active == activate)
goto out;

apic->apicv_active = activate;
kvm_apic_update_apicv(vcpu);
static_call(kvm_x86_refresh_apicv_exec_ctrl)(vcpu);

Rename the helper to reflect that it is also called during "refresh".

WARNING: CPU: 183 PID: 49186 at arch/x86/kvm/svm/avic.c:1081 avic_vcpu_put+0xde/0xf0 [kvm_amd]
CPU: 183 PID: 49186 Comm: stable Tainted: G O 6.0.0-smp--fcddbca45f0a-sink #34
Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
RIP: 0010:avic_vcpu_put+0xde/0xf0 [kvm_amd]
avic_refresh_apicv_exec_ctrl+0x142/0x1c0 [kvm_amd]
avic_set_virtual_apic_mode+0x5a/0x70 [kvm_amd]
kvm_lapic_set_base+0x149/0x1a0 [kvm]
kvm_set_apic_base+0x8f/0xd0 [kvm]
kvm_set_msr_common+0xa3a/0xdc0 [kvm]
svm_set_msr+0x364/0x6b0 [kvm_amd]
__kvm_set_msr+0xb8/0x1c0 [kvm]
kvm_emulate_wrmsr+0x58/0x1d0 [kvm]
msr_interception+0x1c/0x30 [kvm_amd]
svm_invoke_exit_handler+0x31/0x100 [kvm_amd]
svm_handle_exit+0xfc/0x160 [kvm_amd]
vcpu_enter_guest+0x21bb/0x23e0 [kvm]
vcpu_run+0x92/0x450 [kvm]
kvm_arch_vcpu_ioctl_run+0x43e/0x6e0 [kvm]
kvm_vcpu_ioctl+0x559/0x620 [kvm]

Fixes: 05c4fe8c1bd9 ("KVM: SVM: Refresh AVIC configuration when changing APIC mode")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 31 +++++++++++++++----------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 8f9426f21bbf..535e35edce1d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -720,18 +720,6 @@ void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
avic_handle_ldr_update(vcpu);
}

-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu)
-{
- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (kvm_get_apic_mode(vcpu) == LAPIC_MODE_INVALID) {
- WARN_ONCE(true, "Invalid local APIC state (vcpu_id=%d)", vcpu->vcpu_id);
- return;
- }
- avic_refresh_apicv_exec_ctrl(vcpu);
-}
-
static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
{
int ret = 0;
@@ -1074,17 +1062,18 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
}

-
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;
- bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
+ return;

if (!enable_apicv)
return;

- if (activated) {
+ if (kvm_vcpu_apicv_active(vcpu)) {
/**
* During AVIC temporary deactivation, guest could update
* APIC ID, DFR and LDR registers, which would not be trapped
@@ -1098,6 +1087,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
avic_deactivate_vmcb(svm);
}
vmcb_mark_dirty(vmcb, VMCB_AVIC);
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+ bool activated = kvm_vcpu_apicv_active(vcpu);
+
+ if (!enable_apicv)
+ return;
+
+ avic_refresh_virtual_apic_mode(vcpu);

if (activated)
avic_vcpu_load(vcpu, vcpu->cpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f3813dbacb9f..2aa5069bafb2 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4807,7 +4807,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.enable_nmi_window = svm_enable_nmi_window,
.enable_irq_window = svm_enable_irq_window,
.update_cr8_intercept = svm_update_cr8_intercept,
- .set_virtual_apic_mode = avic_set_virtual_apic_mode,
+ .set_virtual_apic_mode = avic_refresh_virtual_apic_mode,
.refresh_apicv_exec_ctrl = avic_refresh_apicv_exec_ctrl,
.check_apicv_inhibit_reasons = avic_check_apicv_inhibit_reasons,
.apicv_post_state_restore = avic_apicv_post_state_restore,
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 6a7686bf6900..7a95f50e80e7 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -646,7 +646,7 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu);
void avic_vcpu_unblocking(struct kvm_vcpu *vcpu);
void avic_ring_doorbell(struct kvm_vcpu *vcpu);
unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu);
-void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
+void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);


/* sev.c */
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:50:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 20/28] KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs

Apply KVM's hotplug hack if and only if userspace has enabled 32-bit IDs
for x2APIC. If 32-bit IDs are not enabled, disable the optimized map to
honor x86 architectural behavior if multiple vCPUs shared a physical APIC
ID. As called out in the changelog that added the hack, all CPUs whose
(possibly truncated) APIC ID matches the target are supposed to receive
the IPI.

KVM intentionally differs from real hardware, because real hardware
(Knights Landing) does just "x2apic_id & 0xff" to decide whether to
accept the interrupt in xAPIC mode and it can deliver one interrupt to
more than one physical destination, e.g. 0x123 to 0x123 and 0x23.

Applying the hack even when x2APIC is not fully enabled means KVM doesn't
correctly handle scenarios where the guest has aliased xAPIC IDs across
multiple vCPUs, as only the vCPU with the lowest vCPU ID will receive any
interrupts. It's extremely unlikely any real world guest aliase APIC IDs,
or even modifies APIC IDs, but KVM's behavior is arbitrary, e.g. the
lowest vCPU ID "wins" regardless of which vCPU is "aliasing" and which
vCPU is "normal".

Furthermore, the hack is _not_ guaranteed to work! The hack works if and
only if the optimized APIC map is successfully allocated. If the map
allocation fails (unlikely), KVM will fall back to its unoptimized
behavior, which _does_ honor the architectural behavior.

Pivot on 32-bit x2APIC IDs being enabled as that is required to take
advantage of the hotplug hack (see kvm_apic_state_fixup()), i.e. won't
break existing setups unless they are way, way off in the weeds.

And an entry in KVM's errata to document the hack. Alternatively, KVM
could provide an actual x2APIC quirk and document the hack that way, but
there's unlikely to ever be a use case for disabling the quirk. Go the
errata route to avoid having to validate a quirk no one cares about.

Fixes: 5bd5db385b3e ("KVM: x86: allow hotplug of VCPU with APIC ID over 0xff")
Signed-off-by: Sean Christopherson <[email protected]>
---
Documentation/virt/kvm/x86/errata.rst | 11 ++++++
arch/x86/kvm/lapic.c | 50 ++++++++++++++++++++++-----
2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/x86/errata.rst b/Documentation/virt/kvm/x86/errata.rst
index 410e0aa63493..49a05f24747b 100644
--- a/Documentation/virt/kvm/x86/errata.rst
+++ b/Documentation/virt/kvm/x86/errata.rst
@@ -37,3 +37,14 @@ Nested virtualization features
------------------------------

TBD
+
+x2APIC
+------
+When KVM_X2APIC_API_USE_32BIT_IDS is enabled, KVM activates a hack/quirk that
+allows sending events to a single vCPU using its x2APIC ID even if the target
+vCPU has legacy xAPIC enabled, e.g. to bring up hotplugged vCPUs via INIT-SIPI
+on VMs with > 255 vCPUs. A side effect of the quirk is that, if multiple vCPUs
+have the same physical APIC ID, KVM will deliver events targeting that APIC ID
+only to the vCPU with the lowest vCPU ID. If KVM_X2APIC_API_USE_32BIT_IDS is
+not enabled, KVM follows x86 architecture when processing interrupts (all vCPUs
+matching the target APIC ID receive the interrupt).
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e447278d1986..b344ab52556e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -274,10 +274,10 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
struct kvm_lapic *apic = vcpu->arch.apic;
struct kvm_lapic **cluster;
enum kvm_apic_logical_mode logical_mode;
+ u32 x2apic_id, physical_id;
u16 mask;
u32 ldr;
u8 xapic_id;
- u32 x2apic_id;

if (!kvm_apic_present(vcpu))
continue;
@@ -285,16 +285,48 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
xapic_id = kvm_xapic_id(apic);
x2apic_id = kvm_x2apic_id(apic);

- /* Hotplug hack: see kvm_apic_match_physical_addr(), ... */
- if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
- x2apic_id <= new->max_apic_id)
- new->phys_map[x2apic_id] = apic;
/*
- * ... xAPIC ID of VCPUs with APIC ID > 0xff will wrap-around,
- * prevent them from masking VCPUs with APIC ID <= 0xff.
+ * Apply KVM's hotplug hack if userspace has enable 32-bit APIC
+ * IDs. Allow sending events to vCPUs by their x2APIC ID even
+ * if the target vCPU is in legacy xAPIC mode, and silently
+ * ignore aliased xAPIC IDs (the x2APIC ID is truncated to 8
+ * bits, causing IDs > 0xff to wrap and collide).
+ *
+ * Honor the architectural (and KVM's non-optimized) behavior
+ * if userspace has not enabled 32-bit x2APIC IDs. Each APIC
+ * is supposed to process messages independently. If multiple
+ * vCPUs have the same effective APIC ID, e.g. due to the
+ * x2APIC wrap or because the guest manually modified its xAPIC
+ * IDs, events targeting that ID are supposed to be recognized
+ * by all vCPUs with said ID.
*/
- if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
- new->phys_map[xapic_id] = apic;
+ if (kvm->arch.x2apic_format) {
+ /* See also kvm_apic_match_physical_addr(). */
+ if ((apic_x2apic_mode(apic) || x2apic_id > 0xff) &&
+ x2apic_id <= new->max_apic_id)
+ new->phys_map[x2apic_id] = apic;
+
+ if (!apic_x2apic_mode(apic) && !new->phys_map[xapic_id])
+ new->phys_map[xapic_id] = apic;
+ } else {
+ /*
+ * Disable the optimized map if the physical APIC ID is
+ * already mapped, i.e. is aliased to multiple vCPUs.
+ * The optimized map requires a strict 1:1 mapping
+ * between IDs and vCPUs.
+ */
+ if (apic_x2apic_mode(apic))
+ physical_id = x2apic_id;
+ else
+ physical_id = xapic_id;
+
+ if (new->phys_map[physical_id]) {
+ kvfree(new);
+ new = NULL;
+ goto out;
+ }
+ new->phys_map[physical_id] = apic;
+ }

if (new->logical_mode == KVM_APIC_MODE_MAP_DISABLED ||
!kvm_apic_sw_enabled(apic))
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:51:19

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 25/28] KVM: SVM: Require logical ID to be power-of-2 for AVIC entry

Do not modify AVIC's logical ID table if the logical ID portion of the
LDR is not a power-of-2, i.e. if the LDR has multiple bits set. Taking
only the first bit means that KVM will fail to match MDAs that intersect
with "higher" bits in the "ID"

The "ID" acts as a bitmap, but is referred to as an ID because theres an
implicit, unenforced "requirement" that software only set one bit. This
edge case is arguably out-of-spec behavior, but KVM cleanly handles it
in all other cases, e.g. the optimized logical map (and AVIC!) is also
disabled in this scenario.

Refactor the code to consolidate the checks, and so that the code looks
more like avic_kick_target_vcpus_fast().

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 4b6fc9d64f4d..a9e4e09f83fc 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -513,26 +513,26 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u32 ldr, bool flat)
{
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);
- int index;
u32 *logical_apic_id_table;
- int dlid = GET_APIC_LOGICAL_ID(ldr);
+ u32 cluster, index;

- if (!dlid)
- return NULL;
+ ldr = GET_APIC_LOGICAL_ID(ldr);

- if (flat) { /* flat */
- index = ffs(dlid) - 1;
- if (index > 7)
+ if (flat) {
+ cluster = 0;
+ } else {
+ cluster = (ldr >> 4) << 2;
+ if (cluster >= 0xf)
return NULL;
- } else { /* cluster */
- int cluster = (dlid & 0xf0) >> 4;
- int apic = ffs(dlid & 0x0f) - 1;
-
- if ((apic < 0) || (apic > 7) ||
- (cluster >= 0xf))
- return NULL;
- index = (cluster << 2) + apic;
+ ldr &= 0xf;
}
+ if (!ldr || !is_power_of_2(ldr))
+ return NULL;
+
+ index = __ffs(ldr);
+ if (WARN_ON_ONCE(index > 7))
+ return NULL;
+ index += (cluster << 2);

logical_apic_id_table = (u32 *) page_address(kvm_svm->avic_logical_id_table_page);

--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:52:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 13/28] KVM: SVM: Document that vCPU ID == APIC ID in AVIC kick fastpatch

Document that AVIC is inhibited if any vCPU's APIC ID diverges from its
vCPU ID, i.e. that there's no need to check for a destination match in
the AVIC kick fast path.

Opportunistically tweak comments to remove "guest bug", as that suggests
KVM is punting on error handling, which is not the case. Targeting a
non-existent vCPU or no vCPUs _may_ be a guest software bug, but whether
or not it's a guest bug is irrelevant. Such behavior is architecturally
legal and thus needs to faithfully emulated by KVM (and it is).

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 605c36569ddf..40a1ea21074d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -368,8 +368,8 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
cluster = (dest >> 4) << 2;
}

+ /* Nothing to do if there are no destinations in the cluster. */
if (unlikely(!bitmap))
- /* guest bug: nobody to send the logical interrupt to */
return 0;

if (!is_power_of_2(bitmap))
@@ -397,7 +397,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
if (WARN_ON_ONCE(index != logid_index))
return -EINVAL;

- /* guest bug: non existing/reserved logical destination */
+ /* Nothing to do if the logical destination is invalid. */
if (unlikely(!(logid_entry & AVIC_LOGICAL_ID_ENTRY_VALID_MASK)))
return 0;

@@ -406,9 +406,13 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source
}
}

+ /*
+ * KVM inhibits AVIC if any vCPU ID diverges from the vCPUs APIC ID,
+ * i.e. APIC ID == vCPU ID. Once again, nothing to do if the target
+ * vCPU doesn't exist.
+ */
target_vcpu = kvm_get_vcpu_by_id(kvm, l1_physical_id);
if (unlikely(!target_vcpu))
- /* guest bug: non existing vCPU is a target of this IPI*/
return 0;

target_vcpu->arch.apic->irr_pending = true;
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:52:50

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 09/28] KVM: SVM: Replace "avic_mode" enum with "x2avic_enabled" boolean

Replace the "avic_mode" enum with a single bool to track whether or not
x2AVIC is enabled. KVM already has "apicv_enabled" that tracks if any
flavor of AVIC is enabled, i.e. AVIC_MODE_NONE and AVIC_MODE_X1 are
redundant and unnecessary noise.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 46 +++++++++++++++++++----------------------
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/svm/svm.h | 9 +-------
3 files changed, 23 insertions(+), 34 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 535e35edce1d..84beef0edae3 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -53,7 +53,7 @@ static DEFINE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS);
static u32 next_vm_id = 0;
static bool next_vm_id_wrapped = 0;
static DEFINE_SPINLOCK(svm_vm_data_hash_lock);
-enum avic_modes avic_mode;
+bool x2avic_enabled;

/*
* This is a wrapper of struct amd_iommu_ir_data.
@@ -79,8 +79,7 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
* (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
* AVIC in hybrid mode activates only the doorbell mechanism.
*/
- if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
- avic_mode == AVIC_MODE_X2) {
+ if (x2avic_enabled && apic_x2apic_mode(svm->vcpu.arch.apic)) {
vmcb->control.int_ctl |= X2APIC_MODE_MASK;
vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
/* Disabling MSR intercept for x2APIC registers */
@@ -247,8 +246,8 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
u64 *avic_physical_id_table;
struct kvm_svm *kvm_svm = to_kvm_svm(vcpu->kvm);

- if ((avic_mode == AVIC_MODE_X1 && index > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && index > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && index > AVIC_MAX_PHYSICAL_ID) ||
+ (index > X2AVIC_MAX_PHYSICAL_ID))
return NULL;

avic_physical_id_table = page_address(kvm_svm->avic_physical_id_table_page);
@@ -262,8 +261,8 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
int id = vcpu->vcpu_id;
struct vcpu_svm *svm = to_svm(vcpu);

- if ((avic_mode == AVIC_MODE_X1 && id > AVIC_MAX_PHYSICAL_ID) ||
- (avic_mode == AVIC_MODE_X2 && id > X2AVIC_MAX_PHYSICAL_ID))
+ if ((!x2avic_enabled && id > AVIC_MAX_PHYSICAL_ID) ||
+ (id > X2AVIC_MAX_PHYSICAL_ID))
return -EINVAL;

if (!vcpu->arch.apic->regs)
@@ -1067,10 +1066,7 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb01.ptr;

- if (!lapic_in_kernel(vcpu) || avic_mode == AVIC_MODE_NONE)
- return;
-
- if (!enable_apicv)
+ if (!lapic_in_kernel(vcpu) || !enable_apicv)
return;

if (kvm_vcpu_apicv_active(vcpu)) {
@@ -1146,32 +1142,32 @@ bool avic_hardware_setup(struct kvm_x86_ops *x86_ops)
if (!npt_enabled)
return false;

+ /* AVIC is a prerequisite for x2AVIC. */
+ if (!boot_cpu_has(X86_FEATURE_AVIC) && !force_avic) {
+ if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
+ pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
+ pr_warn(FW_BUG "Try enable AVIC using force_avic option");
+ }
+ return false;
+ }
+
if (boot_cpu_has(X86_FEATURE_AVIC)) {
- avic_mode = AVIC_MODE_X1;
pr_info("AVIC enabled\n");
} else if (force_avic) {
/*
* Some older systems does not advertise AVIC support.
* See Revision Guide for specific AMD processor for more detail.
*/
- avic_mode = AVIC_MODE_X1;
pr_warn("AVIC is not supported in CPUID but force enabled");
pr_warn("Your system might crash and burn");
}

/* AVIC is a prerequisite for x2AVIC. */
- if (boot_cpu_has(X86_FEATURE_X2AVIC)) {
- if (avic_mode == AVIC_MODE_X1) {
- avic_mode = AVIC_MODE_X2;
- pr_info("x2AVIC enabled\n");
- } else {
- pr_warn(FW_BUG "Cannot support x2AVIC due to AVIC is disabled");
- pr_warn(FW_BUG "Try enable AVIC using force_avic option");
- }
- }
+ x2avic_enabled = boot_cpu_has(X86_FEATURE_X2AVIC);
+ if (x2avic_enabled)
+ pr_info("x2AVIC enabled\n");

- if (avic_mode != AVIC_MODE_NONE)
- amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);
+ amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier);

- return !!avic_mode;
+ return true;
}
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 2aa5069bafb2..709f0b3e7a48 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -821,7 +821,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;

- if (avic_mode != AVIC_MODE_X2 ||
+ if (!x2avic_enabled ||
!apic_x2apic_mode(svm->vcpu.arch.apic))
return;

diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7a95f50e80e7..29c334a932c3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -35,14 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
extern int vgif;
extern bool intercept_smi;
-
-enum avic_modes {
- AVIC_MODE_NONE = 0,
- AVIC_MODE_X1,
- AVIC_MODE_X2,
-};
-
-extern enum avic_modes avic_mode;
+extern bool x2avic_enabled;

/*
* Clean bits in VMCB.
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:53:00

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
x2APIC enabled. On AMD, due to its "hybrid" mode where AVIC is enabled
when x2APIC is enabled even without x2AVIC support, keeping the APIC
access page memslot results in the guest being able to access the virtual
APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
that the guest is operating in x2APIC mode.

Intel doesn't suffer from the same issue as APICv has fully independent
VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM
should provide bus error semantics and not memory semantics for the APIC
page when x2APIC is enabled, but KVM already provides memory semantics in
other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
disabled (via APIC_BASE MSR).

Reserve an inhibit bit so that common code can detect whether or not the
"x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
so that it doesn't need to be stripped from apicv_inhibit_reasons (since
it's not a "full" inhibit).

Note, setting apic_access_memslot_inhibited without taking locks relies
on it being sticky, and also relies on apic_access_memslot_enabled being
set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to
set the inhibit and delete the memslot, i.e. can get false positives, but
can't false negatives as apic_access_memslot_enabled can't be toggle "on"
once any vCPU reaches kvm_lapic_set_base().

Opportunistically drop the "can" while updating avic_activate_vmcb()'s
comment, i.e. to state that KVM _does_ support the hybrid mode. Move
the "Note:" down a line to conform to preferred kernel/KVM multi-line
comment style.

Opportunistically update the apicv_update_lock comment, as it isn't
actually used to protect apic_access_memslot_enabled (it's protected by
slots_lock).

Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++---
arch/x86/kvm/lapic.c | 34 +++++++++++++++++++++++++++++++--
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/svm/avic.c | 15 ++++++++-------
arch/x86/kvm/x86.c | 7 +++++++
5 files changed, 63 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..6475c882b359 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
* AVIC is disabled because SEV doesn't support it.
*/
APICV_INHIBIT_REASON_SEV,
+
+ /*
+ * Due to sharing page tables across vCPUs, the xAPIC memslot must be
+ * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
+ * independent controls for AVIC vs. x2AVIC, and also because SVM
+ * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
+ * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
+ * AVIC can still be activated, but KVM must not create SPTEs for the
+ * APIC base. For simplicity, this is sticky.
+ */
+ APICV_INHIBIT_REASON_X2APIC,
};

struct kvm_arch {
@@ -1169,10 +1180,11 @@ struct kvm_arch {
struct kvm_apic_map __rcu *apic_map;
atomic_t apic_map_dirty;

- /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
- struct rw_semaphore apicv_update_lock;
-
bool apic_access_memslot_enabled;
+ bool apic_access_memslot_inhibited;
+
+ /* Protects apicv_inhibit_reasons */
+ struct rw_semaphore apicv_update_lock;
unsigned long apicv_inhibit_reasons;

gpa_t wall_clock;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 99994d2470a2..70f00eda75b2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
}
}

- if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
+ if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);

+ /*
+ * Mark the APIC memslot as inhibited if x2APIC is enabled and
+ * the x2APIC inhibit is required. The actual deletion of the
+ * memslot is handled by vcpu_run() as SRCU may or may not be
+ * held at this time, i.e. updating memslots isn't safe. Don't
+ * check apic_access_memslot_inhibited, this vCPU needs to
+ * ensure the memslot is deleted before re-entering the guest,
+ * i.e. needs to make the request even if the inhibit flag was
+ * already set by a different vCPU.
+ */
+ if (vcpu->kvm->arch.apic_access_memslot_enabled &&
+ static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
+ vcpu->kvm->arch.apic_access_memslot_inhibited = true;
+ kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
+ }
+ }
+
if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
@@ -2440,7 +2457,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
int ret = 0;

mutex_lock(&kvm->slots_lock);
- if (kvm->arch.apic_access_memslot_enabled)
+ if (kvm->arch.apic_access_memslot_enabled ||
+ kvm->arch.apic_access_memslot_inhibited)
goto out;

hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
@@ -2468,6 +2486,18 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);

+void kvm_free_apic_access_page(struct kvm *kvm)
+{
+ mutex_lock(&kvm->slots_lock);
+
+ if (kvm->arch.apic_access_memslot_enabled) {
+ __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
+ kvm->arch.apic_access_memslot_enabled = false;
+ }
+
+ mutex_unlock(&kvm->slots_lock);
+}
+
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
{
struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 6d06397683d0..e2271ffa7ac0 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
int kvm_alloc_apic_access_page(struct kvm *kvm);
+void kvm_free_apic_access_page(struct kvm *kvm);

bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 0424a5e664bb..8f9426f21bbf 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)

vmcb->control.int_ctl |= AVIC_ENABLE_MASK;

- /* Note:
- * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
- * MSR accesses, while interrupt injection to a running vCPU
- * can be achieved using AVIC doorbell. The AVIC hardware still
- * accelerate MMIO accesses, but this does not cause any harm
- * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
+ /*
+ * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
+ * accesses, while interrupt injection to a running vCPU can be
+ * achieved using AVIC doorbell. KVM disables the APIC access page
+ * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
+ * AVIC in hybrid mode activates only the doorbell mechanism.
*/
if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
avic_mode == AVIC_MODE_X2) {
@@ -987,7 +987,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
BIT(APICV_INHIBIT_REASON_SEV) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
+ BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
+ BIT(APICV_INHIBIT_REASON_X2APIC);

return supported & BIT(reason);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7374d768296..aa5ab0c620de 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10705,6 +10705,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
break;
}

+ if (vcpu->kvm->arch.apic_access_memslot_inhibited &&
+ vcpu->kvm->arch.apic_access_memslot_enabled) {
+ kvm_vcpu_srcu_read_unlock(vcpu);
+ kvm_free_apic_access_page(vcpu->kvm);
+ kvm_vcpu_srcu_read_lock(vcpu);
+ }
+
if (__xfer_to_guest_mode_work_pending()) {
kvm_vcpu_srcu_read_unlock(vcpu);
r = xfer_to_guest_mode_handle_work(vcpu);
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:53:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 27/28] KVM: SVM: Ignore writes to Remote Read Data on AVIC write traps

Drop writes to APIC_RRR, a.k.a. Remote Read Data Register, on AVIC
unaccelerated write traps. The register is read-only and isn't emulated
by KVM. Sending the register through kvm_apic_write_nodecode() will
result in screaming when x2APIC is enabled due to the unexpected failure
to retrieve the MSR (KVM expects that only "legal" accesses will trap).

Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/avic.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 17e64b056e4e..953b1fd14b6d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -631,6 +631,9 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)
case APIC_DFR:
avic_handle_dfr_update(vcpu);
break;
+ case APIC_RRR:
+ /* Ignore writes to Read Remote Data, it's read-only. */
+ return 1;
default:
break;
}
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:53:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 02/28] KVM: x86: Purge "highest ISR" cache when updating APICv state

Purge the "highest ISR" cache when updating APICv state on a vCPU. The
cache must not be used when APICv is active as hardware may emulate EOIs
(and other operations) without exiting to KVM.

This fixes a bug where KVM will effectively block IRQs in perpetuity due
to the "highest ISR" never getting reset if APICv is activated on a vCPU
while an IRQ is in-service. Hardware emulates the EOI and KVM never gets
a chance to update its cache.

Fixes: b26a695a1d78 ("kvm: lapic: Introduce APICv update helper function")
Cc: [email protected]
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Reviewed-by: Paolo Bonzini <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 8004c4d0a8e5..adac6ca9b7dc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2424,6 +2424,7 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
*/
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
+ apic->highest_isr_cache = -1;
}
EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);

@@ -2480,7 +2481,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
kvm_lapic_set_reg(apic, APIC_TMR + 0x10 * i, 0);
}
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);

@@ -2767,7 +2767,6 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
__start_apic_timer(apic, APIC_TMCCT);
kvm_lapic_set_reg(apic, APIC_TMCCT, 0);
kvm_apic_update_apicv(vcpu);
- apic->highest_isr_cache = -1;
if (apic->apicv_active) {
static_call_cond(kvm_x86_apicv_post_state_restore)(vcpu);
static_call_cond(kvm_x86_hwapic_irr_update)(vcpu, apic_find_highest_irr(apic));
--
2.37.3.968.ga6b4b080e4-goog

2022-09-20 23:53:19

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 23/28] KVM: SVM: Always update local APIC on writes to logical dest register

Update the vCPU's local (virtual) APIC on LDR writes even if the write
"fails". The APIC needs to recalc the optimized logical map even if the
LDR is invalid or zero, e.g. if the guest clears its LDR, the optimized
map will be left as is and the vCPU will receive interrupts using its
old LDR.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 27d5abc15a91..2b640c73f447 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -573,7 +573,7 @@ static void avic_invalidate_logical_id_entry(struct kvm_vcpu *vcpu)
clear_bit(AVIC_LOGICAL_ID_ENTRY_VALID_BIT, (unsigned long *)entry);
}

-static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)
+static void avic_handle_ldr_update(struct kvm_vcpu *vcpu)
{
int ret = 0;
struct vcpu_svm *svm = to_svm(vcpu);
@@ -582,10 +582,10 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

/* AVIC does not support LDR update for x2APIC */
if (apic_x2apic_mode(vcpu->arch.apic))
- return 0;
+ return;

if (ldr == svm->ldr_reg)
- return 0;
+ return;

avic_invalidate_logical_id_entry(vcpu);

@@ -594,8 +594,6 @@ static int avic_handle_ldr_update(struct kvm_vcpu *vcpu)

if (!ret)
svm->ldr_reg = ldr;
-
- return ret;
}

static void avic_handle_dfr_update(struct kvm_vcpu *vcpu)
@@ -617,8 +615,7 @@ static int avic_unaccel_trap_write(struct kvm_vcpu *vcpu)

switch (offset) {
case APIC_LDR:
- if (avic_handle_ldr_update(vcpu))
- return 0;
+ avic_handle_ldr_update(vcpu);
break;
case APIC_DFR:
avic_handle_dfr_update(vcpu);
--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:10:48

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 18/28] KVM: x86: Disable APIC logical map if logical ID covers multiple MDAs

Disable the optimized APIC logical map if a logical ID covers multiple
MDAs, i.e. if a vCPU has multiple bits set in its ID. In logical mode,
events match if "ID & MDA != 0", i.e. creating an entry for only the
first bit can cause interrupts to be missed.

Note, creating an entry for every bit is also wrong as KVM would generate
IPIs for every matching bit. It would be possible to teach KVM to play
nice with this edge case, but it is very much an edge case and probably
not used in any real world OS, i.e. it's not worth optimizing.

Fixes: 1e08ec4a130e ("KVM: optimize apic interrupt delivery")
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/lapic.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7a39d7be4cc9..a12360fd4df6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -338,8 +338,14 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
continue;
}

- if (mask)
- cluster[ffs(mask) - 1] = apic;
+ if (!mask)
+ continue;
+
+ if (!is_power_of_2(mask)) {
+ new->logical_mode = KVM_APIC_MODE_MAP_DISABLED;
+ continue;
+ }
+ cluster[ffs(mask) - 1] = apic;
}
out:
old = rcu_dereference_protected(kvm->arch.apic_map,
--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:11:12

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 21/28] KVM: x86: Inhibit APICv/AVIC if the optimized physical map is disabled

Inhibit APICv/AVIC if the optimized physical map is disabled so that KVM
KVM provides consistent APIC behavior if xAPIC IDs are aliased due to
vcpu_id being truncated and the x2APIC hotplug hack isn't enabled. If
the hotplug hack is disabled, events that are emulated by KVM will follow
architectural behavior (all matching vCPUs receive events, even if the
"match" is due to truncation), whereas APICv and AVIC will deliver events
only to the first matching vCPU, i.e. the vCPU that matches without
truncation.

Note, the "extra" inhibit is needed because KVM deliberately ignores
mismatches due to truncation when applying the APIC_ID_MODIFIED inhibit
so that large VMs (>255 vCPUs) can run with APICv/AVIC.

Fixes: TDB
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/lapic.c | 13 ++++++++++++-
arch/x86/kvm/svm/avic.c | 1 +
arch/x86/kvm/vmx/vmx.c | 1 +
4 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6570b5d728ef..594674eefe59 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1097,6 +1097,12 @@ enum kvm_apicv_inhibit {
*/
APICV_INHIBIT_REASON_BLOCKIRQ,

+ /*
+ * APICv is disabled because not all vCPUs have a 1:1 mapping between
+ * APIC ID and vCPU, _and_ KVM is not applying its x2APIC hotplug hack.
+ */
+ APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED,
+
/*
* For simplicity, the APIC acceleration is inhibited
* first time either APIC ID or APIC base are changed by the guest
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b344ab52556e..4db162b1f0b1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -381,6 +381,16 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
cluster[ldr] = apic;
}
out:
+ /*
+ * The optimized map is effectively KVM's internal version of APICv,
+ * and all unwanted aliasing that results in disabling the optimized
+ * map also applies to APICv.
+ */
+ if (!new)
+ kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+ else
+ kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);
+
old = rcu_dereference_protected(kvm->arch.apic_map,
lockdep_is_held(&kvm->arch.apic_map_lock));
rcu_assign_pointer(kvm->arch.apic_map, new);
@@ -2150,7 +2160,8 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
/*
* Deliberately truncate the vCPU ID when detecting a modified APIC ID
* to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
- * value.
+ * value. If the wrap/truncation results in unwatned aliasing, APICv
+ * will be inhibited as part of updating KVM's optimized APIC maps.
*/
if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
return;
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index dd0e41d454a7..2908adc79ea6 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -965,6 +965,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
BIT(APICV_INHIBIT_REASON_SEV) |
+ BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
BIT(APICV_INHIBIT_REASON_X2APIC);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b39095ef9bd7..0f9f8ae59f85 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7963,6 +7963,7 @@ static bool vmx_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_ABSENT) |
BIT(APICV_INHIBIT_REASON_HYPERV) |
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
+ BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);

--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:14:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 06/28] KVM: x86: Move APIC access page helper to common x86 code

Move the APIC access page allocation helper function to common x86 code,
the allocation routine is virtually identical between APICv (VMX) and
AVIC (SVM). Keep APICv's gfn_to_page() + put_page() sequence, which
verifies that a backing page can be allocated, i.e. that the system isn't
under heavy memory pressure. Forcing the backing page to be populated
isn't strictly necessary, but skipping the effective prefetch only delays
the inevitable.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 35 +++++++++++++++++++++++++++++++++++
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/svm/avic.c | 41 +++++++----------------------------------
arch/x86/kvm/vmx/vmx.c | 35 +----------------------------------
4 files changed, 44 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a02defa3f7b5..99994d2470a2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2433,6 +2433,41 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_apic_update_apicv);

+int kvm_alloc_apic_access_page(struct kvm *kvm)
+{
+ struct page *page;
+ void __user *hva;
+ int ret = 0;
+
+ mutex_lock(&kvm->slots_lock);
+ if (kvm->arch.apic_access_memslot_enabled)
+ goto out;
+
+ hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
+ APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
+ if (IS_ERR(hva)) {
+ ret = PTR_ERR(hva);
+ goto out;
+ }
+
+ page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+ if (is_error_page(page)) {
+ ret = -EFAULT;
+ goto out;
+ }
+
+ /*
+ * Do not pin the page in memory, so that memory hot-unplug
+ * is able to migrate it.
+ */
+ put_page(page);
+ kvm->arch.apic_access_memslot_enabled = true;
+out:
+ mutex_unlock(&kvm->slots_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
+
void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
{
struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 117a46df5cc1..6d06397683d0 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -111,6 +111,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
+int kvm_alloc_apic_access_page(struct kvm *kvm);

bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 3b2c88b168ba..0424a5e664bb 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -256,39 +256,6 @@ static u64 *avic_get_physical_id_entry(struct kvm_vcpu *vcpu,
return &avic_physical_id_table[index];
}

-/*
- * Note:
- * AVIC hardware walks the nested page table to check permissions,
- * but does not use the SPA address specified in the leaf page
- * table entry since it uses address in the AVIC_BACKING_PAGE pointer
- * field of the VMCB. Therefore, we set up the
- * APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (4KB) here.
- */
-static int avic_alloc_access_page(struct kvm *kvm)
-{
- void __user *ret;
- int r = 0;
-
- mutex_lock(&kvm->slots_lock);
-
- if (kvm->arch.apic_access_memslot_enabled)
- goto out;
-
- ret = __x86_set_memory_region(kvm,
- APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
- APIC_DEFAULT_PHYS_BASE,
- PAGE_SIZE);
- if (IS_ERR(ret)) {
- r = PTR_ERR(ret);
- goto out;
- }
-
- kvm->arch.apic_access_memslot_enabled = true;
-out:
- mutex_unlock(&kvm->slots_lock);
- return r;
-}
-
static int avic_init_backing_page(struct kvm_vcpu *vcpu)
{
u64 *entry, new_entry;
@@ -305,7 +272,13 @@ static int avic_init_backing_page(struct kvm_vcpu *vcpu)
if (kvm_apicv_activated(vcpu->kvm)) {
int ret;

- ret = avic_alloc_access_page(vcpu->kvm);
+ /*
+ * Note, AVIC hardware walks the nested page table to check
+ * permissions, but does not use the SPA address specified in
+ * the leaf SPTE since it uses address in the AVIC_BACKING_PAGE
+ * pointer field of the VMCB.
+ */
+ ret = kvm_alloc_apic_access_page(vcpu->kvm);
if (ret)
return ret;
}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9b49a09e6b5..b39095ef9bd7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3818,39 +3818,6 @@ static void seg_setup(int seg)
vmcs_write32(sf->ar_bytes, ar);
}

-static int alloc_apic_access_page(struct kvm *kvm)
-{
- struct page *page;
- void __user *hva;
- int ret = 0;
-
- mutex_lock(&kvm->slots_lock);
- if (kvm->arch.apic_access_memslot_enabled)
- goto out;
- hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
- APIC_DEFAULT_PHYS_BASE, PAGE_SIZE);
- if (IS_ERR(hva)) {
- ret = PTR_ERR(hva);
- goto out;
- }
-
- page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
- if (is_error_page(page)) {
- ret = -EFAULT;
- goto out;
- }
-
- /*
- * Do not pin the page in memory, so that memory hot-unplug
- * is able to migrate it.
- */
- put_page(page);
- kvm->arch.apic_access_memslot_enabled = true;
-out:
- mutex_unlock(&kvm->slots_lock);
- return ret;
-}
-
int allocate_vpid(void)
{
int vpid;
@@ -7356,7 +7323,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
vmx->loaded_vmcs = &vmx->vmcs01;

if (cpu_need_virtualize_apic_accesses(vcpu)) {
- err = alloc_apic_access_page(vcpu->kvm);
+ err = kvm_alloc_apic_access_page(vcpu->kvm);
if (err)
goto free_vmcs;
}
--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:15:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 22/28] KVM: SVM: Inhibit AVIC if vCPUs are aliased in logical mode

Inhibit SVM's AVIC if multiple vCPUs are aliased to the same logical ID.
Architecturally, all CPUs whose logical ID matches the MDA are supposed
to receive the interrupt; overwriting existing entries in AVIC's
logical=>physical map can result in missed IPIs.

Fixes: 18f40c53e10f ("svm: Add VMEXIT handlers for AVIC")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/lapic.c | 5 +++++
arch/x86/kvm/svm/avic.c | 3 ++-
3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 594674eefe59..32c0bca052e3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1152,6 +1152,12 @@ enum kvm_apicv_inhibit {
* APIC base. For simplicity, this is sticky.
*/
APICV_INHIBIT_REASON_X2APIC,
+
+ /*
+ * AVIC is disabled because not all vCPUs with a valid LDR have a 1:1
+ * mapping between logical ID and vCPU.
+ */
+ APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED,
};

struct kvm_arch {
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 4db162b1f0b1..804d529d9bfb 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -391,6 +391,11 @@ void kvm_recalculate_apic_map(struct kvm *kvm)
else
kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED);

+ if (!new || new->logical_mode == KVM_APIC_MODE_MAP_DISABLED)
+ kvm_set_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+ else
+ kvm_clear_apicv_inhibit(kvm, APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);
+
old = rcu_dereference_protected(kvm->arch.apic_map,
lockdep_is_held(&kvm->arch.apic_map_lock));
rcu_assign_pointer(kvm->arch.apic_map, new);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 2908adc79ea6..27d5abc15a91 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -968,7 +968,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_PHYSICAL_ID_ALIASED) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
- BIT(APICV_INHIBIT_REASON_X2APIC);
+ BIT(APICV_INHIBIT_REASON_X2APIC) |
+ BIT(APICV_INHIBIT_REASON_LOGICAL_ID_ALIASED);

return supported & BIT(reason);
}
--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:17:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 11/28] KVM: SVM: Fix x2APIC Logical ID calculation for avic_kick_target_vcpus_fast

From: Suravee Suthikulpanit <[email protected]>

For X2APIC ID in cluster mode, the logical ID is bit [15:0].

Fixes: 603ccef42ce9 ("KVM: x86: SVM: fix avic_kick_target_vcpus_fast")
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index e9aab8ecce83..e35e9363e7ff 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -356,7 +356,7 @@ static int avic_kick_target_vcpus_fast(struct kvm *kvm, struct kvm_lapic *source

if (apic_x2apic_mode(source)) {
/* 16 bit dest mask, 16 bit cluster id */
- bitmap = dest & 0xFFFF0000;
+ bitmap = dest & 0xFFFF;
cluster = (dest >> 16) << 4;
} else if (kvm_lapic_get_reg(source, APIC_DFR) == APIC_DFR_FLAT) {
/* 8 bit dest mask*/
--
2.37.3.968.ga6b4b080e4-goog

2022-09-21 00:51:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 28/28] Revert "KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu"

Turns out that some warnings exist for good reasons. Restore the warning
in avic_vcpu_load() that guards against calling avic_vcpu_load() on a
running vCPU now that KVM avoids doing so when switching between x2APIC
and xAPIC. The entire point of the WARN is to highlight that KVM should
not be reloading an AVIC.

Opportunistically convert the WARN_ON() to WARN_ON_ONCE() to avoid
spamming the kernel if it does fire.

This reverts commit c0caeee65af3944b7b8abbf566e7cc1fae15c775.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/avic.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index 953b1fd14b6d..35b0ef877e53 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -1038,6 +1038,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
return;

entry = READ_ONCE(*(svm->avic_physical_id_cache));
+ WARN_ON_ONCE(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK);

entry &= ~AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK;
entry |= (h_physical_id & AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK);
--
2.37.3.968.ga6b4b080e4-goog

2022-09-23 10:51:38

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> Free the APIC access page memslot if any vCPU enables x2APIC and SVM's
> AVIC is enabled to prevent accesses to the virtual APIC on vCPUs with
> x2APIC enabled. On AMD, due to its "hybrid" mode where AVIC is enabled
> when x2APIC is enabled even without x2AVIC support, keeping the APIC
> access page memslot results in the guest being able to access the virtual
> APIC page as x2APIC is fully emulated by KVM. I.e. hardware isn't aware
> that the guest is operating in x2APIC mode.
>
> Intel doesn't suffer from the same issue as APICv has fully independent
> VMCS controls for xAPIC vs. x2APIC virtualization. Technically, KVM
> should provide bus error semantics and not memory semantics for the APIC
> page when x2APIC is enabled, but KVM already provides memory semantics in
> other scenarios, e.g. if APICv/AVIC is enabled and the APIC is hardware
> disabled (via APIC_BASE MSR).
>
> Reserve an inhibit bit so that common code can detect whether or not the
> "x2APIC inhibit" applies, but use a dedicated flag to track the inhibit
> so that it doesn't need to be stripped from apicv_inhibit_reasons (since
> it's not a "full" inhibit).
>
> Note, setting apic_access_memslot_inhibited without taking locks relies
> on it being sticky, and also relies on apic_access_memslot_enabled being
> set during vCPU creation (before kvm_vcpu_reset()). vCPUs can race to
> set the inhibit and delete the memslot, i.e. can get false positives, but
> can't false negatives as apic_access_memslot_enabled can't be toggle "on"
> once any vCPU reaches kvm_lapic_set_base().
>
> Opportunistically drop the "can" while updating avic_activate_vmcb()'s
> comment, i.e. to state that KVM _does_ support the hybrid mode. Move
> the "Note:" down a line to conform to preferred kernel/KVM multi-line
> comment style.
>
> Opportunistically update the apicv_update_lock comment, as it isn't
> actually used to protect apic_access_memslot_enabled (it's protected by
> slots_lock).
>
> Fixes: 0e311d33bfbe ("KVM: SVM: Introduce hybrid-AVIC mode")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 18 ++++++++++++++---
> arch/x86/kvm/lapic.c | 34 +++++++++++++++++++++++++++++++--
> arch/x86/kvm/lapic.h | 1 +
> arch/x86/kvm/svm/avic.c | 15 ++++++++-------
> arch/x86/kvm/x86.c | 7 +++++++
> 5 files changed, 63 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..6475c882b359 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> * AVIC is disabled because SEV doesn't support it.
> */
> APICV_INHIBIT_REASON_SEV,
> +
> + /*
> + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> + * independent controls for AVIC vs. x2AVIC, and also because SVM
> + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
> + * AVIC can still be activated, but KVM must not create SPTEs for the
> + * APIC base. For simplicity, this is sticky.
> + */
> + APICV_INHIBIT_REASON_X2APIC,

Hi Sean!

So assuming that I won't object to making it SVM specific (I still think
that VMX should also inhibit this memslot because this is closer to x86 spec,
but if you really want it this way, I won't fight over it):

I somewhat don't like this inhibit, because now it is used just to say
'I am AVIC'.

What do you think if you just move the code that removes the memslot to SVM,
to avic_set_virtual_apic_mode?


> };
>
> struct kvm_arch {
> @@ -1169,10 +1180,11 @@ struct kvm_arch {
> struct kvm_apic_map __rcu *apic_map;
> atomic_t apic_map_dirty;
>
> - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> - struct rw_semaphore apicv_update_lock;
> -
> bool apic_access_memslot_enabled;
> + bool apic_access_memslot_inhibited;

So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
As I see later in the patch when you free the memslot, you set it to false,
which means that if a vCPU is created after that (it can happen in theory),
the memslot will be created again :(

I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
enabled to false, and then on next vcpu run, that will free the memslot.

when enabled == false, the code needs to be changed to not allocate it again.

> +
> + /* Protects apicv_inhibit_reasons */
> + struct rw_semaphore apicv_update_lock;
> unsigned long apicv_inhibit_reasons;
>
> gpa_t wall_clock;
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 99994d2470a2..70f00eda75b2 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> }
> }
>
> - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
>
> + /*
> + * Mark the APIC memslot as inhibited if x2APIC is enabled and
> + * the x2APIC inhibit is required. The actual deletion of the
> + * memslot is handled by vcpu_run() as SRCU may or may not be
> + * held at this time, i.e. updating memslots isn't safe. Don't
> + * check apic_access_memslot_inhibited, this vCPU needs to
> + * ensure the memslot is deleted before re-entering the guest,
> + * i.e. needs to make the request even if the inhibit flag was
> + * already set by a different vCPU.
> + */
> + if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> + vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> + kvm_make_request(KVM_REQ_UNBLOCK, vcpu);

You are about to remove the KVM_REQ_UNBLOCK in other patch series.

How about just raising KVM_REQ_APICV_UPDATE on current vCPU
and having a special case in kvm_vcpu_update_apicv of

if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
drop srcu lock
free the memslot
take srcu lock
}

That wasn't possible to do with regular AVIC inhibit as I tried, because it has to be done
before any vCPU re-enters the guest, so the KVM_REQ_APICV_UPDATE has to be raised
on all vCPUs, and then a 'winning' vCPU would toggle the memslot, but that
would cause a lot of impossible to solve races, thus I abandoned that approach
back then.

Here the accuracy is not that critical so we can raise the request on current vCPU,
as you do in your patch anyway, so it should work.


> + }
> + }
> +
> if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
> kvm_vcpu_update_apicv(vcpu);
> static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
> @@ -2440,7 +2457,8 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
> int ret = 0;
>
> mutex_lock(&kvm->slots_lock);
> - if (kvm->arch.apic_access_memslot_enabled)
> + if (kvm->arch.apic_access_memslot_enabled ||
> + kvm->arch.apic_access_memslot_inhibited)
> goto out;
>
> hva = __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT,
> @@ -2468,6 +2486,18 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
> }
> EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
>
> +void kvm_free_apic_access_page(struct kvm *kvm)
> +{
> + mutex_lock(&kvm->slots_lock);
> +
> + if (kvm->arch.apic_access_memslot_enabled) {
> + __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> + kvm->arch.apic_access_memslot_enabled = false;
> + }
> +
> + mutex_unlock(&kvm->slots_lock);
> +}
> +
> void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
> {
> struct kvm_lapic *apic = vcpu->arch.apic;
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 6d06397683d0..e2271ffa7ac0 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -112,6 +112,7 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
> int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
> void kvm_apic_update_apicv(struct kvm_vcpu *vcpu);
> int kvm_alloc_apic_access_page(struct kvm *kvm);
> +void kvm_free_apic_access_page(struct kvm *kvm);
>
> bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src,
> struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index 0424a5e664bb..8f9426f21bbf 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -72,12 +72,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
>
> vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
>
> - /* Note:
> - * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> - * MSR accesses, while interrupt injection to a running vCPU
> - * can be achieved using AVIC doorbell. The AVIC hardware still
> - * accelerate MMIO accesses, but this does not cause any harm
> - * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> + /*
> + * Note: KVM supports hybrid-AVIC mode, where KVM emulates x2APIC MSR
> + * accesses, while interrupt injection to a running vCPU can be
> + * achieved using AVIC doorbell. KVM disables the APIC access page
> + * (deletes the memslot) if any vCPU has x2APIC enabled, thus enabling
> + * AVIC in hybrid mode activates only the doorbell mechanism.
> */
> if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
> avic_mode == AVIC_MODE_X2) {
> @@ -987,7 +987,8 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> BIT(APICV_INHIBIT_REASON_SEV) |
> BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
> - BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED);
> + BIT(APICV_INHIBIT_REASON_APIC_BASE_MODIFIED) |
> + BIT(APICV_INHIBIT_REASON_X2APIC);
>
> return supported & BIT(reason);
> }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d7374d768296..aa5ab0c620de 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10705,6 +10705,13 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
> break;
> }
>
> + if (vcpu->kvm->arch.apic_access_memslot_inhibited &&
> + vcpu->kvm->arch.apic_access_memslot_enabled) {
> + kvm_vcpu_srcu_read_unlock(vcpu);
> + kvm_free_apic_access_page(vcpu->kvm);
> + kvm_vcpu_srcu_read_lock(vcpu);
> + }
> +
> if (__xfer_to_guest_mode_work_pending()) {
> kvm_vcpu_srcu_read_unlock(vcpu);
> r = xfer_to_guest_mode_handle_work(vcpu);

I will review the rest of the patch series on next Wednesday (after
our holidays).
From a quick glance they look good to me.


Thanks,
Best regards,
Maxim Levitsky


2022-09-26 18:22:58

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 2c96c43c313a..6475c882b359 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> > * AVIC is disabled because SEV doesn't support it.
> > */
> > APICV_INHIBIT_REASON_SEV,
> > +
> > + /*
> > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > + * independent controls for AVIC vs. x2AVIC, and also because SVM
> > + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
> > + * AVIC can still be activated, but KVM must not create SPTEs for the
> > + * APIC base. For simplicity, this is sticky.
> > + */
> > + APICV_INHIBIT_REASON_X2APIC,
>
> Hi Sean!
>
> So assuming that I won't object to making it SVM specific (I still think
> that VMX should also inhibit this memslot because this is closer to x86 spec,
> but if you really want it this way, I won't fight over it):

Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
reason to change KVM's behavior and risk silently causing a performance regression.
If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
this were the initial APICv implementation and thus no possible users, then I would
probably also vote to give APICv the same treatment.

> I somewhat don't like this inhibit, because now it is used just to say
> 'I am AVIC'.
>
> What do you think if you just move the code that removes the memslot to SVM,
> to avic_set_virtual_apic_mode?

Suffers the same SRCU issue (see below) :-/

Given the SRCU problem, I'd prefer to keep the management of the memslot in common
code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
for dealing with the SRCU issue, i.e. handling this in AVIC code would require
another hook on top of spreading the memslot management across x86 and SVM code.

> > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> > struct kvm_apic_map __rcu *apic_map;
> > atomic_t apic_map_dirty;
> >
> > - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > - struct rw_semaphore apicv_update_lock;
> > -
> > bool apic_access_memslot_enabled;
> > + bool apic_access_memslot_inhibited;
>
> So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> As I see later in the patch when you free the memslot, you set it to false,
> which means that if a vCPU is created after that (it can happen in theory),
> the memslot will be created again :(
>
> I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> enabled to false, and then on next vcpu run, that will free the memslot.
>
> when enabled == false, the code needs to be changed to not allocate it again.

This should be handled already. apic_access_memslot_enabled is toggled from
true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
flag is protected by slots_lock. Thus, newly created vCPUs are guaranteed to
either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.

int kvm_alloc_apic_access_page(struct kvm *kvm)
{
struct page *page;
void __user *hva;
int ret = 0;

mutex_lock(&kvm->slots_lock);
if (kvm->arch.apic_access_memslot_enabled ||
kvm->arch.apic_access_memslot_inhibited) <=== prevents reallocation
goto out;

out:
mutex_unlock(&kvm->slots_lock);
return ret;
}

That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.

void kvm_free_apic_access_page(struct kvm *kvm)
{
WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);

mutex_lock(&kvm->slots_lock);

if (kvm->arch.apic_access_memslot_enabled) {
__x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
kvm->arch.apic_access_memslot_enabled = false;
}

mutex_unlock(&kvm->slots_lock);
}

> > +
> > + /* Protects apicv_inhibit_reasons */
> > + struct rw_semaphore apicv_update_lock;
> > unsigned long apicv_inhibit_reasons;
> >
> > gpa_t wall_clock;
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 99994d2470a2..70f00eda75b2 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > }
> > }
> >
> > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> >
> > + /*
> > + * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > + * the x2APIC inhibit is required. The actual deletion of the
> > + * memslot is handled by vcpu_run() as SRCU may or may not be
> > + * held at this time, i.e. updating memslots isn't safe. Don't
> > + * check apic_access_memslot_inhibited, this vCPU needs to
> > + * ensure the memslot is deleted before re-entering the guest,
> > + * i.e. needs to make the request even if the inhibit flag was
> > + * already set by a different vCPU.
> > + */
> > + if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > + vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > + kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
>
> You are about to remove the KVM_REQ_UNBLOCK in other patch series.

No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
has a rather weird name, e.g. KVM_REQ_WORK would probably be better.

> How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> and having a special case in kvm_vcpu_update_apicv of
>
> if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> drop srcu lock

This was my initial thought as well, but the issue is that SRCU may or may not be
held, and so the unlock+lock would need to be conditional. That's technically a
solvable problem, as it's possible to detect if SRCU is held, but I really don't
want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
screw up SRCU.

> free the memslot
> take srcu lock
> }

2022-09-28 06:40:35

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> On Fri, Sep 23, 2022, Maxim Levitsky wrote:
> > On Tue, 2022-09-20 at 23:31 +0000, Sean Christopherson wrote:
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 2c96c43c313a..6475c882b359 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1132,6 +1132,17 @@ enum kvm_apicv_inhibit {
> > > * AVIC is disabled because SEV doesn't support it.
> > > */
> > > APICV_INHIBIT_REASON_SEV,
> > > +
> > > + /*
> > > + * Due to sharing page tables across vCPUs, the xAPIC memslot must be
> > > + * deleted if any vCPU has x2APIC enabled as SVM doesn't provide fully
> > > + * independent controls for AVIC vs. x2AVIC, and also because SVM
> > > + * supports a "hybrid" AVIC mode for CPUs that support AVIC but not
> > > + * x2AVIC. Note, this isn't a "full" inhibit and is tracked separately.
> > > + * AVIC can still be activated, but KVM must not create SPTEs for the
> > > + * APIC base. For simplicity, this is sticky.
> > > + */
> > > + APICV_INHIBIT_REASON_X2APIC,
> >
> > Hi Sean!
> >
> > So assuming that I won't object to making it SVM specific (I still think
> > that VMX should also inhibit this memslot because this is closer to x86 spec,
> > but if you really want it this way, I won't fight over it):
>
> Heh, I don't necessarily "want" it this way, it's more that I don't see a compelling
> reason to change KVM's behavior and risk silently causing a performance regression.
> If KVM didn't already have the "APIC base may have RAM semantics" quirk, and/or if
> this were the initial APICv implementation and thus no possible users, then I would
> probably also vote to give APICv the same treatment.
>
> > I somewhat don't like this inhibit, because now it is used just to say
> > 'I am AVIC'.
> >
> > What do you think if you just move the code that removes the memslot to SVM,
> > to avic_set_virtual_apic_mode?
>
> Suffers the same SRCU issue (see below) :-/
>
> Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
> for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> another hook on top of spreading the memslot management across x86 and SVM code.

OK, I am not going to argue about this. But what about at least not using an inhibit
bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
or rather something like vcpu->arch.apicv_type enum?


Or we can make SVM code just call a common function - just put these in a function and call it
from avic_set_virtual_apic_mode?


void kvm_disable_apicv_memslot(struct kvm_vcpu *vcpu)
{
if (!vcpu->kvm->arch.apic_access_memslot_inhibited) {
vcpu->kvm->arch.apic_access_memslot_inhibited = true;
kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
}
}

>
> > > @@ -1169,10 +1180,11 @@ struct kvm_arch {
> > > struct kvm_apic_map __rcu *apic_map;
> > > atomic_t apic_map_dirty;
> > >
> > > - /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */
> > > - struct rw_semaphore apicv_update_lock;
> > > -
> > > bool apic_access_memslot_enabled;
> > > + bool apic_access_memslot_inhibited;
> >
> > So the apic_access_memslot_enabled currently tracks if the memslot is enabled.
> > As I see later in the patch when you free the memslot, you set it to false,
> > which means that if a vCPU is created after that (it can happen in theory),
> > the memslot will be created again :(
> >
> > I say we need 'enabled', and 'allocated' booleans instead. Inhibit will set
> > enabled to false, and then on next vcpu run, that will free the memslot.
> >
> > when enabled == false, the code needs to be changed to not allocate it again.
>
> This should be handled already. apic_access_memslot_enabled is toggled from
> true=>false if and only if apic_access_memslot_inhibited is set, and the "enabled"
> flag is protected by slots_lock. Thus, newly created vCPUs are guaranteed to
> either see apic_access_memslot_enabled==true or apic_access_memslot_inhibited==true.
>
> int kvm_alloc_apic_access_page(struct kvm *kvm)
> {
> struct page *page;
> void __user *hva;
> int ret = 0;
>
> mutex_lock(&kvm->slots_lock);
> if (kvm->arch.apic_access_memslot_enabled ||
> kvm->arch.apic_access_memslot_inhibited) <=== prevents reallocation
> goto out;
>
> out:
> mutex_unlock(&kvm->slots_lock);
> return ret;
> }

Ah, you added this in previous patch which I didn't see, makes sense.

>
> That could be made more obvious by adding a WARN in kvm_free_apic_access_page(), i.e.
Yep, a WARN_ON_ONCE unless in hot path, is almost always a good idea, so lets add it.
>
> void kvm_free_apic_access_page(struct kvm *kvm)
> {
> WARN_ON_ONCE(!kvm->arch.apic_access_memslot_inhibited);
>
> mutex_lock(&kvm->slots_lock);
>
> if (kvm->arch.apic_access_memslot_enabled) {
> __x86_set_memory_region(kvm, APIC_ACCESS_PAGE_PRIVATE_MEMSLOT, 0, 0);
> kvm->arch.apic_access_memslot_enabled = false;
> }
>
> mutex_unlock(&kvm->slots_lock);
> }
>
> > > +
> > > + /* Protects apicv_inhibit_reasons */
> > > + struct rw_semaphore apicv_update_lock;
> > > unsigned long apicv_inhibit_reasons;
> > >
> > > gpa_t wall_clock;
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 99994d2470a2..70f00eda75b2 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2394,9 +2394,26 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
> > > }
> > > }
> > >
> > > - if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
> > > + if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE)) {
> > > kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);
> > >
> > > + /*
> > > + * Mark the APIC memslot as inhibited if x2APIC is enabled and
> > > + * the x2APIC inhibit is required. The actual deletion of the
> > > + * memslot is handled by vcpu_run() as SRCU may or may not be
> > > + * held at this time, i.e. updating memslots isn't safe. Don't
> > > + * check apic_access_memslot_inhibited, this vCPU needs to
> > > + * ensure the memslot is deleted before re-entering the guest,
> > > + * i.e. needs to make the request even if the inhibit flag was
> > > + * already set by a different vCPU.
> > > + */
> > > + if (vcpu->kvm->arch.apic_access_memslot_enabled &&
> > > + static_call(kvm_x86_check_apicv_inhibit_reasons)(APICV_INHIBIT_REASON_X2APIC)) {
> > > + vcpu->kvm->arch.apic_access_memslot_inhibited = true;
> > > + kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
> >
> > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
>
> No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
> has a rather weird name, e.g. KVM_REQ_WORK would probably be better.

Roger that!
And I guess lets rename it while we are at it.

>
> > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > and having a special case in kvm_vcpu_update_apicv of
> >
> > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > drop srcu lock
>
> This was my initial thought as well, but the issue is that SRCU may or may not be
> held, and so the unlock+lock would need to be conditional. That's technically a
> solvable problem, as it's possible to detect if SRCU is held, but I really don't
> want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> screw up SRCU.

Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
which drops the srcu lock few lines afterwards, and therefore the
kvm_vcpu_update_apicv is always called with the lock held and it means that it
can drop it for the duration of slot update.

The original issue we had was that we tried to drop the srcu lock in
'kvm_set_apicv_inhibit' which indeed is called from various places,
with, or without the lock held.

Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve that,
but it was not possible because kvm_vcpu_update_apicv is called simultaneously on all vCPUs,
and created various races, including toggling the memslot twice.


So if possible please take another look at using KVM_REQ_APICV_UPDATE instead of KVM_REQ_UNBLOCK.

Best regards,
Maxim Levitsky

>
> > free the memslot
> > take srcu lock
> > }


2022-09-28 16:55:07

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> > Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> > code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
> > for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> > another hook on top of spreading the memslot management across x86 and SVM code.
>
> OK, I am not going to argue about this. But what about at least not using an inhibit
> bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
> or rather something like vcpu->arch.apicv_type enum?
>
> Or we can make SVM code just call a common function - just put these in a
> function and call it from avic_set_virtual_apic_mode?

The issue is that kvm_vcpu_update_apicv() is called from kvm_lapic_set_base(),
which is the one that may or may not hold SRCU.

> > > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
> >
> > No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
> > has a rather weird name, e.g. KVM_REQ_WORK would probably be better.
>
> Roger that!
> And I guess lets rename it while we are at it.

I'll prep a patch.

> > > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > > and having a special case in kvm_vcpu_update_apicv of
> > >
> > > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > > drop srcu lock
> >
> > This was my initial thought as well, but the issue is that SRCU may or may not be
> > held, and so the unlock+lock would need to be conditional. That's technically a
> > solvable problem, as it's possible to detect if SRCU is held, but I really don't
> > want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> > screw up SRCU.
>
> Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
> which drops the srcu lock few lines afterwards, and therefore the
> kvm_vcpu_update_apicv is always called with the lock held and it means that it
> can drop it for the duration of slot update.
>
> The original issue we had was that we tried to drop the srcu lock in
> 'kvm_set_apicv_inhibit' which indeed is called from various places,
> with, or without the lock held.
>
> Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve
> that, but it was not possible because kvm_vcpu_update_apicv is called
> simultaneously on all vCPUs, and created various races, including toggling
> the memslot twice.

As above, kvm_vcpu_update_apicv() can be called without SRCU held. Oh, but that
was a recent addition, commit 8fc9c7a3079e ("KVM: x86: Deactivate APICv on vCPU
with APIC disabled"). I was wary of using KVM_REQ_APICV_UPDATE in kvm_lapic_set_base(),
e.g. in case there was some dependency on updating _immediately, but since that's
such a new addition I have no objection to switching to the request.

Similarly, is there a good reason for having nested_svm_vmexit() invoke
kvm_vcpu_update_apicv() directly? I'm confused by the "so that other vCPUs can
start to benefit from it right away". The nested inhibit is per-vCPU and so
should only affect the current vCPU, no? I.e. for all intents and purposes, using
a request should be functionally equivalent.

/*
* Un-inhibit the AVIC right away, so that other vCPUs can start
* to benefit from it right away.
*/
if (kvm_apicv_activated(vcpu->kvm))
kvm_vcpu_update_apicv(vcpu);

2022-09-28 18:16:01

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Wed, 2022-09-28 at 16:33 +0000, Sean Christopherson wrote:
> On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> > On Mon, 2022-09-26 at 17:00 +0000, Sean Christopherson wrote:
> > > Given the SRCU problem, I'd prefer to keep the management of the memslot in common
> > > code, even though I agree it's a bit silly. And KVM_REQ_UNBLOCK is a perfect fit
> > > for dealing with the SRCU issue, i.e. handling this in AVIC code would require
> > > another hook on top of spreading the memslot management across x86 and SVM code.
> >
> > OK, I am not going to argue about this. But what about at least not using an inhibit
> > bit for that but something else like a boolean, or maybe really add 'I am AVIC bit'
> > or rather something like vcpu->arch.apicv_type enum?
> >
> > Or we can make SVM code just call a common function - just put these in a
> > function and call it from avic_set_virtual_apic_mode?
>
> The issue is that kvm_vcpu_update_apicv() is called from kvm_lapic_set_base(),
> which is the one that may or may not hold SRCU.

Makes sense now.

>
> > > > You are about to remove the KVM_REQ_UNBLOCK in other patch series.
> > >
> > > No, KVM_REQ_UNHALT is being removed. KVM_REQ_UNBLOCK needs to stay, although it
> > > has a rather weird name, e.g. KVM_REQ_WORK would probably be better.
> >
> > Roger that!
> > And I guess lets rename it while we are at it.
>
> I'll prep a patch.
>
> > > > How about just raising KVM_REQ_APICV_UPDATE on current vCPU
> > > > and having a special case in kvm_vcpu_update_apicv of
> > > >
> > > > if (apic_access_memslot_enabled == false && apic_access_memslot_allocaed == true) {
> > > > drop srcu lock
> > >
> > > This was my initial thought as well, but the issue is that SRCU may or may not be
> > > held, and so the unlock+lock would need to be conditional. That's technically a
> > > solvable problem, as it's possible to detect if SRCU is held, but I really don't
> > > want to rely on kvm_vcpu.srcu_depth for anything other than proving that KVM doesn't
> > > screw up SRCU.
> >
> > Why though? the KVM_REQ_APICV_UPDATE is only handled AFAIK in vcpu_enter_guest
> > which drops the srcu lock few lines afterwards, and therefore the
> > kvm_vcpu_update_apicv is always called with the lock held and it means that it
> > can drop it for the duration of slot update.
> >
> > The original issue we had was that we tried to drop the srcu lock in
> > 'kvm_set_apicv_inhibit' which indeed is called from various places,
> > with, or without the lock held.
> >
> > Moving the memslot disable code to kvm_vcpu_update_apicv would actually solve
> > that, but it was not possible because kvm_vcpu_update_apicv is called
> > simultaneously on all vCPUs, and created various races, including toggling
> > the memslot twice.
>
> As above, kvm_vcpu_update_apicv() can be called without SRCU held. Oh, but that
> was a recent addition, commit 8fc9c7a3079e ("KVM: x86: Deactivate APICv on vCPU
> with APIC disabled"). I was wary of using KVM_REQ_APICV_UPDATE in kvm_lapic_set_base(),
> e.g. in case there was some dependency on updating _immediately, but since that's
> such a new addition I have no objection to switching to the request.
>
> Similarly, is there a good reason for having nested_svm_vmexit() invoke
> kvm_vcpu_update_apicv() directly? I'm confused by the "so that other vCPUs can
> start to benefit from it right away". The nested inhibit is per-vCPU and so
> should only affect the current vCPU, no? I.e. for all intents and purposes, using
> a request should be functionally equivalent.

It is kind of the other way around:

The mere fact of switching to vmcb02 *inhibits* the AVIC on the current vCPU,
but the AVIC inhibit is there only to set the is_running bits in the physid table
and in IOMMU to prevent its *peers* to try and send interrupts to it via AVIC.

It is the reason why APICv doesn't need it - the posted interrupts still work
just fine when a vCPU doens't use APICv, or uses a different posted interrupt vector
when it uses the nested APICv.

So it makes sense to remove that inhibit as soon as possible that the peers
could stop getting 'unaccellerated IPI' vmexits for nothing.


However back to the discussion, I don't think this is a problem.

We can just call both the kvm_vcpu_update_apicv() and a new function that
does the memslot disable from KVM_REQ_APICV_UPDATE, then
plain kvm_vcpu_update_apicv() won't need to drop the srcu lock.

It is pretty much the same that you proposed, just instead of piggybacking on
KVM_REQ_UNBLOCK, I proposed to piggyback on KVM_REQ_APICV_UPDATE.


Best regards,
Maxim Levitsky


>
> /*
> * Un-inhibit the AVIC right away, so that other vCPUs can start
> * to benefit from it right away.
> */
> if (kvm_apicv_activated(vcpu->kvm))
> kvm_vcpu_update_apicv(vcpu);
>


2022-09-28 22:59:17

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 07/28] KVM: x86: Inhibit APIC memslot if x2APIC and AVIC are enabled

On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Wed, 2022-09-28 at 16:33 +0000, Sean Christopherson wrote:
> > Similarly, is there a good reason for having nested_svm_vmexit() invoke
> > kvm_vcpu_update_apicv() directly? I'm confused by the "so that other vCPUs can
> > start to benefit from it right away". The nested inhibit is per-vCPU and so
> > should only affect the current vCPU, no? I.e. for all intents and purposes, using
> > a request should be functionally equivalent.
>
> It is kind of the other way around:
>
> The mere fact of switching to vmcb02 *inhibits* the AVIC on the current vCPU,
> but the AVIC inhibit is there only to set the is_running bits in the physid table
> and in IOMMU to prevent its *peers* to try and send interrupts to it via AVIC.
>
> It is the reason why APICv doesn't need it - the posted interrupts still work
> just fine when a vCPU doens't use APICv, or uses a different posted interrupt vector
> when it uses the nested APICv.

Gotcha, the "other vCPUs" part is where I got confused.

> So it makes sense to remove that inhibit as soon as possible that the peers
> could stop getting 'unaccellerated IPI' vmexits for nothing.

But practically speaking, the delay between the nested VM-Exit and servicing the
request is minimal. Might be a moot point if nested AVIC is supported, i.e. an
inline update may be "required" at that point.

Not a sticking point by any means, but if possible, it would be nice to have a
single call site for the per-vCPU APICv update.

> However back to the discussion, I don't think this is a problem.
>
> We can just call both the kvm_vcpu_update_apicv() and a new function that
> does the memslot disable from KVM_REQ_APICV_UPDATE, then
> plain kvm_vcpu_update_apicv() won't need to drop the srcu lock.
>
> It is pretty much the same that you proposed, just instead of piggybacking on
> KVM_REQ_UNBLOCK, I proposed to piggyback on KVM_REQ_APICV_UPDATE.

Yep, easy to do after converting the x2APIC toggling to use a request.

Thanks!