2022-09-20 23:42:37

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
it against the xAPIC ID to avoid false positives (sort of) on systems
with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
it's original value,

The mismatch isn't technically a false positive, as architecturally the
xAPIC IDs do end up being aliased in this scenario, and neither APICv
nor AVIC correctly handles IPI virtualization when there is aliasing.
However, KVM already deliberately does not honor the aliasing behavior
that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
resulting APICv/AVIC behavior is aligned with KVM's existing behavior
when KVM's x2APIC hotplug hack is effectively enabled.

If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
piggyback whatever logic disables the optimized APIC map (which is what
provides the hotplug hack), i.e. so that KVM's optimized map and APIC
virtualization yield the same behavior.

For now, fix the immediate problem of APIC virtualization being disabled
for large VMs, which is a much more pressing issue than ensuring KVM
honors architectural behavior for APIC ID aliasing.

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Reported-by: Suravee Suthikulpanit <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/lapic.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index adac6ca9b7dc..a02defa3f7b5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2075,7 +2075,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
return;

- if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
+ /*
+ * Deliberately truncate the vCPU ID when detecting a modified APIC ID
+ * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
+ * value.
+ */
+ if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
return;

kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);
--
2.37.3.968.ga6b4b080e4-goog


2022-09-28 03:22:12

by Alejandro Jimenez

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID



On 9/20/2022 7:31 PM, Sean Christopherson wrote:
> Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
> it against the xAPIC ID to avoid false positives (sort of) on systems
> with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
> APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
> it's original value,
>
> The mismatch isn't technically a false positive, as architecturally the
> xAPIC IDs do end up being aliased in this scenario, and neither APICv
> nor AVIC correctly handles IPI virtualization when there is aliasing.
> However, KVM already deliberately does not honor the aliasing behavior
> that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
> resulting APICv/AVIC behavior is aligned with KVM's existing behavior
> when KVM's x2APIC hotplug hack is effectively enabled.
>
> If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
> piggyback whatever logic disables the optimized APIC map (which is what
> provides the hotplug hack), i.e. so that KVM's optimized map and APIC
> virtualization yield the same behavior.
>
> For now, fix the immediate problem of APIC virtualization being disabled
> for large VMs, which is a much more pressing issue than ensuring KVM
> honors architectural behavior for APIC ID aliasing.

I built a host kernel with this entire series on top of mainline
v6.0-rc6, and booting a guest with AVIC enabled works as expected on the
initial boot. The issue is that during the first reboot AVIC is
inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see
constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in
the traces:

qemu-system-x86-10147 [222] ..... 1116.519052:
kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
qemu-system-x86-10147 [222] ..... 1116.519063:
kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20
qemu-system-x86-10147 [222] ..... 1117.934222:
kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
qemu-system-x86-10147 [222] ..... 1117.934233:
kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20

It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
512 vCPUs). This state persists for all subsequent reboots, until the VM
is terminated. For vCPU counts < 256, when x2apic is disabled the
problem does not occur, and AVIC continues to work properly after reboots.

I did not see this issue when testing a similar host kernel that did not
include this current patchset, but instead applied the earlier:
https://lore.kernel.org/lkml/[email protected]/
which inspired this [05/23] patch and the follow up [22/28] in this series.

I am using QEMU built from v7.1.0 upstream tag, plus the patch at:
https://lore.kernel.org/qemu-devel/[email protected]/

Please feel free to request any other data points that might be relevant
and I'll try to collect them.

Alejandro
>
> Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
> Reported-by: Suravee Suthikulpanit <[email protected]>
> Cc: Maxim Levitsky <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/lapic.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index adac6ca9b7dc..a02defa3f7b5 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2075,7 +2075,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
> if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
> return;
>
> - if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
> + /*
> + * Deliberately truncate the vCPU ID when detecting a modified APIC ID
> + * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
> + * value.
> + */
> + if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
> return;
>
> kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);

2022-09-28 06:08:59

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Tue, 2022-09-27 at 23:15 -0400, Alejandro Jimenez wrote:
>
> On 9/20/2022 7:31 PM, Sean Christopherson wrote:
> > Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
> > it against the xAPIC ID to avoid false positives (sort of) on systems
> > with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
> > APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
> > it's original value,
> >
> > The mismatch isn't technically a false positive, as architecturally the
> > xAPIC IDs do end up being aliased in this scenario, and neither APICv
> > nor AVIC correctly handles IPI virtualization when there is aliasing.
> > However, KVM already deliberately does not honor the aliasing behavior
> > that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
> > resulting APICv/AVIC behavior is aligned with KVM's existing behavior
> > when KVM's x2APIC hotplug hack is effectively enabled.
> >
> > If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
> > piggyback whatever logic disables the optimized APIC map (which is what
> > provides the hotplug hack), i.e. so that KVM's optimized map and APIC
> > virtualization yield the same behavior.
> >
> > For now, fix the immediate problem of APIC virtualization being disabled
> > for large VMs, which is a much more pressing issue than ensuring KVM
> > honors architectural behavior for APIC ID aliasing.
>
> I built a host kernel with this entire series on top of mainline
> v6.0-rc6, and booting a guest with AVIC enabled works as expected on the
> initial boot. The issue is that during the first reboot AVIC is
> inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see
> constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in


APICV_INHIBIT_REASON_IRQWIN is OK, because that happens about every time
the good old PIT timer fires which happens on reboot.

APICV_INHIBIT_REASON_APIC_ID_MODIFIED should not happen as you noted,
this needs investigation.

Best regards,
Maxim Levitsky

> the traces:
>
> qemu-system-x86-10147 [222] ..... 1116.519052:
> kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
> qemu-system-x86-10147 [222] ..... 1116.519063:
> kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20
> qemu-system-x86-10147 [222] ..... 1117.934222:
> kvm_apicv_inhibit_changed: set reason=8, inhibits=0x120
> qemu-system-x86-10147 [222] ..... 1117.934233:
> kvm_apicv_inhibit_changed: cleared reason=8, inhibits=0x20
>
> It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> 512 vCPUs). This state persists for all subsequent reboots, until the VM
> is terminated. For vCPU counts < 256, when x2apic is disabled the
> problem does not occur, and AVIC continues to work properly after reboots.
>
> I did not see this issue when testing a similar host kernel that did not
> include this current patchset, but instead applied the earlier:
> https://lore.kernel.org/lkml/[email protected]/
> which inspired this [05/23] patch and the follow up [22/28] in this series.
>
> I am using QEMU built from v7.1.0 upstream tag, plus the patch at:
> https://lore.kernel.org/qemu-devel/[email protected]/
>
> Please feel free to request any other data points that might be relevant
> and I'll try to collect them.
>
> Alejandro
> > Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
> > Reported-by: Suravee Suthikulpanit <[email protected]>
> > Cc: Maxim Levitsky <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/kvm/lapic.c | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index adac6ca9b7dc..a02defa3f7b5 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2075,7 +2075,12 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
> > if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
> > return;
> >
> > - if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
> > + /*
> > + * Deliberately truncate the vCPU ID when detecting a modified APIC ID
> > + * to avoid false positives if the vCPU ID, i.e. x2APIC ID, is a 32-bit
> > + * value.
> > + */
> > + if (kvm_xapic_id(apic) == (u8)apic->vcpu->vcpu_id)
> > return;
> >
> > kvm_set_apicv_inhibit(apic->vcpu->kvm, APICV_INHIBIT_REASON_APIC_ID_MODIFIED);


2022-09-28 17:29:05

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Tue, 2022-09-27 at 23:15 -0400, Alejandro Jimenez wrote:
> >
> > On 9/20/2022 7:31 PM, Sean Christopherson wrote:
> > > Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
> > > it against the xAPIC ID to avoid false positives (sort of) on systems
> > > with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
> > > APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
> > > it's original value,
> > >
> > > The mismatch isn't technically a false positive, as architecturally the
> > > xAPIC IDs do end up being aliased in this scenario, and neither APICv
> > > nor AVIC correctly handles IPI virtualization when there is aliasing.
> > > However, KVM already deliberately does not honor the aliasing behavior
> > > that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
> > > resulting APICv/AVIC behavior is aligned with KVM's existing behavior
> > > when KVM's x2APIC hotplug hack is effectively enabled.
> > >
> > > If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
> > > piggyback whatever logic disables the optimized APIC map (which is what
> > > provides the hotplug hack), i.e. so that KVM's optimized map and APIC
> > > virtualization yield the same behavior.
> > >
> > > For now, fix the immediate problem of APIC virtualization being disabled
> > > for large VMs, which is a much more pressing issue than ensuring KVM
> > > honors architectural behavior for APIC ID aliasing.
> >
> > I built a host kernel with this entire series on top of mainline
> > v6.0-rc6, and booting a guest with AVIC enabled works as expected on the
> > initial boot. The issue is that during the first reboot AVIC is
> > inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see
> > constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in
>
>
> APICV_INHIBIT_REASON_IRQWIN is OK, because that happens about every time
> the good old PIT timer fires which happens on reboot.
>
> APICV_INHIBIT_REASON_APIC_ID_MODIFIED should not happen as you noted,
> this needs investigation.

Ya, I'll take a look.

> > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> > 512 vCPUs). This state persists for all subsequent reboots, until the VM
> > is terminated. For vCPU counts < 256, when x2apic is disabled the
> > problem does not occur, and AVIC continues to work properly after reboots.

Bit of a shot in the dark, but does the below fix the issue? There are two
issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():

1. The xAPIC ID should only be refreshed on "set".

2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));

and a third bug in the helper itself, as changes to the ID should be ignored if
the APIC is hardward disabled since the ID is reset to the vcpu_id when the APIC
is hardware enabled (architecturally behavior).

---
arch/x86/kvm/lapic.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 804d529d9bfb..b8b2faf5abc7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2159,6 +2159,9 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
{
struct kvm *kvm = apic->vcpu->kvm;

+ if (!kvm_apic_hw_enabled(apic))
+ return;
+
if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
return;

@@ -2875,8 +2878,6 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
__kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
}
- } else {
- kvm_lapic_xapic_id_updated(vcpu->arch.apic);
}

return 0;
@@ -2912,6 +2913,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
}
memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));

+ if (!apic_x2apic_mode(vcpu->arch.apic))
+ kvm_lapic_xapic_id_updated(vcpu->arch.apic);
+
atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
kvm_recalculate_apic_map(vcpu->kvm);
kvm_apic_set_version(vcpu);

base-commit: 0b502152c0b8523f399bdb53096e2d620c5795b5
--

2022-09-28 18:27:50

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Wed, 2022-09-28 at 18:03 +0000, Sean Christopherson wrote:
> On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-09-28 at 16:51 +0000, Sean Christopherson wrote:
> > > > > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> > > > > 512 vCPUs). This state persists for all subsequent reboots, until the VM
> > > > > is terminated. For vCPU counts < 256, when x2apic is disabled the
> > > > > problem does not occur, and AVIC continues to work properly after reboots.
> > >
> > > Bit of a shot in the dark, but does the below fix the issue? There are two
> > > issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():
> > >
> > > 1. The xAPIC ID should only be refreshed on "set".
> > True - I didn't bother to fix it yet because it shouldn't cause harm, but
> > sure this needs to be fixed.
>
> It's probably benign on its own, but with the missing "hardware enabled" check,
> it could be problematic if userspace does KVM_GET_LAPIC while the APIC is hardware
> disabled, after the APIC was previously in x2APIC mode. I'm guessing QEMU does
> KVM_GET_LAPIC state when emulating reboot, hence the potential for being involved
> in the bug Alejandro is seeing.
>
> > > 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
> > Are you sure? The check is first because if it fails, then error is returned to userspace
> > and the KVM's state left unchanged.
> >
> > I assume you are talking about
> >
> > ....
> > r = kvm_apic_state_fixup(vcpu, s, true);
> > if (r) {
> > kvm_recalculate_apic_map(vcpu->kvm);
> > return r;
> > }
> > memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
>
> This isn't a failure path though, it's purely a "take note of the update", and
> KVM needs to do that processing _after_ the actual update. Specifically,
> kvm_lapic_xapic_id_updated() consumes the internal APIC state:

Yes, I somehow blindly assumed that kvm_apic_state_fixup actually checks
the new state and not the existing state.

Probably because my original code did that, I think it just checked the 'id'
variable.. Oh well.
Thanks for catching this bug!

Best regards,
Maxim Levitsky

>
> if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
> return;
>
> Calling that before the internal state has been set with the incoming state from
> userspace is simply wrong.
>
> The check that the x2APIC ID is "correct" stays where it is, this is purely the
> "is the xAPIC ID different" path.
>


2022-09-28 18:44:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Wed, 2022-09-28 at 16:51 +0000, Sean Christopherson wrote:
> On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> > On Tue, 2022-09-27 at 23:15 -0400, Alejandro Jimenez wrote:
> > > On 9/20/2022 7:31 PM, Sean Christopherson wrote:
> > > > Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
> > > > it against the xAPIC ID to avoid false positives (sort of) on systems
> > > > with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
> > > > APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
> > > > it's original value,
> > > >
> > > > The mismatch isn't technically a false positive, as architecturally the
> > > > xAPIC IDs do end up being aliased in this scenario, and neither APICv
> > > > nor AVIC correctly handles IPI virtualization when there is aliasing.
> > > > However, KVM already deliberately does not honor the aliasing behavior
> > > > that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
> > > > resulting APICv/AVIC behavior is aligned with KVM's existing behavior
> > > > when KVM's x2APIC hotplug hack is effectively enabled.
> > > >
> > > > If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
> > > > piggyback whatever logic disables the optimized APIC map (which is what
> > > > provides the hotplug hack), i.e. so that KVM's optimized map and APIC
> > > > virtualization yield the same behavior.
> > > >
> > > > For now, fix the immediate problem of APIC virtualization being disabled
> > > > for large VMs, which is a much more pressing issue than ensuring KVM
> > > > honors architectural behavior for APIC ID aliasing.
> > >
> > > I built a host kernel with this entire series on top of mainline
> > > v6.0-rc6, and booting a guest with AVIC enabled works as expected on the
> > > initial boot. The issue is that during the first reboot AVIC is
> > > inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see
> > > constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in
> >
> > APICV_INHIBIT_REASON_IRQWIN is OK, because that happens about every time
> > the good old PIT timer fires which happens on reboot.
> >
> > APICV_INHIBIT_REASON_APIC_ID_MODIFIED should not happen as you noted,
> > this needs investigation.
>
> Ya, I'll take a look.
>
> > > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> > > 512 vCPUs). This state persists for all subsequent reboots, until the VM
> > > is terminated. For vCPU counts < 256, when x2apic is disabled the
> > > problem does not occur, and AVIC continues to work properly after reboots.
>
> Bit of a shot in the dark, but does the below fix the issue? There are two
> issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():
>
> 1. The xAPIC ID should only be refreshed on "set".
True - I didn't bother to fix it yet because it shouldn't cause harm, but
sure this needs to be fixed.

>
> 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
Are you sure? The check is first because if it fails, then error is returned to userspace
and the KVM's state left unchanged.

I assume you are talking about

....
r = kvm_apic_state_fixup(vcpu, s, true);
if (r) {
kvm_recalculate_apic_map(vcpu->kvm);
return r;
}
memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
....


>
> and a third bug in the helper itself, as changes to the ID should be ignored if
> the APIC is hardward disabled since the ID is reset to the vcpu_id when the APIC
> is hardware enabled (architecturally behavior).

That is true, and something I haven't thought about.

Best regards,
Maxim Levitsky



>
> ---
> arch/x86/kvm/lapic.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 804d529d9bfb..b8b2faf5abc7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2159,6 +2159,9 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
> {
> struct kvm *kvm = apic->vcpu->kvm;
>
> + if (!kvm_apic_hw_enabled(apic))
> + return;
> +
> if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
> return;
>
> @@ -2875,8 +2878,6 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
> __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
> }
> - } else {
> - kvm_lapic_xapic_id_updated(vcpu->arch.apic);
> }
>
> return 0;
> @@ -2912,6 +2913,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> }
> memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
>
> + if (!apic_x2apic_mode(vcpu->arch.apic))
> + kvm_lapic_xapic_id_updated(vcpu->arch.apic);
> +
> atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
> kvm_recalculate_apic_map(vcpu->kvm);
> kvm_apic_set_version(vcpu);
>
> base-commit: 0b502152c0b8523f399bdb53096e2d620c5795b5


2022-09-28 18:45:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Wed, 2022-09-28 at 16:51 +0000, Sean Christopherson wrote:
> > > > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> > > > 512 vCPUs). This state persists for all subsequent reboots, until the VM
> > > > is terminated. For vCPU counts < 256, when x2apic is disabled the
> > > > problem does not occur, and AVIC continues to work properly after reboots.
> >
> > Bit of a shot in the dark, but does the below fix the issue? There are two
> > issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():
> >
> > 1. The xAPIC ID should only be refreshed on "set".
> True - I didn't bother to fix it yet because it shouldn't cause harm, but
> sure this needs to be fixed.

It's probably benign on its own, but with the missing "hardware enabled" check,
it could be problematic if userspace does KVM_GET_LAPIC while the APIC is hardware
disabled, after the APIC was previously in x2APIC mode. I'm guessing QEMU does
KVM_GET_LAPIC state when emulating reboot, hence the potential for being involved
in the bug Alejandro is seeing.

> > 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
> Are you sure? The check is first because if it fails, then error is returned to userspace
> and the KVM's state left unchanged.
>
> I assume you are talking about
>
> ....
> r = kvm_apic_state_fixup(vcpu, s, true);
> if (r) {
> kvm_recalculate_apic_map(vcpu->kvm);
> return r;
> }
> memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));

This isn't a failure path though, it's purely a "take note of the update", and
KVM needs to do that processing _after_ the actual update. Specifically,
kvm_lapic_xapic_id_updated() consumes the internal APIC state:

if (kvm_xapic_id(apic) == apic->vcpu->vcpu_id)
return;

Calling that before the internal state has been set with the incoming state from
userspace is simply wrong.

The check that the x2APIC ID is "correct" stays where it is, this is purely the
"is the xAPIC ID different" path.

2022-09-28 20:53:12

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID

On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> On Wed, 2022-09-28 at 18:03 +0000, Sean Christopherson wrote:
> > On Wed, Sep 28, 2022, Maxim Levitsky wrote:
> > > On Wed, 2022-09-28 at 16:51 +0000, Sean Christopherson wrote:
> > > > > > It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
> > > > > > 512 vCPUs). This state persists for all subsequent reboots, until the VM
> > > > > > is terminated. For vCPU counts < 256, when x2apic is disabled the
> > > > > > problem does not occur, and AVIC continues to work properly after reboots.
> > > >
> > > > Bit of a shot in the dark, but does the below fix the issue? There are two
> > > > issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():
> > > >
> > > > 1. The xAPIC ID should only be refreshed on "set".
> > > True - I didn't bother to fix it yet because it shouldn't cause harm, but
> > > sure this needs to be fixed.
> >
> > It's probably benign on its own, but with the missing "hardware enabled" check,
> > it could be problematic if userspace does KVM_GET_LAPIC while the APIC is hardware
> > disabled, after the APIC was previously in x2APIC mode. I'm guessing QEMU does
> > KVM_GET_LAPIC state when emulating reboot, hence the potential for being involved
> > in the bug Alejandro is seeing.
> >
> > > > 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
> > > Are you sure? The check is first because if it fails, then error is returned to userspace
> > > and the KVM's state left unchanged.
> > >
> > > I assume you are talking about
> > >
> > > ....
> > > r = kvm_apic_state_fixup(vcpu, s, true);
> > > if (r) {
> > > kvm_recalculate_apic_map(vcpu->kvm);
> > > return r;
> > > }
> > > memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
> >
> > This isn't a failure path though, it's purely a "take note of the update", and
> > KVM needs to do that processing _after_ the actual update. Specifically,
> > kvm_lapic_xapic_id_updated() consumes the internal APIC state:
>
> Yes, I somehow blindly assumed that kvm_apic_state_fixup actually checks
> the new state and not the existing state.
>
> Probably because my original code did that, I think it just checked the 'id'
> variable.. Oh well.
> Thanks for catching this bug!

This was indeed the bug Alejandro hit. QEMU does KVM_SET_LAPIC and KVM checks
the stale vAPIC state. Because the vCPU was in x2APIC mode, the 32-bit ID is
in bits 31:0, not bit 31:24, and so kvm_xapic_id() returned '0' instead of the
correct ID.

2022-09-28 20:53:56

by Alejandro Jimenez

[permalink] [raw]
Subject: Re: [PATCH v3 05/28] KVM: x86: Don't inhibit APICv/AVIC if xAPIC ID mismatch is due to 32-bit ID



On 9/28/2022 12:51 PM, Sean Christopherson wrote:
> On Wed, Sep 28, 2022, Maxim Levitsky wrote:
>> On Tue, 2022-09-27 at 23:15 -0400, Alejandro Jimenez wrote:
>>>
>>> On 9/20/2022 7:31 PM, Sean Christopherson wrote:
>>>> Truncate the vcpu_id, a.k.a. x2APIC ID, to an 8-bit value when comparing
>>>> it against the xAPIC ID to avoid false positives (sort of) on systems
>>>> with >255 CPUs, i.e. with IDs that don't fit into a u8. The intent of
>>>> APIC_ID_MODIFIED is to inhibit APICv/AVIC when the xAPIC is changed from
>>>> it's original value,
>>>>
>>>> The mismatch isn't technically a false positive, as architecturally the
>>>> xAPIC IDs do end up being aliased in this scenario, and neither APICv
>>>> nor AVIC correctly handles IPI virtualization when there is aliasing.
>>>> However, KVM already deliberately does not honor the aliasing behavior
>>>> that results when an x2APIC ID gets truncated to an xAPIC ID. I.e. the
>>>> resulting APICv/AVIC behavior is aligned with KVM's existing behavior
>>>> when KVM's x2APIC hotplug hack is effectively enabled.
>>>>
>>>> If/when KVM provides a way to disable the hotplug hack, APICv/AVIC can
>>>> piggyback whatever logic disables the optimized APIC map (which is what
>>>> provides the hotplug hack), i.e. so that KVM's optimized map and APIC
>>>> virtualization yield the same behavior.
>>>>
>>>> For now, fix the immediate problem of APIC virtualization being disabled
>>>> for large VMs, which is a much more pressing issue than ensuring KVM
>>>> honors architectural behavior for APIC ID aliasing.
>>>
>>> I built a host kernel with this entire series on top of mainline
>>> v6.0-rc6, and booting a guest with AVIC enabled works as expected on the
>>> initial boot. The issue is that during the first reboot AVIC is
>>> inhibited due to APICV_INHIBIT_REASON_APIC_ID_MODIFIED, and I see
>>> constant inhibition events due to APICV_INHIBIT_REASON_IRQWIN as seen in
>>
>>
>> APICV_INHIBIT_REASON_IRQWIN is OK, because that happens about every time
>> the good old PIT timer fires which happens on reboot.
>>
>> APICV_INHIBIT_REASON_APIC_ID_MODIFIED should not happen as you noted,
>> this needs investigation.
>
> Ya, I'll take a look.
>
>>> It happens regardless of vCPU count (tested with 2, 32, 255, 380, and
>>> 512 vCPUs). This state persists for all subsequent reboots, until the VM
>>> is terminated. For vCPU counts < 256, when x2apic is disabled the
>>> problem does not occur, and AVIC continues to work properly after reboots.
>
> Bit of a shot in the dark, but does the below fix the issue?
The patch below fixes the problems for all the scenarios I have tested
so far.

Thank you,
Alejandro

There are two
> issues with calling kvm_lapic_xapic_id_updated() from kvm_apic_state_fixup():
>
> 1. The xAPIC ID should only be refreshed on "set".
>
> 2. The refresh needs to be noted after memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
>
> and a third bug in the helper itself, as changes to the ID should be ignored if
> the APIC is hardward disabled since the ID is reset to the vcpu_id when the APIC
> is hardware enabled (architecturally behavior).
>
> ---
> arch/x86/kvm/lapic.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 804d529d9bfb..b8b2faf5abc7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2159,6 +2159,9 @@ static void kvm_lapic_xapic_id_updated(struct kvm_lapic *apic)
> {
> struct kvm *kvm = apic->vcpu->kvm;
>
> + if (!kvm_apic_hw_enabled(apic))
> + return;
> +
> if (KVM_BUG_ON(apic_x2apic_mode(apic), kvm))
> return;
>
> @@ -2875,8 +2878,6 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> icr = __kvm_lapic_get_reg64(s->regs, APIC_ICR);
> __kvm_lapic_set_reg(s->regs, APIC_ICR2, icr >> 32);
> }
> - } else {
> - kvm_lapic_xapic_id_updated(vcpu->arch.apic);
> }
>
> return 0;
> @@ -2912,6 +2913,9 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
> }
> memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
>
> + if (!apic_x2apic_mode(vcpu->arch.apic))
> + kvm_lapic_xapic_id_updated(vcpu->arch.apic);
> +
> atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
> kvm_recalculate_apic_map(vcpu->kvm);
> kvm_apic_set_version(vcpu);
>
> base-commit: 0b502152c0b8523f399bdb53096e2d620c5795b5