2022-08-31 00:39:17

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
where AVIC is disabled if x2APIC is enabled. The argument that the
"guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
wrong. Activating x2APIC completely disables the xAPIC MMIO region,
there is nothing that says the guest must not access that address.

Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
expect accesses to the APIC base region to not be emulated when x2APIC is
enabled.

Furthermore, allowing the guest to trigger MMIO emulation in a mode where
KVM doesn't expect such emulation to occur is all kinds of dangerous.

Tweak the restriction so that it only inhibits AVIC if x2APIC is actually
enabled instead of inhibiting AVIC is x2APIC is exposed to the guest.

This reverts commit 0e311d33bfbef86da130674e8528cc23e6acfe16.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/svm/avic.c | 21 ++++++++++-----------
2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2c96c43c313a..1f51411f3112 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1128,6 +1128,12 @@ enum kvm_apicv_inhibit {
*/
APICV_INHIBIT_REASON_PIT_REINJ,

+ /*
+ * AVIC is inhibited because the vCPU has x2apic enabled and x2AVIC is
+ * not supported.
+ */
+ APICV_INHIBIT_REASON_X2APIC,
+
/*
* AVIC is disabled because SEV doesn't support it.
*/
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index f3a74c8284cb..1d516d658f9a 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -71,22 +71,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;

vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
-
- /* Note:
- * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
- * MSR accesses, while interrupt injection to a running vCPU
- * can be achieved using AVIC doorbell. The AVIC hardware still
- * accelerate MMIO accesses, but this does not cause any harm
- * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
- */
- if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
- avic_mode == AVIC_MODE_X2) {
+ if (apic_x2apic_mode(svm->vcpu.arch.apic)) {
vmcb->control.int_ctl |= X2APIC_MODE_MASK;
vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
/* Disabling MSR intercept for x2APIC registers */
svm_set_x2apic_msr_interception(svm, false);
} else {
- /* For xAVIC and hybrid-xAVIC modes */
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
/* Enabling MSR intercept for x2APIC registers */
svm_set_x2apic_msr_interception(svm, true);
@@ -537,6 +527,14 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
{
if (is_guest_mode(vcpu))
return APICV_INHIBIT_REASON_NESTED;
+
+ /*
+ * AVIC must be disabled if x2AVIC isn't supported and the guest has
+ * x2APIC enabled.
+ */
+ if (avic_mode != AVIC_MODE_X2 && apic_x2apic_mode(vcpu->arch.apic))
+ return APICV_INHIBIT_REASON_X2APIC;
+
return 0;
}

@@ -993,6 +991,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
BIT(APICV_INHIBIT_REASON_NESTED) |
BIT(APICV_INHIBIT_REASON_IRQWIN) |
BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
+ BIT(APICV_INHIBIT_REASON_X2APIC) |
BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
BIT(APICV_INHIBIT_REASON_SEV) |
BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |
--
2.37.2.672.g94769d06f0-goog


2022-08-31 06:33:01

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> where AVIC is disabled if x2APIC is enabled. The argument that the
> "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> there is nothing that says the guest must not access that address.
>
> Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> expect accesses to the APIC base region to not be emulated when x2APIC is
> enabled.
>
> Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> KVM doesn't expect such emulation to occur is all kinds of dangerous.
>
> Tweak the restriction so that it only inhibits AVIC if x2APIC is actually
> enabled instead of inhibiting AVIC is x2APIC is exposed to the guest.
>
> This reverts commit 0e311d33bfbef86da130674e8528cc23e6acfe16.

I don't agree with this patch.

When reviewing this code I did note that MMIO is left enabled which is kind of errata on KVM
side, and nobody objected to this.

Keeping AVIC enabled allows to have performance benefits with guests that have to use x2apic
(e.g after migration, or OS requirements) - aside from emulated msr access they get all the
AVIC benefits like doorbell, IOMMU posted interrupts, etc.

What we should do, and I even suggested doing it
(and what I somewhat assumed that you will ask when the patch was in review),
is to disable the AVIC mmio hole when one of the guest vCPUs is in x2apic mode
which can be done by disabling our APIC private memslot.

Best regards,
Maxim Levitsky

>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 6 ++++++
> arch/x86/kvm/svm/avic.c | 21 ++++++++++-----------
> 2 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2c96c43c313a..1f51411f3112 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1128,6 +1128,12 @@ enum kvm_apicv_inhibit {
> */
> APICV_INHIBIT_REASON_PIT_REINJ,
>
> + /*
> + * AVIC is inhibited because the vCPU has x2apic enabled and x2AVIC is
> + * not supported.
> + */
> + APICV_INHIBIT_REASON_X2APIC,
> +
> /*
> * AVIC is disabled because SEV doesn't support it.
> */
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index f3a74c8284cb..1d516d658f9a 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -71,22 +71,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
>
> vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> -
> - /* Note:
> - * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> - * MSR accesses, while interrupt injection to a running vCPU
> - * can be achieved using AVIC doorbell. The AVIC hardware still
> - * accelerate MMIO accesses, but this does not cause any harm
> - * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> - */
> - if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
> - avic_mode == AVIC_MODE_X2) {
> + if (apic_x2apic_mode(svm->vcpu.arch.apic)) {
> vmcb->control.int_ctl |= X2APIC_MODE_MASK;
> vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
> /* Disabling MSR intercept for x2APIC registers */
> svm_set_x2apic_msr_interception(svm, false);
> } else {
> - /* For xAVIC and hybrid-xAVIC modes */
> vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
> /* Enabling MSR intercept for x2APIC registers */
> svm_set_x2apic_msr_interception(svm, true);
> @@ -537,6 +527,14 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> {
> if (is_guest_mode(vcpu))
> return APICV_INHIBIT_REASON_NESTED;
> +
> + /*
> + * AVIC must be disabled if x2AVIC isn't supported and the guest has
> + * x2APIC enabled.
> + */
> + if (avic_mode != AVIC_MODE_X2 && apic_x2apic_mode(vcpu->arch.apic))
> + return APICV_INHIBIT_REASON_X2APIC;
> +
> return 0;
> }
>
> @@ -993,6 +991,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> BIT(APICV_INHIBIT_REASON_NESTED) |
> BIT(APICV_INHIBIT_REASON_IRQWIN) |
> BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
> + BIT(APICV_INHIBIT_REASON_X2APIC) |
> BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> BIT(APICV_INHIBIT_REASON_SEV) |
> BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |


2022-08-31 07:12:11

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 08:59 +0300, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> > where AVIC is disabled if x2APIC is enabled. The argument that the
> > "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> > wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> > there is nothing that says the guest must not access that address.
> >
> > Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> > expect accesses to the APIC base region to not be emulated when x2APIC is
> > enabled.
> >
> > Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> > KVM doesn't expect such emulation to occur is all kinds of dangerous.

Also, unless I misunderstood you, the above statement is wrong.

Leaving AVIC on, when vCPU is in x2apic mode cannot trigger extra MMIO emulation,
in fact the opposite - because AVIC is on, writes to 0xFEE00xxx might *not* trigger
MMIO emulation and instead be emulated by AVIC.

Yes, some of these writes can trigger AVIC specific emulation vm exits, but they
are literaly the same as those used by x2avic, and it is really hard to see
why this would be dangerous (assuming that x2avic code works, and avic code
is aware of this 'hybrid' mode).

From the guest point of view, unless the guest pokes at random MMIO area,
the only case when this matters is if the guest maps RAM over the 0xFEE00xxx
(which it of course can, the spec explictly state as you say that when x2apic
is enabled, the mmio is disabled), and then instead of functioning as RAM,
the range will still function as APIC.

However after *our hack* that we did to avoid removing the APIC private memslot,
regardless of APIC inhibit, userspace likely can't even place RAM memslot over the area, regarless
if x2apic is enabled or not, because it will overlap over our private memslot.

Best regards,
Maxim Levitsky

> >
> > Tweak the restriction so that it only inhibits AVIC if x2APIC is actually
> > enabled instead of inhibiting AVIC is x2APIC is exposed to the guest.
> >
> > This reverts commit 0e311d33bfbef86da130674e8528cc23e6acfe16.
>
> I don't agree with this patch.
>
> When reviewing this code I did note that MMIO is left enabled which is kind of errata on KVM
> side, and nobody objected to this.
>
> Keeping AVIC enabled allows to have performance benefits with guests that have to use x2apic
> (e.g after migration, or OS requirements) - aside from emulated msr access they get all the
> AVIC benefits like doorbell, IOMMU posted interrupts, etc.
>
> What we should do, and I even suggested doing it
> (and what I somewhat assumed that you will ask when the patch was in review),
> is to disable the AVIC mmio hole when one of the guest vCPUs is in x2apic mode
> which can be done by disabling our APIC private memslot.
>
> Best regards,
> Maxim Levitsky
>
> > Signed-off-by: Sean Christopherson <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 6 ++++++
> > arch/x86/kvm/svm/avic.c | 21 ++++++++++-----------
> > 2 files changed, 16 insertions(+), 11 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 2c96c43c313a..1f51411f3112 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1128,6 +1128,12 @@ enum kvm_apicv_inhibit {
> > */
> > APICV_INHIBIT_REASON_PIT_REINJ,
> >
> > + /*
> > + * AVIC is inhibited because the vCPU has x2apic enabled and x2AVIC is
> > + * not supported.
> > + */
> > + APICV_INHIBIT_REASON_X2APIC,
> > +
> > /*
> > * AVIC is disabled because SEV doesn't support it.
> > */
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index f3a74c8284cb..1d516d658f9a 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -71,22 +71,12 @@ static void avic_activate_vmcb(struct vcpu_svm *svm)
> > vmcb->control.avic_physical_id &= ~AVIC_PHYSICAL_MAX_INDEX_MASK;
> >
> > vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> > -
> > - /* Note:
> > - * KVM can support hybrid-AVIC mode, where KVM emulates x2APIC
> > - * MSR accesses, while interrupt injection to a running vCPU
> > - * can be achieved using AVIC doorbell. The AVIC hardware still
> > - * accelerate MMIO accesses, but this does not cause any harm
> > - * as the guest is not supposed to access xAPIC mmio when uses x2APIC.
> > - */
> > - if (apic_x2apic_mode(svm->vcpu.arch.apic) &&
> > - avic_mode == AVIC_MODE_X2) {
> > + if (apic_x2apic_mode(svm->vcpu.arch.apic)) {
> > vmcb->control.int_ctl |= X2APIC_MODE_MASK;
> > vmcb->control.avic_physical_id |= X2AVIC_MAX_PHYSICAL_ID;
> > /* Disabling MSR intercept for x2APIC registers */
> > svm_set_x2apic_msr_interception(svm, false);
> > } else {
> > - /* For xAVIC and hybrid-xAVIC modes */
> > vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID;
> > /* Enabling MSR intercept for x2APIC registers */
> > svm_set_x2apic_msr_interception(svm, true);
> > @@ -537,6 +527,14 @@ unsigned long avic_vcpu_get_apicv_inhibit_reasons(struct kvm_vcpu *vcpu)
> > {
> > if (is_guest_mode(vcpu))
> > return APICV_INHIBIT_REASON_NESTED;
> > +
> > + /*
> > + * AVIC must be disabled if x2AVIC isn't supported and the guest has
> > + * x2APIC enabled.
> > + */
> > + if (avic_mode != AVIC_MODE_X2 && apic_x2apic_mode(vcpu->arch.apic))
> > + return APICV_INHIBIT_REASON_X2APIC;
> > +
> > return 0;
> > }
> >
> > @@ -993,6 +991,7 @@ bool avic_check_apicv_inhibit_reasons(enum kvm_apicv_inhibit reason)
> > BIT(APICV_INHIBIT_REASON_NESTED) |
> > BIT(APICV_INHIBIT_REASON_IRQWIN) |
> > BIT(APICV_INHIBIT_REASON_PIT_REINJ) |
> > + BIT(APICV_INHIBIT_REASON_X2APIC) |
> > BIT(APICV_INHIBIT_REASON_BLOCKIRQ) |
> > BIT(APICV_INHIBIT_REASON_SEV) |
> > BIT(APICV_INHIBIT_REASON_APIC_ID_MODIFIED) |


2022-08-31 16:34:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> > where AVIC is disabled if x2APIC is enabled. The argument that the
> > "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> > wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> > there is nothing that says the guest must not access that address.
> >
> > Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> > expect accesses to the APIC base region to not be emulated when x2APIC is
> > enabled.
> >
> > Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> > KVM doesn't expect such emulation to occur is all kinds of dangerous.
> >
> > Tweak the restriction so that it only inhibits AVIC if x2APIC is actually
> > enabled instead of inhibiting AVIC is x2APIC is exposed to the guest.
> >
> > This reverts commit 0e311d33bfbef86da130674e8528cc23e6acfe16.
>
> I don't agree with this patch.
>
> When reviewing this code I did note that MMIO is left enabled which is kind
> of errata on KVM side, and nobody objected to this.

I didn't object because I didn't read the patch. I'm very much objecting now.

2022-08-31 17:20:59

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 08:59 +0300, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> > > where AVIC is disabled if x2APIC is enabled. The argument that the
> > > "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> > > wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> > > there is nothing that says the guest must not access that address.
> > >
> > > Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> > > expect accesses to the APIC base region to not be emulated when x2APIC is
> > > enabled.
> > >
> > > Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> > > KVM doesn't expect such emulation to occur is all kinds of dangerous.
>
> Also, unless I misunderstood you, the above statement is wrong.
>
> Leaving AVIC on, when vCPU is in x2apic mode cannot trigger extra MMIO emulation,
> in fact the opposite - because AVIC is on, writes to 0xFEE00xxx might *not* trigger
> MMIO emulation and instead be emulated by AVIC.

That's even worse, because KVM is allowing the guest to exercise hardware logic
that I highly doubt AMD has thoroughly tested.

> Yes, some of these writes can trigger AVIC specific emulation vm exits, but they
> are literaly the same as those used by x2avic, and it is really hard to see
> why this would be dangerous (assuming that x2avic code works, and avic code
> is aware of this 'hybrid' mode).

The APIC_RRR thing triggered the KVM_BUG_ON() in kvm_apic_write_nodecode()
precisely because of the AVIC trap. At best, this gives a way for the guest to
trigger a WARN_ON_ONCE() and thus panic the host if panic_on_warn=1. I fixed
the APIC_RRR case because that will be problematic for x2AVIC, but there are
other APIC registers that are unsupported in x2APIC that can trigger the KVM_BUG_ON().

> From the guest point of view, unless the guest pokes at random MMIO area,
> the only case when this matters is if the guest maps RAM over the 0xFEE00xxx
> (which it of course can, the spec explictly state as you say that when x2apic
> is enabled, the mmio is disabled), and then instead of functioning as RAM,
> the range will still function as APIC.

There is no wiggle room here though, KVM is blatantly breaking the architectural
specification. When x2APIC is enabled, the xAPIC MMIO does not exist.

2022-08-31 17:52:14

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 16:19 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> > > where AVIC is disabled if x2APIC is enabled. The argument that the
> > > "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> > > wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> > > there is nothing that says the guest must not access that address.
> > >
> > > Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> > > expect accesses to the APIC base region to not be emulated when x2APIC is
> > > enabled.
> > >
> > > Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> > > KVM doesn't expect such emulation to occur is all kinds of dangerous.
> > >
> > > Tweak the restriction so that it only inhibits AVIC if x2APIC is actually
> > > enabled instead of inhibiting AVIC is x2APIC is exposed to the guest.
> > >
> > > This reverts commit 0e311d33bfbef86da130674e8528cc23e6acfe16.
> >
> > I don't agree with this patch.
> >
> > When reviewing this code I did note that MMIO is left enabled which is kind
> > of errata on KVM side, and nobody objected to this.
>
> I didn't object because I didn't read the patch. I'm very much objecting now.
>

And I am *very* much objecting to reverting this patch.

Best regards,
Maxim Levitsky

2022-08-31 18:07:36

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, Aug 31, 2022 at 10:49 AM Maxim Levitsky <[email protected]> wrote:

> In this case I say that there is no wiggle room for KVM to not allow different APIC bases
> on each CPU - the spec 100% allows it, but in KVM it is broken.

This would actually be my first candidate for
Documentation/virt/kvm/x86/errata.rst!

2022-08-31 18:13:49

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 16:29 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 08:59 +0300, Maxim Levitsky wrote:
> > > On Wed, 2022-08-31 at 00:34 +0000, Sean Christopherson wrote:
> > > > Remove SVM's so called "hybrid-AVIC mode" and reinstate the restriction
> > > > where AVIC is disabled if x2APIC is enabled. The argument that the
> > > > "guest is not supposed to access xAPIC mmio when uses x2APIC" is flat out
> > > > wrong. Activating x2APIC completely disables the xAPIC MMIO region,
> > > > there is nothing that says the guest must not access that address.
> > > >
> > > > Concretely, KVM-Unit-Test's existing "apic" test fails the subtests that
> > > > expect accesses to the APIC base region to not be emulated when x2APIC is
> > > > enabled.
> > > >
> > > > Furthermore, allowing the guest to trigger MMIO emulation in a mode where
> > > > KVM doesn't expect such emulation to occur is all kinds of dangerous.
> >
> > Also, unless I misunderstood you, the above statement is wrong.
> >
> > Leaving AVIC on, when vCPU is in x2apic mode cannot trigger extra MMIO emulation,
> > in fact the opposite - because AVIC is on, writes to 0xFEE00xxx might *not* trigger
> > MMIO emulation and instead be emulated by AVIC.
>
> That's even worse, because KVM is allowing the guest to exercise hardware logic
> that I highly doubt AMD has thoroughly tested.

Harware logic is exactly the same regarless of if KVM uses x2apic mode or not,
and it is better to be prepared for all kind of garbage coming from the guest.

Software logic, I can understand you, there could be registers that trap differently
in avic and x2avic mode, but it should be *very* easy to deal with it, the list
of registers that trap is very short.

>
> > Yes, some of these writes can trigger AVIC specific emulation vm exits, but they
> > are literaly the same as those used by x2avic, and it is really hard to see
> > why this would be dangerous (assuming that x2avic code works, and avic code
> > is aware of this 'hybrid' mode).
>
> The APIC_RRR thing triggered the KVM_BUG_ON() in kvm_apic_write_nodecode()
> precisely because of the AVIC trap. At best, this gives a way for the guest to
> trigger a WARN_ON_ONCE() and thus panic the host if panic_on_warn=1. I fixed
> the APIC_RRR case because that will be problematic for x2AVIC, but there are
> other APIC registers that are unsupported in x2APIC that can trigger the KVM_BUG_ON().
>
> > From the guest point of view, unless the guest pokes at random MMIO area,
> > the only case when this matters is if the guest maps RAM over the 0xFEE00xxx
> > (which it of course can, the spec explictly state as you say that when x2apic
> > is enabled, the mmio is disabled), and then instead of functioning as RAM,
> > the range will still function as APIC.
>
> There is no wiggle room here though, KVM is blatantly breaking the architectural
> specification. When x2APIC is enabled, the xAPIC MMIO does not exist.

In this case I say that there is no wiggle room for KVM to not allow different APIC bases
on each CPU - the spec 100% allows it, but in KVM it is broken.

If you are really hell bent on not having that MMIO exposed,
then I say we can just disable the AVIC memslot, and keep AVIC enabled in this case -
this should make us both happy.


This discussion really makes me feel that you want just to force your opinion on others,
and its not the first time this happens. It is really frustrating to work like that.
It might sound harsh but this is how I feel. Hopefully I am wrong.


Best regards,
Maxim Levisky


>


2022-08-31 18:21:48

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 10:58 -0700, Jim Mattson wrote:
> On Wed, Aug 31, 2022 at 10:49 AM Maxim Levitsky <[email protected]> wrote:
>
> > In this case I say that there is no wiggle room for KVM to not allow different APIC bases
> > on each CPU - the spec 100% allows it, but in KVM it is broken.
>
> This would actually be my first candidate for
> Documentation/virt/kvm/x86/errata.rst!
>
100% agree.

Best regards,
Maxim Levitsky

2022-08-31 19:37:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 16:29 +0000, Sean Christopherson wrote:
> > On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > > Leaving AVIC on, when vCPU is in x2apic mode cannot trigger extra MMIO emulation,
> > > in fact the opposite - because AVIC is on, writes to 0xFEE00xxx might *not* trigger
> > > MMIO emulation and instead be emulated by AVIC.
> >
> > That's even worse, because KVM is allowing the guest to exercise hardware logic
> > that I highly doubt AMD has thoroughly tested.
>
> Harware logic is exactly the same regarless of if KVM uses x2apic mode or not,
> and it is better to be prepared for all kind of garbage coming from the guest.

Right, I got twisted around and was thinking x2APIC enabling of the guest was
reflected in hardware, but it's purely emulated in this mode.

> Software logic, I can understand you, there could be registers that trap differently
> in avic and x2avic mode, but it should be *very* easy to deal with it, the list
> of registers that trap is very short.
>
> >
> > > Yes, some of these writes can trigger AVIC specific emulation vm exits, but they
> > > are literaly the same as those used by x2avic, and it is really hard to see
> > > why this would be dangerous (assuming that x2avic code works, and avic code
> > > is aware of this 'hybrid' mode).
> >
> > The APIC_RRR thing triggered the KVM_BUG_ON() in kvm_apic_write_nodecode()
> > precisely because of the AVIC trap. At best, this gives a way for the guest to
> > trigger a WARN_ON_ONCE() and thus panic the host if panic_on_warn=1. I fixed
> > the APIC_RRR case because that will be problematic for x2AVIC, but there are
> > other APIC registers that are unsupported in x2APIC that can trigger the KVM_BUG_ON().

> > > From the guest point of view, unless the guest pokes at random MMIO area,
> > > the only case when this matters is if the guest maps RAM over the 0xFEE00xxx
> > > (which it of course can, the spec explictly state as you say that when x2apic
> > > is enabled, the mmio is disabled), and then instead of functioning as RAM,
> > > the range will still function as APIC.
> >
> > There is no wiggle room here though, KVM is blatantly breaking the architectural
> > specification. When x2APIC is enabled, the xAPIC MMIO does not exist.
>
> In this case I say that there is no wiggle room for KVM to not allow
> different APIC bases on each CPU - the spec 100% allows it, but in KVM it is
> broken.

The difference is that KVM is consistent with respect to itself in that case.
KVM doesn't support APIC base relocation, and never has supported APIC base
relocation.

This hybrid AVIC mode means that the resulting KVM behavior will vary depending
on whether or not AVIC is supported and enabled, whether or not x2AVIC is supported,
and also on internal KVM state. And we even have tests for this! E.g. run the APIC
unit test with AVIC disabled and it passes, run it with AVIC enabled and it fails.

> If you are really hell bent on not having that MMIO exposed, then I say we
> can just disable the AVIC memslot, and keep AVIC enabled in this case - this
> should make us both happy.

I don't think that will work though, as I don't think it's possible to tell hardware
not to accelerate AVIC accesses. I.e. KVM can squash the unaccelerated traps, but
anything that is handled by hardware will still go through.

> This discussion really makes me feel that you want just to force your opinion
> on others, and its not the first time this happens. It is really frustrating
> to work like that. It might sound harsh but this is how I feel. Hopefully I
> am wrong.

Yes and no. I am most definitely trying to steer KVM in a direction that I feel
will allow KVM to scale in terms of developers, users, and workloads. Part of
that direction is to change people's mindset from "good enough" to "implement to
the spec".

KVM's historic "good enough" approach largely worked when KVM had a relatively
small and stable development community, and when KVM was being used to run a
relatively limited set of workloads. But KVM is being used to run an ever increasing
variety of workloads, and the development community is larger (and I hope that we
can grow it even further). And there is inevitably attrition, which means that
unless we are extremely diligent in documenting KVM's quirks/errata, knowledge of
KVM's "good enough" shortcuts will eventually be lost.

E.g. it took me literally days to understand KVM's hack for forcing #PF=>#PF=>VM-Exit
instead of #PF=>#PF=>#DF. That mess would have been avoided if KVM had implemented
to the spec and actually done the right thing back when the bug was first encountered.
Ditto for the x2APIC ID hotplug hack; I stared at that code on at least three
different occassions before I finally understood the full impact of the hack.

And getting KVM to implement to the spec means not deviating from that path when
it's inconvenient to follow, thus the hard line I am drawing. I am sure we'll
encounter scenarios/features where it's simply impossible for KVM to do the right
thing, e.g. I believe virtualizing VMX's posted interrupts falls into this category.
But IMO those should be very, very rare exceptions.

So, "yes" in the sense that I am openly trying to drag people into alignment with
my "implement to the spec" vision. But "no" in the sense that I don't think it's
fair to say I am forcing my opinion on others. I am not saying "NAK" and ending
the discussion.

2022-09-01 10:35:20

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Wed, 2022-08-31 at 19:12 +0000, Sean Christopherson wrote:
> On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-08-31 at 16:29 +0000, Sean Christopherson wrote:
> > > On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > > > Leaving AVIC on, when vCPU is in x2apic mode cannot trigger extra MMIO emulation,
> > > > in fact the opposite - because AVIC is on, writes to 0xFEE00xxx might *not* trigger
> > > > MMIO emulation and instead be emulated by AVIC.
> > >
> > > That's even worse, because KVM is allowing the guest to exercise hardware logic
> > > that I highly doubt AMD has thoroughly tested.
> >
> > Harware logic is exactly the same regarless of if KVM uses x2apic mode or not,
> > and it is better to be prepared for all kind of garbage coming from the guest.
>
> Right, I got twisted around and was thinking x2APIC enabling of the guest was
> reflected in hardware, but it's purely emulated in this mode.

BTW, I don't know why I didn't thought about it before - it is trivial to plug the possible
emulation bugs in hybrid AVIC mode, assuming that AVIC mmio is still exposed to the guest -
all we need to do is to refuse to do anything on each of the two AVIC vmexits if the
AVIC is in this mode, because in this mode, the guest should never touch the MMIO.

So basically just stick teh below code in start of

'avic_unaccelerated_access_interception()' and in 'avic_incomplete_ipi_interception()'


if (apic_x2apic_mode(vcpu->arch.apic) && avic_mode != AVIC_MODE_X2)
return 1;

>
> > Software logic, I can understand you, there could be registers that trap differently
> > in avic and x2avic mode, but it should be *very* easy to deal with it, the list
> > of registers that trap is very short.
> >
> > > > Yes, some of these writes can trigger AVIC specific emulation vm exits, but they
> > > > are literaly the same as those used by x2avic, and it is really hard to see
> > > > why this would be dangerous (assuming that x2avic code works, and avic code
> > > > is aware of this 'hybrid' mode).
> > >
> > > The APIC_RRR thing triggered the KVM_BUG_ON() in kvm_apic_write_nodecode()
> > > precisely because of the AVIC trap. At best, this gives a way for the guest to
> > > trigger a WARN_ON_ONCE() and thus panic the host if panic_on_warn=1. I fixed
> > > the APIC_RRR case because that will be problematic for x2AVIC, but there are
> > > other APIC registers that are unsupported in x2APIC that can trigger the KVM_BUG_ON().
> > > > From the guest point of view, unless the guest pokes at random MMIO area,
> > > > the only case when this matters is if the guest maps RAM over the 0xFEE00xxx
> > > > (which it of course can, the spec explictly state as you say that when x2apic
> > > > is enabled, the mmio is disabled), and then instead of functioning as RAM,
> > > > the range will still function as APIC.
> > >
> > > There is no wiggle room here though, KVM is blatantly breaking the architectural
> > > specification. When x2APIC is enabled, the xAPIC MMIO does not exist.
> >
> > In this case I say that there is no wiggle room for KVM to not allow
> > different APIC bases on each CPU - the spec 100% allows it, but in KVM it is
> > broken.
>
> The difference is that KVM is consistent with respect to itself in that case.
> KVM doesn't support APIC base relocation, and never has supported APIC base
> relocation.
>
> This hybrid AVIC mode means that the resulting KVM behavior will vary depending
> on whether or not AVIC is supported and enabled, whether or not x2AVIC is supported,
> and also on internal KVM state. And we even have tests for this! E.g. run the APIC
> unit test with AVIC disabled and it passes, run it with AVIC enabled and it fails.
>
> > If you are really hell bent on not having that MMIO exposed, then I say we
> > can just disable the AVIC memslot, and keep AVIC enabled in this case - this
> > should make us both happy.
>
> I don't think that will work though, as I don't think it's possible to tell hardware
> not to accelerate AVIC accesses. I.e. KVM can squash the unaccelerated traps, but
> anything that is handled by hardware will still go through.

Nope! Remember why do we have that APIC private memslot? It is because APIC's mmio
is accelerated by AVIC/APICv only when both conditions are true
1. AVIC/APICv is enabled and is in xapic mode
2. the MMIO is mapped as read/write in the NPT/EPT (and yes, AVIC/APICv requires NPT/EPT).

For the (2) we have a private memslot that we map there and we enable it when
AVIC/APICv is enabled.



To refresh our memory, we did a hack (I have nothing against it), that we
no longer remove that memslot when AVIC/APICv is inhibited, but instead we zap
its SPTE, and next time the guest tries to access it, despite having a memslot there
we don't re-install the SPTE and instead jump to emulation.

So technically the memslot is always enabled, but the guest can't use it when AVIC/APICv is
inhibited.

However when APIC mode changes to x2apic, I don't think that we zap that SPTE.

This means that when the hardware APIC acceleration is in x2apic mode (e.g x2avic is active),
the hardware doesn't intercept the MMIO anymore, and thus if guest accesses it,
it will access read/write our private APIC memslot, since its SPTE is there, which is wrong
because the userspace didn't map memory there, not to mention that this 'extra' memory won't survive migration.


Thus regardless of avic hybrid mode we need to ensure that if one of vCPUs is in x2apic mode,
that our private memslot SPTE is zapped, and that also the page fault handler won't re-install it.


A side efect of this will be that AVIC in hybrid mode will just work and be 100% to the spec.

>
> > This discussion really makes me feel that you want just to force your opinion
> > on others, and its not the first time this happens. It is really frustrating
> > to work like that. It might sound harsh but this is how I feel. Hopefully I
> > am wrong.
>
> Yes and no. I am most definitely trying to steer KVM in a direction that I feel
> will allow KVM to scale in terms of developers, users, and workloads. Part of
> that direction is to change people's mindset from "good enough" to "implement to
> the spec".

Sean, don't understand me wrong - I agree with you mostly.

In fact I implemented support for PDPTR migration to be 100% to the spec,
although I kind of regret it, since PDPTRs looks like not preserved over SMM entries
in hardware thus, it is really not possible to rely on them to be unsync.
Also NPT doesn't support them either.

I also was actually first to notice the exception merging issue, and I actually
worked on fixing it and posted patches upstream.

It is also a very corner case, and yet I wanted to fix it, just to be 100% up the spec.

I also reviewed the EVENTINJ injection of software interrupts (INTn) when there
is not INTn instruction in the guest, also something that nobody in reality uses,
but I warmely welcomed it and helped with review.

I also fixed many bugs in !NPT, and !EPT cases, also just to make KVM work to the spec.

Etc, etc, etc.

I just know that nothing is absolute, in some rare cases (like different APIC base per cpu),
it is just not feasable to support the spec.

In fact remember that patch series about maximum VMCS field number?
https://lore.kernel.org/kvm/[email protected]/

There was actually a patch series that was fixing it, but you said, just like me,
that it is not worth it, better to have an errata in KVM, since guest should not use
this info anyway. I didn't object to it, and neither I do now, but as you see,
you also sometimes agree that going 100% to the spec is not worth it.


I hope you understand me.

TL;DR - in this case actually the most correct solution is to disable the private apic
memslot that we install over the APIC mmio, when one of the APICs is in x2apic mode.


That will not only make AVIC hybrid mode just work, but also make sure that if the
guest accesses the APIC MMIO anyway it will see no memory there.


Best regards,
Maxim Levitsky


>
> KVM's historic "good enough" approach largely worked when KVM had a relatively
> small and stable development community, and when KVM was being used to run a
> relatively limited set of workloads. But KVM is being used to run an ever increasing
> variety of workloads, and the development community is larger (and I hope that we
> can grow it even further). And there is inevitably attrition, which means that
> unless we are extremely diligent in documenting KVM's quirks/errata, knowledge of
> KVM's "good enough" shortcuts will eventually be lost.
>
> E.g. it took me literally days to understand KVM's hack for forcing #PF=>#PF=>VM-Exit
> instead of #PF=>#PF=>#DF. That mess would have been avoided if KVM had implemented
> to the spec and actually done the right thing back when the bug was first encountered.
> Ditto for the x2APIC ID hotplug hack; I stared at that code on at least three
> different occassions before I finally understood the full impact of the hack.
>
> And getting KVM to implement to the spec means not deviating from that path when
> it's inconvenient to follow, thus the hard line I am drawing. I am sure we'll
> encounter scenarios/features where it's simply impossible for KVM to do the right
> thing, e.g. I believe virtualizing VMX's posted interrupts falls into this category.
> But IMO those should be very, very rare exceptions.


>
> So, "yes" in the sense that I am openly trying to drag people into alignment with
> my "implement to the spec" vision. But "no" in the sense that I don't think it's
> fair to say I am forcing my opinion on others. I am not saying "NAK" and ending
> the discussion.
>


2022-09-01 11:57:12

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Thu, Sep 1, 2022 at 3:26 AM Maxim Levitsky <[email protected]> wrote:

> I just know that nothing is absolute, in some rare cases (like different APIC base per cpu),
> it is just not feasable to support the spec.

Why isn't this feasible? We supported it when I was at VMware.

2022-09-01 14:18:25

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Thu, Sep 01, 2022, Maxim Levitsky wrote:
> There was actually a patch series that was fixing it, but you said, just like me,
> that it is not worth it, better to have an errata in KVM, since guest should not use
> this info anyway. I didn't object to it, and neither I do now, but as you see,
> you also sometimes agree that going 100% to the spec is not worth it.
>
>
> I hope you understand me.

Yep.

And rereading what I wrote... I didn't intend to imply that you personally aren't
operating in "good faith" or whatever is the right terminology. What I was trying
to explain is why I sometimes speak in absolutes. When there is a bug/regression
and KVM is clearly violating spec, it's not a matter of opinion; KVM is broken and
needs to be fixed.

Of course, one could argue that that's an opinion in and of itself...

2022-09-01 14:43:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

On Thu, Sep 01, 2022, Maxim Levitsky wrote:
> On Wed, 2022-08-31 at 19:12 +0000, Sean Christopherson wrote:
> > On Wed, Aug 31, 2022, Maxim Levitsky wrote:
> > > If you are really hell bent on not having that MMIO exposed, then I say we
> > > can just disable the AVIC memslot, and keep AVIC enabled in this case - this
> > > should make us both happy.
> >
> > I don't think that will work though, as I don't think it's possible to tell hardware
> > not to accelerate AVIC accesses. I.e. KVM can squash the unaccelerated traps, but
> > anything that is handled by hardware will still go through.
>
> Nope! Remember why do we have that APIC private memslot?

Heh, I finally understood what you were suggesting two minutes after I woke up
this morning. Apparently I needed to sleep on it?

2022-09-01 16:48:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 03/19] Revert "KVM: SVM: Introduce hybrid-AVIC mode"

Sorry for forking a bunch of different threads, wanted to respond to the other
stuff before you signed off for the day.

On Thu, Sep 01, 2022, Maxim Levitsky wrote:
> However when APIC mode changes to x2apic, I don't think that we zap that
> SPTE.

Hmm, yes.

> A side efect of this will be that AVIC in hybrid mode will just work and be
> 100% to the spec.

And I believe we can solve the avic_set_virtual_apic_mode() issue as well. If
x2APIC is treated as an inhibit for xAPIC but not APICv at large, then setting the
inhibit will zap the SPTEs, and toggling the inhibit will force AVIC to re-assess
its VMCB controls without needing to hook ->set_virtual_apic_mode().

Roughly what I'm thinking. I'll try to give this a shot today.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 164b002777cf..0b69acef2bee 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2381,7 +2381,11 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
if (((old_value ^ value) & X2APIC_ENABLE) && (value & X2APIC_ENABLE))
kvm_apic_set_x2apic_id(apic, vcpu->vcpu_id);

- if ((old_value ^ value) & (MSR_IA32_APICBASE_ENABLE | X2APIC_ENABLE)) {
+ if ((old_value ^ value) & X2APIC_ENABLE) {
+ kvm_set_or_clear_apicv_inhibit(vcpu->kvm,
+ APICV_INHIBIT_REASON_X2APIC,
+ value & X2APIC_ENABLE;
+ } else if ((old_value ^ value) & MSR_IA32_APICBASE_ENABLE) {
kvm_vcpu_update_apicv(vcpu);
static_call_cond(kvm_x86_set_virtual_apic_mode)(vcpu);
}
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 270403610185..76d93f87071f 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4156,7 +4156,7 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* when the AVIC is re-enabled.
*/
if (slot && slot->id == APIC_ACCESS_PAGE_PRIVATE_MEMSLOT &&
- !kvm_apicv_activated(vcpu->kvm))
+ !kvm_xapic_apicv_activated(vcpu->kvm))
return RET_PF_EMULATE;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1328326acfae..ee381d155adc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9297,11 +9297,24 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, int apicid)
kvm_irq_delivery_to_apic(kvm, NULL, &lapic_irq, NULL);
}

-bool kvm_apicv_activated(struct kvm *kvm)
+bool kvm_xapic_apicv_activated(struct kvm *kvm)
{
return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
}

+bool kvm_apicv_activated(struct kvm *kvm)
+{
+ unsigned long inhibits = READ_ONCE(kvm->arch.apicv_inhibit_reasons);
+
+ /*
+ * x2APIC only needs to "inhibit" the MMIO region, all other aspects of
+ * APICv can continue to be utilized.
+ */
+ inhibits &= ~APICV_INHIBIT_REASON_X2APIC;
+
+ return !inhibits;
+}
+
bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
{
ulong vm_reasons = READ_ONCE(vcpu->kvm->arch.apicv_inhibit_reasons);