2023-01-07 01:38:10

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/6] KVM: x86: x2APIC reserved bits/regs fixes

Fixes for edge cases where KVM mishandles reserved bits/regs checks when
the vCPU is in x2APIC mode.

The first two patches were previously posted[*], but both patches were
broken (as posted against upstream), hence I took full credit for doing
the work and changed Marc to a reporter.

The VMX APICv fixes are for bugs found when writing tests. *sigh*
I didn't Cc those to stable as the odds of breaking something when touching
the MSR bitmaps seemed higher than someone caring about a 10 year old bug.

AMD x2AVIC support may or may not suffer similar interception bugs, but I
don't have hardware to test and this already snowballed further than
expected...

[*] https://lore.kernel.org/kvm/[email protected]

Sean Christopherson (6):
KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPI
KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
KVM: x86: Mark x2APIC DFR reg as non-existent for x2APIC
KVM: x86: Split out logic to generate "readable" APIC regs mask to
helper
KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC
regs
KVM: VMX: Intercept reads to invalid and write-only x2APIC registers

arch/x86/kvm/lapic.c | 55 ++++++++++++++++++++++++++----------------
arch/x86/kvm/lapic.h | 2 ++
arch/x86/kvm/vmx/vmx.c | 40 +++++++++++++++---------------
3 files changed, 57 insertions(+), 40 deletions(-)


base-commit: 91dc252b0dbb6879e4067f614df1e397fec532a1
--
2.39.0.314.g84b9a713c41-goog


2023-01-07 01:42:14

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 5/6] KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regs

Don't clear the "read" bits for x2APIC registers above SELF_IPI (APIC regs
0x400 - 0xff0, MSRs 0x840 - 0x8ff). KVM doesn't emulate registers in that
space (there are a smattering of AMD-only extensions) and so should
intercept reads in order to inject #GP. When APICv is fully enabled,
Intel hardware doesn't validate the registers on RDMSR and instead blindly
retrieves data from the vAPIC page, i.e. it's software's responsibility to
intercept reads to non-existent MSRs.

Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++------------------
1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c788aa382611..82c61c16f8f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4018,26 +4018,17 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
vmx_set_msr_bitmap_write(msr_bitmap, msr);
}

-static void vmx_reset_x2apic_msrs(struct kvm_vcpu *vcpu, u8 mode)
-{
- unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
- unsigned long read_intercept;
- int msr;
-
- read_intercept = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
-
- for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
- unsigned int read_idx = msr / BITS_PER_LONG;
- unsigned int write_idx = read_idx + (0x800 / sizeof(long));
-
- msr_bitmap[read_idx] = read_intercept;
- msr_bitmap[write_idx] = ~0ul;
- }
-}
-
static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
{
+ /*
+ * x2APIC indices for 64-bit accesses into the RDMSR and WRMSR halves
+ * of the MSR bitmap. KVM emulates APIC registers up through 0x3f0,
+ * i.e. MSR 0x83f, and so only needs to dynamically manipulate 64 bits.
+ */
+ const int read_idx = APIC_BASE_MSR / BITS_PER_LONG_LONG;
+ const int write_idx = read_idx + (0x800 / sizeof(u64));
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ u64 *msr_bitmap = (u64 *)vmx->vmcs01.msr_bitmap;
u8 mode;

if (!cpu_has_vmx_msr_bitmap())
@@ -4058,7 +4049,18 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)

vmx->x2apic_msr_bitmap_mode = mode;

- vmx_reset_x2apic_msrs(vcpu, mode);
+ /*
+ * Reset the bitmap for MSRs 0x800 - 0x83f. Leave AMD's uber-extended
+ * registers (0x840 and above) intercepted, KVM doesn't support them.
+ * Intercept all writes by default and poke holes as needed. Pass
+ * through all reads by default in x2APIC+APICv mode, as all registers
+ * except the current timer count are passed through for read.
+ */
+ if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
+ msr_bitmap[read_idx] = 0;
+ else
+ msr_bitmap[read_idx] = ~0ull;
+ msr_bitmap[write_idx] = ~0ull;

/*
* TPR reads and writes can be virtualized even if virtual interrupt
--
2.39.0.314.g84b9a713c41-goog

2023-01-08 18:32:35

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regs

On Sat, 2023-01-07 at 01:10 +0000, Sean Christopherson wrote:
> Don't clear the "read" bits for x2APIC registers above SELF_IPI (APIC regs
> 0x400 - 0xff0, MSRs 0x840 - 0x8ff). KVM doesn't emulate registers in that
> space (there are a smattering of AMD-only extensions) and so should
> intercept reads in order to inject #GP. When APICv is fully enabled,
> Intel hardware doesn't validate the registers on RDMSR and instead blindly
> retrieves data from the vAPIC page, i.e. it's software's responsibility to
> intercept reads to non-existent MSRs.
>
> Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 38 ++++++++++++++++++++------------------
> 1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c788aa382611..82c61c16f8f5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4018,26 +4018,17 @@ void vmx_enable_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, int type)
> vmx_set_msr_bitmap_write(msr_bitmap, msr);
> }
>
> -static void vmx_reset_x2apic_msrs(struct kvm_vcpu *vcpu, u8 mode)
> -{
> - unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
> - unsigned long read_intercept;
> - int msr;
> -
> - read_intercept = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
> -
> - for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
> - unsigned int read_idx = msr / BITS_PER_LONG;
> - unsigned int write_idx = read_idx + (0x800 / sizeof(long));
> -
> - msr_bitmap[read_idx] = read_intercept;
> - msr_bitmap[write_idx] = ~0ul;
> - }
> -}
> -
> static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
> {
> + /*
> + * x2APIC indices for 64-bit accesses into the RDMSR and WRMSR halves
> + * of the MSR bitmap. KVM emulates APIC registers up through 0x3f0,
> + * i.e. MSR 0x83f, and so only needs to dynamically manipulate 64 bits.
> + */
The above comment is better to be placed down below, near the actual write,
otherwise it is confusing.

> + const int read_idx = APIC_BASE_MSR / BITS_PER_LONG_LONG;
> + const int write_idx = read_idx + (0x800 / sizeof(u64));
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> + u64 *msr_bitmap = (u64 *)vmx->vmcs01.msr_bitmap;
> u8 mode;
>
> if (!cpu_has_vmx_msr_bitmap())
> @@ -4058,7 +4049,18 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
>
> vmx->x2apic_msr_bitmap_mode = mode;
>
> - vmx_reset_x2apic_msrs(vcpu, mode);
> + /*
> + * Reset the bitmap for MSRs 0x800 - 0x83f. Leave AMD's uber-extended
> + * registers (0x840 and above) intercepted, KVM doesn't support them.

I don't think AMD calls them uber-extended. Just extended.

From a quick glance, these could have beeing very useful for VFIO passthrough of INT-X interrupts,
removing the need to mask the interrupt on per PCI device basis - instead you can just leave
the IRQ pending in ISR, while using SEOI and IER to ignore this pending bit for host.

I understand that the days of INT-X are long gone (and especially days of shared IRQ lines...)
and every sane device uses MSI/-X instead, but still.


> + * Intercept all writes by default and poke holes as needed. Pass
> + * through all reads by default in x2APIC+APICv mode, as all registers
> + * except the current timer count are passed through for read.
> + */
> + if (mode & MSR_BITMAP_MODE_X2APIC_APICV)
> + msr_bitmap[read_idx] = 0;
> + else
> + msr_bitmap[read_idx] = ~0ull;
> + msr_bitmap[write_idx] = ~0ull;
>
> /*
> * TPR reads and writes can be virtualized even if virtual interrupt

Other than the note about the comment,

Reviewed-by: Maxim Levitsky <[email protected]>


Best regards,
Maxim Levitsky

2023-01-09 16:45:32

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regs

On Sun, Jan 08, 2023, Maxim Levitsky wrote:
> On Sat, 2023-01-07 at 01:10 +0000, Sean Christopherson wrote:
> > static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
> > {
> > + /*
> > + * x2APIC indices for 64-bit accesses into the RDMSR and WRMSR halves
> > + * of the MSR bitmap. KVM emulates APIC registers up through 0x3f0,
> > + * i.e. MSR 0x83f, and so only needs to dynamically manipulate 64 bits.
> > + */
> The above comment is better to be placed down below, near the actual write,
> otherwise it is confusing.

Can you elaborate on why it's confusing? The intent of this specific comment is
to capture why the index calculations use BITS_PER_LONG_LONG and sizeof(u64).

> > + const int read_idx = APIC_BASE_MSR / BITS_PER_LONG_LONG;
> > + const int write_idx = read_idx + (0x800 / sizeof(u64));
> > struct vcpu_vmx *vmx = to_vmx(vcpu);
> > + u64 *msr_bitmap = (u64 *)vmx->vmcs01.msr_bitmap;
> > u8 mode;
> >
> > if (!cpu_has_vmx_msr_bitmap())
> > @@ -4058,7 +4049,18 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu)
> >
> > vmx->x2apic_msr_bitmap_mode = mode;
> >
> > - vmx_reset_x2apic_msrs(vcpu, mode);
> > + /*
> > + * Reset the bitmap for MSRs 0x800 - 0x83f. Leave AMD's uber-extended
> > + * registers (0x840 and above) intercepted, KVM doesn't support them.
>
> I don't think AMD calls them uber-extended. Just extended.

Yeah, I took some creative liberaties. I want to avoid confusion with the more
common use of Extended APIC (x2APIC).

2023-01-09 17:31:14

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 5/6] KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regs

On Fri, Jan 6, 2023 at 5:10 PM Sean Christopherson <[email protected]> wrote:
>
> Don't clear the "read" bits for x2APIC registers above SELF_IPI (APIC regs

Odd use of quotation marks in the shortlog and here.

> 0x400 - 0xff0, MSRs 0x840 - 0x8ff). KVM doesn't emulate registers in that
> space (there are a smattering of AMD-only extensions) and so should
> intercept reads in order to inject #GP. When APICv is fully enabled,
> Intel hardware doesn't validate the registers on RDMSR and instead blindly
> retrieves data from the vAPIC page, i.e. it's software's responsibility to
> intercept reads to non-existent MSRs.
>
> Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2023-01-13 18:37:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: x2APIC reserved bits/regs fixes

On 1/7/23 02:10, Sean Christopherson wrote:
> Fixes for edge cases where KVM mishandles reserved bits/regs checks when
> the vCPU is in x2APIC mode.
>
> The first two patches were previously posted[*], but both patches were
> broken (as posted against upstream), hence I took full credit for doing
> the work and changed Marc to a reporter.
>
> The VMX APICv fixes are for bugs found when writing tests. *sigh*
> I didn't Cc those to stable as the odds of breaking something when touching
> the MSR bitmaps seemed higher than someone caring about a 10 year old bug.
>
> AMD x2AVIC support may or may not suffer similar interception bugs, but I
> don't have hardware to test and this already snowballed further than
> expected...
>
> [*] https://lore.kernel.org/kvm/[email protected]

Looks good; please feel free to start gathering this in your tree for 6.3.

Next week I'll go through Ben's series as well as Aaron's "Clean up the
supported xfeatures" and others.

Let me know if you would like me to queue anything of these instead, and
please remember to set up the tree in linux-next. :)

Thanks,

Paolo

> Sean Christopherson (6):
> KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPI
> KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
> KVM: x86: Mark x2APIC DFR reg as non-existent for x2APIC
> KVM: x86: Split out logic to generate "readable" APIC regs mask to
> helper
> KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC
> regs
> KVM: VMX: Intercept reads to invalid and write-only x2APIC registers
>
> arch/x86/kvm/lapic.c | 55 ++++++++++++++++++++++++++----------------
> arch/x86/kvm/lapic.h | 2 ++
> arch/x86/kvm/vmx/vmx.c | 40 +++++++++++++++---------------
> 3 files changed, 57 insertions(+), 40 deletions(-)
>
>
> base-commit: 91dc252b0dbb6879e4067f614df1e397fec532a1

2023-01-13 19:06:21

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: x2APIC reserved bits/regs fixes

On Fri, Jan 13, 2023, Paolo Bonzini wrote:
> On 1/7/23 02:10, Sean Christopherson wrote:
> > Fixes for edge cases where KVM mishandles reserved bits/regs checks when
> > the vCPU is in x2APIC mode.
> >
> > The first two patches were previously posted[*], but both patches were
> > broken (as posted against upstream), hence I took full credit for doing
> > the work and changed Marc to a reporter.
> >
> > The VMX APICv fixes are for bugs found when writing tests. *sigh*
> > I didn't Cc those to stable as the odds of breaking something when touching
> > the MSR bitmaps seemed higher than someone caring about a 10 year old bug.
> >
> > AMD x2AVIC support may or may not suffer similar interception bugs, but I
> > don't have hardware to test and this already snowballed further than
> > expected...
> >
> > [*] https://lore.kernel.org/kvm/[email protected]
>
> Looks good; please feel free to start gathering this in your tree for 6.3.

Thanks!

> Next week I'll go through Ben's series as well as Aaron's "Clean up the
> supported xfeatures" and others.
>
> Let me know if you would like me to queue anything of these instead, and
> please remember to set up the tree in linux-next. :)

Ya, next week is going to be dedicated to sorting out maintenance mechanics.

2023-01-20 00:35:24

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: x2APIC reserved bits/regs fixes

On Sat, 07 Jan 2023 01:10:19 +0000, Sean Christopherson wrote:
> Fixes for edge cases where KVM mishandles reserved bits/regs checks when
> the vCPU is in x2APIC mode.
>
> The first two patches were previously posted[*], but both patches were
> broken (as posted against upstream), hence I took full credit for doing
> the work and changed Marc to a reporter.
>
> [...]

Applied to kvm-x86 apic, thanks past me!

[1/6] KVM: x86: Inject #GP if WRMSR sets reserved bits in APIC Self-IPI
https://github.com/kvm-x86/linux/commit/aeee623ea411
[2/6] KVM: x86: Inject #GP on x2APIC WRMSR that sets reserved bits 63:32
https://github.com/kvm-x86/linux/commit/a927a2508121
[3/6] KVM: x86: Mark x2APIC DFR reg as non-existent for x2APIC
https://github.com/kvm-x86/linux/commit/6d4719e1b5a2
[4/6] KVM: x86: Split out logic to generate "readable" APIC regs mask to helper
https://github.com/kvm-x86/linux/commit/1088d5e5cf70
[5/6] KVM: VMX: Always intercept accesses to unsupported "extended" x2APIC regs
https://github.com/kvm-x86/linux/commit/cbb3f75487a9
[6/6] KVM: VMX: Intercept reads to invalid and write-only x2APIC registers
https://github.com/kvm-x86/linux/commit/7b205379c53d

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes