Patch 1 is an unofficial patch from Peter to fix x2APIC MSR interception
on non-APICV systems. As Peter suggested, it really should be squashed
with commit 3eb900173c71 ("KVM: x86: VMX: Prevent MSR passthrough when MSR
access is denied"). Without the fix, KVM is completely busted on
non-APICV systems.
Patch 2 is a cleanup of sorts to revert back to the pre-filtering approach
of initializing the x2APIC MSR bitmaps for APICV.
Note, I haven't tested on an APICV system. My APICV system appears to
have crashed over the weekend and I haven't yet journeyed back to the
lab to kick it.
Peter Xu (1):
KVM: VMX: Fix x2APIC MSR intercept handling on !APICV platforms
Sean Christopherson (1):
KVM: VMX: Ignore userspace MSR filters for x2APIC when APICV is
enabled
arch/x86/kvm/vmx/vmx.c | 45 ++++++++++++++++++++++++++++--------------
1 file changed, 30 insertions(+), 15 deletions(-)
--
2.28.0
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore
userspace filtering when APICV is enabled. Allowing userspace to
intercept reads to x2APIC MSRs when APICV is fully enabled for the guest
simply can't work. The LAPIC and thus virtual APIC is in-kernel and
cannot be directly accessed by userspace. If userspace wants to
intercept x2APIC MSRs, then it should first disable APICV.
Opportunistically change the behavior to reset the full range of MSRs if
and only if APICV is enabled for KVM. The MSR bitmaps are initialized
to intercept all reads and writes by default, and enable_apicv cannot be
toggled after KVM is loaded. I.e. if APICV is disabled, simply toggle
the TPR MSR accordingly.
Note, this still allows userspace to intercept reads and writes to TPR,
and writes to EOI and SELF_IPI. It is at least plausible userspace
interception could work for those registers, though it is still silly.
Cc: Alexander Graf <[email protected]>
Cc: Aaron Lewis <[email protected]>
Cc: Peter Xu <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 46 +++++++++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 25ef0b22ac9e..e23c41ccfac9 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3782,28 +3782,42 @@ static u8 vmx_msr_bitmap_mode(struct kvm_vcpu *vcpu)
return mode;
}
-static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
+static void vmx_reset_x2apic_msrs_for_apicv(struct kvm_vcpu *vcpu, u8 mode)
{
+ unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+ unsigned long read_intercept;
int msr;
- for (msr = 0x800; msr <= 0x8ff; msr++) {
- bool apicv = !!(mode & MSR_BITMAP_MODE_X2APIC_APICV);
+ read_intercept = (mode & MSR_BITMAP_MODE_X2APIC_APICV) ? 0 : ~0;
- vmx_set_intercept_for_msr(vcpu, msr, MSR_TYPE_R, !apicv);
- vmx_set_intercept_for_msr(vcpu, msr, MSR_TYPE_W, true);
+ for (msr = 0x800; msr <= 0x8ff; msr += BITS_PER_LONG) {
+ unsigned int read_idx = msr / BITS_PER_LONG;
+ unsigned int write_idx = read_idx + (0x800 / sizeof(long));
+
+ msr_bitmap[read_idx] = read_intercept;
+ msr_bitmap[write_idx] = ~0ul;
}
+}
- if (mode & MSR_BITMAP_MODE_X2APIC) {
- /*
- * TPR reads and writes can be virtualized even if virtual interrupt
- * delivery is not in use.
- */
- vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW);
- if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
- vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
- vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
- vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
- }
+static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
+{
+ if (!cpu_has_vmx_msr_bitmap())
+ return;
+
+ if (enable_apicv)
+ vmx_reset_x2apic_msrs_for_apicv(vcpu, mode);
+
+ /*
+ * TPR reads and writes can be virtualized even if virtual interrupt
+ * delivery is not in use.
+ */
+ vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TASKPRI), MSR_TYPE_RW,
+ !(mode & MSR_BITMAP_MODE_X2APIC));
+
+ if (mode & MSR_BITMAP_MODE_X2APIC_APICV) {
+ vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
+ vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
+ vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
}
}
--
2.28.0
On 05.10.20 21:55, Sean Christopherson wrote:
>
> Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore
> userspace filtering when APICV is enabled. Allowing userspace to
> intercept reads to x2APIC MSRs when APICV is fully enabled for the guest
> simply can't work. The LAPIC and thus virtual APIC is in-kernel and
> cannot be directly accessed by userspace. If userspace wants to
> intercept x2APIC MSRs, then it should first disable APICV.
>
> Opportunistically change the behavior to reset the full range of MSRs if
> and only if APICV is enabled for KVM. The MSR bitmaps are initialized
> to intercept all reads and writes by default, and enable_apicv cannot be
> toggled after KVM is loaded. I.e. if APICV is disabled, simply toggle
> the TPR MSR accordingly.
>
> Note, this still allows userspace to intercept reads and writes to TPR,
> and writes to EOI and SELF_IPI. It is at least plausible userspace
> interception could work for those registers, though it is still silly.
>
> Cc: Alexander Graf <[email protected]>
> Cc: Aaron Lewis <[email protected]>
> Cc: Peter Xu <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
I'm not opposed in general to leaving APICV handled registers out of the
filtering logic. However, this really needs a note in the documentation
then, no?
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879
On Wed, Oct 07, 2020 at 04:01:59PM +0200, Alexander Graf wrote:
>
>
> On 05.10.20 21:55, Sean Christopherson wrote:
> >
> > Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore
> > userspace filtering when APICV is enabled. Allowing userspace to
> > intercept reads to x2APIC MSRs when APICV is fully enabled for the guest
> > simply can't work. The LAPIC and thus virtual APIC is in-kernel and
> > cannot be directly accessed by userspace. If userspace wants to
> > intercept x2APIC MSRs, then it should first disable APICV.
> >
> > Opportunistically change the behavior to reset the full range of MSRs if
> > and only if APICV is enabled for KVM. The MSR bitmaps are initialized
> > to intercept all reads and writes by default, and enable_apicv cannot be
> > toggled after KVM is loaded. I.e. if APICV is disabled, simply toggle
> > the TPR MSR accordingly.
> >
> > Note, this still allows userspace to intercept reads and writes to TPR,
> > and writes to EOI and SELF_IPI. It is at least plausible userspace
> > interception could work for those registers, though it is still silly.
> >
> > Cc: Alexander Graf <[email protected]>
> > Cc: Aaron Lewis <[email protected]>
> > Cc: Peter Xu <[email protected]>
> > Signed-off-by: Sean Christopherson <[email protected]>
>
> I'm not opposed in general to leaving APICV handled registers out of the
> filtering logic. However, this really needs a note in the documentation
> then, no?
If we want to forbid apicv msrs, should we even fail KVM_X86_SET_MSR_FILTER
directly then?
I've no strong opinion on whether these msrs should be restricted. I'm not sure
whether my understanding is correct here - to me, kvm should always depend on
the userspace to do the right thing to make the vm work. To me, as long as the
error is self-contained and it does not affect kvm as a whole or the host, then
it seems still fine.
However I do agree that I also worried about vmx_update_msr_bitmap_x2apic()
being slower. Majorly I see calls from vmx_refresh_apicv_exec_ctrl() or
nested, so I'm not sure whether that could make sense for some workload. Btw,
that seems to be another change corresponds to the idea to restrict msr
filitering on apicv regs.
Thanks,
--
Peter Xu
On 07/10/20 18:44, Peter Xu wrote:
> If we want to forbid apicv msrs, should we even fail KVM_X86_SET_MSR_FILTER
> directly then?
Yes, probably it should. I'll send a patch shortly.
Paolo