2022-07-25 04:07:55

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

AMD does not support APIC TSC-deadline timer mode. AVIC hardware
will generate GP fault when guest kernel writes 1 to bits [18]
of the APIC LVTT register (offset 0x32) to set the timer mode.
(Note: bit 18 is reserved on AMD system).

Therefore, always intercept and let KVM emulate the MSR accesses.

Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kvm/svm/svm.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index aef63aae922d..3e0639a68385 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -118,7 +118,14 @@ static const struct svm_direct_access_msrs {
{ .index = X2APIC_MSR(APIC_ESR), .always = false },
{ .index = X2APIC_MSR(APIC_ICR), .always = false },
{ .index = X2APIC_MSR(APIC_ICR2), .always = false },
- { .index = X2APIC_MSR(APIC_LVTT), .always = false },
+
+ /*
+ * Note:
+ * AMD does not virtualize APIC TSC-deadline timer mode, but it is
+ * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
+ * the AVIC hardware would generate GP fault. Therefore, always
+ * intercept the MSR 0x832, and do not setup direct_access_msr.
+ */
{ .index = X2APIC_MSR(APIC_LVTTHMR), .always = false },
{ .index = X2APIC_MSR(APIC_LVTPC), .always = false },
{ .index = X2APIC_MSR(APIC_LVT0), .always = false },
--
2.34.1


2022-07-25 09:55:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

On 7/25/22 05:34, Suravee Suthikulpanit wrote:
> AMD does not support APIC TSC-deadline timer mode. AVIC hardware
> will generate GP fault when guest kernel writes 1 to bits [18]
> of the APIC LVTT register (offset 0x32) to set the timer mode.
> (Note: bit 18 is reserved on AMD system).
>
> Therefore, always intercept and let KVM emulate the MSR accesses.
>
> Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
> Signed-off-by: Suravee Suthikulpanit<[email protected]>

Does this fix some kvm-unit-tests testcase?

Anyway, I queued the patch, thanks!

Paolo

2022-07-25 13:31:53

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register



On 7/25/22 4:46 PM, Paolo Bonzini wrote:
> On 7/25/22 05:34, Suravee Suthikulpanit wrote:
>> AMD does not support APIC TSC-deadline timer mode. AVIC hardware
>> will generate GP fault when guest kernel writes 1 to bits [18]
>> of the APIC LVTT register (offset 0x32) to set the timer mode.
>> (Note: bit 18 is reserved on AMD system).
>>
>> Therefore, always intercept and let KVM emulate the MSR accesses.
>>
>> Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
>> Signed-off-by: Suravee Suthikulpanit<[email protected]>
>
> Does this fix some kvm-unit-tests testcase?

I am not sure if we have kvm-unit-tests testcases for this.
I found this when enabling tsc-deadline option in QEMU causing
the vm to fail to boot.

> Anyway, I queued the patch, thanks!
>
> Paolo

Thank you,
Suravee

2022-07-28 07:48:30

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

On Sun, 2022-07-24 at 22:34 -0500, Suravee Suthikulpanit wrote:
> AMD does not support APIC TSC-deadline timer mode. AVIC hardware
> will generate GP fault when guest kernel writes 1 to bits [18]
> of the APIC LVTT register (offset 0x32) to set the timer mode.
> (Note: bit 18 is reserved on AMD system).
>
> Therefore, always intercept and let KVM emulate the MSR accesses.
>
> Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index aef63aae922d..3e0639a68385 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -118,7 +118,14 @@ static const struct svm_direct_access_msrs {
> { .index = X2APIC_MSR(APIC_ESR), .always = false },
> { .index = X2APIC_MSR(APIC_ICR), .always = false },
> { .index = X2APIC_MSR(APIC_ICR2), .always = false },
> - { .index = X2APIC_MSR(APIC_LVTT), .always = false },
> +
> + /*
> + * Note:
> + * AMD does not virtualize APIC TSC-deadline timer mode, but it is
> + * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
> + * the AVIC hardware would generate GP fault. Therefore, always
> + * intercept the MSR 0x832, and do not setup direct_access_msr.
> + */
> { .index = X2APIC_MSR(APIC_LVTTHMR), .always = false },
> { .index = X2APIC_MSR(APIC_LVTPC), .always = false },
> { .index = X2APIC_MSR(APIC_LVT0), .always = false },


LVT is not something I would expect x2avic to even try to emulate, I would expect
it to dumbly forward the write to apic backing page (garbage in, garbage out) and then
signal trap vmexit?

I also think that regular AVIC works like that (just forwards the write to the page).

I am asking because there is a remote possibliity that due to some bug the guest got
direct access to x2apic registers of the host, and this is how you got that #GP.
Could you double check it?

We really need x2avic (and vNMI) spec to be published to know exactly how all of this
is supposed to work.

Best regards,
Maxim Levitsky



2022-07-28 09:17:59

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

Maxim,

On 7/28/22 2:38 PM, Maxim Levitsky wrote:
> On Sun, 2022-07-24 at 22:34 -0500, Suravee Suthikulpanit wrote:
>> AMD does not support APIC TSC-deadline timer mode. AVIC hardware
>> will generate GP fault when guest kernel writes 1 to bits [18]
>> of the APIC LVTT register (offset 0x32) to set the timer mode.
>> (Note: bit 18 is reserved on AMD system).
>>
>> Therefore, always intercept and let KVM emulate the MSR accesses.
>>
>> Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> arch/x86/kvm/svm/svm.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index aef63aae922d..3e0639a68385 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -118,7 +118,14 @@ static const struct svm_direct_access_msrs {
>> { .index = X2APIC_MSR(APIC_ESR), .always = false },
>> { .index = X2APIC_MSR(APIC_ICR), .always = false },
>> { .index = X2APIC_MSR(APIC_ICR2), .always = false },
>> - { .index = X2APIC_MSR(APIC_LVTT), .always = false },
>> +
>> + /*
>> + * Note:
>> + * AMD does not virtualize APIC TSC-deadline timer mode, but it is
>> + * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
>> + * the AVIC hardware would generate GP fault. Therefore, always
>> + * intercept the MSR 0x832, and do not setup direct_access_msr.
>> + */
>> { .index = X2APIC_MSR(APIC_LVTTHMR), .always = false },
>> { .index = X2APIC_MSR(APIC_LVTPC), .always = false },
>> { .index = X2APIC_MSR(APIC_LVT0), .always = false },
>
>
> LVT is not something I would expect x2avic to even try to emulate, I would expect
> it to dumbly forward the write to apic backing page (garbage in, garbage out) and then
> signal trap vmexit?
>
> I also think that regular AVIC works like that (just forwards the write to the page).

The main difference b/w AVIC and x2AVIC is the MSR interception control, which needs to
not-intercept x2APIC MSRs for x2AVIC (allowing HW to virtualize MSR accesses).
However, the hypervisor can decide which x2APIC MSR to intercept and emulate.

> I am asking because there is a remote possibility that due to some bug the guest got
> direct access to x2apic registers of the host, and this is how you got that #GP.
> Could you double check it?

I have verified this behavior with the HW designer and requested them to document
this in the next AMD programmers manual that will include x2AVIC details.

> We really need x2avic (and vNMI) spec to be published to know exactly how all of this
> is supposed to work.

I have raised the concern to the team responsible for publishing the doc.

Best Regards,
Suravee

2022-07-28 11:04:43

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH] KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

On Thu, 2022-07-28 at 15:55 +0700, Suravee Suthikulpanit wrote:
> Maxim,
>
> On 7/28/22 2:38 PM, Maxim Levitsky wrote:
> > On Sun, 2022-07-24 at 22:34 -0500, Suravee Suthikulpanit wrote:
> > > AMD does not support APIC TSC-deadline timer mode. AVIC hardware
> > > will generate GP fault when guest kernel writes 1 to bits [18]
> > > of the APIC LVTT register (offset 0x32) to set the timer mode.
> > > (Note: bit 18 is reserved on AMD system).
> > >
> > > Therefore, always intercept and let KVM emulate the MSR accesses.
> > >
> > > Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
> > > Signed-off-by: Suravee Suthikulpanit <[email protected]>
> > > ---
> > > arch/x86/kvm/svm/svm.c | 9 ++++++++-
> > > 1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index aef63aae922d..3e0639a68385 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -118,7 +118,14 @@ static const struct svm_direct_access_msrs {
> > > { .index = X2APIC_MSR(APIC_ESR), .always = false },
> > > { .index = X2APIC_MSR(APIC_ICR), .always = false },
> > > { .index = X2APIC_MSR(APIC_ICR2), .always = false },
> > > - { .index = X2APIC_MSR(APIC_LVTT), .always = false },
> > > +
> > > + /*
> > > + * Note:
> > > + * AMD does not virtualize APIC TSC-deadline timer mode, but it is
> > > + * emulated by KVM. When setting APIC LVTT (0x832) register bit 18,
> > > + * the AVIC hardware would generate GP fault. Therefore, always
> > > + * intercept the MSR 0x832, and do not setup direct_access_msr.
> > > + */
> > > { .index = X2APIC_MSR(APIC_LVTTHMR), .always = false },
> > > { .index = X2APIC_MSR(APIC_LVTPC), .always = false },
> > > { .index = X2APIC_MSR(APIC_LVT0), .always = false },
> >
> > LVT is not something I would expect x2avic to even try to emulate, I would expect
> > it to dumbly forward the write to apic backing page (garbage in, garbage out) and then
> > signal trap vmexit?
> >
> > I also think that regular AVIC works like that (just forwards the write to the page).
>
> The main difference b/w AVIC and x2AVIC is the MSR interception control, which needs to
> not-intercept x2APIC MSRs for x2AVIC (allowing HW to virtualize MSR accesses).
> However, the hypervisor can decide which x2APIC MSR to intercept and emulate.


>
> > I am asking because there is a remote possibility that due to some bug the guest got
> > direct access to x2apic registers of the host, and this is how you got that #GP.
> > Could you double check it?
>
> I have verified this behavior with the HW designer and requested them to document
> this in the next AMD programmers manual that will include x2AVIC details.

I guess this implies that when guest has direct access to LVTT msr, x2avic redirection
happens after microcode already checked some things, like reserved bits.

You are also welcome to check vs hardware team, how all other apic msrs behave - there could be similar
cases, maybe even some msrs which don't go through x2avic flow.

Assuming that this it is really the case (I am just very afraid of CVEs),
then this patch is all right.

So with all that said:

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky

>
> > We really need x2avic (and vNMI) spec to be published to know exactly how all of this
> > is supposed to work.
>
> I have raised the concern to the team responsible for publishing the doc.
>
> Best Regards,
> Suravee
>