2024-04-16 05:04:33

by Ravi Bangoria

[permalink] [raw]
Subject: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

Currently, LBR Virtualization is dynamically enabled and disabled for
a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
avoiding unnecessary save/restore of LBR MSRs when nobody is using it
in the guest. However, SEV-ES guest mandates LBR Virtualization to be
_always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
guest, in fact it results into fatal error:

SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1

[guest ~]# wrmsr 0x1d9 0x4
KVM: entry failed, hardware error 0xffffffff
EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
...

Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
is of swap type A.

[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
2023, Vol 2, 15.35.2 Enabling SEV-ES.
https://bugzilla.kernel.org/attachment.cgi?id=304653

Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
Signed-off-by: Ravi Bangoria <[email protected]>
Reviewed-by: Nikunj A Dadhania <[email protected]>
Reviewed-by: Tom Lendacky <[email protected]>
---
v1: https://lore.kernel.org/r/[email protected]
v1->v2:
- Add MSR swap type detail in the patch description. No code changes.

arch/x86/kvm/svm/sev.c | 1 +
arch/x86/kvm/svm/svm.c | 1 +
arch/x86/kvm/svm/svm.h | 2 +-
3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a8ce5226b3b5..ef932a7ff9bd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3073,6 +3073,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
/* Clear intercepts on selected MSRs */
set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
+ set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e90b429c84f1..5a82135ae84e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs {
{ .index = MSR_IA32_SPEC_CTRL, .always = false },
{ .index = MSR_IA32_PRED_CMD, .always = false },
{ .index = MSR_IA32_FLUSH_CMD, .always = false },
+ { .index = MSR_IA32_DEBUGCTLMSR, .always = false },
{ .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
{ .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
{ .index = MSR_IA32_LASTINTFROMIP, .always = false },
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 8ef95139cd24..7a1b60bcebff 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
#define IOPM_SIZE PAGE_SIZE * 3
#define MSRPM_SIZE PAGE_SIZE * 2

-#define MAX_DIRECT_ACCESS_MSRS 47
+#define MAX_DIRECT_ACCESS_MSRS 48
#define MSRPM_OFFSETS 32
extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
extern bool npt_enabled;
--
2.44.0



2024-04-16 08:49:43

by Gupta, Pankaj

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

On 4/16/2024 7:03 AM, Ravi Bangoria wrote:
> Currently, LBR Virtualization is dynamically enabled and disabled for
> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
> guest, in fact it results into fatal error:
>
> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
>
> [guest ~]# wrmsr 0x1d9 0x4
> KVM: entry failed, hardware error 0xffffffff
> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
> ...
>
> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
> No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR
> is of swap type A.
>
> [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June
> 2023, Vol 2, 15.35.2 Enabling SEV-ES.
> https://bugzilla.kernel.org/attachment.cgi?id=304653
>
> Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading")
> Signed-off-by: Ravi Bangoria <[email protected]>
> Reviewed-by: Nikunj A Dadhania <[email protected]>
> Reviewed-by: Tom Lendacky <[email protected]>

Reviewed-by: Pankaj Gupta <[email protected]>

> ---
> v1: https://lore.kernel.org/r/[email protected]
> v1->v2:
> - Add MSR swap type detail in the patch description. No code changes.
>
> arch/x86/kvm/svm/sev.c | 1 +
> arch/x86/kvm/svm/svm.c | 1 +
> arch/x86/kvm/svm/svm.h | 2 +-
> 3 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index a8ce5226b3b5..ef932a7ff9bd 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3073,6 +3073,7 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
> /* Clear intercepts on selected MSRs */
> set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1);
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1);
> + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1);
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1);
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1);
> set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index e90b429c84f1..5a82135ae84e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs {
> { .index = MSR_IA32_SPEC_CTRL, .always = false },
> { .index = MSR_IA32_PRED_CMD, .always = false },
> { .index = MSR_IA32_FLUSH_CMD, .always = false },
> + { .index = MSR_IA32_DEBUGCTLMSR, .always = false },
> { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false },
> { .index = MSR_IA32_LASTBRANCHTOIP, .always = false },
> { .index = MSR_IA32_LASTINTFROMIP, .always = false },
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 8ef95139cd24..7a1b60bcebff 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -30,7 +30,7 @@
> #define IOPM_SIZE PAGE_SIZE * 3
> #define MSRPM_SIZE PAGE_SIZE * 2
>
> -#define MAX_DIRECT_ACCESS_MSRS 47
> +#define MAX_DIRECT_ACCESS_MSRS 48
> #define MSRPM_OFFSETS 32
> extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> extern bool npt_enabled;


2024-05-02 23:52:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

On Tue, Apr 16, 2024, Ravi Bangoria wrote:
> Currently, LBR Virtualization is dynamically enabled and disabled for
> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
> guest, in fact it results into fatal error:
>
> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
>
> [guest ~]# wrmsr 0x1d9 0x4
> KVM: entry failed, hardware error 0xffffffff
> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
> ...
>
> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.

Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_
disconnect between the first paragraph and this statement.

Oh, good gravy, it "works" because SEV already forces LBR virtualization.

svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;

(a) the changelog needs to call that out. (b) KVM needs to disallow SEV-ES if
LBR virtualization is disabled by the admin, i.e. if lbrv=false.

Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more
printks about why SEV-ES couldn't be enabled.

Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID
bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's
called now). And there are hardly any checks on the feature, so it's not like
having a boolean saves anything. AMD is clearly committed to making sure LBRV
works, so the odds of KVM really getting much value out of a module param is low.

And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in
sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS
and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard
requirement for SEV-ES (that's an understatment; I'm curious how some decided
that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").

2024-05-06 04:49:35

by Ravi Bangoria

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

On 03-May-24 5:21 AM, Sean Christopherson wrote:
> On Tue, Apr 16, 2024, Ravi Bangoria wrote:
>> Currently, LBR Virtualization is dynamically enabled and disabled for
>> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
>> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
>> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
>> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
>> guest, in fact it results into fatal error:
>>
>> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
>>
>> [guest ~]# wrmsr 0x1d9 0x4
>> KVM: entry failed, hardware error 0xffffffff
>> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
>> ...
>>
>> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
>
> Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_
> disconnect between the first paragraph and this statement.
>
> Oh, good gravy, it "works" because SEV already forces LBR virtualization.
>
> svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
>
> (a) the changelog needs to call that out.

Sorry, I should have called that out explicitly.

> (b) KVM needs to disallow SEV-ES if
> LBR virtualization is disabled by the admin, i.e. if lbrv=false.

That's what I initially thought. But since KVM currently allows booting SEV-ES
guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would
be a behavior change.

> Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more
> printks about why SEV-ES couldn't be enabled.
>
> Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID
> bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's
> called now). And there are hardly any checks on the feature, so it's not like
> having a boolean saves anything. AMD is clearly committed to making sure LBRV
> works, so the odds of KVM really getting much value out of a module param is low.

Currently, lbrv is not enabled by default with model specific -cpu profiles in
qemu. So I guess this is not backward compatible?

> And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in
> sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS
> and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard
> requirement for SEV-ES (that's an understatment; I'm curious how some decided
> that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").

I'm not sure. Some ES internal dependency.

In any case, the patch simply fixes 'missed clearing MSR Interception' for
SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv
cleanup as a followup series?

PS: AMD Bus Lock Detect virtualization also dependents on LBR Virtualization:
https://lore.kernel.org/r/[email protected]

Thanks for the review,
Ravi

2024-05-07 19:09:27

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

On Mon, May 06, 2024, Ravi Bangoria wrote:
> On 03-May-24 5:21 AM, Sean Christopherson wrote:
> > On Tue, Apr 16, 2024, Ravi Bangoria wrote:
> >> Currently, LBR Virtualization is dynamically enabled and disabled for
> >> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
> >> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
> >> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
> >> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
> >> guest, in fact it results into fatal error:
> >>
> >> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
> >>
> >> [guest ~]# wrmsr 0x1d9 0x4
> >> KVM: entry failed, hardware error 0xffffffff
> >> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
> >> ...
> >>
> >> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
> >
> > Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_
> > disconnect between the first paragraph and this statement.
> >
> > Oh, good gravy, it "works" because SEV already forces LBR virtualization.
> >
> > svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
> >
> > (a) the changelog needs to call that out.
>
> Sorry, I should have called that out explicitly.
>
> > (b) KVM needs to disallow SEV-ES if
> > LBR virtualization is disabled by the admin, i.e. if lbrv=false.
>
> That's what I initially thought. But since KVM currently allows booting SEV-ES
> guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would
> be a behavior change.

IMO, that's totally fine. There are no hard guarantees regarding module params,

> > Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more
> > printks about why SEV-ES couldn't be enabled.
> >
> > Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID
> > bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's
> > called now). And there are hardly any checks on the feature, so it's not like
> > having a boolean saves anything. AMD is clearly committed to making sure LBRV
> > works, so the odds of KVM really getting much value out of a module param is low.
>
> Currently, lbrv is not enabled by default with model specific -cpu profiles in
> qemu. So I guess this is not backward compatible?

I am talking about LBRV being disabled in the _host_ kernel, not guest CPUID.
QEMU enabling LBRV only affects nested SVM, which is out of scope for SEV-ES.

> > And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in
> > sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS
> > and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard
> > requirement for SEV-ES (that's an understatment; I'm curious how some decided
> > that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").
>
> I'm not sure. Some ES internal dependency.
>
> In any case, the patch simply fixes 'missed clearing MSR Interception' for
> SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv
> cleanup as a followup series?

No.

(a) the lbrv module param mess needs to be sorted out.
(b) this is not a complete fix.
(c) I'm not convinced it's the right way to fix this, at all.
(d) there's a big gaping hole in KVM's handling of MSRs that are passed through
to SEV-ES guests.
(e) it's not clear to me that KVM needs to dynamically toggle LBRV for _any_ guest.
(f) I don't like that sev_es_init_vmcb() mucks with the LBRV intercepts without
using svm_enable_lbrv().

Unless I'm missing something, KVM allows userspace to get/set MSRs for SEV-ES
guests, even after the VMSA is encrypted. E.g. a naive userspace could attempt
to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the
target. The proper fix for VMSA being encrypted is to likely to disallow
KVM_{G,S}ET_MSR on MSRs that are contexted switched via the VMSA.

But that doesn't address the issue where KVM will disable LBRV if userspace
sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. The easiest fix for
that is to have svm_disable_lbrv() do nothing for SEV-ES guests, but I'm not
convinced that's the best fix.

AFAICT, host perf doesn't use the relevant MSRs, and even if host perf did use
the MSRs, IIUC there is no "stack", and #VMEXIT retains the guest values for
non-SEV-ES guests. I.e. functionally, running with and without LBRV would be
largely equivalent as far as perf is concerned. The guest could scribble an MSR
with garbage, but overall, host perf wouldn't be meaningfully affected by LBRV.

So unless I'm missing something, the only reason to ever disable LBRV would be
for performance reasons. Indeed the original commits more or less says as much:

commit 24e09cbf480a72f9c952af4ca77b159503dca44b
Author: Joerg Roedel <[email protected]>
AuthorDate: Wed Feb 13 18:58:47 2008 +0100

KVM: SVM: enable LBR virtualization

This patch implements the Last Branch Record Virtualization (LBRV) feature of
the AMD Barcelona and Phenom processors into the kvm-amd module. It will only
be enabled if the guest enables last branch recording in the DEBUG_CTL MSR. So
there is no increased world switch overhead when the guest doesn't use these
MSRs.

but what it _doesn't_ say is what the world switch overhead is when LBRV is
enabled. If the overhead is small, e.g. 20 cycles?, then I see no reason to
keep the dynamically toggling.

And if we ditch the dynamic toggling, then this patch is unnecessary to fix the
LBRV issue. It _is_ necessary to actually let the guest use the LBRs, but that's
a wildly different changelog and justification.

And if we _don't_ ditch the dynamic toggling, then sev_es_init_vmcb() should be
using svm_enable_lbrv(), not open coding the exact same thing.