Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
userspace. This way, userspace can access the capabilities even if it
does not have the permissions to read MSRs.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx.c | 12 +++++++++++-
arch/x86/kvm/x86.c | 1 +
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 130fca0ea1bf..99689061e11e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3228,7 +3228,17 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
{
- return 1;
+ switch (msr->index) {
+ case MSR_IA32_ARCH_CAPABILITIES:
+ if (!boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
+ return 1;
+ rdmsrl(msr->index, msr->data);
+ break;
+ default:
+ return 1;
+ }
+
+ return 0;
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 54b4ed55945b..e9a8cc9e3b2b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1054,6 +1054,7 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
* can be used by a hypervisor to validate requested CPU features.
*/
static u32 msr_based_features[] = {
+ MSR_IA32_ARCH_CAPABILITIES,
MSR_F10H_DECFG,
};
--
1.8.3.1
On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
> Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
> userspace. This way, userspace can access the capabilities even if it
> does not have the permissions to read MSRs.
... That is good but could you expand a bit of why it would want this?
I am 99% sure it is due to the lovely spectre_v2 mitigation but
could you include that in the commit message so that in say a year
folks would know what this is?
Also what branch is this based on? I am not seeing this vmx_get_msr_feature
in kvm/master or kvm/linux-next ?
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 12 +++++++++++-
> arch/x86/kvm/x86.c | 1 +
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 130fca0ea1bf..99689061e11e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3228,7 +3228,17 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
>
> static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> {
> - return 1;
> + switch (msr->index) {
> + case MSR_IA32_ARCH_CAPABILITIES:
> + if (!boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> + return 1;
> + rdmsrl(msr->index, msr->data);
> + break;
> + default:
> + return 1;
> + }
> +
> + return 0;
> }
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 54b4ed55945b..e9a8cc9e3b2b 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1054,6 +1054,7 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
> * can be used by a hypervisor to validate requested CPU features.
> */
> static u32 msr_based_features[] = {
> + MSR_IA32_ARCH_CAPABILITIES,
> MSR_F10H_DECFG,
> };
>
> --
> 1.8.3.1
>
On Mon, Feb 26, 2018 at 05:13:00PM -0500, Konrad Rzeszutek Wilk wrote:
> On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
> > Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
> > userspace. This way, userspace can access the capabilities even if it
> > does not have the permissions to read MSRs.
>
> ... That is good but could you expand a bit of why it would want this?
>
> I am 99% sure it is due to the lovely spectre_v2 mitigation but
> could you include that in the commit message so that in say a year
> folks would know what this is?
>
> Also what branch is this based on? I am not seeing this vmx_get_msr_feature
> in kvm/master or kvm/linux-next ?
Ah I see you posted them! Sorry for that particular noise.
> >
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 12 +++++++++++-
> > arch/x86/kvm/x86.c | 1 +
> > 2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 130fca0ea1bf..99689061e11e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -3228,7 +3228,17 @@ static inline bool vmx_feature_control_msr_valid(struct kvm_vcpu *vcpu,
> >
> > static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
> > {
> > - return 1;
> > + switch (msr->index) {
> > + case MSR_IA32_ARCH_CAPABILITIES:
> > + if (!boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> > + return 1;
> > + rdmsrl(msr->index, msr->data);
> > + break;
> > + default:
> > + return 1;
> > + }
> > +
> > + return 0;
> > }
> >
> > /*
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 54b4ed55945b..e9a8cc9e3b2b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1054,6 +1054,7 @@ bool kvm_rdpmc(struct kvm_vcpu *vcpu)
> > * can be used by a hypervisor to validate requested CPU features.
> > */
> > static u32 msr_based_features[] = {
> > + MSR_IA32_ARCH_CAPABILITIES,
> > MSR_F10H_DECFG,
> > };
> >
> > --
> > 1.8.3.1
> >
[Resent after removing [email protected].]
2018-02-26 17:13-0500, Konrad Rzeszutek Wilk:
> On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
> > Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
> > userspace. This way, userspace can access the capabilities even if it
> > does not have the permissions to read MSRs.
>
> ... That is good but could you expand a bit of why it would want this?
>
> I am 99% sure it is due to the lovely spectre_v2 mitigation but
> could you include that in the commit message so that in say a year
> folks would know what this is?
Userspace can currently get the MSR by creating a VCPU and reading its
MSR_IA32_ARCH_CAPABILITIES, because it is set from the hardware MSR.
I thought that "permissions to read MSRs" talked about hardware MSRs, so
the purpose of this patch would be a better interface, but I don't see
how if we keep the auto-setting on VCPU creation.
Is this aimed towards userspaces that want nothing else from KVM than
the MSR value?
Thanks.
On 01/03/2018 22:39, Radim Krčmář wrote:
> [Resent after removing [email protected].]
>
> 2018-02-26 17:13-0500, Konrad Rzeszutek Wilk:
>> On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
>>> Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
>>> userspace. This way, userspace can access the capabilities even if it
>>> does not have the permissions to read MSRs.
>>
>> ... That is good but could you expand a bit of why it would want this?
>>
>> I am 99% sure it is due to the lovely spectre_v2 mitigation but
>> could you include that in the commit message so that in say a year
>> folks would know what this is?
>
> Userspace can currently get the MSR by creating a VCPU and reading its
> MSR_IA32_ARCH_CAPABILITIES, because it is set from the hardware MSR.
>
> I thought that "permissions to read MSRs" talked about hardware MSRs, so
> the purpose of this patch would be a better interface, but I don't see
> how if we keep the auto-setting on VCPU creation.
Yeah, it's mostly about a better interface and being able to do checks
before creating the VCPU. The commit message was written before I
noticed the auto-setting on VCPU creation, and I failed to update it.
Thanks,
Paolo
> Is this aimed towards userspaces that want nothing else from KVM than
> the MSR value?
>
> Thanks.
>
2018-03-02 10:36+0100, Paolo Bonzini:
> On 01/03/2018 22:39, Radim Krčmář wrote:
> > [Resent after removing [email protected].]
> >
> > 2018-02-26 17:13-0500, Konrad Rzeszutek Wilk:
> >> On Sat, Feb 24, 2018 at 01:52:26AM +0100, Paolo Bonzini wrote:
> >>> Use the new MSR feature framework to expose the ARCH_CAPABILITIES MSR to
> >>> userspace. This way, userspace can access the capabilities even if it
> >>> does not have the permissions to read MSRs.
> >>
> >> ... That is good but could you expand a bit of why it would want this?
> >>
> >> I am 99% sure it is due to the lovely spectre_v2 mitigation but
> >> could you include that in the commit message so that in say a year
> >> folks would know what this is?
> >
> > Userspace can currently get the MSR by creating a VCPU and reading its
> > MSR_IA32_ARCH_CAPABILITIES, because it is set from the hardware MSR.
> >
> > I thought that "permissions to read MSRs" talked about hardware MSRs, so
> > the purpose of this patch would be a better interface, but I don't see
> > how if we keep the auto-setting on VCPU creation.
>
> Yeah, it's mostly about a better interface and being able to do checks
> before creating the VCPU. The commit message was written before I
> noticed the auto-setting on VCPU creation, and I failed to update it.
Ok, sounds good. I've deferred it to rc5 as I think we'll want to use
this to replace the auto setting: I would not bet that it is going to
be safe to expose future bits, so having the userspace always sanitize
the capabilities would be safer (and more in line with what we do with
other MSRs). i.e. this patch would also
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 051dab74e4e9..86ea4a83af1f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5740,9 +5740,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
++vmx->nmsrs;
}
- if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
- rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
-
vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
/* 22.2.1, 20.8.1 */
On 02/03/2018 22:42, Radim Krčmář wrote:
> Ok, sounds good. I've deferred it to rc5 as I think we'll want to use
> this to replace the auto setting: I would not bet that it is going to
> be safe to expose future bits, so having the userspace always sanitize
> the capabilities would be safer (and more in line with what we do with
> other MSRs). i.e. this patch would also
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 051dab74e4e9..86ea4a83af1f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5740,9 +5740,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> ++vmx->nmsrs;
> }
>
> - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
> -
> vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
>
> /* 22.2.1, 20.8.1 */
I don't know. There are good reasons for both behaviors, and especially
the following two for _not_ removing the rdmsr:
1) so far you could just pass the result of KVM_GET_SUPPORTED_CPUID to
KVM_SET_CPUID2, and expect the result to be "as close as possible to the
host";
2) having different behavior for VMX and ARCH_CAPABILITIES MSRs would be
confusing.
I think I'm leaning more towards the following direction: whitelist
ARCH_CAPABILITIES, like we do for the AMD LFENCE MSR already, and
default the AMD LFENCE MSR to the host value.
Thanks,
Paolo
2018-03-07 12:53+0100, Paolo Bonzini:
> On 02/03/2018 22:42, Radim Krčmář wrote:
> > Ok, sounds good. I've deferred it to rc5 as I think we'll want to use
> > this to replace the auto setting: I would not bet that it is going to
> > be safe to expose future bits, so having the userspace always sanitize
> > the capabilities would be safer (and more in line with what we do with
> > other MSRs). i.e. this patch would also
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 051dab74e4e9..86ea4a83af1f 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -5740,9 +5740,6 @@ static void vmx_vcpu_setup(struct vcpu_vmx *vmx)
> > ++vmx->nmsrs;
> > }
> >
> > - if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
> > - rdmsrl(MSR_IA32_ARCH_CAPABILITIES, vmx->arch_capabilities);
> > -
> > vm_exit_controls_init(vmx, vmcs_config.vmexit_ctrl);
> >
> > /* 22.2.1, 20.8.1 */
>
> I don't know. There are good reasons for both behaviors, and especially
> the following two for _not_ removing the rdmsr:
>
> 1) so far you could just pass the result of KVM_GET_SUPPORTED_CPUID to
> KVM_SET_CPUID2, and expect the result to be "as close as possible to the
> host";
>
> 2) having different behavior for VMX and ARCH_CAPABILITIES MSRs would be
> confusing.
Right, we can't just stop setting them by default ...
(I am in favor of forcing the userspace to configure everything and I'd
accept this exception as a mistake of the past.)
> I think I'm leaning more towards the following direction: whitelist
> ARCH_CAPABILITIES, like we do for the AMD LFENCE MSR already, and
> default the AMD LFENCE MSR to the host value.
The whitelisting is a good idea and I'm ok with just that, thanks.
The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
architectural MSR, so we'd be changing the guest under the sight of
existing userspaces.
A potential security risk if they migrate the guest to a CPU that
doesn't serialize LFENCE. ARCH_CAPABILITIES are at least hidden by a
new CPUID bit.
The feature MSR defaults are going to be a mess anyway: we have
MSR_IA32_UCODE_REV that is tightly coupled with CPUID. Not a good
candidate for passing by default and currently also has a default value.
On 07/03/2018 15:56, Radim Krčmář wrote:
> The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
> architectural MSR, so we'd be changing the guest under the sight of
> existing userspaces.
> A potential security risk if they migrate the guest to a CPU that
> doesn't serialize LFENCE. ARCH_CAPABILITIES are at least hidden by a
> new CPUID bit.
Good point. Perhaps we should add a KVM-specific CPUID bit for
serializing LFENCE.
> The feature MSR defaults are going to be a mess anyway: we have
> MSR_IA32_UCODE_REV that is tightly coupled with CPUID. Not a good
> candidate for passing by default and currently also has a default value.
Yeah, ucode revision is out of question.
Paolo
2018-03-07 16:10+0100, Paolo Bonzini:
> On 07/03/2018 15:56, Radim Krčmář wrote:
> > The MSR_F10H_DECFG default is questionable -- MSR_F10H_DECFG is an
> > architectural MSR, so we'd be changing the guest under the sight of
> > existing userspaces.
> > A potential security risk if they migrate the guest to a CPU that
> > doesn't serialize LFENCE. ARCH_CAPABILITIES are at least hidden by a
> > new CPUID bit.
>
> Good point. Perhaps we should add a KVM-specific CPUID bit for
> serializing LFENCE.
I reckon it wouldn't help much in the wild: we'd need userspace changes
(at least for QEMU) and at that point, userspace can as well just
implement MSR_F10H_DECFG and use an unmodified guest.
We can't easily intercept LFENCE to emulate the feature either, so it
seems like a waste of effort to me.