2013-10-02 14:06:26

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v3 0/3] KVM: prepare for future XSAVE extensions

As soon as the kernel will support the XSAVE extensions in Skylake
processors, we will want both userspace and the hypervisor to run
guests without showing any trace of the new features (because
support for them in the hypervisor will come later).

This series does exactly this. Patches 1 and 3 ensures that userspace
does not get access to features not supported in the hypervisor.
Patch 2 does the same for the guest.

The effect of these patches is already visible together with
the corresponding QEMU patches and a command line such as "-cpu
SandyBridge,-avx". Previously, the guest could still enable AVX via
xsetbv, and state would be sent by QEMU during migration. These patches
make sure that disabling AVX will really prevent the guest from using it,
which helps if you have to deal with migration from newer versions of
QEMU and the kernel to older versions. The QEMU patches set up the
guest's 0xd CPUID leaf, which also ensures that the kernel has the
necessary information.

Paolo

v2->v3:
rename supported_xcr0 field to guest_supported_xcr0
rename xstate_size field to guest_xstate_size
do not hardcode XSTATE_FPSSE in guest_supported_xcr0
handle missing 0xd leaf

Paolo Bonzini (3):
KVM: x86: mask unsupported XSAVE entries from leaf 0Dh index 0
KVM: x86: prevent setting unsupported XSAVE states
KVM: x86: only copy XSAVE state for the supported features

arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++-------
arch/x86/kvm/x86.h | 1 +
4 files changed, 60 insertions(+), 8 deletions(-)

--
1.8.3.1


2013-10-02 14:06:30

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 1/3] KVM: x86: mask unsupported XSAVE entries from leaf 0Dh index 0

XSAVE entries that KVM does not support are reported by
KVM_GET_SUPPORTED_CPUID for leaf 0Dh index 0 if the host supports them;
they should be left out unless there is also hypervisor support for them.

Sub-leafs are correctly handled in supported_xcr0_bit, fix index 0
to match.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/cpuid.c | 4 +++-
arch/x86/kvm/x86.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b110fe6..a03a9fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -182,7 +182,7 @@ static bool supported_xcr0_bit(unsigned bit)
{
u64 mask = ((u64)1 << bit);

- return mask & (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) & host_xcr0;
+ return mask & KVM_SUPPORTED_XCR0 & host_xcr0;
}

#define F(x) bit(X86_FEATURE_##x)
@@ -383,6 +383,8 @@ static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
case 0xd: {
int idx, i;

+ entry->eax &= host_xcr0 & KVM_SUPPORTED_XCR0;
+ entry->edx &= (host_xcr0 & KVM_SUPPORTED_XCR0) >> 32;
entry->flags |= KVM_CPUID_FLAG_SIGNIFCANT_INDEX;
for (idx = 1, i = 1; idx < 64; ++idx) {
if (*nent >= maxnent)
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index e224f7a..587fb9e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -122,6 +122,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);

+#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
extern u64 host_xcr0;

extern struct static_key kvm_no_apic_vcpu;
--
1.8.3.1

2013-10-02 14:06:35

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 3/3] KVM: x86: only copy XSAVE state for the supported features

This makes the interface more deterministic for userspace, which can expect
(after configuring only the features it supports) to get exactly the same
state from the kernel, independent of the host CPU and kernel version.

Suggested-by: Gleb Natapov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 28 ++++++++++++++++++++++++++--
arch/x86/kvm/x86.c | 11 +++++++----
3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35d10d1..52110d0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -390,6 +390,7 @@ struct kvm_vcpu_arch {
struct fpu guest_fpu;
u64 xcr0;
u64 guest_supported_xcr0;
+ u32 guest_xstate_size;

struct kvm_pio_request pio;
void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 5a5ff94..0a1e3b8 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -23,6 +23,26 @@
#include "mmu.h"
#include "trace.h"

+static u32 xstate_required_size(u64 xstate_bv)
+{
+ int feature_bit = 0;
+ u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
+
+ xstate_bv &= ~XSTATE_FPSSE;
+ while (xstate_bv) {
+ if (xstate_bv & 0x1) {
+ u32 eax, ebx, ecx, edx;
+ cpuid_count(0xD, feature_bit, &eax, &ebx, &ecx, &edx);
+ ret = max(ret, eax + ebx);
+ }
+
+ xstate_bv >>= 1;
+ feature_bit++;
+ }
+
+ return ret;
+}
+
void kvm_update_cpuid(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
@@ -47,12 +67,16 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
}

best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
- if (!best)
+ if (!best) {
vcpu->arch.guest_supported_xcr0 = 0;
- else
+ vcpu->arch.guest_xstate_size = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
+ } else {
vcpu->arch.guest_supported_xcr0 =
(best->eax | ((u64)best->edx << 32)) &
host_xcr0 & KVM_SUPPORTED_XCR0;
+ vcpu->arch.guest_xstate_size =
+ xstate_required_size(vcpu->arch.guest_supported_xcr0);
+ }

kvm_pmu_cpuid_update(vcpu);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7d5c520..e8e2d09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2984,11 +2984,13 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
- if (cpu_has_xsave)
+ if (cpu_has_xsave) {
memcpy(guest_xsave->region,
&vcpu->arch.guest_fpu.state->xsave,
- xstate_size);
- else {
+ vcpu->arch.guest_xstate_size);
+ *(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] &=
+ vcpu->arch.guest_supported_xcr0 | XSTATE_FPSSE;
+ } else {
memcpy(guest_xsave->region,
&vcpu->arch.guest_fpu.state->fxsave,
sizeof(struct i387_fxsave_struct));
@@ -3014,7 +3016,7 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
if (xstate_bv & ~host_xcr0)
return -EINVAL;
memcpy(&vcpu->arch.guest_fpu.state->xsave,
- guest_xsave->region, xstate_size);
+ guest_xsave->region, vcpu->arch.guest_xstate_size);
} else {
if (xstate_bv & ~XSTATE_FPSSE)
return -EINVAL;
@@ -6951,6 +6953,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
vcpu->arch.pv_time_enabled = false;

vcpu->arch.guest_supported_xcr0 = 0;
+ vcpu->arch.guest_xstate_size = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;

kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
--
1.8.3.1

2013-10-02 14:07:01

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86: prevent setting unsupported XSAVE states

A guest can still attempt to save and restore XSAVE states even if they
have been masked in CPUID leaf 0Dh. This usually is not visible to
the guest, but is still wrong: "Any attempt to set a reserved bit (as
determined by the contents of EAX and EDX after executing CPUID with
EAX=0DH, ECX= 0H) in XCR0 for a given processor will result in a #GP
exception".

The patch also performs the same checks as __kvm_set_xcr in KVM_SET_XSAVE.
This catches migration from newer to older kernel/processor before the
guest starts running.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/cpuid.c | 8 ++++++++
arch/x86/kvm/x86.c | 18 +++++++++++++++---
3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c76ff74..35d10d1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -389,6 +389,7 @@ struct kvm_vcpu_arch {

struct fpu guest_fpu;
u64 xcr0;
+ u64 guest_supported_xcr0;

struct kvm_pio_request pio;
void *pio_data;
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a03a9fa..5a5ff94 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -46,6 +46,14 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
apic->lapic_timer.timer_mode_mask = 1 << 17;
}

+ best = kvm_find_cpuid_entry(vcpu, 0xD, 0);
+ if (!best)
+ vcpu->arch.guest_supported_xcr0 = 0;
+ else
+ vcpu->arch.guest_supported_xcr0 =
+ (best->eax | ((u64)best->edx << 32)) &
+ host_xcr0 & KVM_SUPPORTED_XCR0;
+
kvm_pmu_cpuid_update(vcpu);
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 187f824..7d5c520 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -586,7 +586,7 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
return 1;
if ((xcr0 & XSTATE_YMM) && !(xcr0 & XSTATE_SSE))
return 1;
- if (xcr0 & ~host_xcr0)
+ if (xcr0 & ~vcpu->arch.guest_supported_xcr0)
return 1;
kvm_put_guest_xcr0(vcpu);
vcpu->arch.xcr0 = xcr0;
@@ -3003,10 +3003,19 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
u64 xstate_bv =
*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)];

- if (cpu_has_xsave)
+ if (cpu_has_xsave) {
+ /*
+ * Here we allow setting states that are not present in
+ * CPUID leaf 0xD, index 0, EDX:EAX. This is for compatibility
+ * with old userspace.
+ */
+ if (xstate_bv & ~KVM_SUPPORTED_XCR0)
+ return -EINVAL;
+ if (xstate_bv & ~host_xcr0)
+ return -EINVAL;
memcpy(&vcpu->arch.guest_fpu.state->xsave,
guest_xsave->region, xstate_size);
- else {
+ } else {
if (xstate_bv & ~XSTATE_FPSSE)
return -EINVAL;
memcpy(&vcpu->arch.guest_fpu.state->fxsave,
@@ -6940,6 +6949,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)

vcpu->arch.ia32_tsc_adjust_msr = 0x0;
vcpu->arch.pv_time_enabled = false;
+
+ vcpu->arch.guest_supported_xcr0 = 0;
+
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);

--
1.8.3.1

2013-10-03 09:40:53

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] KVM: prepare for future XSAVE extensions

On Wed, Oct 02, 2013 at 04:06:13PM +0200, Paolo Bonzini wrote:
> As soon as the kernel will support the XSAVE extensions in Skylake
> processors, we will want both userspace and the hypervisor to run
> guests without showing any trace of the new features (because
> support for them in the hypervisor will come later).
>
> This series does exactly this. Patches 1 and 3 ensures that userspace
> does not get access to features not supported in the hypervisor.
> Patch 2 does the same for the guest.
>
> The effect of these patches is already visible together with
> the corresponding QEMU patches and a command line such as "-cpu
> SandyBridge,-avx". Previously, the guest could still enable AVX via
> xsetbv, and state would be sent by QEMU during migration. These patches
> make sure that disabling AVX will really prevent the guest from using it,
> which helps if you have to deal with migration from newer versions of
> QEMU and the kernel to older versions. The QEMU patches set up the
> guest's 0xd CPUID leaf, which also ensures that the kernel has the
> necessary information.
>
> Paolo
>
Applied, thanks.

> v2->v3:
> rename supported_xcr0 field to guest_supported_xcr0
> rename xstate_size field to guest_xstate_size
> do not hardcode XSTATE_FPSSE in guest_supported_xcr0
> handle missing 0xd leaf
>
> Paolo Bonzini (3):
> KVM: x86: mask unsupported XSAVE entries from leaf 0Dh index 0
> KVM: x86: prevent setting unsupported XSAVE states
> KVM: x86: only copy XSAVE state for the supported features
>
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/cpuid.c | 36 +++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c | 29 ++++++++++++++++++++++-------
> arch/x86/kvm/x86.h | 1 +
> 4 files changed, 60 insertions(+), 8 deletions(-)
>
> --
> 1.8.3.1

--
Gleb.