2021-12-14 02:50:23

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

Folks,

this is a follow up to the initial sketch of patches which got picked up by
Jing and have been posted in combination with the KVM parts:

https://lore.kernel.org/r/[email protected]

This update is only touching the x86/fpu code and not changing anything on
the KVM side.

BIG FAT WARNING: This is compile tested only!

In course of the dicsussion of the above patchset it turned out that there
are a few conceptual issues vs. hardware and software state and also
vs. guest restore.

This series addresses this with the following changes vs. the original
approach:

1) fpstate reallocation is now independent of fpu_swap_kvm_fpstate()

It is triggered directly via XSETBV and XFD MSR write emulation which
are used both for runtime and restore purposes.

For this it provides two wrappers around a common update function, one
for XCR0 and one for XFD.

Both check the validity of the arguments and the correct sizing of the
guest FPU fpstate. If the size is not sufficient, fpstate is
reallocated.

The functions can fail.

2) XFD synchronization

KVM must neither touch the XFD MSR nor the fpstate->xfd software state
in order to guarantee state consistency.

In the MSR write emulation case the XFD specific update handler has to
be invoked. See #1

If MSR write emulation is disabled because the buffer size is
sufficient for all use cases, i.e.:

guest_fpu::xfeatures == guest_fpu::perm

then there is no guarantee that the XFD software state on VMEXIT is
the same as the state on VMENTER.

A separate synchronization function is provided which reads the XFD
MSR and updates the relevant software state. This function has to be
invoked after a VMEXIT before reenabling interrupts.

With that the KVM logic looks like this:

xsetbv_emulate()
ret = fpu_update_guest_xcr0(&vcpu->arch.guest_fpu, xcr0);
if (ret)
handle_fail()
....


kvm_emulate_wrmsr()
....
case MSR_IA32_XFD:
ret = fpu_update_guest_xfd(&vcpu->arch.guest_fpu, vcpu->arch.xcr0, msrval);
if (ret)
handle_fail()
....

This covers both the case of a running vCPU and the case of restore.

The XFD synchronization mechanism is only relevant for a running vCPU after
VMEXIT when XFD MSR write emulation is disabled:

vcpu_run()
vcpu_enter_guest()
for (;;) {
...
vmenter();
...
};
...

if (!xfd_write_emulated(vcpu))
fpu_sync_guest_vmexit_xfd_state();

local_irq_enable();

It has no relevance for the guest restore case.

With that all XFD/fpstate related issues should be covered in a consistent
way.

CPUID validation can be done without exporting yet more FPU functions:

if (requested_xfeatures & ~vcpu->arch.guest_fpu.perm)
return -ENOPONY;

That's the purpose of fpu_guest::perm from the beginning along with
fpu_guest::xfeatures for other validation purposes.

XFD_ERR MSR handling is completely separate and as discussed a KVM only
issue for now. KVM has to ensure that the MSR is 0 before interrupts are
enabled. So this is not touched here.

The only remaining issue is the KVM XSTATE save/restore size checking which
probably requires some FPU core assistance. But that requires some more
thoughts vs. the IOCTL interface extension and once that is settled it
needs to be solved in one go. But that's an orthogonal issue to the above.

The series is also available from git:

git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm

Thanks,

tglx
---
include/asm/fpu/api.h | 63 ++++++++++++++++++++++++
include/asm/fpu/types.h | 22 ++++++++
include/uapi/asm/prctl.h | 26 +++++----
kernel/fpu/core.c | 123 ++++++++++++++++++++++++++++++++++++++++++++---
kernel/fpu/xstate.c | 118 +++++++++++++++++++++++++++------------------
kernel/fpu/xstate.h | 20 ++++++-
kernel/process.c | 2
7 files changed, 307 insertions(+), 67 deletions(-)


2021-12-14 06:53:47

by Liu, Jing2

[permalink] [raw]
Subject: Re: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

Hi Thomas,

On 12/14/2021 10:50 AM, Thomas Gleixner wrote:
> Folks,
>
> this is a follow up to the initial sketch of patches which got picked up by
> Jing and have been posted in combination with the KVM parts:
>
> https://lore.kernel.org/r/[email protected]
>
> This update is only touching the x86/fpu code and not changing anything on
> the KVM side.
>
> BIG FAT WARNING: This is compile tested only!
>
> In course of the dicsussion of the above patchset it turned out that there
> are a few conceptual issues vs. hardware and software state and also
> vs. guest restore.
>
> This series addresses this with the following changes vs. the original
> approach:
>
> 1) fpstate reallocation is now independent of fpu_swap_kvm_fpstate()
>
> It is triggered directly via XSETBV and XFD MSR write emulation which
> are used both for runtime and restore purposes.
>
> For this it provides two wrappers around a common update function, one
> for XCR0 and one for XFD.
>
> Both check the validity of the arguments and the correct sizing of the
> guest FPU fpstate. If the size is not sufficient, fpstate is
> reallocated.
>
> The functions can fail.
>
> 2) XFD synchronization
>
> KVM must neither touch the XFD MSR nor the fpstate->xfd software state
> in order to guarantee state consistency.
>
> In the MSR write emulation case the XFD specific update handler has to
> be invoked. See #1
>
> If MSR write emulation is disabled because the buffer size is
> sufficient for all use cases, i.e.:
>
> guest_fpu::xfeatures == guest_fpu::perm
>
The buffer size can be sufficient once one of the features is requested
since
kernel fpu realloc full size (permitted). And I think we don't want to
disable
interception until all the features are detected e.g., one by one.

Thus it can be guest_fpu::xfeatures != guest_fpu::perm.


Thanks,
Jing


2021-12-14 06:54:02

by Tian, Kevin

[permalink] [raw]
Subject: RE: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

> From: Thomas Gleixner <[email protected]>
> Sent: Tuesday, December 14, 2021 10:50 AM
>
> Folks,
>
> this is a follow up to the initial sketch of patches which got picked up by
> Jing and have been posted in combination with the KVM parts:
>
> https://lore.kernel.org/r/20211208000359.2853257-1-
> [email protected]
>
> This update is only touching the x86/fpu code and not changing anything on
> the KVM side.
>
> BIG FAT WARNING: This is compile tested only!
>
> In course of the dicsussion of the above patchset it turned out that there
> are a few conceptual issues vs. hardware and software state and also
> vs. guest restore.

Overall this is definitely a good move and also help simplify the
KVM side logic. ????

Thanks
Kevin

2021-12-14 07:54:13

by Tian, Kevin

[permalink] [raw]
Subject: RE: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

> From: Liu, Jing2 <[email protected]>
> Sent: Tuesday, December 14, 2021 2:52 PM
>
> On 12/14/2021 10:50 AM, Thomas Gleixner wrote:
> > If MSR write emulation is disabled because the buffer size is
> > sufficient for all use cases, i.e.:
> >
> > guest_fpu::xfeatures == guest_fpu::perm
> >
> The buffer size can be sufficient once one of the features is requested
> since
> kernel fpu realloc full size (permitted). And I think we don't want to
> disable
> interception until all the features are detected e.g., one by one.
>
> Thus it can be guest_fpu::xfeatures != guest_fpu::perm.
>

There are two options to handle multiple xfd features.

a) a conservative approach as Thomas suggested, i.e. don't disable emulation
until all the features in guest_fpu::perm are requested by the guest. This
definitely has poor performance if the guest only wants to use a subset of
perm features. But functionally p.o.v it just works.

Given we only have one xfeature today, let's just use this simple check which
has ZERO negative impact.

b) an optimized approach by dynamically enabling/disabling emulation. e.g.
we can disable emulation after the 1st xfd feature is enabled and then
reenable it in #NM vmexit handler when XFD_ERR includes a bit which is
not in guest_fpu::xfeatures, sort of like:

--xfd trapped, perm has two xfd features--
(G) access xfd_feature1;
(H) trap #NM (XFD_ERR = xfd_feature1) and inject #NM;
(G) WRMSR(IA32_XFD, (-1ULL) & ~xfd_feature1);
(H) reallocate fpstate and disable write emulation for XFD;

--xfd passed through--
(G) do something...
(G) access xfd_feature2;
(H) trap #NM (XFD_ERR = xfd_feature2), enable emulation, inject #NM;

--xfd trapped--
(G) WRMSR(IA32_XFD, (-1ULL) & ~(xfd_feature1 | xfd_feature2));
(H) reallocate fpstate and disable write emulation for XFD;

--xfd passed through--
(G) do something...

Thanks
Kevin

2021-12-14 10:42:46

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

On 12/14/21 03:50, Thomas Gleixner wrote:
> The only remaining issue is the KVM XSTATE save/restore size checking which
> probably requires some FPU core assistance. But that requires some more
> thoughts vs. the IOCTL interface extension and once that is settled it
> needs to be solved in one go. But that's an orthogonal issue to the above.

That's not a big deal because KVM uses the uncompacted format. So
KVM_CHECK_EXTENSION and KVM_GET_XSAVE can just use CPUID to retrieve the
size and uncompacted offset of the largest bit that is set in
kvm_supported_xcr0, while KVM_SET_XSAVE can do the same with the largest
bit that is set in the xstate_bv.

Paolo



> The series is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/people/tglx/devel.git x86/fpu-kvm


2021-12-14 13:24:56

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 0/6] x86/fpu: Preparatory changes for guest AMX support

On Tue, Dec 14 2021 at 11:42, Paolo Bonzini wrote:
> On 12/14/21 03:50, Thomas Gleixner wrote:
>> The only remaining issue is the KVM XSTATE save/restore size checking which
>> probably requires some FPU core assistance. But that requires some more
>> thoughts vs. the IOCTL interface extension and once that is settled it
>> needs to be solved in one go. But that's an orthogonal issue to the above.
>
> That's not a big deal because KVM uses the uncompacted format. So
> KVM_CHECK_EXTENSION and KVM_GET_XSAVE can just use CPUID to retrieve the
> size and uncompacted offset of the largest bit that is set in
> kvm_supported_xcr0, while KVM_SET_XSAVE can do the same with the largest
> bit that is set in the xstate_bv.

For simplicity you can just get that information from guest_fpu. See
below.

Thanks,

tglx
---
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -518,6 +518,11 @@ struct fpu_guest {
u64 perm;

/*
+ * @uabi_size: Size required for save/restore
+ */
+ unsigned int uabi_size;
+
+ /*
* @fpstate: Pointer to the allocated guest fpstate
*/
struct fpstate *fpstate;
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -240,6 +240,7 @@ bool fpu_alloc_guest_fpstate(struct fpu_
gfpu->fpstate = fpstate;
gfpu->xfeatures = fpu_user_cfg.default_features;
gfpu->perm = fpu_user_cfg.default_features;
+ gfpu->uabi_size = fpu_user_cfg.default_size;
fpu_init_guest_permissions(gfpu);

return true;
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1545,6 +1545,7 @@ static int fpstate_realloc(u64 xfeatures
newfps->is_confidential = curfps->is_confidential;
newfps->in_use = curfps->in_use;
guest_fpu->xfeatures |= xfeatures;
+ guest_fpu->uabi_size = usize;
}

fpregs_lock();