2021-06-18 18:05:30

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

The main parts of this series are:

- Yet more bug fixes

- Simplification and removal/replacement of redundant and/or
overengineered code.

- Name space cleanup as the existing names were just a permanent source
of confusion.

- Clear seperation of user ABI and kernel internal state handling.

- Removal of PKRU from being XSTATE managed in the kernel because PKRU
has to be eagerly restored on context switch and keeping it in sync
in the xstate buffer is just pointless overhead and fragile.

The kernel still XSAVEs PKRU on context switch but the value in the
buffer is not longer used and never restored from the buffer.

This still needs to be cleaned up, but the series is already 40+
patches large and the cleanup of this is not a functional problem.

The functional issues of PKRU management are fully addressed with the
series as is.

- Cleanup of fpu signal restore

- Make the fast path self contained. Handle #PF directly and skip
the slow path on any other exception as that will just end up
with the same result that the frame is invalid. This allows
the compiler to optimize the slow path out for 64bit kernels
w/o ia32 emulation.

- Reduce code duplication and unnecessary operations


It applies on top of

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

and is also available via git:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu

This is a follow up to V2 which can be found here:

https://lore.kernel.org/r/[email protected]

Changes vs. V2:

- Fixed the testing fallout (Dave, Kan)

- Fixed a few issues found by myself when going through the lot
with a fine comb, especially MXCSR handling

- Drop the FNSAVE optimizations

- Cleanup of signal restore

- Addressed review comments, mostly comments and a hopefully better
naming scheme which now just uses the instruction names and
consolidates everything else on save/restore so it's close to the way
how the hardware works.

- A few cleanups and simplifications on the way (mostly regset related).

- Picked up tags

With the above I'm not intending to do any further surgery on that
code at the moment, though there is still room for improvement which
can and has to be worked on when new bits are added.

Thanks,

tglx
---
arch/x86/events/intel/lbr.c | 6
arch/x86/include/asm/fpu/internal.h | 211 +++-------
arch/x86/include/asm/fpu/xstate.h | 70 ++-
arch/x86/include/asm/pgtable.h | 57 --
arch/x86/include/asm/pkeys.h | 9
arch/x86/include/asm/pkru.h | 62 +++
arch/x86/include/asm/processor.h | 9
arch/x86/include/asm/special_insns.h | 14
arch/x86/kernel/cpu/common.c | 34 -
arch/x86/kernel/fpu/core.c | 276 +++++++------
arch/x86/kernel/fpu/init.c | 15
arch/x86/kernel/fpu/regset.c | 220 ++++++-----
arch/x86/kernel/fpu/signal.c | 423 +++++++++------------
arch/x86/kernel/fpu/xstate.c | 693 ++++++++++++++---------------------
arch/x86/kernel/process.c | 22 -
arch/x86/kernel/process_64.c | 28 +
arch/x86/kernel/traps.c | 5
arch/x86/kvm/svm/sev.c | 1
arch/x86/kvm/x86.c | 56 +-
arch/x86/mm/extable.c | 2
arch/x86/mm/fault.c | 2
arch/x86/mm/pkeys.c | 22 -
include/linux/pkeys.h | 4
23 files changed, 1060 insertions(+), 1181 deletions(-)



2021-06-21 16:19:00

by Yu-cheng Yu

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

On 6/18/2021 7:18 AM, Thomas Gleixner wrote:
> The main parts of this series are:
>
> - Yet more bug fixes
>
> - Simplification and removal/replacement of redundant and/or
> overengineered code.
>
> - Name space cleanup as the existing names were just a permanent source
> of confusion.
>
> - Clear seperation of user ABI and kernel internal state handling.
>
> - Removal of PKRU from being XSTATE managed in the kernel because PKRU
> has to be eagerly restored on context switch and keeping it in sync
> in the xstate buffer is just pointless overhead and fragile.
>
> The kernel still XSAVEs PKRU on context switch but the value in the
> buffer is not longer used and never restored from the buffer.
>
> This still needs to be cleaned up, but the series is already 40+
> patches large and the cleanup of this is not a functional problem.
>
> The functional issues of PKRU management are fully addressed with the
> series as is.
>
> - Cleanup of fpu signal restore
>
> - Make the fast path self contained. Handle #PF directly and skip
> the slow path on any other exception as that will just end up
> with the same result that the frame is invalid. This allows
> the compiler to optimize the slow path out for 64bit kernels
> w/o ia32 emulation.
>
> - Reduce code duplication and unnecessary operations
>
>
> It applies on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
>
> and is also available via git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu
>
> This is a follow up to V2 which can be found here:
>
> https://lore.kernel.org/r/[email protected]
>
> Changes vs. V2:
>
> - Fixed the testing fallout (Dave, Kan)
>
> - Fixed a few issues found by myself when going through the lot
> with a fine comb, especially MXCSR handling
>
> - Drop the FNSAVE optimizations
>
> - Cleanup of signal restore
>
> - Addressed review comments, mostly comments and a hopefully better
> naming scheme which now just uses the instruction names and
> consolidates everything else on save/restore so it's close to the way
> how the hardware works.
>
> - A few cleanups and simplifications on the way (mostly regset related).
>
> - Picked up tags
>
> With the above I'm not intending to do any further surgery on that
> code at the moment, though there is still room for improvement which
> can and has to be worked on when new bits are added.
>
> Thanks,

Run all my tests again, and all pass.

Thanks,
Yu-cheng

>
> tglx
> ---
> arch/x86/events/intel/lbr.c | 6
> arch/x86/include/asm/fpu/internal.h | 211 +++-------
> arch/x86/include/asm/fpu/xstate.h | 70 ++-
> arch/x86/include/asm/pgtable.h | 57 --
> arch/x86/include/asm/pkeys.h | 9
> arch/x86/include/asm/pkru.h | 62 +++
> arch/x86/include/asm/processor.h | 9
> arch/x86/include/asm/special_insns.h | 14
> arch/x86/kernel/cpu/common.c | 34 -
> arch/x86/kernel/fpu/core.c | 276 +++++++------
> arch/x86/kernel/fpu/init.c | 15
> arch/x86/kernel/fpu/regset.c | 220 ++++++-----
> arch/x86/kernel/fpu/signal.c | 423 +++++++++------------
> arch/x86/kernel/fpu/xstate.c | 693 ++++++++++++++---------------------
> arch/x86/kernel/process.c | 22 -
> arch/x86/kernel/process_64.c | 28 +
> arch/x86/kernel/traps.c | 5
> arch/x86/kvm/svm/sev.c | 1
> arch/x86/kvm/x86.c | 56 +-
> arch/x86/mm/extable.c | 2
> arch/x86/mm/fault.c | 2
> arch/x86/mm/pkeys.c | 22 -
> include/linux/pkeys.h | 4
> 23 files changed, 1060 insertions(+), 1181 deletions(-)
>
>

2021-06-21 21:55:01

by Fenghua Yu

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

Hi, X86 maintainers,

On Fri, Jun 18, 2021 at 04:18:23PM +0200, Thomas Gleixner wrote:
> The main parts of this series are:
>
> - Yet more bug fixes
...
> and is also available via git:
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu
...
> Changes vs. V2:
...

After reverting the disabling PASID patch, resolving one PKRU conflict, and
porting the latest internal IDXD patches to this series, I can run stress
PASID context switch tests on this series (and v2 as well). I don't see any
issue for PASID context switch.

Also thank you very much for moving the PASID feature forward.

-Fenghua

2021-06-21 22:23:58

by Chang S. Bae

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

On Jun 18, 2021, at 07:18, Thomas Gleixner <[email protected]> wrote:
>
> It applies on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
>
> and is also available via git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu

I tried to apply AMX patches on top of this. The test looks to be okay by far.
I will also give an update here if I find anything.

Thanks,
Chang

2021-06-22 01:45:57

by kernel test robot

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

Hi Thomas,

On Fri, Jun 18, 2021 at 10:18:23PM +0800, Thomas Gleixner wrote:
> The main parts of this series are:
>
> - Yet more bug fixes
>
> - Simplification and removal/replacement of redundant and/or
> overengineered code.
>
> - Name space cleanup as the existing names were just a permanent source
> of confusion.
>
> - Clear seperation of user ABI and kernel internal state handling.
>
> - Removal of PKRU from being XSTATE managed in the kernel because PKRU
> has to be eagerly restored on context switch and keeping it in sync
> in the xstate buffer is just pointless overhead and fragile.
>
> The kernel still XSAVEs PKRU on context switch but the value in the
> buffer is not longer used and never restored from the buffer.
>
> This still needs to be cleaned up, but the series is already 40+
> patches large and the cleanup of this is not a functional problem.
>
> The functional issues of PKRU management are fully addressed with the
> series as is.
>
> - Cleanup of fpu signal restore
>
> - Make the fast path self contained. Handle #PF directly and skip
> the slow path on any other exception as that will just end up
> with the same result that the frame is invalid. This allows
> the compiler to optimize the slow path out for 64bit kernels
> w/o ia32 emulation.
>
> - Reduce code duplication and unnecessary operations
>
>
> It applies on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
>
> and is also available via git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu

0-Day kernel CI tested this branch from performance view,
choosing some sub-tests from will-it-scale (detail as below), since we
thought if the branch has the impact of fpu ops, will-it-scale should be
able to catch it.
we also plan to add stress-ng for new round test.
could you suggest if any other suitable test suites? and what's the most
proper sub-tests in will-it-scale and stress-ng?

Test Summary
============
no obvious will-it-scale performance changes found so far

Test Environment
================
https://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git/log/?h=x86/fpu
* 0619677ee36c3 (tglx-devel/x86/fpu) x86/fpu/signal: Let xrstor handle the features to init <----- the tip we tested
* a114fd9946c28 x86/fpu/signal: Handle #PF in the direct restore path
* 73e26fdd0cf1c x86/fpu: Return proper error codes from user access functions
...
* 63bf804bfa6b0 x86/fpu: Make init_fpstate correct with optimized XSAVE
* 6db8e02d5e932 x86/fpu: x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
* 4fe93c2272dbb Merge branch 'x86/fpu' of ../tip into x86/fpu <----- the base we compared
|\
| * b7c11876d24bd (tip/x86/fpu, peterz-queue/x86/fpu) selftests/x86: Test signal frame XSTATE header corruption handling

64bit kernel testing, upon below platform:
model: Cascade Lake
Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz
nr_node: 2
nr_cpu: 88
memory: 128G


32bit kernel testing, upon below platform:
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
model: Ivy Bridge
nr_node: 1
nr_cpu: 8
memory: 16G


tested below test suites:
will-it-scale-performance-context_switch1
will-it-scale-performance-page_fault1
will-it-scale-performance-poll1
will-it-scale-performance-pthread_mutex1
will-it-scale-performance-writeseek1



>
> This is a follow up to V2 which can be found here:
>
> https://lore.kernel.org/r/[email protected]
>
> Changes vs. V2:
>
> - Fixed the testing fallout (Dave, Kan)
>
> - Fixed a few issues found by myself when going through the lot
> with a fine comb, especially MXCSR handling
>
> - Drop the FNSAVE optimizations
>
> - Cleanup of signal restore
>
> - Addressed review comments, mostly comments and a hopefully better
> naming scheme which now just uses the instruction names and
> consolidates everything else on save/restore so it's close to the way
> how the hardware works.
>
> - A few cleanups and simplifications on the way (mostly regset related).
>
> - Picked up tags
>
> With the above I'm not intending to do any further surgery on that
> code at the moment, though there is still room for improvement which
> can and has to be worked on when new bits are added.
>
> Thanks,
>
> tglx
> ---
> arch/x86/events/intel/lbr.c | 6
> arch/x86/include/asm/fpu/internal.h | 211 +++-------
> arch/x86/include/asm/fpu/xstate.h | 70 ++-
> arch/x86/include/asm/pgtable.h | 57 --
> arch/x86/include/asm/pkeys.h | 9
> arch/x86/include/asm/pkru.h | 62 +++
> arch/x86/include/asm/processor.h | 9
> arch/x86/include/asm/special_insns.h | 14
> arch/x86/kernel/cpu/common.c | 34 -
> arch/x86/kernel/fpu/core.c | 276 +++++++------
> arch/x86/kernel/fpu/init.c | 15
> arch/x86/kernel/fpu/regset.c | 220 ++++++-----
> arch/x86/kernel/fpu/signal.c | 423 +++++++++------------
> arch/x86/kernel/fpu/xstate.c | 693 ++++++++++++++---------------------
> arch/x86/kernel/process.c | 22 -
> arch/x86/kernel/process_64.c | 28 +
> arch/x86/kernel/traps.c | 5
> arch/x86/kvm/svm/sev.c | 1
> arch/x86/kvm/x86.c | 56 +-
> arch/x86/mm/extable.c | 2
> arch/x86/mm/fault.c | 2
> arch/x86/mm/pkeys.c | 22 -
> include/linux/pkeys.h | 4
> 23 files changed, 1060 insertions(+), 1181 deletions(-)
>
>

2021-06-22 09:12:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

Oliver,

On Tue, Jun 22 2021 at 09:59, Oliver Sang wrote:
> On Fri, Jun 18, 2021 at 10:18:23PM +0800, Thomas Gleixner wrote:
>> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu
>
> 0-Day kernel CI tested this branch from performance view,
> choosing some sub-tests from will-it-scale (detail as below), since we
> thought if the branch has the impact of fpu ops, will-it-scale should be
> able to catch it.
> we also plan to add stress-ng for new round test.
> could you suggest if any other suitable test suites? and what's the most
> proper sub-tests in will-it-scale and stress-ng?

Hard to tell. Anything scheduling heavy will exercise these code paths.

Thanks,

tglx


2021-06-22 18:57:36

by Megha Dey

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

Hi Thomas,

On 6/18/2021 7:48 PM, Thomas Gleixner wrote:
> The main parts of this series are:
>
> - Yet more bug fixes
>
> - Simplification and removal/replacement of redundant and/or
> overengineered code.
>
> - Name space cleanup as the existing names were just a permanent source
> of confusion.
>
> - Clear seperation of user ABI and kernel internal state handling.
>
> - Removal of PKRU from being XSTATE managed in the kernel because PKRU
> has to be eagerly restored on context switch and keeping it in sync
> in the xstate buffer is just pointless overhead and fragile.
>
> The kernel still XSAVEs PKRU on context switch but the value in the
> buffer is not longer used and never restored from the buffer.
>
> This still needs to be cleaned up, but the series is already 40+
> patches large and the cleanup of this is not a functional problem.
>
> The functional issues of PKRU management are fully addressed with the
> series as is.
>
> - Cleanup of fpu signal restore
>
> - Make the fast path self contained. Handle #PF directly and skip
> the slow path on any other exception as that will just end up
> with the same result that the frame is invalid. This allows
> the compiler to optimize the slow path out for 64bit kernels
> w/o ia32 emulation.
>
> - Reduce code duplication and unnecessary operations
>
>
> It applies on top of
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
>
> and is also available via git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/fpu
>
> This is a follow up to V2 which can be found here:
>
> https://lore.kernel.org/r/[email protected]

I tested the x86/fpu branch using both AVX2(intree) and AVX512(out of
tree) crypto code.

I used the tcrypt test module in the kernel and ran the prime95 workload
(which finds prime numbers using AVX2 or AVX512) as a background process
to make sure that the AVX states don't get screwed up after we run the
crypto in kernel.

I did not see any issues with this branch and all tcrypt tests run as
expected. I tested using the SHA1/256, AES-CTR, AES-GCM, camelia and
crc32t10diff crypto algorithms.

Thanks,
Megha

>
> Changes vs. V2:
>
> - Fixed the testing fallout (Dave, Kan)
>
> - Fixed a few issues found by myself when going through the lot
> with a fine comb, especially MXCSR handling
>
> - Drop the FNSAVE optimizations
>
> - Cleanup of signal restore
>
> - Addressed review comments, mostly comments and a hopefully better
> naming scheme which now just uses the instruction names and
> consolidates everything else on save/restore so it's close to the way
> how the hardware works.
>
> - A few cleanups and simplifications on the way (mostly regset related).
>
> - Picked up tags
>
> With the above I'm not intending to do any further surgery on that
> code at the moment, though there is still room for improvement which
> can and has to be worked on when new bits are added.
>
> Thanks,
>
> tglx
> ---
> arch/x86/events/intel/lbr.c | 6
> arch/x86/include/asm/fpu/internal.h | 211 +++-------
> arch/x86/include/asm/fpu/xstate.h | 70 ++-
> arch/x86/include/asm/pgtable.h | 57 --
> arch/x86/include/asm/pkeys.h | 9
> arch/x86/include/asm/pkru.h | 62 +++
> arch/x86/include/asm/processor.h | 9
> arch/x86/include/asm/special_insns.h | 14
> arch/x86/kernel/cpu/common.c | 34 -
> arch/x86/kernel/fpu/core.c | 276 +++++++------
> arch/x86/kernel/fpu/init.c | 15
> arch/x86/kernel/fpu/regset.c | 220 ++++++-----
> arch/x86/kernel/fpu/signal.c | 423 +++++++++------------
> arch/x86/kernel/fpu/xstate.c | 693 ++++++++++++++---------------------
> arch/x86/kernel/process.c | 22 -
> arch/x86/kernel/process_64.c | 28 +
> arch/x86/kernel/traps.c | 5
> arch/x86/kvm/svm/sev.c | 1
> arch/x86/kvm/x86.c | 56 +-
> arch/x86/mm/extable.c | 2
> arch/x86/mm/fault.c | 2
> arch/x86/mm/pkeys.c | 22 -
> include/linux/pkeys.h | 4
> 23 files changed, 1060 insertions(+), 1181 deletions(-)
>
>

2021-06-22 20:05:28

by Chang S. Bae

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

On Jun 21, 2021, at 15:22, Bae, Chang Seok <[email protected]> wrote:
> I tried to apply AMX patches on top of this. The test looks to be okay by far.
> I will also give an update here if I find anything.

This looks to be vague about the test. I took cases shown in AMX v5 like this:
https://lore.kernel.org/lkml/[email protected]/
It validates AMX state with context switches, signal delivery/return, and
context injection via ptrace.

Thanks,
Chang

2021-06-24 00:09:07

by Chang S. Bae

[permalink] [raw]
Subject: Re: [patch V3 00/66] x86/fpu: Spring cleaning and PKRU sanitizing

On Jun 22, 2021, at 13:02, Bae, Chang Seok <[email protected]> wrote:
> On Jun 21, 2021, at 15:22, Bae, Chang Seok <[email protected]> wrote:
>> I tried to apply AMX patches on top of this. The test looks to be okay by far.
>> I will also give an update here if I find anything.
>
> This looks to be vague about the test. I took cases shown in AMX v5 like this:
> https://lore.kernel.org/lkml/[email protected]/
> It validates AMX state with context switches, signal delivery/return, and
> context injection via ptrace.

Also, just for the record, as the next AMX version is imminent:

Lots of AMX-related code are mostly function name changes and I had to
readjust my assumption. (Maybe this is not conclusive yet as still digesting
the code.) But I think the rebase went well with the test results, and this
change improves the mainline in many ways that affect adjusting AMX code in
the right way.

Thanks,
Chang