[Second part of the series - Gmail didn't like me sending so many mails.]
Over the past 10 years the x86 FPU has organically grown into
somewhat of a spaghetti monster that few (if any) kernel
developers understand and which code few people enjoy to hack.
Many people suggested over the years that it needs a major cleanup,
and some time ago I went "what the heck" and started doing it step
by step to see where it leads - it cannot be that hard!
Three weeks and 200+ patches later I think I have to admit that I
seriously underestimated the magnitude of the project! ;-)
This work in progress series is large, but it I think makes the
code maintainable and hackable again. It's pretty complete, as
per the 9 high level goals laid out further below. Individual
patches are all finegrained, so should be easy to review - Boris
Petkov already reviewed most of the patches so they are not
entirely raw.
Individual patches have been tested heavily for bisectability, they
were both build and boot on a relatively wide range of x86 hardware
that I have access to. But nevertheless the changes are pretty
invasive, so I'd expect there to be test failures.
This is the only time I intend to post them to lkml in their entirety,
to not spam lkml too much. (Future additions will be posted as delta
series.)
I'd like to ask interested people to test this tree, and to comment
on the patches. The changes can be found in the following Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tmp.fpu
(The tree might be rebased, depending on feedback.)
Here are the main themes that motivated most of the changes:
1)
I collected all FPU code into arch/x86/kernel/fpu/*.c and split it
all up into the following, topically organized source code files:
-rw-rw-r-- 1 mingo mingo 1423 May 5 16:36 arch/x86/kernel/fpu/bugs.c
-rw-rw-r-- 1 mingo mingo 12206 May 5 16:36 arch/x86/kernel/fpu/core.c
-rw-rw-r-- 1 mingo mingo 7342 May 5 16:36 arch/x86/kernel/fpu/init.c
-rw-rw-r-- 1 mingo mingo 10909 May 5 16:36 arch/x86/kernel/fpu/measure.c
-rw-rw-r-- 1 mingo mingo 9012 May 5 16:36 arch/x86/kernel/fpu/regset.c
-rw-rw-r-- 1 mingo mingo 11188 May 5 16:36 arch/x86/kernel/fpu/signal.c
-rw-rw-r-- 1 mingo mingo 10140 May 5 16:36 arch/x86/kernel/fpu/xstate.c
Similarly I've collected and split up all FPU related header files, and
organized them topically:
-rw-rw-r-- 1 mingo mingo 1690 May 5 16:35 arch/x86/include/asm/fpu/api.h
-rw-rw-r-- 1 mingo mingo 12937 May 5 16:36 arch/x86/include/asm/fpu/internal.h
-rw-rw-r-- 1 mingo mingo 278 May 5 16:36 arch/x86/include/asm/fpu/measure.h
-rw-rw-r-- 1 mingo mingo 596 May 5 16:35 arch/x86/include/asm/fpu/regset.h
-rw-rw-r-- 1 mingo mingo 1013 May 5 16:35 arch/x86/include/asm/fpu/signal.h
-rw-rw-r-- 1 mingo mingo 8137 May 5 16:36 arch/x86/include/asm/fpu/types.h
-rw-rw-r-- 1 mingo mingo 5691 May 5 16:36 arch/x86/include/asm/fpu/xstate.h
<fpu/api.h> is the only 'public' API left, used in various drivers.
I decoupled drivers and non-FPU x86 code from various FPU internals.
2)
I renamed various internal data types, APIs and helpers, and organized its
support functions accordingly.
For example, all functions that deal with copying FPU registers in and
out of the FPU, are now named consistently:
copy_fxregs_to_kernel() # was: fpu_fxsave()
copy_xregs_to_kernel() # was: xsave_state()
copy_kernel_to_fregs() # was: frstor_checking()
copy_kernel_to_fxregs() # was: fxrstor_checking()
copy_kernel_to_xregs() # was: fpu_xrstor_checking()
copy_kernel_to_xregs_booting() # was: xrstor_state_booting()
copy_fregs_to_user() # was: fsave_user()
copy_fxregs_to_user() # was: fxsave_user()
copy_xregs_to_user() # was: xsave_user()
copy_user_to_fregs() # was: frstor_user()
copy_user_to_fxregs() # was: fxrstor_user()
copy_user_to_xregs() # was: xrestore_user()
copy_user_to_fpregs_zeroing() # was: restore_user_xstate()
'xregs' stands for registers supported by XSAVE
'fxregs' stands for registers supported by FXSAVE
'fregs' stands for registers supported by FSAVE
'fpregs' stands for generic FPU registers.
Similarly, the high level FPU functions got reorganized as well:
extern void fpu__activate_curr(struct fpu *fpu);
extern void fpu__activate_stopped(struct fpu *fpu);
extern void fpu__save(struct fpu *fpu);
extern void fpu__restore(struct fpu *fpu);
extern int fpu__restore_sig(void __user *buf, int ia32_frame);
extern void fpu__drop(struct fpu *fpu);
extern int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
extern void fpu__clear(struct fpu *fpu);
extern int fpu__exception_code(struct fpu *fpu, int trap_nr);
Those functions that used to take a task_struct argument now take
the more limited 'struct fpu' argument, and their naming is consistent
and logical as well.
Likewise, the FP state data types are now consistently named as well:
struct fregs_state;
struct fxregs_state;
struct swregs_state;
struct xregs_state;
union fpregs_state;
3)
Various core data types got streamlined around four byte flags in 'struct fpu':
fpu->fpstate_active # was: tsk->flags & PF_USED_MATH
fpu->fpregs_active # was: fpu->has_fpu
fpu->last_cpu
fpu->counter
which now fit into a single word.
4)
task->thread.fpu->state got embedded again, as task->thread.fpu.state. This
eliminated a lot of awkward late dynamic memory allocation of FPU state
and the problematic handling of failures.
Note that while the allocation is static right now, this is a WIP interim
state: we can still do dynamic allocation of FPU state, by moving the FPU
state last in task_struct and then allocating task_struct accordingly.
5)
The amazingly convoluted init dependencies got sorted out, into two
cleanly separated families of initialization functions: the
fpu__init_system_*() functions, and the fpu__init_cpu_*() functions.
This allowed the removal of various __init annotation hacks and
obscure boot time checks.
6)
Decoupled the FPU core from the save code. xsave.c and xsave.h got
shrunk quite a bit, and it now hosts only XSAVE/etc. related
functionality, not generic FPU handling functions.
7)
Added a ton of comments explaining how things works and why, hopefully
making this code accessible to everyone interested.
8)
Added FPU debugging code (CONFIG_X86_DEBUG_FPU=y) and added an FPU hw
benchmarking subsystem (CONFIG_X86_DEBUG_FPU_MEASUREMENTS=y), which
performs boot time measurements like:
x86/fpu:##################################################################
x86/fpu: Running FPU performance measurement suite (cache hot):
x86/fpu: Cost of: null : 108 cycles
x86/fpu:######## CPU instructions: ############################
x86/fpu: Cost of: NOP insn : 0 cycles
x86/fpu: Cost of: RDTSC insn : 12 cycles
x86/fpu: Cost of: RDMSR insn : 100 cycles
x86/fpu: Cost of: WRMSR insn : 396 cycles
x86/fpu: Cost of: CLI insn same-IF : 0 cycles
x86/fpu: Cost of: CLI insn flip-IF : 0 cycles
x86/fpu: Cost of: STI insn same-IF : 0 cycles
x86/fpu: Cost of: STI insn flip-IF : 0 cycles
x86/fpu: Cost of: PUSHF insn : 0 cycles
x86/fpu: Cost of: POPF insn same-IF : 20 cycles
x86/fpu: Cost of: POPF insn flip-IF : 28 cycles
x86/fpu:######## IRQ save/restore APIs: ############################
x86/fpu: Cost of: local_irq_save() fn : 20 cycles
x86/fpu: Cost of: local_irq_restore() fn same-IF : 24 cycles
x86/fpu: Cost of: local_irq_restore() fn flip-IF : 28 cycles
x86/fpu: Cost of: irq_save()+restore() fn same-IF : 48 cycles
x86/fpu: Cost of: irq_save()+restore() fn flip-IF : 48 cycles
x86/fpu:######## locking APIs: ############################
x86/fpu: Cost of: smp_mb() fn : 40 cycles
x86/fpu: Cost of: cpu_relax() fn : 8 cycles
x86/fpu: Cost of: spin_lock()+unlock() fn : 64 cycles
x86/fpu: Cost of: read_lock()+unlock() fn : 76 cycles
x86/fpu: Cost of: write_lock()+unlock() fn : 52 cycles
x86/fpu: Cost of: rcu_read_lock()+unlock() fn : 16 cycles
x86/fpu: Cost of: preempt_disable()+enable() fn : 20 cycles
x86/fpu: Cost of: mutex_lock()+unlock() fn : 56 cycles
x86/fpu:######## MM instructions: ############################
x86/fpu: Cost of: __flush_tlb() fn : 132 cycles
x86/fpu: Cost of: __flush_tlb_global() fn : 920 cycles
x86/fpu: Cost of: __flush_tlb_one() fn : 288 cycles
x86/fpu: Cost of: __flush_tlb_range() fn : 412 cycles
x86/fpu:######## FPU instructions: ############################
x86/fpu: Cost of: CR0 read : 4 cycles
x86/fpu: Cost of: CR0 write : 208 cycles
x86/fpu: Cost of: CR0::TS fault : 1156 cycles
x86/fpu: Cost of: FNINIT insn : 76 cycles
x86/fpu: Cost of: FWAIT insn : 0 cycles
x86/fpu: Cost of: FSAVE insn : 168 cycles
x86/fpu: Cost of: FRSTOR insn : 160 cycles
x86/fpu: Cost of: FXSAVE insn : 84 cycles
x86/fpu: Cost of: FXRSTOR insn : 44 cycles
x86/fpu: Cost of: FXRSTOR fault : 688 cycles
x86/fpu: Cost of: XSAVE insn : 104 cycles
x86/fpu: Cost of: XRSTOR insn : 80 cycles
x86/fpu: Cost of: XRSTOR fault : 884 cycles
x86/fpu:##################################################################
Based on such measurements we'll be able to do performance tuning,
set default policies and do optimizations in a more informed fashion,
as the speed of various x86 hardware varies a lot.
9)
Reworked many ancient inlining and uninlining decisions based on
modern principles.
Any feedback is welcome!
Thanks,
Ingo
=====
Ingo Molnar (208):
x86/fpu: Rename unlazy_fpu() to fpu__save()
x86/fpu: Add comments to fpu__save() and restrict its export
x86/fpu: Add debugging check to fpu__save()
x86/fpu: Rename fpu_detect() to fpu__detect()
x86/fpu: Remove stale init_fpu() prototype
x86/fpu: Split an fpstate_alloc_init() function out of init_fpu()
x86/fpu: Make init_fpu() static
x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check
x86/fpu: Optimize fpu__unlazy_stopped()
x86/fpu: Simplify fpu__unlazy_stopped()
x86/fpu: Remove fpu_allocated()
x86/fpu: Move fpu_alloc() out of line
x86/fpu: Rename fpu_alloc() to fpstate_alloc()
x86/fpu: Rename fpu_free() to fpstate_free()
x86/fpu: Rename fpu_finit() to fpstate_init()
x86/fpu: Rename fpu_init() to fpu__cpu_init()
x86/fpu: Rename init_thread_xstate() to fpstate_xstate_init_size()
x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter
x86/fpu: Improve the comment for the fpu::counter field
x86/fpu: Move FPU data structures to asm/fpu_types.h
x86/fpu: Clean up asm/fpu/types.h
x86/fpu: Move i387.c and xsave.c to arch/x86/kernel/fpu/
x86/fpu: Fix header file dependencies of fpu-internal.h
x86/fpu: Split out the boot time FPU init code into fpu/init.c
x86/fpu: Remove unnecessary includes from core.c
x86/fpu: Move the no_387 handling and FPU detection code into init.c
x86/fpu: Remove the free_thread_xstate() complication
x86/fpu: Factor out fpu__flush_thread() from flush_thread()
x86/fpu: Move math_state_restore() to fpu/core.c
x86/fpu: Rename math_state_restore() to fpu__restore()
x86/fpu: Factor out the FPU bug detection code into fpu__init_check_bugs()
x86/fpu: Simplify the xsave_state*() methods
x86/fpu: Remove fpu_xsave()
x86/fpu: Move task_xstate_cachep handling to core.c
x86/fpu: Factor out fpu__copy()
x86/fpu: Uninline fpstate_free() and move it next to the allocation function
x86/fpu: Make task_xstate_cachep static
x86/fpu: Make kernel_fpu_disable/enable() static
x86/fpu: Add debug check to kernel_fpu_disable()
x86/fpu: Add kernel_fpu_disabled()
x86/fpu: Remove __save_init_fpu()
x86/fpu: Move fpu_copy() to fpu/core.c
x86/fpu: Add debugging check to fpu_copy()
x86/fpu: Print out whether we are doing lazy/eager FPU context switches
x86/fpu: Eliminate the __thread_has_fpu() wrapper
x86/fpu: Change __thread_clear_has_fpu() to 'struct fpu' parameter
x86/fpu: Move 'PER_CPU(fpu_owner_task)' to fpu/core.c
x86/fpu: Change fpu_owner_task to fpu_fpregs_owner_ctx
x86/fpu: Remove 'struct task_struct' usage from __thread_set_has_fpu()
x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_end()
x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_begin()
x86/fpu: Open code PF_USED_MATH usages
x86/fpu: Document fpu__unlazy_stopped()
x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active
x86/fpu: Remove 'struct task_struct' usage from drop_fpu()
x86/fpu: Remove task_disable_lazy_fpu_restore()
x86/fpu: Use 'struct fpu' in fpu_lazy_restore()
x86/fpu: Use 'struct fpu' in restore_fpu_checking()
x86/fpu: Use 'struct fpu' in fpu_reset_state()
x86/fpu: Use 'struct fpu' in switch_fpu_prepare()
x86/fpu: Use 'struct fpu' in switch_fpu_finish()
x86/fpu: Move __save_fpu() into fpu/core.c
x86/fpu: Use 'struct fpu' in __fpu_save()
x86/fpu: Use 'struct fpu' in fpu__save()
x86/fpu: Use 'struct fpu' in fpu_copy()
x86/fpu: Use 'struct fpu' in fpu__copy()
x86/fpu: Use 'struct fpu' in fpstate_alloc_init()
x86/fpu: Use 'struct fpu' in fpu__unlazy_stopped()
x86/fpu: Rename fpu__flush_thread() to fpu__clear()
x86/fpu: Clean up fpu__clear() a bit
x86/fpu: Rename i387.h to fpu/api.h
x86/fpu: Move xsave.h to fpu/xsave.h
x86/fpu: Rename fpu-internal.h to fpu/internal.h
x86/fpu: Move MXCSR_DEFAULT to fpu/internal.h
x86/fpu: Remove xsave_init() __init obfuscation
x86/fpu: Remove assembly guard from asm/fpu/api.h
x86/fpu: Improve FPU detection kernel messages
x86/fpu: Print supported xstate features in human readable way
x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask'
x86/fpu: Rename 'xstate_features' to 'xfeatures_nr'
x86/fpu: Move XCR0 manipulation to the FPU code proper
x86/fpu: Clean up regset functions
x86/fpu: Rename 'xsave_hdr' to 'header'
x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures'
x86/fpu: Clean up and fix MXCSR handling
x86/fpu: Rename regset FPU register accessors
x86/fpu: Explain the AVX register layout in the xsave area
x86/fpu: Improve the __sanitize_i387_state() documentation
x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active
x86/fpu: Rename __thread_set_has_fpu() to __fpregs_activate()
x86/fpu: Rename __thread_clear_has_fpu() to __fpregs_deactivate()
x86/fpu: Rename __thread_fpu_begin() to fpregs_activate()
x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate()
x86/fpu: Remove fpstate_xstate_init_size() boot quirk
x86/fpu: Remove xsave_init() bootmem allocations
x86/fpu: Make setup_init_fpu_buf() run-once explicitly
x86/fpu: Remove 'init_xstate_buf' bootmem allocation
x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts
x86/fpu: Make the system/cpu init distinction clear in the xstate code as well
x86/fpu: Move CPU capability check into fpu__init_cpu_xstate()
x86/fpu: Move legacy check to fpu__init_system_xstate()
x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate()
x86/fpu: Remove xsave_init()
x86/fpu: Do fpu__init_system_xstate only from fpu__init_system()
x86/fpu: Set up the legacy FPU init image from fpu__init_system()
x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init()
x86/fpu: Move all eager-fpu setup code to eager_fpu_init()
x86/fpu: Move eager_fpu_init() to fpu/init.c
x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init()
x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions
x86/fpu: Do CLTS fpu__init_system()
x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system()
x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu()
x86/fpu: Do system-wide setup from fpu__detect()
x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system()
x86/fpu: Simplify fpu__cpu_init()
x86/fpu: Factor out fpu__init_cpu_generic()
x86/fpu: Factor out fpu__init_system_generic()
x86/fpu: Factor out fpu__init_system_early_generic()
x86/fpu: Move !FPU check ingo fpu__init_system_early_generic()
x86/fpu: Factor out FPU bug checks into fpu/bugs.c
x86/fpu: Make check_fpu() init ordering independent
x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect()
x86/fpu: Remove the extra fpu__detect() layer
x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy()
x86/fpu: Reorder init methods
x86/fpu: Add more comments to the FPU init code
x86/fpu: Move fpu__save() to fpu/internals.h
x86/fpu: Uninline kernel_fpu_begin()/end()
x86/fpu: Move various internal function prototypes to fpu/internal.h
x86/fpu: Uninline the irq_ts_save()/restore() functions
x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate()
x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions
x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again)
x86/fpu: Remove failure paths from fpstate-alloc low level functions
x86/fpu: Remove failure return from fpstate_alloc_init()
x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr()
x86/fpu: Simplify fpu__unlazy_stopped() error handling
x86/fpu, kvm: Simplify fx_init()
x86/fpu: Simplify fpstate_init_curr() usage
x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped()
x86/fpu: Factor out FPU hw activation/deactivation
x86/fpu: Simplify __save_fpu()
x86/fpu: Eliminate __save_fpu()
x86/fpu: Simplify fpu__save()
x86/fpu: Optimize fpu__save()
x86/fpu: Optimize fpu_copy()
x86/fpu: Optimize fpu_copy() some more on lazy switching systems
x86/fpu: Rename fpu/xsave.h to fpu/xstate.h
x86/fpu: Rename fpu/xsave.c to fpu/xstate.c
x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name)
x86/fpu: Simplify print_xstate_features()
x86/fpu: Enumerate xfeature bits
x86/fpu: Move xfeature type enumeration to fpu/types.h
x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks
x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks
x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() xfeature checks
x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() xfeature checks
x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() xfeature checks
x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks
x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks
x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks
x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c
x86/fpu: Move asm/xcr.h to asm/fpu/internal.h
x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate()
x86/fpu: Simplify fpstate_sanitize_xstate() calls
x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate()
x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe()
x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe()
x86/fpu: Clarify ancient comments in fpu__restore()
x86/fpu: Rename user_has_fpu() to fpregs_active()
x86/fpu: Initialize fpregs in fpu__init_cpu_generic()
x86/fpu: Clean up fpu__clear() state handling
x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state()
x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state()
x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs()
x86/fpu: Move all the fpu__*() high level methods closer to each other
x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing
x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig()
x86/fpu: Move the signal frame handling code closer to each other
x86/fpu: Merge fpu__reset() and fpu__clear()
x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h
x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions
x86/fpu: Factor out fpu/regset.h from fpu/internal.h
x86/fpu: Remove run-once init quirks
x86/fpu: Factor out the exception error code handling code
x86/fpu: Harmonize the names of the fpstate_init() helper functions
x86/fpu: Create 'union thread_xstate' helper for fpstate_init()
x86/fpu: Generalize 'init_xstate_ctx'
x86/fpu: Move restore_init_xstate() out of fpu/internal.h
x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions
x86/fpu: Factor out fpu/signal.c
x86/fpu: Factor out the FPU regset code into fpu/regset.c
x86/fpu: Harmonize FPU register state types
x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments
x86/fpu: Document the various fpregs state formats
x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin()
x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end
x86/fpu: Clean up xstate feature reservation
x86/fpu/xstate: Clean up setup_xstate_comp() call
x86/fpu/init: Propagate __init annotations
x86/fpu: Pass 'struct fpu' to fpu__restore()
x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT
x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code
x86/fpu: Add FPU performance measurement subsystem
x86/fpu: Reorganize fpu/internal.h
Documentation/preempt-locking.txt | 2 +-
arch/x86/Kconfig.debug | 27 ++
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx2_glue.c | 15 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 15 +-
arch/x86/crypto/cast5_avx_glue.c | 15 +-
arch/x86/crypto/cast6_avx_glue.c | 15 +-
arch/x86/crypto/crc32-pclmul_glue.c | 2 +-
arch/x86/crypto/crc32c-intel_glue.c | 3 +-
arch/x86/crypto/crct10dif-pclmul_glue.c | 2 +-
arch/x86/crypto/fpu.c | 2 +-
arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +-
arch/x86/crypto/serpent_avx2_glue.c | 15 +-
arch/x86/crypto/serpent_avx_glue.c | 15 +-
arch/x86/crypto/sha-mb/sha1_mb.c | 5 +-
arch/x86/crypto/sha1_ssse3_glue.c | 16 +-
arch/x86/crypto/sha256_ssse3_glue.c | 16 +-
arch/x86/crypto/sha512_ssse3_glue.c | 16 +-
arch/x86/crypto/twofish_avx_glue.c | 16 +-
arch/x86/ia32/ia32_signal.c | 13 +-
arch/x86/include/asm/alternative.h | 6 +
arch/x86/include/asm/crypto/glue_helper.h | 2 +-
arch/x86/include/asm/efi.h | 2 +-
arch/x86/include/asm/fpu-internal.h | 626 ---------------------------------------
arch/x86/include/asm/fpu/api.h | 48 +++
arch/x86/include/asm/fpu/internal.h | 488 ++++++++++++++++++++++++++++++
arch/x86/include/asm/fpu/measure.h | 13 +
arch/x86/include/asm/fpu/regset.h | 21 ++
arch/x86/include/asm/fpu/signal.h | 33 +++
arch/x86/include/asm/fpu/types.h | 293 ++++++++++++++++++
arch/x86/include/asm/{xsave.h => fpu/xstate.h} | 60 ++--
arch/x86/include/asm/i387.h | 108 -------
arch/x86/include/asm/kvm_host.h | 2 -
arch/x86/include/asm/mpx.h | 8 +-
arch/x86/include/asm/processor.h | 141 +--------
arch/x86/include/asm/simd.h | 2 +-
arch/x86/include/asm/stackprotector.h | 2 +
arch/x86/include/asm/suspend_32.h | 2 +-
arch/x86/include/asm/suspend_64.h | 2 +-
arch/x86/include/asm/user.h | 12 +-
arch/x86/include/asm/xcr.h | 49 ---
arch/x86/include/asm/xor.h | 2 +-
arch/x86/include/asm/xor_32.h | 2 +-
arch/x86/include/asm/xor_avx.h | 2 +-
arch/x86/include/uapi/asm/sigcontext.h | 8 +-
arch/x86/kernel/Makefile | 2 +-
arch/x86/kernel/alternative.c | 5 +
arch/x86/kernel/cpu/bugs.c | 57 +---
arch/x86/kernel/cpu/bugs_64.c | 2 +
arch/x86/kernel/cpu/common.c | 29 +-
arch/x86/kernel/fpu/Makefile | 11 +
arch/x86/kernel/fpu/bugs.c | 71 +++++
arch/x86/kernel/fpu/core.c | 509 +++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/init.c | 288 ++++++++++++++++++
arch/x86/kernel/fpu/measure.c | 509 +++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/regset.c | 356 ++++++++++++++++++++++
arch/x86/kernel/fpu/signal.c | 404 +++++++++++++++++++++++++
arch/x86/kernel/fpu/xstate.c | 406 +++++++++++++++++++++++++
arch/x86/kernel/i387.c | 656 ----------------------------------------
arch/x86/kernel/process.c | 52 +---
arch/x86/kernel/process_32.c | 15 +-
arch/x86/kernel/process_64.c | 13 +-
arch/x86/kernel/ptrace.c | 12 +-
arch/x86/kernel/signal.c | 38 ++-
arch/x86/kernel/smpboot.c | 3 +-
arch/x86/kernel/traps.c | 120 ++------
arch/x86/kernel/xsave.c | 724 ---------------------------------------------
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/vmx.c | 5 +-
arch/x86/kvm/x86.c | 68 ++---
arch/x86/lguest/boot.c | 2 +-
arch/x86/lib/mmx_32.c | 2 +-
arch/x86/math-emu/fpu_aux.c | 4 +-
arch/x86/math-emu/fpu_entry.c | 20 +-
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/mpx.c | 15 +-
arch/x86/power/cpu.c | 11 +-
arch/x86/xen/enlighten.c | 2 +-
drivers/char/hw_random/via-rng.c | 2 +-
drivers/crypto/padlock-aes.c | 2 +-
drivers/crypto/padlock-sha.c | 2 +-
drivers/lguest/x86/core.c | 12 +-
lib/raid6/x86.h | 2 +-
83 files changed, 3742 insertions(+), 2841 deletions(-)
delete mode 100644 arch/x86/include/asm/fpu-internal.h
create mode 100644 arch/x86/include/asm/fpu/api.h
create mode 100644 arch/x86/include/asm/fpu/internal.h
create mode 100644 arch/x86/include/asm/fpu/measure.h
create mode 100644 arch/x86/include/asm/fpu/regset.h
create mode 100644 arch/x86/include/asm/fpu/signal.h
create mode 100644 arch/x86/include/asm/fpu/types.h
rename arch/x86/include/asm/{xsave.h => fpu/xstate.h} (77%)
delete mode 100644 arch/x86/include/asm/i387.h
delete mode 100644 arch/x86/include/asm/xcr.h
create mode 100644 arch/x86/kernel/fpu/Makefile
create mode 100644 arch/x86/kernel/fpu/bugs.c
create mode 100644 arch/x86/kernel/fpu/core.c
create mode 100644 arch/x86/kernel/fpu/init.c
create mode 100644 arch/x86/kernel/fpu/measure.c
create mode 100644 arch/x86/kernel/fpu/regset.c
create mode 100644 arch/x86/kernel/fpu/signal.c
create mode 100644 arch/x86/kernel/fpu/xstate.c
delete mode 100644 arch/x86/kernel/i387.c
delete mode 100644 arch/x86/kernel/xsave.c
--
2.1.0
The name 'xstate_features' does not tell us whether it's a bitmap
or any other value. That it's a count of features is only obvious
if you read the code that calculates it.
Rename it to the more descriptive 'xfeatures_nr' name.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index c0e95538d689..1d0e27128f18 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -25,7 +25,9 @@ struct xsave_struct *init_xstate_buf;
static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32;
static unsigned int *xstate_offsets, *xstate_sizes;
static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
-static unsigned int xstate_features;
+
+/* The number of supported xfeatures in xfeatures_mask: */
+static unsigned int xfeatures_nr;
/*
* If a processor implementation discern that a processor state component is
@@ -465,9 +467,9 @@ static void __init setup_xstate_features(void)
{
int eax, ebx, ecx, edx, leaf = 0x2;
- xstate_features = fls64(xfeatures_mask);
- xstate_offsets = alloc_bootmem(xstate_features * sizeof(int));
- xstate_sizes = alloc_bootmem(xstate_features * sizeof(int));
+ xfeatures_nr = fls64(xfeatures_mask);
+ xstate_offsets = alloc_bootmem(xfeatures_nr * sizeof(int));
+ xstate_sizes = alloc_bootmem(xfeatures_nr * sizeof(int));
do {
cpuid_count(XSTATE_CPUID, leaf, &eax, &ebx, &ecx, &edx);
@@ -528,7 +530,7 @@ void setup_xstate_comp(void)
xstate_comp_offsets[1] = offsetof(struct i387_fxsave_struct, xmm_space);
if (!cpu_has_xsaves) {
- for (i = 2; i < xstate_features; i++) {
+ for (i = 2; i < xfeatures_nr; i++) {
if (test_bit(i, (unsigned long *)&xfeatures_mask)) {
xstate_comp_offsets[i] = xstate_offsets[i];
xstate_comp_sizes[i] = xstate_sizes[i];
@@ -539,7 +541,7 @@ void setup_xstate_comp(void)
xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
- for (i = 2; i < xstate_features; i++) {
+ for (i = 2; i < xfeatures_nr; i++) {
if (test_bit(i, (unsigned long *)&xfeatures_mask))
xstate_comp_sizes[i] = xstate_sizes[i];
else
--
2.1.0
The suspend code accesses FPU state internals, add a helper for
it and isolate it.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 1 +
arch/x86/kernel/fpu/xsave.c | 12 ++++++++++++
arch/x86/power/cpu.c | 10 ++--------
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index f1eddcccba16..5bdde8ca87bc 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -23,6 +23,7 @@ extern void fpu__clear(struct task_struct *tsk);
extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);
extern void fpu__restore(void);
extern void fpu__init_check_bugs(void);
+extern void fpu__resume_cpu(void);
extern bool irq_fpu_usable(void);
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 1d0e27128f18..a485180ebc32 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -735,6 +735,18 @@ void __init_refok eager_fpu_init(void)
}
/*
+ * Restore minimal FPU state after suspend:
+ */
+void fpu__resume_cpu(void)
+{
+ /*
+ * Restore XCR0 on xsave capable CPUs:
+ */
+ if (cpu_has_xsave)
+ xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
+}
+
+/*
* Given the xsave area and a state inside, this function returns the
* address of the state.
*
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 62054acbd0d8..ad0ce6b70fac 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -18,10 +18,8 @@
#include <asm/mtrr.h>
#include <asm/page.h>
#include <asm/mce.h>
-#include <asm/xcr.h>
#include <asm/suspend.h>
#include <asm/debugreg.h>
-#include <asm/fpu/internal.h> /* xfeatures_mask */
#include <asm/cpu.h>
#ifdef CONFIG_X86_32
@@ -155,6 +153,8 @@ static void fix_processor_context(void)
#endif
load_TR_desc(); /* This does ltr */
load_LDT(¤t->active_mm->context); /* This does lldt */
+
+ fpu__resume_cpu();
}
/**
@@ -221,12 +221,6 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
wrmsrl(MSR_KERNEL_GS_BASE, ctxt->gs_kernel_base);
#endif
- /*
- * restore XCR0 for xsave capable cpu's.
- */
- if (cpu_has_xsave)
- xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
-
fix_processor_context();
do_fpu_end();
--
2.1.0
Clean up various regset handlers: use the 'fpu' pointer which
is available in most cases.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7b98da7e1b55..54070d817960 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -438,7 +438,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
sanitize_i387_state(target);
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu.state->fxsave, 0, -1);
+ &fpu->state->fxsave, 0, -1);
}
int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
@@ -458,19 +458,19 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
sanitize_i387_state(target);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu.state->fxsave, 0, -1);
+ &fpu->state->fxsave, 0, -1);
/*
* mxcsr reserved bits must be masked to zero for security reasons.
*/
- target->thread.fpu.state->fxsave.mxcsr &= mxcsr_feature_mask;
+ fpu->state->fxsave.mxcsr &= mxcsr_feature_mask;
/*
* update the header bits in the xsave header, indicating the
* presence of FP and SSE state.
*/
if (cpu_has_xsave)
- target->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
+ fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
return ret;
}
@@ -490,7 +490,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;
- xsave = &target->thread.fpu.state->xsave;
+ xsave = &fpu->state->xsave;
/*
* Copy the 48bytes defined by the software first into the xstate
@@ -521,7 +521,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;
- xsave = &target->thread.fpu.state->xsave;
+ xsave = &fpu->state->xsave;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
/*
@@ -533,6 +533,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
* These bits must be zero.
*/
memset(&xsave->xsave_hdr.reserved, 0, 48);
+
return ret;
}
@@ -690,7 +691,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu.state->fsave, 0,
+ &fpu->state->fsave, 0,
-1);
sanitize_i387_state(target);
@@ -724,7 +725,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &target->thread.fpu.state->fsave, 0,
+ &fpu->state->fsave, 0,
-1);
if (pos > 0 || count < sizeof(env))
@@ -739,7 +740,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP.
*/
if (cpu_has_xsave)
- target->thread.fpu.state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
+ fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
return ret;
}
--
2.1.0
Code like:
fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
is an eyesore, because not only is the words 'xsave' and 'state'
are repeated twice times (!), but also because of the 'hdr' and 'bv'
abbreviations that are pretty meaningless at a first glance.
Start cleaning this up by renaming 'xsave_hdr' to 'header'.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/include/asm/fpu/types.h | 4 ++--
arch/x86/include/asm/fpu/xsave.h | 2 +-
arch/x86/include/asm/user.h | 10 +++++-----
arch/x86/include/uapi/asm/sigcontext.h | 4 ++--
arch/x86/kernel/fpu/core.c | 8 ++++----
arch/x86/kernel/fpu/xsave.c | 22 +++++++++++-----------
arch/x86/kvm/x86.c | 8 ++++----
8 files changed, 30 insertions(+), 30 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 0e9a7a37801a..3007df99833e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -261,7 +261,7 @@ static inline int fpu_save_init(struct fpu *fpu)
/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ if (!(fpu->state->xsave.header.xstate_bv & XSTATE_FP))
return 1;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 917d2e56426a..33c0c7b782db 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -99,7 +99,7 @@ struct bndcsr {
u64 bndstatus;
} __packed;
-struct xsave_hdr_struct {
+struct xstate_header {
u64 xstate_bv;
u64 xcomp_bv;
u64 reserved[6];
@@ -107,7 +107,7 @@ struct xsave_hdr_struct {
struct xsave_struct {
struct i387_fxsave_struct i387;
- struct xsave_hdr_struct xsave_hdr;
+ struct xstate_header header;
struct ymmh_struct ymmh;
struct lwp_struct lwp;
struct bndreg bndreg[4];
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xsave.h
index 400d5b2e42eb..b27b4466f88d 100644
--- a/arch/x86/include/asm/fpu/xsave.h
+++ b/arch/x86/include/asm/fpu/xsave.h
@@ -212,7 +212,7 @@ static inline int xsave_user(struct xsave_struct __user *buf)
* Clear the xsave header first, so that reserved fields are
* initialized to zero.
*/
- err = __clear_user(&buf->xsave_hdr, sizeof(buf->xsave_hdr));
+ err = __clear_user(&buf->header, sizeof(buf->header));
if (unlikely(err))
return -EFAULT;
diff --git a/arch/x86/include/asm/user.h b/arch/x86/include/asm/user.h
index ccab4af1646d..fa042410c42c 100644
--- a/arch/x86/include/asm/user.h
+++ b/arch/x86/include/asm/user.h
@@ -14,7 +14,7 @@ struct user_ymmh_regs {
__u32 ymmh_space[64];
};
-struct user_xsave_hdr {
+struct user_xstate_header {
__u64 xstate_bv;
__u64 reserved1[2];
__u64 reserved2[5];
@@ -41,11 +41,11 @@ struct user_xsave_hdr {
* particular process/thread.
*
* Also when the user modifies certain state FP/SSE/etc through the
- * ptrace interface, they must ensure that the xsave_hdr.xstate_bv
+ * ptrace interface, they must ensure that the header.xstate_bv
* bytes[512..519] of the memory layout are updated correspondingly.
* i.e., for example when FP state is modified to a non-init state,
- * xsave_hdr.xstate_bv's bit 0 must be set to '1', when SSE is modified to
- * non-init state, xsave_hdr.xstate_bv's bit 1 must to be set to '1', etc.
+ * header.xstate_bv's bit 0 must be set to '1', when SSE is modified to
+ * non-init state, header.xstate_bv's bit 1 must to be set to '1', etc.
*/
#define USER_XSTATE_FX_SW_WORDS 6
#define USER_XSTATE_XCR0_WORD 0
@@ -55,7 +55,7 @@ struct user_xstateregs {
__u64 fpx_space[58];
__u64 xstate_fx_sw[USER_XSTATE_FX_SW_WORDS];
} i387;
- struct user_xsave_hdr xsave_hdr;
+ struct user_xstate_header header;
struct user_ymmh_regs ymmh;
/* further processor state extensions go here */
};
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 16dc4e8a2cd3..7f850f7b5c45 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -209,7 +209,7 @@ struct sigcontext {
#endif /* !__i386__ */
-struct _xsave_hdr {
+struct _header {
__u64 xstate_bv;
__u64 reserved1[2];
__u64 reserved2[5];
@@ -228,7 +228,7 @@ struct _ymmh_state {
*/
struct _xstate {
struct _fpstate fpstate;
- struct _xsave_hdr xstate_hdr;
+ struct _header xstate_hdr;
struct _ymmh_state ymmh;
/* new processor state extensions go here */
};
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 54070d817960..eabf4380366a 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -470,7 +470,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP and SSE state.
*/
if (cpu_has_xsave)
- fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FPSSE;
+ fpu->state->xsave.header.xstate_bv |= XSTATE_FPSSE;
return ret;
}
@@ -528,11 +528,11 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
* mxcsr reserved bits must be masked to zero for security reasons.
*/
xsave->i387.mxcsr &= mxcsr_feature_mask;
- xsave->xsave_hdr.xstate_bv &= xfeatures_mask;
+ xsave->header.xstate_bv &= xfeatures_mask;
/*
* These bits must be zero.
*/
- memset(&xsave->xsave_hdr.reserved, 0, 48);
+ memset(&xsave->header.reserved, 0, 48);
return ret;
}
@@ -740,7 +740,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP.
*/
if (cpu_has_xsave)
- fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
+ fpu->state->xsave.header.xstate_bv |= XSTATE_FP;
return ret;
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index a485180ebc32..03639fa079b0 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -32,7 +32,7 @@ static unsigned int xfeatures_nr;
/*
* If a processor implementation discern that a processor state component is
* in its initialized state it may modify the corresponding bit in the
- * xsave_hdr.xstate_bv as '0', with out modifying the corresponding memory
+ * header.xstate_bv as '0', with out modifying the corresponding memory
* layout in the case of xsaveopt. While presenting the xstate information to
* the user, we always ensure that the memory layout of a feature will be in
* the init state if the corresponding header bit is zero. This is to ensure
@@ -48,7 +48,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!fx)
return;
- xstate_bv = tsk->thread.fpu.state->xsave.xsave_hdr.xstate_bv;
+ xstate_bv = tsk->thread.fpu.state->xsave.header.xstate_bv;
/*
* None of the feature bits are in init state. So nothing else
@@ -106,7 +106,7 @@ static inline int check_for_xstate(struct i387_fxsave_struct __user *buf,
struct _fpx_sw_bytes *fx_sw)
{
int min_xstate_size = sizeof(struct i387_fxsave_struct) +
- sizeof(struct xsave_hdr_struct);
+ sizeof(struct xstate_header);
unsigned int magic2;
if (__copy_from_user(fx_sw, &buf->sw_reserved[0], sizeof(*fx_sw)))
@@ -178,7 +178,7 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
* Read the xstate_bv which we copied (directly from the cpu or
* from the state in task struct) to the user buffers.
*/
- err |= __get_user(xstate_bv, (__u32 *)&x->xsave_hdr.xstate_bv);
+ err |= __get_user(xstate_bv, (__u32 *)&x->header.xstate_bv);
/*
* For legacy compatible, we always set FP/SSE bits in the bit
@@ -193,7 +193,7 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
*/
xstate_bv |= XSTATE_FPSSE;
- err |= __put_user(xstate_bv, (__u32 *)&x->xsave_hdr.xstate_bv);
+ err |= __put_user(xstate_bv, (__u32 *)&x->header.xstate_bv);
return err;
}
@@ -280,20 +280,20 @@ sanitize_restored_xstate(struct task_struct *tsk,
u64 xstate_bv, int fx_only)
{
struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
- struct xsave_hdr_struct *xsave_hdr = &xsave->xsave_hdr;
+ struct xstate_header *header = &xsave->header;
if (use_xsave()) {
/* These bits must be zero. */
- memset(xsave_hdr->reserved, 0, 48);
+ memset(header->reserved, 0, 48);
/*
* Init the state that is not present in the memory
* layout and not enabled by the OS.
*/
if (fx_only)
- xsave_hdr->xstate_bv = XSTATE_FPSSE;
+ header->xstate_bv = XSTATE_FPSSE;
else
- xsave_hdr->xstate_bv &= (xfeatures_mask & xstate_bv);
+ header->xstate_bv &= (xfeatures_mask & xstate_bv);
}
if (use_fxsr()) {
@@ -574,9 +574,9 @@ static void __init setup_init_fpu_buf(void)
print_xstate_features();
if (cpu_has_xsaves) {
- init_xstate_buf->xsave_hdr.xcomp_bv =
+ init_xstate_buf->header.xcomp_bv =
(u64)1 << 63 | xfeatures_mask;
- init_xstate_buf->xsave_hdr.xstate_bv = xfeatures_mask;
+ init_xstate_buf->header.xstate_bv = xfeatures_mask;
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 91d7f3b1e50c..ac24889c8bc3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3197,7 +3197,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
{
struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
- u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+ u64 xstate_bv = xsave->header.xstate_bv;
u64 valid;
/*
@@ -3243,9 +3243,9 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
memcpy(xsave, src, XSAVE_HDR_OFFSET);
/* Set XSTATE_BV and possibly XCOMP_BV. */
- xsave->xsave_hdr.xstate_bv = xstate_bv;
+ xsave->header.xstate_bv = xstate_bv;
if (cpu_has_xsaves)
- xsave->xsave_hdr.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
+ xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
/*
* Copy each region from the non-compacted offset to the
@@ -7014,7 +7014,7 @@ int fx_init(struct kvm_vcpu *vcpu)
fpstate_init(&vcpu->arch.guest_fpu);
if (cpu_has_xsaves)
- vcpu->arch.guest_fpu.state->xsave.xsave_hdr.xcomp_bv =
+ vcpu->arch.guest_fpu.state->xsave.header.xcomp_bv =
host_xcr0 | XSTATE_COMPACTION_ENABLED;
/*
--
2.1.0
'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for?
It probably comes from the 'XGETBV' instruction name, but I could
not find in the Intel documentation where that abbreviation comes
from. It could mean 'bit vector' - or something else?
But how about - instead of guessing about a weird name - we named
the field in an obvious and descriptive way that tells us exactly
what it does?
So rename it to 'xfeatures', which is a bitmask of the
xfeatures that are fpstate_active in that context structure.
Eyesore like:
fpu->state->xsave.xsave_hdr.xstate_bv |= XSTATE_FP;
is now much more readable:
fpu->state->xsave.header.xfeatures |= XSTATE_FP;
Which form is not just infinitely more readable, but is also
shorter as well.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/include/asm/fpu/types.h | 2 +-
arch/x86/include/asm/user.h | 8 ++++----
arch/x86/include/uapi/asm/sigcontext.h | 4 ++--
arch/x86/kernel/fpu/core.c | 6 +++---
arch/x86/kernel/fpu/xsave.c | 52 ++++++++++++++++++++++++++--------------------------
arch/x86/kvm/x86.c | 4 ++--
7 files changed, 39 insertions(+), 39 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 3007df99833e..07c6adc02f68 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -261,7 +261,7 @@ static inline int fpu_save_init(struct fpu *fpu)
/*
* xsave header may indicate the init state of the FP.
*/
- if (!(fpu->state->xsave.header.xstate_bv & XSTATE_FP))
+ if (!(fpu->state->xsave.header.xfeatures & XSTATE_FP))
return 1;
} else if (use_fxsr()) {
fpu_fxsave(fpu);
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 33c0c7b782db..9bd2cd1a19fd 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -100,7 +100,7 @@ struct bndcsr {
} __packed;
struct xstate_header {
- u64 xstate_bv;
+ u64 xfeatures;
u64 xcomp_bv;
u64 reserved[6];
} __attribute__((packed));
diff --git a/arch/x86/include/asm/user.h b/arch/x86/include/asm/user.h
index fa042410c42c..59a54e869f15 100644
--- a/arch/x86/include/asm/user.h
+++ b/arch/x86/include/asm/user.h
@@ -15,7 +15,7 @@ struct user_ymmh_regs {
};
struct user_xstate_header {
- __u64 xstate_bv;
+ __u64 xfeatures;
__u64 reserved1[2];
__u64 reserved2[5];
};
@@ -41,11 +41,11 @@ struct user_xstate_header {
* particular process/thread.
*
* Also when the user modifies certain state FP/SSE/etc through the
- * ptrace interface, they must ensure that the header.xstate_bv
+ * ptrace interface, they must ensure that the header.xfeatures
* bytes[512..519] of the memory layout are updated correspondingly.
* i.e., for example when FP state is modified to a non-init state,
- * header.xstate_bv's bit 0 must be set to '1', when SSE is modified to
- * non-init state, header.xstate_bv's bit 1 must to be set to '1', etc.
+ * header.xfeatures's bit 0 must be set to '1', when SSE is modified to
+ * non-init state, header.xfeatures's bit 1 must to be set to '1', etc.
*/
#define USER_XSTATE_FX_SW_WORDS 6
#define USER_XSTATE_XCR0_WORD 0
diff --git a/arch/x86/include/uapi/asm/sigcontext.h b/arch/x86/include/uapi/asm/sigcontext.h
index 7f850f7b5c45..0e8a973de9ee 100644
--- a/arch/x86/include/uapi/asm/sigcontext.h
+++ b/arch/x86/include/uapi/asm/sigcontext.h
@@ -25,7 +25,7 @@ struct _fpx_sw_bytes {
__u32 extended_size; /* total size of the layout referred by
* fpstate pointer in the sigcontext.
*/
- __u64 xstate_bv;
+ __u64 xfeatures;
/* feature bit mask (including fp/sse/extended
* state) that is present in the memory
* layout.
@@ -210,7 +210,7 @@ struct sigcontext {
#endif /* !__i386__ */
struct _header {
- __u64 xstate_bv;
+ __u64 xfeatures;
__u64 reserved1[2];
__u64 reserved2[5];
};
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index eabf4380366a..c12dd3c0aabb 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -470,7 +470,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP and SSE state.
*/
if (cpu_has_xsave)
- fpu->state->xsave.header.xstate_bv |= XSTATE_FPSSE;
+ fpu->state->xsave.header.xfeatures |= XSTATE_FPSSE;
return ret;
}
@@ -528,7 +528,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
* mxcsr reserved bits must be masked to zero for security reasons.
*/
xsave->i387.mxcsr &= mxcsr_feature_mask;
- xsave->header.xstate_bv &= xfeatures_mask;
+ xsave->header.xfeatures &= xfeatures_mask;
/*
* These bits must be zero.
*/
@@ -740,7 +740,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP.
*/
if (cpu_has_xsave)
- fpu->state->xsave.header.xstate_bv |= XSTATE_FP;
+ fpu->state->xsave.header.xfeatures |= XSTATE_FP;
return ret;
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 03639fa079b0..467e4635bd29 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -32,7 +32,7 @@ static unsigned int xfeatures_nr;
/*
* If a processor implementation discern that a processor state component is
* in its initialized state it may modify the corresponding bit in the
- * header.xstate_bv as '0', with out modifying the corresponding memory
+ * header.xfeatures as '0', with out modifying the corresponding memory
* layout in the case of xsaveopt. While presenting the xstate information to
* the user, we always ensure that the memory layout of a feature will be in
* the init state if the corresponding header bit is zero. This is to ensure
@@ -43,24 +43,24 @@ void __sanitize_i387_state(struct task_struct *tsk)
{
struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
int feature_bit = 0x2;
- u64 xstate_bv;
+ u64 xfeatures;
if (!fx)
return;
- xstate_bv = tsk->thread.fpu.state->xsave.header.xstate_bv;
+ xfeatures = tsk->thread.fpu.state->xsave.header.xfeatures;
/*
* None of the feature bits are in init state. So nothing else
* to do for us, as the memory layout is up to date.
*/
- if ((xstate_bv & xfeatures_mask) == xfeatures_mask)
+ if ((xfeatures & xfeatures_mask) == xfeatures_mask)
return;
/*
* FP is in init state
*/
- if (!(xstate_bv & XSTATE_FP)) {
+ if (!(xfeatures & XSTATE_FP)) {
fx->cwd = 0x37f;
fx->swd = 0;
fx->twd = 0;
@@ -73,17 +73,17 @@ void __sanitize_i387_state(struct task_struct *tsk)
/*
* SSE is in init state
*/
- if (!(xstate_bv & XSTATE_SSE))
+ if (!(xfeatures & XSTATE_SSE))
memset(&fx->xmm_space[0], 0, 256);
- xstate_bv = (xfeatures_mask & ~xstate_bv) >> 2;
+ xfeatures = (xfeatures_mask & ~xfeatures) >> 2;
/*
* Update all the other memory layouts for which the corresponding
* header bit is in the init state.
*/
- while (xstate_bv) {
- if (xstate_bv & 0x1) {
+ while (xfeatures) {
+ if (xfeatures & 0x1) {
int offset = xstate_offsets[feature_bit];
int size = xstate_sizes[feature_bit];
@@ -92,7 +92,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
size);
}
- xstate_bv >>= 1;
+ xfeatures >>= 1;
feature_bit++;
}
}
@@ -162,7 +162,7 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
{
struct xsave_struct __user *x = buf;
struct _fpx_sw_bytes *sw_bytes;
- u32 xstate_bv;
+ u32 xfeatures;
int err;
/* Setup the bytes not touched by the [f]xsave and reserved for SW. */
@@ -175,25 +175,25 @@ static inline int save_xstate_epilog(void __user *buf, int ia32_frame)
err |= __put_user(FP_XSTATE_MAGIC2, (__u32 *)(buf + xstate_size));
/*
- * Read the xstate_bv which we copied (directly from the cpu or
+ * Read the xfeatures which we copied (directly from the cpu or
* from the state in task struct) to the user buffers.
*/
- err |= __get_user(xstate_bv, (__u32 *)&x->header.xstate_bv);
+ err |= __get_user(xfeatures, (__u32 *)&x->header.xfeatures);
/*
* For legacy compatible, we always set FP/SSE bits in the bit
* vector while saving the state to the user context. This will
* enable us capturing any changes(during sigreturn) to
* the FP/SSE bits by the legacy applications which don't touch
- * xstate_bv in the xsave header.
+ * xfeatures in the xsave header.
*
- * xsave aware apps can change the xstate_bv in the xsave
+ * xsave aware apps can change the xfeatures in the xsave
* header as well as change any contents in the memory layout.
* xrestore as part of sigreturn will capture all the changes.
*/
- xstate_bv |= XSTATE_FPSSE;
+ xfeatures |= XSTATE_FPSSE;
- err |= __put_user(xstate_bv, (__u32 *)&x->header.xstate_bv);
+ err |= __put_user(xfeatures, (__u32 *)&x->header.xfeatures);
return err;
}
@@ -277,7 +277,7 @@ int save_xstate_sig(void __user *buf, void __user *buf_fx, int size)
static inline void
sanitize_restored_xstate(struct task_struct *tsk,
struct user_i387_ia32_struct *ia32_env,
- u64 xstate_bv, int fx_only)
+ u64 xfeatures, int fx_only)
{
struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
struct xstate_header *header = &xsave->header;
@@ -291,9 +291,9 @@ sanitize_restored_xstate(struct task_struct *tsk,
* layout and not enabled by the OS.
*/
if (fx_only)
- header->xstate_bv = XSTATE_FPSSE;
+ header->xfeatures = XSTATE_FPSSE;
else
- header->xstate_bv &= (xfeatures_mask & xstate_bv);
+ header->xfeatures &= (xfeatures_mask & xfeatures);
}
if (use_fxsr()) {
@@ -335,7 +335,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
int state_size = xstate_size;
- u64 xstate_bv = 0;
+ u64 xfeatures = 0;
int fx_only = 0;
ia32_fxstate &= (config_enabled(CONFIG_X86_32) ||
@@ -369,7 +369,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
fx_only = 1;
} else {
state_size = fx_sw_user.xstate_size;
- xstate_bv = fx_sw_user.xstate_bv;
+ xfeatures = fx_sw_user.xfeatures;
}
}
@@ -398,7 +398,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
fpstate_init(fpu);
err = -1;
} else {
- sanitize_restored_xstate(tsk, &env, xstate_bv, fx_only);
+ sanitize_restored_xstate(tsk, &env, xfeatures, fx_only);
}
fpu->fpstate_active = 1;
@@ -415,7 +415,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
* state to the registers directly (with exceptions handled).
*/
user_fpu_begin();
- if (restore_user_xstate(buf_fx, xstate_bv, fx_only)) {
+ if (restore_user_xstate(buf_fx, xfeatures, fx_only)) {
fpu_reset_state(fpu);
return -1;
}
@@ -441,7 +441,7 @@ static void prepare_fx_sw_frame(void)
fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
fx_sw_reserved.extended_size = size;
- fx_sw_reserved.xstate_bv = xfeatures_mask;
+ fx_sw_reserved.xfeatures = xfeatures_mask;
fx_sw_reserved.xstate_size = xstate_size;
if (config_enabled(CONFIG_IA32_EMULATION)) {
@@ -576,7 +576,7 @@ static void __init setup_init_fpu_buf(void)
if (cpu_has_xsaves) {
init_xstate_buf->header.xcomp_bv =
(u64)1 << 63 | xfeatures_mask;
- init_xstate_buf->header.xstate_bv = xfeatures_mask;
+ init_xstate_buf->header.xfeatures = xfeatures_mask;
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ac24889c8bc3..0b58b9397098 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3197,7 +3197,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
{
struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
- u64 xstate_bv = xsave->header.xstate_bv;
+ u64 xstate_bv = xsave->header.xfeatures;
u64 valid;
/*
@@ -3243,7 +3243,7 @@ static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
memcpy(xsave, src, XSAVE_HDR_OFFSET);
/* Set XSTATE_BV and possibly XCOMP_BV. */
- xsave->header.xstate_bv = xstate_bv;
+ xsave->header.xfeatures = xstate_bv;
if (cpu_has_xsaves)
xsave->header.xcomp_bv = host_xcr0 | XSTATE_COMPACTION_ENABLED;
--
2.1.0
The code has the following problems:
- it uses a single global 'fx_scratch' area that multiple CPUs could
write into simultaneously, in theory.
- it wastes 512 bytes of .data for something that is only rarely used.
Fix this by moving the state buffer to the stack. Note that while
this is 512 bytes, we don't ever call this function in very deep
callchains, so its stack usage should not be a problem.
Also add comments to explain the magic 0x0000ffbf default value.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 33df056b1624..0b16f61cb2a4 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -68,18 +68,26 @@ void fpu__init_check_bugs(void)
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
+
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
-static struct i387_fxsave_struct fx_scratch;
static void mxcsr_feature_mask_init(void)
{
- unsigned long mask = 0;
+ unsigned int mask = 0;
if (cpu_has_fxsr) {
- memset(&fx_scratch, 0, sizeof(struct i387_fxsave_struct));
- asm volatile("fxsave %0" : "+m" (fx_scratch));
- mask = fx_scratch.mxcsr_mask;
+ struct i387_fxsave_struct fx_tmp __aligned(32) = { };
+
+ asm volatile("fxsave %0" : "+m" (fx_tmp));
+
+ mask = fx_tmp.mxcsr_mask;
+
+ /*
+ * If zero then use the default features mask,
+ * which has all features set, except the
+ * denormals-are-zero feature bit:
+ */
if (mask == 0)
mask = 0x0000ffbf;
}
--
2.1.0
Rename regset accessors to prefix them with 'regset_', because we
want to start using the 'fpregs_active' name elsewhere.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 8 ++++----
arch/x86/kernel/fpu/core.c | 6 +++---
arch/x86/kernel/ptrace.c | 6 +++---
3 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 07c6adc02f68..6eea81c068fb 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -46,17 +46,17 @@ extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
extern void convert_to_fxsr(struct task_struct *tsk,
const struct user_i387_ia32_struct *env);
-extern user_regset_active_fn fpregs_active, xfpregs_active;
+extern user_regset_active_fn regset_fpregs_active, regset_xregset_fpregs_active;
extern user_regset_get_fn fpregs_get, xfpregs_get, fpregs_soft_get,
xstateregs_get;
extern user_regset_set_fn fpregs_set, xfpregs_set, fpregs_soft_set,
xstateregs_set;
/*
- * xstateregs_active == fpregs_active. Please refer to the comment
- * at the definition of fpregs_active.
+ * xstateregs_active == regset_fpregs_active. Please refer to the comment
+ * at the definition of regset_fpregs_active.
*/
-#define xstateregs_active fpregs_active
+#define xstateregs_active regset_fpregs_active
#ifdef CONFIG_MATH_EMULATION
extern void finit_soft_fpu(struct i387_soft_struct *soft);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index c12dd3c0aabb..35ef1f9b56b3 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -403,18 +403,18 @@ void fpu__clear(struct task_struct *tsk)
}
/*
- * The xstateregs_active() routine is the same as the fpregs_active() routine,
+ * The xstateregs_active() routine is the same as the regset_fpregs_active() routine,
* as the "regset->n" for the xstate regset will be updated based on the feature
* capabilites supported by the xsave.
*/
-int fpregs_active(struct task_struct *target, const struct user_regset *regset)
+int regset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
{
struct fpu *target_fpu = &target->thread.fpu;
return target_fpu->fpstate_active ? regset->n : 0;
}
-int xfpregs_active(struct task_struct *target, const struct user_regset *regset)
+int regset_xregset_fpregs_active(struct task_struct *target, const struct user_regset *regset)
{
struct fpu *target_fpu = &target->thread.fpu;
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index c14a00f54b61..4c615661ec72 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1296,7 +1296,7 @@ static struct user_regset x86_64_regsets[] __read_mostly = {
.core_note_type = NT_PRFPREG,
.n = sizeof(struct user_i387_struct) / sizeof(long),
.size = sizeof(long), .align = sizeof(long),
- .active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
+ .active = regset_xregset_fpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
[REGSET_XSTATE] = {
.core_note_type = NT_X86_XSTATE,
@@ -1337,13 +1337,13 @@ static struct user_regset x86_32_regsets[] __read_mostly = {
.core_note_type = NT_PRFPREG,
.n = sizeof(struct user_i387_ia32_struct) / sizeof(u32),
.size = sizeof(u32), .align = sizeof(u32),
- .active = fpregs_active, .get = fpregs_get, .set = fpregs_set
+ .active = regset_fpregs_active, .get = fpregs_get, .set = fpregs_set
},
[REGSET_XFP] = {
.core_note_type = NT_PRXFPREG,
.n = sizeof(struct user32_fxsr_struct) / sizeof(u32),
.size = sizeof(u32), .align = sizeof(u32),
- .active = xfpregs_active, .get = xfpregs_get, .set = xfpregs_set
+ .active = regset_xregset_fpregs_active, .get = xfpregs_get, .set = xfpregs_set
},
[REGSET_XSTATE] = {
.core_note_type = NT_X86_XSTATE,
--
2.1.0
The previous explanation was rather cryptic.
Also transform "u32 [64]" to the more readable "u8[256]" form.
No change in implementation.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/types.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 9bd2cd1a19fd..8a5120a3b48b 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -78,9 +78,16 @@ struct i387_soft_struct {
u32 entry_eip;
};
+/*
+ * There are 16x 256-bit AVX registers named YMM0-YMM15.
+ * The low 128 bits are aliased to the 16 SSE registers (XMM0-XMM15)
+ * and are stored in 'struct i387_fxsave_struct::xmm_space[]'.
+ *
+ * The high 128 bits are stored here:
+ * 16x 128 bits == 256 bytes.
+ */
struct ymmh_struct {
- /* 16 * 16 bytes for each YMMH-reg = 256 bytes */
- u32 ymmh_space[64];
+ u8 ymmh_space[256];
};
/* We don't support LWP yet: */
--
2.1.0
Improve the comments and add new ones, as this code isn't very obvious.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 467e4635bd29..f3d30f0c50f9 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -30,19 +30,23 @@ static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
static unsigned int xfeatures_nr;
/*
- * If a processor implementation discern that a processor state component is
- * in its initialized state it may modify the corresponding bit in the
- * header.xfeatures as '0', with out modifying the corresponding memory
- * layout in the case of xsaveopt. While presenting the xstate information to
- * the user, we always ensure that the memory layout of a feature will be in
- * the init state if the corresponding header bit is zero. This is to ensure
- * that the user doesn't see some stale state in the memory layout during
- * signal handling, debugging etc.
+ * When executing XSAVEOPT (optimized XSAVE), if a processor implementation
+ * detects that an FPU state component is still (or is again) in its
+ * initialized state, it may clear the corresponding bit in the header.xfeatures
+ * field, and can skip the writeout of registers to the corresponding memory layout.
+ *
+ * This means that when the bit is zero, the state component might still contain
+ * some previous - non-initialized register state.
+ *
+ * Before writing xstate information to user-space we sanitize those components,
+ * to always ensure that the memory layout of a feature will be in the init state
+ * if the corresponding header bit is zero. This is to ensure that user-space doesn't
+ * see some stale state in the memory layout during signal handling, debugging etc.
*/
void __sanitize_i387_state(struct task_struct *tsk)
{
struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
- int feature_bit = 0x2;
+ int feature_bit;
u64 xfeatures;
if (!fx)
@@ -76,19 +80,25 @@ void __sanitize_i387_state(struct task_struct *tsk)
if (!(xfeatures & XSTATE_SSE))
memset(&fx->xmm_space[0], 0, 256);
+ /*
+ * First two features are FPU and SSE, which above we handled
+ * in a special way already:
+ */
+ feature_bit = 0x2;
xfeatures = (xfeatures_mask & ~xfeatures) >> 2;
/*
- * Update all the other memory layouts for which the corresponding
- * header bit is in the init state.
+ * Update all the remaining memory layouts according to their
+ * standard xstate layout, if their header bit is in the init
+ * state:
*/
while (xfeatures) {
if (xfeatures & 0x1) {
int offset = xstate_offsets[feature_bit];
int size = xstate_sizes[feature_bit];
- memcpy(((void *) fx) + offset,
- ((void *) init_xstate_buf) + offset,
+ memcpy((void *)fx + offset,
+ (void *)init_xstate_buf + offset,
size);
}
--
2.1.0
So the current code uses fpu->has_cpu to determine whether a given
user FPU context is actively loaded into the FPU's registers [*] and
that those registers represent the task's current FPU state.
But this term is not unambiguous: especially the distinction between
fpu->has_fpu, PF_USED_MATH and fpu_fpregs_owner_ctx is not clear.
Increase clarity by unambigously signalling that it's about
hardware registers being active right now, by renaming it to
fpu->fpregs_active.
( In later patches we'll use more of the 'fpregs' naming, which will
make it easier to grep for as well. )
[*] There's the kernel_fpu_begin()/end() primitive that also
activates FPU hw registers as well and uses them, without
touching the fpu->fpregs_active flag.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 2 +-
arch/x86/include/asm/fpu/internal.h | 12 ++++++------
arch/x86/include/asm/fpu/types.h | 2 +-
arch/x86/kernel/fpu/core.c | 10 +++++-----
4 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 5bdde8ca87bc..4ca745c0d92e 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -96,7 +96,7 @@ static inline void irq_ts_restore(int TS_state)
*/
static inline int user_has_fpu(void)
{
- return current->thread.fpu.has_fpu;
+ return current->thread.fpu.fpregs_active;
}
extern void fpu__save(struct fpu *fpu);
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 6eea81c068fb..b546ec816fd6 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -308,7 +308,7 @@ static inline int restore_fpu_checking(struct fpu *fpu)
"fnclex\n\t"
"emms\n\t"
"fildl %P[addr]" /* set F?P to defined value */
- : : [addr] "m" (fpu->has_fpu));
+ : : [addr] "m" (fpu->fpregs_active));
}
return fpu_restore_checking(fpu);
@@ -317,14 +317,14 @@ static inline int restore_fpu_checking(struct fpu *fpu)
/* Must be paired with an 'stts' after! */
static inline void __thread_clear_has_fpu(struct fpu *fpu)
{
- fpu->has_fpu = 0;
+ fpu->fpregs_active = 0;
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
/* Must be paired with a 'clts' before! */
static inline void __thread_set_has_fpu(struct fpu *fpu)
{
- fpu->has_fpu = 1;
+ fpu->fpregs_active = 1;
this_cpu_write(fpu_fpregs_owner_ctx, fpu);
}
@@ -357,7 +357,7 @@ static inline void drop_fpu(struct fpu *fpu)
preempt_disable();
fpu->counter = 0;
- if (fpu->has_fpu) {
+ if (fpu->fpregs_active) {
/* Ignore delayed exceptions from user space */
asm volatile("1: fwait\n"
"2:\n"
@@ -416,14 +416,14 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
fpu.preload = new_fpu->fpstate_active &&
(use_eager_fpu() || new_fpu->counter > 5);
- if (old_fpu->has_fpu) {
+ if (old_fpu->fpregs_active) {
if (!fpu_save_init(old_fpu))
old_fpu->last_cpu = -1;
else
old_fpu->last_cpu = cpu;
/* But leave fpu_fpregs_owner_ctx! */
- old_fpu->has_fpu = 0;
+ old_fpu->fpregs_active = 0;
/* Don't change CR0.TS if we just switch! */
if (fpu.preload) {
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 8a5120a3b48b..231a8f53b2f8 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -142,7 +142,7 @@ struct fpu {
*/
unsigned int last_cpu;
- unsigned int has_fpu;
+ unsigned int fpregs_active;
union thread_xstate *state;
/*
* This counter contains the number of consecutive context switches
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 35ef1f9b56b3..a7fb56266a2d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -62,7 +62,7 @@ static bool interrupted_kernel_fpu_idle(void)
if (use_eager_fpu())
return true;
- return !current->thread.fpu.has_fpu && (read_cr0() & X86_CR0_TS);
+ return !current->thread.fpu.fpregs_active && (read_cr0() & X86_CR0_TS);
}
/*
@@ -100,7 +100,7 @@ void __kernel_fpu_begin(void)
kernel_fpu_disable();
- if (fpu->has_fpu) {
+ if (fpu->fpregs_active) {
fpu_save_init(fpu);
} else {
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
@@ -114,7 +114,7 @@ void __kernel_fpu_end(void)
{
struct fpu *fpu = ¤t->thread.fpu;
- if (fpu->has_fpu) {
+ if (fpu->fpregs_active) {
if (WARN_ON(restore_fpu_checking(fpu)))
fpu_reset_state(fpu);
} else if (!use_eager_fpu()) {
@@ -147,7 +147,7 @@ void fpu__save(struct fpu *fpu)
WARN_ON(fpu != ¤t->thread.fpu);
preempt_disable();
- if (fpu->has_fpu) {
+ if (fpu->fpregs_active) {
if (use_eager_fpu()) {
__save_fpu(fpu);
} else {
@@ -243,7 +243,7 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
{
dst_fpu->counter = 0;
- dst_fpu->has_fpu = 0;
+ dst_fpu->fpregs_active = 0;
dst_fpu->state = NULL;
dst_fpu->last_cpu = -1;
--
2.1.0
Propagate the 'fpu->fpregs_active' naming to the functions that
sets it.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index b546ec816fd6..3554a8cdaece 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -322,7 +322,7 @@ static inline void __thread_clear_has_fpu(struct fpu *fpu)
}
/* Must be paired with a 'clts' before! */
-static inline void __thread_set_has_fpu(struct fpu *fpu)
+static inline void __fpregs_activate(struct fpu *fpu)
{
fpu->fpregs_active = 1;
this_cpu_write(fpu_fpregs_owner_ctx, fpu);
@@ -346,7 +346,7 @@ static inline void __thread_fpu_begin(struct fpu *fpu)
{
if (!use_eager_fpu())
clts();
- __thread_set_has_fpu(fpu);
+ __fpregs_activate(fpu);
}
static inline void drop_fpu(struct fpu *fpu)
@@ -428,7 +428,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
/* Don't change CR0.TS if we just switch! */
if (fpu.preload) {
new_fpu->counter++;
- __thread_set_has_fpu(new_fpu);
+ __fpregs_activate(new_fpu);
prefetch(new_fpu->state);
} else if (!use_eager_fpu())
stts();
--
2.1.0
Propagate the 'fpu->fpregs_active' naming to the functions that
clears it.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 3554a8cdaece..7a235171be6c 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -315,7 +315,7 @@ static inline int restore_fpu_checking(struct fpu *fpu)
}
/* Must be paired with an 'stts' after! */
-static inline void __thread_clear_has_fpu(struct fpu *fpu)
+static inline void __fpregs_deactivate(struct fpu *fpu)
{
fpu->fpregs_active = 0;
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
@@ -337,7 +337,7 @@ static inline void __fpregs_activate(struct fpu *fpu)
*/
static inline void __thread_fpu_end(struct fpu *fpu)
{
- __thread_clear_has_fpu(fpu);
+ __fpregs_deactivate(fpu);
if (!use_eager_fpu())
stts();
}
--
2.1.0
Propagate the 'fpu->fpregs_active' naming to the high level
function that sets it.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 6 +++---
arch/x86/kernel/fpu/core.c | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 7a235171be6c..18a62239c73d 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -342,7 +342,7 @@ static inline void __thread_fpu_end(struct fpu *fpu)
stts();
}
-static inline void __thread_fpu_begin(struct fpu *fpu)
+static inline void fpregs_activate(struct fpu *fpu)
{
if (!use_eager_fpu())
clts();
@@ -441,7 +441,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
fpu.preload = 0;
else
prefetch(new_fpu->state);
- __thread_fpu_begin(new_fpu);
+ fpregs_activate(new_fpu);
}
}
return fpu;
@@ -499,7 +499,7 @@ static inline void user_fpu_begin(void)
preempt_disable();
if (!user_has_fpu())
- __thread_fpu_begin(fpu);
+ fpregs_activate(fpu);
preempt_enable();
}
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index a7fb56266a2d..75b94985c82e 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -368,9 +368,9 @@ void fpu__restore(void)
local_irq_disable();
}
- /* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */
+ /* Avoid __kernel_fpu_begin() right after fpregs_activate() */
kernel_fpu_disable();
- __thread_fpu_begin(fpu);
+ fpregs_activate(fpu);
if (unlikely(restore_fpu_checking(fpu))) {
fpu_reset_state(fpu);
force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
--
2.1.0
Propagate the 'fpu->fpregs_active' naming to the high level function that
clears it.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 14 +++++++-------
arch/x86/kernel/fpu/core.c | 2 +-
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 18a62239c73d..0292fcc4d441 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -335,18 +335,18 @@ static inline void __fpregs_activate(struct fpu *fpu)
* These generally need preemption protection to work,
* do try to avoid using these on their own.
*/
-static inline void __thread_fpu_end(struct fpu *fpu)
+static inline void fpregs_activate(struct fpu *fpu)
{
- __fpregs_deactivate(fpu);
if (!use_eager_fpu())
- stts();
+ clts();
+ __fpregs_activate(fpu);
}
-static inline void fpregs_activate(struct fpu *fpu)
+static inline void fpregs_deactivate(struct fpu *fpu)
{
+ __fpregs_deactivate(fpu);
if (!use_eager_fpu())
- clts();
- __fpregs_activate(fpu);
+ stts();
}
static inline void drop_fpu(struct fpu *fpu)
@@ -362,7 +362,7 @@ static inline void drop_fpu(struct fpu *fpu)
asm volatile("1: fwait\n"
"2:\n"
_ASM_EXTABLE(1b, 2b));
- __thread_fpu_end(fpu);
+ fpregs_deactivate(fpu);
}
fpu->fpstate_active = 0;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 75b94985c82e..2c47bcf63e1e 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -152,7 +152,7 @@ void fpu__save(struct fpu *fpu)
__save_fpu(fpu);
} else {
fpu_save_init(fpu);
- __thread_fpu_end(fpu);
+ fpregs_deactivate(fpu);
}
}
preempt_enable();
--
2.1.0
fpstate_xstate_init_size() is called in fpu__cpu_init(), which is
run on every CPU, every time they are brought online.
But we want to call fpstate_xstate_init_size() only once. Move it to
fpu__detect(), which only runs once, on the boot CPU.
Also clean up the flow of fpstate_xstate_init_size() a bit, by
removing a 'return' from the middle of the function.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 0b16f61cb2a4..3e0fee5bc2e7 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -109,13 +109,12 @@ static void fpstate_xstate_init_size(void)
setup_clear_cpu_cap(X86_FEATURE_XSAVE);
setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
xstate_size = sizeof(struct i387_soft_struct);
- return;
+ } else {
+ if (cpu_has_fxsr)
+ xstate_size = sizeof(struct i387_fxsave_struct);
+ else
+ xstate_size = sizeof(struct i387_fsave_struct);
}
-
- if (cpu_has_fxsr)
- xstate_size = sizeof(struct i387_fxsave_struct);
- else
- xstate_size = sizeof(struct i387_fsave_struct);
}
/*
@@ -151,12 +150,6 @@ void fpu__cpu_init(void)
cr0 |= X86_CR0_EM;
write_cr0(cr0);
- /*
- * fpstate_xstate_init_size() is only called once, to avoid overriding
- * 'xstate_size' during (secondary CPU) bootup or during CPU hotplug.
- */
- if (xstate_size == 0)
- fpstate_xstate_init_size();
mxcsr_feature_mask_init();
xsave_init();
@@ -194,5 +187,7 @@ void fpu__detect(struct cpuinfo_x86 *c)
else
clear_cpu_cap(c, X86_FEATURE_FPU);
- /* The final cr0 value is set in fpu_init() */
+ /* The final cr0 value is set later, in fpu_init() */
+
+ fpstate_xstate_init_size();
}
--
2.1.0
There's only 8 xstate bits at the moment, and it's not like we
can support unknown bits - so put xstate_offsets[] and
xstate_sizes[] into static allocation.
This is in preparation to be able to call the FPU init code
earlier, when there's no bootmem available yet.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/xsave.h | 3 +++
arch/x86/kernel/fpu/xsave.c | 4 +---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xsave.h
index b27b4466f88d..fd564344783e 100644
--- a/arch/x86/include/asm/fpu/xsave.h
+++ b/arch/x86/include/asm/fpu/xsave.h
@@ -15,6 +15,9 @@
#define XSTATE_ZMM_Hi256 0x40
#define XSTATE_Hi16_ZMM 0x80
+/* The highest xstate bit above (of XSTATE_Hi16_ZMM): */
+#define XFEATURES_NR_MAX 8
+
#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
#define XSTATE_AVX512 (XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM)
/* Bit 63 of XCR0 is reserved for future expansion */
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index f3d30f0c50f9..adeab16655ae 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -23,7 +23,7 @@ u64 xfeatures_mask;
struct xsave_struct *init_xstate_buf;
static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32;
-static unsigned int *xstate_offsets, *xstate_sizes;
+static unsigned int xstate_offsets[XFEATURES_NR_MAX], xstate_sizes[XFEATURES_NR_MAX];
static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
/* The number of supported xfeatures in xfeatures_mask: */
@@ -478,8 +478,6 @@ static void __init setup_xstate_features(void)
int eax, ebx, ecx, edx, leaf = 0x2;
xfeatures_nr = fls64(xfeatures_mask);
- xstate_offsets = alloc_bootmem(xfeatures_nr * sizeof(int));
- xstate_sizes = alloc_bootmem(xfeatures_nr * sizeof(int));
do {
cpuid_count(XSTATE_CPUID, leaf, &eax, &ebx, &ecx, &edx);
--
2.1.0
Remove the dependency on the init_xstate_buf == NULL check to
implement once-per-bootup logic in eager_fpu_init(), by making
setup_init_fpu_buf() run once per bootup explicitly.
This is in preparation to make init_xstate_buf statically
allocated.
The various boot-once quirks in the FPU init code will be removed
in a later cleanup stage.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index adeab16655ae..d11b33514130 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -565,8 +565,14 @@ void setup_xstate_comp(void)
/*
* setup the xstate image representing the init state
*/
-static void __init setup_init_fpu_buf(void)
+static void setup_init_fpu_buf(void)
{
+ static int on_boot_cpu = 1;
+
+ if (!on_boot_cpu)
+ return;
+ on_boot_cpu = 0;
+
/*
* Setup init_xstate_buf to represent the init state of
* all the features managed by the xsave
@@ -738,8 +744,7 @@ void __init_refok eager_fpu_init(void)
return;
}
- if (!init_xstate_buf)
- setup_init_fpu_buf();
+ setup_init_fpu_buf();
}
/*
--
2.1.0
Make init_xstate_buf allocated statically at build time.
This structure's maximum size is around 1KB - and it's allocated even on
most modern embedded x86 CPUs which strive for FPU instruction set parity
with desktop and server CPUs, so it's not like we can save much on smaller
systems.
This removes the last bootmem allocation from the FPU init path, allowing
it to be called earlier in the boot sequence.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 4 ++--
arch/x86/include/asm/fpu/xsave.h | 2 +-
arch/x86/kernel/fpu/xsave.c | 26 +++++++++++---------------
3 files changed, 14 insertions(+), 18 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 0292fcc4d441..19b7cdf73efd 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -373,9 +373,9 @@ static inline void drop_fpu(struct fpu *fpu)
static inline void restore_init_xstate(void)
{
if (use_xsave())
- xrstor_state(init_xstate_buf, -1);
+ xrstor_state(&init_xstate_ctx, -1);
else
- fxrstor_checking(&init_xstate_buf->i387);
+ fxrstor_checking(&init_xstate_ctx.i387);
}
/*
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xsave.h
index fd564344783e..5c3ab4e17aea 100644
--- a/arch/x86/include/asm/fpu/xsave.h
+++ b/arch/x86/include/asm/fpu/xsave.h
@@ -50,7 +50,7 @@
extern unsigned int xstate_size;
extern u64 xfeatures_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
-extern struct xsave_struct *init_xstate_buf;
+extern struct xsave_struct init_xstate_ctx;
extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index d11b33514130..45130ba6f328 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -3,7 +3,6 @@
*
* Author: Suresh Siddha <[email protected]>
*/
-#include <linux/bootmem.h>
#include <linux/compat.h>
#include <linux/cpu.h>
#include <asm/fpu/api.h>
@@ -20,7 +19,7 @@ u64 xfeatures_mask;
/*
* Represents init state for the supported extended state.
*/
-struct xsave_struct *init_xstate_buf;
+struct xsave_struct init_xstate_ctx;
static struct _fpx_sw_bytes fx_sw_reserved, fx_sw_reserved_ia32;
static unsigned int xstate_offsets[XFEATURES_NR_MAX], xstate_sizes[XFEATURES_NR_MAX];
@@ -98,7 +97,7 @@ void __sanitize_i387_state(struct task_struct *tsk)
int size = xstate_sizes[feature_bit];
memcpy((void *)fx + offset,
- (void *)init_xstate_buf + offset,
+ (void *)&init_xstate_ctx + offset,
size);
}
@@ -325,12 +324,12 @@ static inline int restore_user_xstate(void __user *buf, u64 xbv, int fx_only)
if (use_xsave()) {
if ((unsigned long)buf % 64 || fx_only) {
u64 init_bv = xfeatures_mask & ~XSTATE_FPSSE;
- xrstor_state(init_xstate_buf, init_bv);
+ xrstor_state(&init_xstate_ctx, init_bv);
return fxrstor_user(buf);
} else {
u64 init_bv = xfeatures_mask & ~xbv;
if (unlikely(init_bv))
- xrstor_state(init_xstate_buf, init_bv);
+ xrstor_state(&init_xstate_ctx, init_bv);
return xrestore_user(buf, xbv);
}
} else if (use_fxsr()) {
@@ -574,12 +573,10 @@ static void setup_init_fpu_buf(void)
on_boot_cpu = 0;
/*
- * Setup init_xstate_buf to represent the init state of
+ * Setup init_xstate_ctx to represent the init state of
* all the features managed by the xsave
*/
- init_xstate_buf = alloc_bootmem_align(xstate_size,
- __alignof__(struct xsave_struct));
- fx_finit(&init_xstate_buf->i387);
+ fx_finit(&init_xstate_ctx.i387);
if (!cpu_has_xsave)
return;
@@ -588,21 +585,20 @@ static void setup_init_fpu_buf(void)
print_xstate_features();
if (cpu_has_xsaves) {
- init_xstate_buf->header.xcomp_bv =
- (u64)1 << 63 | xfeatures_mask;
- init_xstate_buf->header.xfeatures = xfeatures_mask;
+ init_xstate_ctx.header.xcomp_bv = (u64)1 << 63 | xfeatures_mask;
+ init_xstate_ctx.header.xfeatures = xfeatures_mask;
}
/*
* Init all the features state with header_bv being 0x0
*/
- xrstor_state_booting(init_xstate_buf, -1);
+ xrstor_state_booting(&init_xstate_ctx, -1);
/*
* Dump the init state again. This is to identify the init state
* of any feature which is not represented by all zero's.
*/
- xsave_state_booting(init_xstate_buf);
+ xsave_state_booting(&init_xstate_ctx);
}
static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
@@ -727,7 +723,7 @@ void xsave_init(void)
/*
* setup_init_fpu_buf() is __init and it is OK to call it here because
- * init_xstate_buf will be unset only once during boot.
+ * init_xstate_ctx will be unset only once during boot.
*/
void __init_refok eager_fpu_init(void)
{
--
2.1.0
There are two kinds of FPU initialization sequences necessary to bring FPU
functionality up: once per system bootup activities, such as detection,
feature initialization, etc. of attributes that are shared by all CPUs
in the system - and per cpu initialization sequences run when a CPU is
brought online (either during bootup or during CPU hotplug onlining),
such as CR0/CR4 register setting, etc.
The FPU code is mixing these roles together, with no clear distinction.
Start sorting this out by splitting the main FPU detection routine
(fpu__cpu_init()) into two parts: fpu__init_system() for
one per system init activities, and fpu__init_cpu() for the
per CPU onlining init activities.
Note that xstate_init() is called from both variants for the time being,
because it has a dual nature as well. We'll fix that in upcoming patches.
Just do the split and call it as we used to before, don't introduce any
change in initialization behavior yet, beyond duplicate (and harmless)
fpu__init_cpu() and xstate_init() calls - which we'll fix in later
patches.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 3e0fee5bc2e7..d6234adc8ba0 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -118,13 +118,9 @@ static void fpstate_xstate_init_size(void)
}
/*
- * Called on the boot CPU at bootup to set up the initial FPU state that
- * is later cloned into all processes.
- *
- * Also called on secondary CPUs to set up the FPU state of their
- * idle threads.
+ * Enable all supported FPU features. Called when a CPU is brought online.
*/
-void fpu__cpu_init(void)
+void fpu__init_cpu(void)
{
unsigned long cr0;
unsigned long cr4_mask = 0;
@@ -150,12 +146,29 @@ void fpu__cpu_init(void)
cr0 |= X86_CR0_EM;
write_cr0(cr0);
+ xsave_init();
+}
+
+/*
+ * Called on the boot CPU once per system bootup, to set up the initial FPU state that
+ * is later cloned into all processes.
+ */
+void fpu__init_system(void)
+{
+ /* The FPU has to be operational for some of the later FPU init activities: */
+ fpu__init_cpu();
mxcsr_feature_mask_init();
xsave_init();
eager_fpu_init();
}
+void fpu__cpu_init(void)
+{
+ fpu__init_cpu();
+ fpu__init_system();
+}
+
static int __init no_387(char *s)
{
setup_clear_cpu_cap(X86_FEATURE_FPU);
--
2.1.0
Rename existing xstate init functions along the system/cpu init principles:
fpu__init_system_xstate(): called once per system bootup
fpu__init_cpu_xstate(): called per CPU onlining
Also make the fpu__init_cpu_xstate() early code invariant:
if xfeatures_mask is not set yet then don't crash but return.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 3 +++
arch/x86/kernel/fpu/xsave.c | 22 ++++++++++++++--------
2 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 19b7cdf73efd..71d44be5acb1 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -39,6 +39,9 @@ extern unsigned int mxcsr_feature_mask;
extern void fpu__cpu_init(void);
extern void eager_fpu_init(void);
+extern void fpu__init_system_xstate(void);
+extern void fpu__init_cpu_xstate(void);
+
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 45130ba6f328..961c25850c7f 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -460,10 +460,14 @@ static void prepare_fx_sw_frame(void)
}
/*
- * Enable the extended processor state save/restore feature
+ * Enable the extended processor state save/restore feature.
+ * Called once per CPU onlining.
*/
-static inline void xstate_enable(void)
+void fpu__init_cpu_xstate(void)
{
+ if (!xfeatures_mask)
+ return;
+
cr4_set_bits(X86_CR4_OSXSAVE);
xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures_mask);
}
@@ -640,11 +644,12 @@ static void __init init_xstate_size(void)
/*
* Enable and initialize the xsave feature.
+ * Called once per system bootup.
*
* ( Not marked __init because of false positive section warnings
* generated by xsave_init(). )
*/
-static void /* __init */ xstate_enable_boot_cpu(void)
+void fpu__init_system_xstate(void)
{
unsigned int eax, ebx, ecx, edx;
@@ -666,7 +671,8 @@ static void /* __init */ xstate_enable_boot_cpu(void)
*/
xfeatures_mask = xfeatures_mask & XCNTXT_MASK;
- xstate_enable();
+ /* Enable xstate instructions to be able to continue with initialization: */
+ fpu__init_cpu_xstate();
/*
* Recompute the context size for enabled features
@@ -698,8 +704,8 @@ static void /* __init */ xstate_enable_boot_cpu(void)
}
/*
- * For the very first instance, this calls xstate_enable_boot_cpu();
- * for all subsequent instances, this calls xstate_enable().
+ * For the very first instance, this calls fpu__init_system_xstate();
+ * for all subsequent instances, this calls fpu__init_cpu_xstate().
*/
void xsave_init(void)
{
@@ -715,9 +721,9 @@ void xsave_init(void)
if (on_boot_cpu) {
on_boot_cpu = 0;
- xstate_enable_boot_cpu();
+ fpu__init_system_xstate();
} else {
- xstate_enable();
+ fpu__init_cpu_xstate();
}
}
--
2.1.0
fpu__init_system_xstate() does an FPU capability check that is better
done in fpu__init_cpu_xstate(). This will allow us to call
fpu__init_cpu_xstate() directly on legacy CPUs as well.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 961c25850c7f..0610f431f77f 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -465,7 +465,7 @@ static void prepare_fx_sw_frame(void)
*/
void fpu__init_cpu_xstate(void)
{
- if (!xfeatures_mask)
+ if (!cpu_has_xsave || !xfeatures_mask)
return;
cr4_set_bits(X86_CR4_OSXSAVE);
--
2.1.0
Now that legacy code can execute fpu__init_cpu_xstate() in
xsave_init(), we can move the once per boot legacy check into
fpu__init_system_xstate(), where it belongs.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 0610f431f77f..fd656cbdd315 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -653,6 +653,11 @@ void fpu__init_system_xstate(void)
{
unsigned int eax, ebx, ecx, edx;
+ if (!cpu_has_xsave) {
+ pr_info("x86/fpu: Legacy x87 FPU detected.\n");
+ return;
+ }
+
if (boot_cpu_data.cpuid_level < XSTATE_CPUID) {
WARN(1, "x86/fpu: XSTATE_CPUID missing!\n");
return;
@@ -711,14 +716,6 @@ void xsave_init(void)
{
static char on_boot_cpu = 1;
- if (!cpu_has_xsave) {
- if (on_boot_cpu) {
- on_boot_cpu = 0;
- pr_info("x86/fpu: Legacy x87 FPU detected.\n");
- }
- return;
- }
-
if (on_boot_cpu) {
on_boot_cpu = 0;
fpu__init_system_xstate();
--
2.1.0
Linearize the call sequence in xsave_init():
fpu__init_system_xstate();
fpu__init_cpu_xstate();
We do this by propagating the boot-once quirk into
fpu__init_system_xstate(). fpu__init_cpu_xstate() is
safe to be called multiple time.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index fd656cbdd315..9d5ff90916b1 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -652,6 +652,11 @@ static void __init init_xstate_size(void)
void fpu__init_system_xstate(void)
{
unsigned int eax, ebx, ecx, edx;
+ static bool on_boot_cpu = 1;
+
+ if (!on_boot_cpu)
+ return;
+ on_boot_cpu = 0;
if (!cpu_has_xsave) {
pr_info("x86/fpu: Legacy x87 FPU detected.\n");
@@ -714,14 +719,8 @@ void fpu__init_system_xstate(void)
*/
void xsave_init(void)
{
- static char on_boot_cpu = 1;
-
- if (on_boot_cpu) {
- on_boot_cpu = 0;
- fpu__init_system_xstate();
- } else {
- fpu__init_cpu_xstate();
- }
+ fpu__init_system_xstate();
+ fpu__init_cpu_xstate();
}
/*
--
2.1.0
Expand fpu__init_system_xstate() and fpu__init_cpu_xstate() calls
into xsave_init() calls.
(This will allow us to call the proper versions in higher level FPU init code
later on.)
No change in functionality.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/xsave.h | 1 -
arch/x86/kernel/fpu/init.c | 8 +++++---
arch/x86/kernel/fpu/xsave.c | 13 +------------
3 files changed, 6 insertions(+), 16 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xsave.h
index 5c3ab4e17aea..a10e66582c1b 100644
--- a/arch/x86/include/asm/fpu/xsave.h
+++ b/arch/x86/include/asm/fpu/xsave.h
@@ -52,7 +52,6 @@ extern u64 xfeatures_mask;
extern u64 xstate_fx_sw_bytes[USER_XSTATE_FX_SW_WORDS];
extern struct xsave_struct init_xstate_ctx;
-extern void xsave_init(void);
extern void update_regset_xstate_info(unsigned int size, u64 xstate_mask);
/* These macros all use (%edi)/(%rdi) as the single memory argument. */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index d6234adc8ba0..77599fe8af56 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -98,7 +98,7 @@ static void fpstate_xstate_init_size(void)
{
/*
* Note that xstate_size might be overwriten later during
- * xsave_init().
+ * fpu__init_system_xstate().
*/
if (!cpu_has_fpu) {
@@ -146,7 +146,8 @@ void fpu__init_cpu(void)
cr0 |= X86_CR0_EM;
write_cr0(cr0);
- xsave_init();
+ fpu__init_system_xstate();
+ fpu__init_cpu_xstate();
}
/*
@@ -159,7 +160,8 @@ void fpu__init_system(void)
fpu__init_cpu();
mxcsr_feature_mask_init();
- xsave_init();
+ fpu__init_system_xstate();
+ fpu__init_cpu_xstate();
eager_fpu_init();
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 9d5ff90916b1..fa9b954eb23a 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -646,8 +646,7 @@ static void __init init_xstate_size(void)
* Enable and initialize the xsave feature.
* Called once per system bootup.
*
- * ( Not marked __init because of false positive section warnings
- * generated by xsave_init(). )
+ * ( Not marked __init because of false positive section warnings. )
*/
void fpu__init_system_xstate(void)
{
@@ -714,16 +713,6 @@ void fpu__init_system_xstate(void)
}
/*
- * For the very first instance, this calls fpu__init_system_xstate();
- * for all subsequent instances, this calls fpu__init_cpu_xstate().
- */
-void xsave_init(void)
-{
- fpu__init_system_xstate();
- fpu__init_cpu_xstate();
-}
-
-/*
* setup_init_fpu_buf() is __init and it is OK to call it here because
* init_xstate_ctx will be unset only once during boot.
*/
--
2.1.0
Only call xstate system setup routines from fpu__init_system().
Likewise, don't call fpu__init_cpu_xstate() from fpu__init_system().
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 77599fe8af56..c1b2d1cfe745 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -146,7 +146,6 @@ void fpu__init_cpu(void)
cr0 |= X86_CR0_EM;
write_cr0(cr0);
- fpu__init_system_xstate();
fpu__init_cpu_xstate();
}
@@ -161,7 +160,6 @@ void fpu__init_system(void)
mxcsr_feature_mask_init();
fpu__init_system_xstate();
- fpu__init_cpu_xstate();
eager_fpu_init();
}
--
2.1.0
The legacy FPU init image is used on older CPUs who don't run xstate init.
But the init code is called within setup_init_fpu_buf(), an xstate method.
Move this legacy init out of the xstate code and put it into fpu/init.c.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 6 ++++++
arch/x86/kernel/fpu/xsave.c | 6 ------
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index c1b2d1cfe745..30d2d5d03cb0 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -158,6 +158,12 @@ void fpu__init_system(void)
/* The FPU has to be operational for some of the later FPU init activities: */
fpu__init_cpu();
+ /*
+ * Set up the legacy init FPU context. (xstate init might overwrite this
+ * with a more modern format, if the CPU supports it.)
+ */
+ fx_finit(&init_xstate_ctx.i387);
+
mxcsr_feature_mask_init();
fpu__init_system_xstate();
eager_fpu_init();
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index fa9b954eb23a..6be0a98238f6 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -576,12 +576,6 @@ static void setup_init_fpu_buf(void)
return;
on_boot_cpu = 0;
- /*
- * Setup init_xstate_ctx to represent the init state of
- * all the features managed by the xsave
- */
- fx_finit(&init_xstate_ctx.i387);
-
if (!cpu_has_xsave)
return;
--
2.1.0
It's a pure xstate method now, no need for this duplicate call.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 6be0a98238f6..097f03e209a6 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -724,8 +724,6 @@ void __init_refok eager_fpu_init(void)
stts();
return;
}
-
- setup_init_fpu_buf();
}
/*
--
2.1.0
The FPU context switch type (lazy or eager) setup code is split into
two places currently - move it all to eager_fpu_init().
Note that the code we move will now be executed on non-xstate CPUs
as well, but this should be safe: both xfeatures_mask and
cpu_has_xsaveopt is 0 there.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xsave.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 097f03e209a6..1b920a170576 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -686,20 +686,6 @@ void fpu__init_system_xstate(void)
prepare_fx_sw_frame();
setup_init_fpu_buf();
- /* Auto enable eagerfpu for xsaveopt */
- if (cpu_has_xsaveopt && eagerfpu != DISABLE)
- eagerfpu = ENABLE;
-
- if (xfeatures_mask & XSTATE_EAGER) {
- if (eagerfpu == DISABLE) {
- pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
- xfeatures_mask & XSTATE_EAGER);
- xfeatures_mask &= ~XSTATE_EAGER;
- } else {
- eagerfpu = ENABLE;
- }
- }
-
pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is 0x%x bytes, using '%s' format.\n",
xfeatures_mask,
xstate_size,
@@ -715,6 +701,20 @@ void __init_refok eager_fpu_init(void)
WARN_ON(current->thread.fpu.fpstate_active);
current_thread_info()->status = 0;
+ /* Auto enable eagerfpu for xsaveopt */
+ if (cpu_has_xsaveopt && eagerfpu != DISABLE)
+ eagerfpu = ENABLE;
+
+ if (xfeatures_mask & XSTATE_EAGER) {
+ if (eagerfpu == DISABLE) {
+ pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
+ xfeatures_mask & XSTATE_EAGER);
+ xfeatures_mask &= ~XSTATE_EAGER;
+ } else {
+ eagerfpu = ENABLE;
+ }
+ }
+
if (eagerfpu == ENABLE)
setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
--
2.1.0
Move eager_fpu_init() and the 'eagerfpu' boot parameter handling function
to the generic FPU init file: it's generic FPU functionality.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/xsave.c | 48 ------------------------------------------------
2 files changed, 48 insertions(+), 48 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 30d2d5d03cb0..fa9678f13630 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -149,6 +149,54 @@ void fpu__init_cpu(void)
fpu__init_cpu_xstate();
}
+static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
+
+static int __init eager_fpu_setup(char *s)
+{
+ if (!strcmp(s, "on"))
+ eagerfpu = ENABLE;
+ else if (!strcmp(s, "off"))
+ eagerfpu = DISABLE;
+ else if (!strcmp(s, "auto"))
+ eagerfpu = AUTO;
+ return 1;
+}
+__setup("eagerfpu=", eager_fpu_setup);
+
+/*
+ * setup_init_fpu_buf() is __init and it is OK to call it here because
+ * init_xstate_ctx will be unset only once during boot.
+ */
+void __init_refok eager_fpu_init(void)
+{
+ WARN_ON(current->thread.fpu.fpstate_active);
+ current_thread_info()->status = 0;
+
+ /* Auto enable eagerfpu for xsaveopt */
+ if (cpu_has_xsaveopt && eagerfpu != DISABLE)
+ eagerfpu = ENABLE;
+
+ if (xfeatures_mask & XSTATE_EAGER) {
+ if (eagerfpu == DISABLE) {
+ pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
+ xfeatures_mask & XSTATE_EAGER);
+ xfeatures_mask &= ~XSTATE_EAGER;
+ } else {
+ eagerfpu = ENABLE;
+ }
+ }
+
+ if (eagerfpu == ENABLE)
+ setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
+
+ printk_once(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
+
+ if (!cpu_has_eager_fpu) {
+ stts();
+ return;
+ }
+}
+
/*
* Called on the boot CPU once per system bootup, to set up the initial FPU state that
* is later cloned into all processes.
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 1b920a170576..a23236358fb0 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -599,20 +599,6 @@ static void setup_init_fpu_buf(void)
xsave_state_booting(&init_xstate_ctx);
}
-static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
-static int __init eager_fpu_setup(char *s)
-{
- if (!strcmp(s, "on"))
- eagerfpu = ENABLE;
- else if (!strcmp(s, "off"))
- eagerfpu = DISABLE;
- else if (!strcmp(s, "auto"))
- eagerfpu = AUTO;
- return 1;
-}
-__setup("eagerfpu=", eager_fpu_setup);
-
-
/*
* Calculate total size of enabled xstates in XCR0/xfeatures_mask.
*/
@@ -693,40 +679,6 @@ void fpu__init_system_xstate(void)
}
/*
- * setup_init_fpu_buf() is __init and it is OK to call it here because
- * init_xstate_ctx will be unset only once during boot.
- */
-void __init_refok eager_fpu_init(void)
-{
- WARN_ON(current->thread.fpu.fpstate_active);
- current_thread_info()->status = 0;
-
- /* Auto enable eagerfpu for xsaveopt */
- if (cpu_has_xsaveopt && eagerfpu != DISABLE)
- eagerfpu = ENABLE;
-
- if (xfeatures_mask & XSTATE_EAGER) {
- if (eagerfpu == DISABLE) {
- pr_err("x86/fpu: eagerfpu switching disabled, disabling the following xstate features: 0x%llx.\n",
- xfeatures_mask & XSTATE_EAGER);
- xfeatures_mask &= ~XSTATE_EAGER;
- } else {
- eagerfpu = ENABLE;
- }
- }
-
- if (eagerfpu == ENABLE)
- setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
-
- printk_once(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
-
- if (!cpu_has_eager_fpu) {
- stts();
- return;
- }
-}
-
-/*
* Restore minimal FPU state after suspend:
*/
void fpu__resume_cpu(void)
--
2.1.0
It's not an xsave specific function anymore, so rename it accordingly
and also clean it up a bit:
- remove the obsolete __init_refok, as the code paths are not
mixed anymore
- rename it from eager_fpu_init() to fpu__ctx_switch_init()
- remove stray 'return;'
- make it static to its only user
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index fa9678f13630..d6d582080c3b 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -167,7 +167,7 @@ __setup("eagerfpu=", eager_fpu_setup);
* setup_init_fpu_buf() is __init and it is OK to call it here because
* init_xstate_ctx will be unset only once during boot.
*/
-void __init_refok eager_fpu_init(void)
+static void fpu__ctx_switch_init(void)
{
WARN_ON(current->thread.fpu.fpstate_active);
current_thread_info()->status = 0;
@@ -191,10 +191,8 @@ void __init_refok eager_fpu_init(void)
printk_once(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
- if (!cpu_has_eager_fpu) {
+ if (!cpu_has_eager_fpu)
stts();
- return;
- }
}
/*
@@ -214,7 +212,7 @@ void fpu__init_system(void)
mxcsr_feature_mask_init();
fpu__init_system_xstate();
- eager_fpu_init();
+ fpu__ctx_switch_init();
}
void fpu__cpu_init(void)
--
2.1.0
So fpu__ctx_switch_init() has two aspects: a once per bootup functionality
that sets up a capability flag, and a per CPU functionality that sets CR0::TS.
Split the function.
Note that at this stage we still have duplicate calls into these methods, as
both the _system() and the _cpu() methods are run on all CPUs, with lower
level on_boot_cpu flags filtering out the duplicates where needed. So add
TS flag clearing as well, to handle the aftermath of early CPU init sequences
that might call in without having eager-fpu set - don't assume the TS flag
is cleared.
Calling each from its respective init level will happen later on.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index d6d582080c3b..2752b4bae854 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -118,6 +118,18 @@ static void fpstate_xstate_init_size(void)
}
/*
+ * Initialize the TS bit in CR0 according to the style of context-switches
+ * we are using:
+ */
+static void fpu__init_cpu_ctx_switch(void)
+{
+ if (!cpu_has_eager_fpu)
+ stts();
+ else
+ clts();
+}
+
+/*
* Enable all supported FPU features. Called when a CPU is brought online.
*/
void fpu__init_cpu(void)
@@ -167,7 +179,7 @@ __setup("eagerfpu=", eager_fpu_setup);
* setup_init_fpu_buf() is __init and it is OK to call it here because
* init_xstate_ctx will be unset only once during boot.
*/
-static void fpu__ctx_switch_init(void)
+static void fpu__init_system_ctx_switch(void)
{
WARN_ON(current->thread.fpu.fpstate_active);
current_thread_info()->status = 0;
@@ -190,9 +202,6 @@ static void fpu__ctx_switch_init(void)
setup_force_cpu_cap(X86_FEATURE_EAGER_FPU);
printk_once(KERN_INFO "x86/fpu: Using '%s' FPU context switches.\n", eagerfpu == ENABLE ? "eager" : "lazy");
-
- if (!cpu_has_eager_fpu)
- stts();
}
/*
@@ -212,7 +221,8 @@ void fpu__init_system(void)
mxcsr_feature_mask_init();
fpu__init_system_xstate();
- fpu__ctx_switch_init();
+ fpu__init_system_ctx_switch();
+ fpu__init_cpu_ctx_switch();
}
void fpu__cpu_init(void)
--
2.1.0
mxcsr_feature_mask_init() depends on TS being cleared, as it executes
an FXSAVE instruction.
After later changes we will move the TS setup into fpu__init_cpu(),
which will interact with this - so clear the TS flag explicitly.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 2752b4bae854..567e7e6cdc6b 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -214,6 +214,13 @@ void fpu__init_system(void)
fpu__init_cpu();
/*
+ * But don't leave CR0::TS set yet, as some of the FPU setup methods depend
+ * on being able to execute FPU instructions that will fault on a set TS,
+ * such as the FXSAVE in mxcsr_feature_mask_init().
+ */
+ clts();
+
+ /*
* Set up the legacy init FPU context. (xstate init might overwrite this
* with a more modern format, if the CPU supports it.)
*/
--
2.1.0
The fpstate_xstate_init_size() function sets up a basic xstate_size, called
during fpu__detect() currently.
Its real dependency is to be called before fpu__init_system_xstate().
So move the function call site into fpu__init_system(), to right before the
fpu__init_system_xstate() call.
Also add a once-per-boot flag to fpstate_xstate_init_size(), we'll remove
this quirk later once we've cleaned up the init dependencies.
This moves the two related functions closer to each other and makes them
both part of the _init_system() functionality.
Currently we do the fpstate_xstate_init_size()
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 567e7e6cdc6b..ca3468d8bc31 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -96,6 +96,12 @@ static void mxcsr_feature_mask_init(void)
static void fpstate_xstate_init_size(void)
{
+ static bool on_boot_cpu = 1;
+
+ if (!on_boot_cpu)
+ return;
+ on_boot_cpu = 0;
+
/*
* Note that xstate_size might be overwriten later during
* fpu__init_system_xstate().
@@ -227,7 +233,10 @@ void fpu__init_system(void)
fx_finit(&init_xstate_ctx.i387);
mxcsr_feature_mask_init();
+
+ fpstate_xstate_init_size();
fpu__init_system_xstate();
+
fpu__init_system_ctx_switch();
fpu__init_cpu_ctx_switch();
}
@@ -270,6 +279,4 @@ void fpu__detect(struct cpuinfo_x86 *c)
clear_cpu_cap(c, X86_FEATURE_FPU);
/* The final cr0 value is set later, in fpu_init() */
-
- fpstate_xstate_init_size();
}
--
2.1.0
fpu__init_cpu() is currently called from fpu__init_system(),
which is the wrong place for it: call it from the proper high level
per CPU init function, fpu__init_cpu().
Note, we still keep the old call site as well, because it depends
on having proper CR0::TS setup. We'll fix this in the next patch.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index ca3468d8bc31..b3ea4f86d643 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -165,6 +165,7 @@ void fpu__init_cpu(void)
write_cr0(cr0);
fpu__init_cpu_xstate();
+ fpu__init_cpu_ctx_switch();
}
static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
--
2.1.0
fpu__cpu_init() is called on every CPU, so it is the wrong place
to call fpu__init_system() from. Call it from fpu__detect():
this is early CPU init code, but we already have CPU features detected,
so we can call the system-wide FPU init code from here.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index b3ea4f86d643..6e422cf1e197 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -245,7 +245,6 @@ void fpu__init_system(void)
void fpu__cpu_init(void)
{
fpu__init_cpu();
- fpu__init_system();
}
static int __init no_387(char *s)
@@ -279,5 +278,6 @@ void fpu__detect(struct cpuinfo_x86 *c)
else
clear_cpu_cap(c, X86_FEATURE_FPU);
+ fpu__init_system();
/* The final cr0 value is set later, in fpu_init() */
}
--
2.1.0
We are now doing the fpu__init_cpu_ctx_switch() call from fpu__init_cpu(),
so there's no need to call it from fpu__init_system() anymore.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 6e422cf1e197..0c9c1069fba8 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -239,7 +239,6 @@ void fpu__init_system(void)
fpu__init_system_xstate();
fpu__init_system_ctx_switch();
- fpu__init_cpu_ctx_switch();
}
void fpu__cpu_init(void)
--
2.1.0
After the latest round of cleanups, fpu__cpu_init() has become
a simple call to fpu__init_cpu().
Rename fpu__init_cpu() to fpu__cpu_init() and remove the
extra layer.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/kernel/cpu/common.c | 4 ++--
arch/x86/kernel/fpu/init.c | 5 -----
arch/x86/xen/enlighten.c | 2 +-
4 files changed, 4 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 71d44be5acb1..4617eeb57004 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -36,7 +36,7 @@ int ia32_setup_frame(int sig, struct ksignal *ksig,
#define MXCSR_DEFAULT 0x1f80
extern unsigned int mxcsr_feature_mask;
-extern void fpu__cpu_init(void);
+extern void fpu__init_cpu(void);
extern void eager_fpu_init(void);
extern void fpu__init_system_xstate(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f6a4ea39657..d28f8ebc506d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1435,7 +1435,7 @@ void cpu_init(void)
clear_all_debug_regs();
dbg_restore_debug_regs();
- fpu__cpu_init();
+ fpu__init_cpu();
if (is_uv_system())
uv_cpu_init();
@@ -1491,7 +1491,7 @@ void cpu_init(void)
clear_all_debug_regs();
dbg_restore_debug_regs();
- fpu__cpu_init();
+ fpu__init_cpu();
}
#endif
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 0c9c1069fba8..cf27bbed1ba1 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -241,11 +241,6 @@ void fpu__init_system(void)
fpu__init_system_ctx_switch();
}
-void fpu__cpu_init(void)
-{
- fpu__init_cpu();
-}
-
static int __init no_387(char *s)
{
setup_clear_cpu_cap(X86_FEATURE_FPU);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 64715168b2b6..de3a669190d1 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1423,7 +1423,7 @@ static void xen_pvh_set_cr_flags(int cpu)
return;
/*
* For BSP, PSE PGE are set in probe_page_size_mask(), for APs
- * set them here. For all, OSFXSR OSXMMEXCPT are set in fpu__cpu_init().
+ * set them here. For all, OSFXSR OSXMMEXCPT are set in fpu__init_cpu().
*/
if (cpu_has_pse)
cr4_set_bits_and_update_boot(X86_CR4_PSE);
--
2.1.0
Factor out the generic bits from fpu__init_cpu(), to create
a flat sequence of per CPU initialization function calls:
fpu__init_cpu_generic();
fpu__init_cpu_xstate();
fpu__init_cpu_ctx_switch();
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index cf27bbed1ba1..37e8b139dc31 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -136,9 +136,9 @@ static void fpu__init_cpu_ctx_switch(void)
}
/*
- * Enable all supported FPU features. Called when a CPU is brought online.
+ * Initialize the registers found in all CPUs, CR0 and CR4:
*/
-void fpu__init_cpu(void)
+static void fpu__init_cpu_generic(void)
{
unsigned long cr0;
unsigned long cr4_mask = 0;
@@ -163,7 +163,14 @@ void fpu__init_cpu(void)
if (!cpu_has_fpu)
cr0 |= X86_CR0_EM;
write_cr0(cr0);
+}
+/*
+ * Enable all supported FPU features. Called when a CPU is brought online.
+ */
+void fpu__init_cpu(void)
+{
+ fpu__init_cpu_generic();
fpu__init_cpu_xstate();
fpu__init_cpu_ctx_switch();
}
--
2.1.0
Factor out the generic bits from fpu__init_system().
Rename mxcsr_feature_mask_init() to fpu__init_system_mxcsr()
to bring it in line with the rest of the nomenclature.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 37e8b139dc31..c3f3a89cbbf6 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -72,7 +72,7 @@ unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
-static void mxcsr_feature_mask_init(void)
+static void fpu__init_system_mxcsr(void)
{
unsigned int mask = 0;
@@ -94,6 +94,20 @@ static void mxcsr_feature_mask_init(void)
mxcsr_feature_mask &= mask;
}
+/*
+ * Once per bootup FPU initialization sequences that will run on most x86 CPUs:
+ */
+static void fpu__init_system_generic(void)
+{
+ /*
+ * Set up the legacy init FPU context. (xstate init might overwrite this
+ * with a more modern format, if the CPU supports it.)
+ */
+ fx_finit(&init_xstate_ctx.i387);
+
+ fpu__init_system_mxcsr();
+}
+
static void fpstate_xstate_init_size(void)
{
static bool on_boot_cpu = 1;
@@ -230,18 +244,11 @@ void fpu__init_system(void)
/*
* But don't leave CR0::TS set yet, as some of the FPU setup methods depend
* on being able to execute FPU instructions that will fault on a set TS,
- * such as the FXSAVE in mxcsr_feature_mask_init().
+ * such as the FXSAVE in fpu__init_system_mxcsr().
*/
clts();
- /*
- * Set up the legacy init FPU context. (xstate init might overwrite this
- * with a more modern format, if the CPU supports it.)
- */
- fx_finit(&init_xstate_ctx.i387);
-
- mxcsr_feature_mask_init();
-
+ fpu__init_system_generic();
fpstate_xstate_init_size();
fpu__init_system_xstate();
--
2.1.0
Move the generic bits of fpu__detect() into fpu__init_system_early_generic().
We'll move some other code here too in a followup patch.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 41 ++++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index c3f3a89cbbf6..3637c509956d 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -65,6 +65,29 @@ void fpu__init_check_bugs(void)
}
/*
+ * The earliest FPU detection code:
+ */
+static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
+{
+ unsigned long cr0;
+ u16 fsw, fcw;
+
+ fsw = fcw = 0xffff;
+
+ cr0 = read_cr0();
+ cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
+ write_cr0(cr0);
+
+ asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
+ : "+m" (fsw), "+m" (fcw));
+
+ if (fsw == 0 && (fcw & 0x103f) == 0x003f)
+ set_cpu_cap(c, X86_FEATURE_FPU);
+ else
+ clear_cpu_cap(c, X86_FEATURE_FPU);
+}
+
+/*
* Boot time FPU feature detection code:
*/
unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
@@ -269,23 +292,7 @@ __setup("no387", no_387);
*/
void fpu__detect(struct cpuinfo_x86 *c)
{
- unsigned long cr0;
- u16 fsw, fcw;
-
- fsw = fcw = 0xffff;
-
- cr0 = read_cr0();
- cr0 &= ~(X86_CR0_TS | X86_CR0_EM);
- write_cr0(cr0);
-
- asm volatile("fninit ; fnstsw %0 ; fnstcw %1"
- : "+m" (fsw), "+m" (fcw));
-
- if (fsw == 0 && (fcw & 0x103f) == 0x003f)
- set_cpu_cap(c, X86_FEATURE_FPU);
- else
- clear_cpu_cap(c, X86_FEATURE_FPU);
-
+ fpu__init_system_early_generic(c);
fpu__init_system();
/* The final cr0 value is set later, in fpu_init() */
}
--
2.1.0
There's a !FPU related sanity check in fpu__init_cpu_generic(),
which is executed on every CPU onlining - even though we should do
this only once, and during system init.
Move this check to fpu__init_system_early_generic().
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 3637c509956d..69cdadd49ddf 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -85,6 +85,15 @@ static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
set_cpu_cap(c, X86_FEATURE_FPU);
else
clear_cpu_cap(c, X86_FEATURE_FPU);
+
+#ifndef CONFIG_MATH_EMULATION
+ if (!cpu_has_fpu) {
+ pr_emerg("No FPU found and no math emulation present\n");
+ pr_emerg("Giving up\n");
+ for (;;)
+ asm volatile("hlt");
+ }
+#endif
}
/*
@@ -180,14 +189,6 @@ static void fpu__init_cpu_generic(void)
unsigned long cr0;
unsigned long cr4_mask = 0;
-#ifndef CONFIG_MATH_EMULATION
- if (!cpu_has_fpu) {
- pr_emerg("No FPU found and no math emulation present\n");
- pr_emerg("Giving up\n");
- for (;;)
- asm volatile("hlt");
- }
-#endif
if (cpu_has_fxsr)
cr4_mask |= X86_CR4_OSFXSR;
if (cpu_has_xmm)
--
2.1.0
Create separate fpu/bugs.c code so that if we read generic FPU code
we don't have to wade through all the bugcheck related code first.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/Makefile | 2 +-
arch/x86/kernel/fpu/bugs.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
arch/x86/kernel/fpu/init.c | 60 ------------------------------------------------------------
3 files changed, 65 insertions(+), 61 deletions(-)
diff --git a/arch/x86/kernel/fpu/Makefile b/arch/x86/kernel/fpu/Makefile
index 50464a716b87..2020a2b7a597 100644
--- a/arch/x86/kernel/fpu/Makefile
+++ b/arch/x86/kernel/fpu/Makefile
@@ -2,4 +2,4 @@
# Build rules for the FPU support code:
#
-obj-y += init.o core.o xsave.o
+obj-y += init.o bugs.o core.o xsave.o
diff --git a/arch/x86/kernel/fpu/bugs.c b/arch/x86/kernel/fpu/bugs.c
new file mode 100644
index 000000000000..400a3d713fb2
--- /dev/null
+++ b/arch/x86/kernel/fpu/bugs.c
@@ -0,0 +1,64 @@
+/*
+ * x86 FPU bug checks:
+ */
+#include <asm/fpu/internal.h>
+
+/*
+ * Boot time CPU/FPU FDIV bug detection code:
+ */
+
+static double __initdata x = 4195835.0;
+static double __initdata y = 3145727.0;
+
+/*
+ * This used to check for exceptions..
+ * However, it turns out that to support that,
+ * the XMM trap handlers basically had to
+ * be buggy. So let's have a correct XMM trap
+ * handler, and forget about printing out
+ * some status at boot.
+ *
+ * We should really only care about bugs here
+ * anyway. Not features.
+ */
+static void __init check_fpu(void)
+{
+ s32 fdiv_bug;
+
+ kernel_fpu_begin();
+
+ /*
+ * trap_init() enabled FXSR and company _before_ testing for FP
+ * problems here.
+ *
+ * Test for the divl bug: http://en.wikipedia.org/wiki/Fdiv_bug
+ */
+ __asm__("fninit\n\t"
+ "fldl %1\n\t"
+ "fdivl %2\n\t"
+ "fmull %2\n\t"
+ "fldl %1\n\t"
+ "fsubp %%st,%%st(1)\n\t"
+ "fistpl %0\n\t"
+ "fwait\n\t"
+ "fninit"
+ : "=m" (*&fdiv_bug)
+ : "m" (*&x), "m" (*&y));
+
+ kernel_fpu_end();
+
+ if (fdiv_bug) {
+ set_cpu_bug(&boot_cpu_data, X86_BUG_FDIV);
+ pr_warn("Hmm, FPU with FDIV bug\n");
+ }
+}
+
+void fpu__init_check_bugs(void)
+{
+ /*
+ * kernel_fpu_begin/end() in check_fpu() relies on the patched
+ * alternative instructions.
+ */
+ if (cpu_has_fpu)
+ check_fpu();
+}
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 69cdadd49ddf..63cd1703d25c 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,66 +5,6 @@
#include <asm/tlbflush.h>
/*
- * Boot time CPU/FPU FDIV bug detection code:
- */
-
-static double __initdata x = 4195835.0;
-static double __initdata y = 3145727.0;
-
-/*
- * This used to check for exceptions..
- * However, it turns out that to support that,
- * the XMM trap handlers basically had to
- * be buggy. So let's have a correct XMM trap
- * handler, and forget about printing out
- * some status at boot.
- *
- * We should really only care about bugs here
- * anyway. Not features.
- */
-static void __init check_fpu(void)
-{
- s32 fdiv_bug;
-
- kernel_fpu_begin();
-
- /*
- * trap_init() enabled FXSR and company _before_ testing for FP
- * problems here.
- *
- * Test for the divl bug: http://en.wikipedia.org/wiki/Fdiv_bug
- */
- __asm__("fninit\n\t"
- "fldl %1\n\t"
- "fdivl %2\n\t"
- "fmull %2\n\t"
- "fldl %1\n\t"
- "fsubp %%st,%%st(1)\n\t"
- "fistpl %0\n\t"
- "fwait\n\t"
- "fninit"
- : "=m" (*&fdiv_bug)
- : "m" (*&x), "m" (*&y));
-
- kernel_fpu_end();
-
- if (fdiv_bug) {
- set_cpu_bug(&boot_cpu_data, X86_BUG_FDIV);
- pr_warn("Hmm, FPU with FDIV bug\n");
- }
-}
-
-void fpu__init_check_bugs(void)
-{
- /*
- * kernel_fpu_begin/end() in check_fpu() relies on the patched
- * alternative instructions.
- */
- if (cpu_has_fpu)
- check_fpu();
-}
-
-/*
* The earliest FPU detection code:
*/
static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
--
2.1.0
check_fpu() currently relies on being called early in the init sequence,
when CR0::TS has not been set up yet.
Save/restore CR0::TS across this function, to make it invariant to
init ordering. This way we'll be able to move the generic FPU setup
routines earlier in the init sequence.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/bugs.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/fpu/bugs.c b/arch/x86/kernel/fpu/bugs.c
index 400a3d713fb2..449b5f3f4925 100644
--- a/arch/x86/kernel/fpu/bugs.c
+++ b/arch/x86/kernel/fpu/bugs.c
@@ -23,8 +23,13 @@ static double __initdata y = 3145727.0;
*/
static void __init check_fpu(void)
{
+ u32 cr0_saved;
s32 fdiv_bug;
+ /* We might have CR0::TS set already, clear it: */
+ cr0_saved = read_cr0();
+ write_cr0(cr0_saved & ~X86_CR0_TS);
+
kernel_fpu_begin();
/*
@@ -47,6 +52,8 @@ static void __init check_fpu(void)
kernel_fpu_end();
+ write_cr0(cr0_saved);
+
if (fdiv_bug) {
set_cpu_bug(&boot_cpu_data, X86_BUG_FDIV);
pr_warn("Hmm, FPU with FDIV bug\n");
--
2.1.0
Move the fpu__init_system_early_generic() call into fpu__init_system(),
which hosts all the system init calls.
Expose fpu__init_system() to other modules - this will be our main and only
system init function.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 1 +
arch/x86/kernel/fpu/init.c | 16 ++++++++--------
2 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4617eeb57004..5a1fa5bc2c27 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -41,6 +41,7 @@ extern void eager_fpu_init(void);
extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
+extern void fpu__init_system(struct cpuinfo_x86 *c);
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 63cd1703d25c..1155a98d8c1e 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -5,7 +5,10 @@
#include <asm/tlbflush.h>
/*
- * The earliest FPU detection code:
+ * The earliest FPU detection code.
+ *
+ * Set the X86_FEATURE_FPU CPU-capability bit based on
+ * trying to execute an actual sequence of FPU instructions:
*/
static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
{
@@ -200,8 +203,10 @@ static void fpu__init_system_ctx_switch(void)
* Called on the boot CPU once per system bootup, to set up the initial FPU state that
* is later cloned into all processes.
*/
-void fpu__init_system(void)
+void fpu__init_system(struct cpuinfo_x86 *c)
{
+ fpu__init_system_early_generic(c);
+
/* The FPU has to be operational for some of the later FPU init activities: */
fpu__init_cpu();
@@ -227,13 +232,8 @@ static int __init no_387(char *s)
__setup("no387", no_387);
-/*
- * Set the X86_FEATURE_FPU CPU-capability bit based on
- * trying to execute an actual sequence of FPU instructions:
- */
void fpu__detect(struct cpuinfo_x86 *c)
{
- fpu__init_system_early_generic(c);
- fpu__init_system();
+ fpu__init_system(c);
/* The final cr0 value is set later, in fpu_init() */
}
--
2.1.0
Now that fpu__detect() has become an empty layer around
fpu__init_system(), eliminate it and make fpu__init_system()
the main system initialization routine.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/processor.h | 1 -
arch/x86/kernel/cpu/common.c | 2 +-
arch/x86/kernel/fpu/init.c | 6 ------
3 files changed, 1 insertion(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 0f4add462697..b9e487499ae2 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -167,7 +167,6 @@ extern const struct seq_operations cpuinfo_op;
#define cache_line_size() (boot_cpu_data.x86_cache_alignment)
extern void cpu_detect(struct cpuinfo_x86 *c);
-extern void fpu__detect(struct cpuinfo_x86 *c);
extern void early_cpu_init(void);
extern void identify_boot_cpu(void);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d28f8ebc506d..d15610b0a4cf 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -758,7 +758,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
cpu_detect(c);
get_cpu_vendor(c);
get_cpu_cap(c);
- fpu__detect(c);
+ fpu__init_system(c);
if (this_cpu->c_early_init)
this_cpu->c_early_init(c);
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 1155a98d8c1e..77b5d403de22 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -231,9 +231,3 @@ static int __init no_387(char *s)
}
__setup("no387", no_387);
-
-void fpu__detect(struct cpuinfo_x86 *c)
-{
- fpu__init_system(c);
- /* The final cr0 value is set later, in fpu_init() */
-}
--
2.1.0
To bring it in line with the other init_system*() methods.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 77b5d403de22..a7ce5bcbcbab 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -83,7 +83,7 @@ static void fpu__init_system_generic(void)
fpu__init_system_mxcsr();
}
-static void fpstate_xstate_init_size(void)
+static void fpu__init_system_xstate_size_legacy(void)
{
static bool on_boot_cpu = 1;
@@ -218,7 +218,7 @@ void fpu__init_system(struct cpuinfo_x86 *c)
clts();
fpu__init_system_generic();
- fpstate_xstate_init_size();
+ fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate();
fpu__init_system_ctx_switch();
--
2.1.0
Reorder init methods in order of their relationship and usage, to
form coherent blocks throughout the whole file.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 96 ++++++++++++++++++++++++++++++++++--------------------------------
1 file changed, 49 insertions(+), 47 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index a7ce5bcbcbab..dbff1335229c 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -4,6 +4,46 @@
#include <asm/fpu/internal.h>
#include <asm/tlbflush.h>
+static void fpu__init_cpu_ctx_switch(void)
+{
+ if (!cpu_has_eager_fpu)
+ stts();
+ else
+ clts();
+}
+
+/*
+ * Initialize the registers found in all CPUs, CR0 and CR4:
+ */
+static void fpu__init_cpu_generic(void)
+{
+ unsigned long cr0;
+ unsigned long cr4_mask = 0;
+
+ if (cpu_has_fxsr)
+ cr4_mask |= X86_CR4_OSFXSR;
+ if (cpu_has_xmm)
+ cr4_mask |= X86_CR4_OSXMMEXCPT;
+ if (cr4_mask)
+ cr4_set_bits(cr4_mask);
+
+ cr0 = read_cr0();
+ cr0 &= ~(X86_CR0_TS|X86_CR0_EM); /* clear TS and EM */
+ if (!cpu_has_fpu)
+ cr0 |= X86_CR0_EM;
+ write_cr0(cr0);
+}
+
+/*
+ * Enable all supported FPU features. Called when a CPU is brought online.
+ */
+void fpu__init_cpu(void)
+{
+ fpu__init_cpu_generic();
+ fpu__init_cpu_xstate();
+ fpu__init_cpu_ctx_switch();
+}
+
/*
* The earliest FPU detection code.
*
@@ -44,9 +84,6 @@ static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
*/
unsigned int mxcsr_feature_mask __read_mostly = 0xffffffffu;
-unsigned int xstate_size;
-EXPORT_SYMBOL_GPL(xstate_size);
-
static void fpu__init_system_mxcsr(void)
{
unsigned int mask = 0;
@@ -83,6 +120,15 @@ static void fpu__init_system_generic(void)
fpu__init_system_mxcsr();
}
+unsigned int xstate_size;
+EXPORT_SYMBOL_GPL(xstate_size);
+
+/*
+ * Set up the xstate_size based on the legacy FPU context size.
+ *
+ * We set this up first, and later it will be overwritten by
+ * fpu__init_system_xstate() if the CPU knows about xstates.
+ */
static void fpu__init_system_xstate_size_legacy(void)
{
static bool on_boot_cpu = 1;
@@ -112,50 +158,6 @@ static void fpu__init_system_xstate_size_legacy(void)
}
}
-/*
- * Initialize the TS bit in CR0 according to the style of context-switches
- * we are using:
- */
-static void fpu__init_cpu_ctx_switch(void)
-{
- if (!cpu_has_eager_fpu)
- stts();
- else
- clts();
-}
-
-/*
- * Initialize the registers found in all CPUs, CR0 and CR4:
- */
-static void fpu__init_cpu_generic(void)
-{
- unsigned long cr0;
- unsigned long cr4_mask = 0;
-
- if (cpu_has_fxsr)
- cr4_mask |= X86_CR4_OSFXSR;
- if (cpu_has_xmm)
- cr4_mask |= X86_CR4_OSXMMEXCPT;
- if (cr4_mask)
- cr4_set_bits(cr4_mask);
-
- cr0 = read_cr0();
- cr0 &= ~(X86_CR0_TS|X86_CR0_EM); /* clear TS and EM */
- if (!cpu_has_fpu)
- cr0 |= X86_CR0_EM;
- write_cr0(cr0);
-}
-
-/*
- * Enable all supported FPU features. Called when a CPU is brought online.
- */
-void fpu__init_cpu(void)
-{
- fpu__init_cpu_generic();
- fpu__init_cpu_xstate();
- fpu__init_cpu_ctx_switch();
-}
-
static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
static int __init eager_fpu_setup(char *s)
--
2.1.0
Extend the comments of the FPU init code, and fix old ones.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/init.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 58 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index dbff1335229c..7ae5a62918c7 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -1,9 +1,13 @@
/*
- * x86 FPU boot time init code
+ * x86 FPU boot time init code:
*/
#include <asm/fpu/internal.h>
#include <asm/tlbflush.h>
+/*
+ * Initialize the TS bit in CR0 according to the style of context-switches
+ * we are using:
+ */
static void fpu__init_cpu_ctx_switch(void)
{
if (!cpu_has_eager_fpu)
@@ -35,7 +39,7 @@ static void fpu__init_cpu_generic(void)
}
/*
- * Enable all supported FPU features. Called when a CPU is brought online.
+ * Enable all supported FPU features. Called when a CPU is brought online:
*/
void fpu__init_cpu(void)
{
@@ -71,8 +75,7 @@ static void fpu__init_system_early_generic(struct cpuinfo_x86 *c)
#ifndef CONFIG_MATH_EMULATION
if (!cpu_has_fpu) {
- pr_emerg("No FPU found and no math emulation present\n");
- pr_emerg("Giving up\n");
+ pr_emerg("x86/fpu: Giving up, no FPU found and no math emulation present\n");
for (;;)
asm volatile("hlt");
}
@@ -120,6 +123,12 @@ static void fpu__init_system_generic(void)
fpu__init_system_mxcsr();
}
+/*
+ * Size of the FPU context state. All tasks in the system use the
+ * same context size, regardless of what portion they use.
+ * This is inherent to the XSAVE architecture which puts all state
+ * components into a single, continuous memory block:
+ */
unsigned int xstate_size;
EXPORT_SYMBOL_GPL(xstate_size);
@@ -158,6 +167,37 @@ static void fpu__init_system_xstate_size_legacy(void)
}
}
+/*
+ * FPU context switching strategies:
+ *
+ * Against popular belief, we don't do lazy FPU saves, due to the
+ * task migration complications it brings on SMP - we only do
+ * lazy FPU restores.
+ *
+ * 'lazy' is the traditional strategy, which is based on setting
+ * CR0::TS to 1 during context-switch (instead of doing a full
+ * restore of the FPU state), which causes the first FPU instruction
+ * after the context switch (whenever it is executed) to fault - at
+ * which point we lazily restore the FPU state into FPU registers.
+ *
+ * Tasks are of course under no obligation to execute FPU instructions,
+ * so it can easily happen that another context-switch occurs without
+ * a single FPU instruction being executed. If we eventually switch
+ * back to the original task (that still owns the FPU) then we have
+ * not only saved the restores along the way, but we also have the
+ * FPU ready to be used for the original task.
+ *
+ * 'eager' switching is used on modern CPUs, there we switch the FPU
+ * state during every context switch, regardless of whether the task
+ * has used FPU instructions in that time slice or not. This is done
+ * because modern FPU context saving instructions are able to optimize
+ * state saving and restoration in hardware: they can detect both
+ * unused and untouched FPU state and optimize accordingly.
+ *
+ * [ Note that even in 'lazy' mode we might optimize context switches
+ * to use 'eager' restores, if we detect that a task is using the FPU
+ * frequently. See the fpu->counter logic in fpu/internal.h for that. ]
+ */
static enum { AUTO, ENABLE, DISABLE } eagerfpu = AUTO;
static int __init eager_fpu_setup(char *s)
@@ -173,8 +213,7 @@ static int __init eager_fpu_setup(char *s)
__setup("eagerfpu=", eager_fpu_setup);
/*
- * setup_init_fpu_buf() is __init and it is OK to call it here because
- * init_xstate_ctx will be unset only once during boot.
+ * Pick the FPU context switching strategy:
*/
static void fpu__init_system_ctx_switch(void)
{
@@ -202,20 +241,24 @@ static void fpu__init_system_ctx_switch(void)
}
/*
- * Called on the boot CPU once per system bootup, to set up the initial FPU state that
- * is later cloned into all processes.
+ * Called on the boot CPU once per system bootup, to set up the initial
+ * FPU state that is later cloned into all processes:
*/
void fpu__init_system(struct cpuinfo_x86 *c)
{
fpu__init_system_early_generic(c);
- /* The FPU has to be operational for some of the later FPU init activities: */
+ /*
+ * The FPU has to be operational for some of the
+ * later FPU init activities:
+ */
fpu__init_cpu();
/*
- * But don't leave CR0::TS set yet, as some of the FPU setup methods depend
- * on being able to execute FPU instructions that will fault on a set TS,
- * such as the FXSAVE in fpu__init_system_mxcsr().
+ * But don't leave CR0::TS set yet, as some of the FPU setup
+ * methods depend on being able to execute FPU instructions
+ * that will fault on a set TS, such as the FXSAVE in
+ * fpu__init_system_mxcsr().
*/
clts();
@@ -226,6 +269,9 @@ void fpu__init_system(struct cpuinfo_x86 *c)
fpu__init_system_ctx_switch();
}
+/*
+ * Boot parameter to turn off FPU support and fall back to math-emu:
+ */
static int __init no_387(char *s)
{
setup_clear_cpu_cap(X86_FEATURE_FPU);
--
2.1.0
It's an internal method, not a driver API, so move it from fpu/api.h
to fpu/internal.h.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 2 --
arch/x86/include/asm/fpu/internal.h | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 4ca745c0d92e..eeac3766d8e5 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -99,6 +99,4 @@ static inline int user_has_fpu(void)
return current->thread.fpu.fpregs_active;
}
-extern void fpu__save(struct fpu *fpu);
-
#endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 5a1fa5bc2c27..0c8c812d23b4 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -290,6 +290,8 @@ static inline int fpu_save_init(struct fpu *fpu)
return 1;
}
+extern void fpu__save(struct fpu *fpu);
+
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
--
2.1.0
Both inline functions call an inline function unconditionally, so we
already pay the function call based clobbering cost. Uninline them.
This saves quite a bit of code in various performance sensitive
code paths:
text data bss dec hex filename
13321334 2569888 1634304 17525526 10b6b16 vmlinux.before
13320246 2569888 1634304 17524438 10b66d6 vmlinux.after
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 15 ++-------------
arch/x86/kernel/fpu/core.c | 15 +++++++++++++++
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index eeac3766d8e5..d4ab9e3af234 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -39,19 +39,8 @@ extern bool irq_fpu_usable(void);
*/
extern void __kernel_fpu_begin(void);
extern void __kernel_fpu_end(void);
-
-static inline void kernel_fpu_begin(void)
-{
- preempt_disable();
- WARN_ON_ONCE(!irq_fpu_usable());
- __kernel_fpu_begin();
-}
-
-static inline void kernel_fpu_end(void)
-{
- __kernel_fpu_end();
- preempt_enable();
-}
+extern void kernel_fpu_begin(void);
+extern void kernel_fpu_end(void);
/*
* Some instructions like VIA's padlock instructions generate a spurious
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2c47bcf63e1e..15fd714b6a83 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -125,6 +125,21 @@ void __kernel_fpu_end(void)
}
EXPORT_SYMBOL(__kernel_fpu_end);
+void kernel_fpu_begin(void)
+{
+ preempt_disable();
+ WARN_ON_ONCE(!irq_fpu_usable());
+ __kernel_fpu_begin();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_begin);
+
+void kernel_fpu_end(void)
+{
+ __kernel_fpu_end();
+ preempt_enable();
+}
+EXPORT_SYMBOL_GPL(kernel_fpu_end);
+
static void __save_fpu(struct fpu *fpu)
{
if (use_xsave()) {
--
2.1.0
There are a number of FPU internal function prototypes and an inline function
in fpu/api.h, mostly placed so historically as the code grew over the years.
Move them over into fpu/internal.h where they belong. (Add sched.h include
to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)
fpu/api.h is now a pure file that only contains FPU APIs intended for driver
use.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 31 +------------------------------
arch/x86/include/asm/fpu/internal.h | 25 +++++++++++++++++++++++++
arch/x86/include/asm/stackprotector.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 2 +-
arch/x86/power/cpu.c | 1 +
drivers/lguest/x86/core.c | 2 +-
8 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index d4ab9e3af234..0c713455fc63 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -10,23 +10,8 @@
#ifndef _ASM_X86_FPU_API_H
#define _ASM_X86_FPU_API_H
-#include <linux/sched.h>
#include <linux/hardirq.h>
-struct pt_regs;
-struct user_i387_struct;
-
-extern int fpstate_alloc_init(struct fpu *fpu);
-extern void fpstate_init(struct fpu *fpu);
-extern void fpu__clear(struct task_struct *tsk);
-
-extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);
-extern void fpu__restore(void);
-extern void fpu__init_check_bugs(void);
-extern void fpu__resume_cpu(void);
-
-extern bool irq_fpu_usable(void);
-
/*
* Careful: __kernel_fpu_begin/end() must be called with preempt disabled
* and they don't touch the preempt state on their own.
@@ -41,6 +26,7 @@ extern void __kernel_fpu_begin(void);
extern void __kernel_fpu_end(void);
extern void kernel_fpu_begin(void);
extern void kernel_fpu_end(void);
+extern bool irq_fpu_usable(void);
/*
* Some instructions like VIA's padlock instructions generate a spurious
@@ -73,19 +59,4 @@ static inline void irq_ts_restore(int TS_state)
stts();
}
-/*
- * The question "does this thread have fpu access?"
- * is slightly racy, since preemption could come in
- * and revoke it immediately after the test.
- *
- * However, even in that very unlikely scenario,
- * we can just assume we have FPU access - typically
- * to save the FP state - we'll just take a #NM
- * fault and get the FPU access back.
- */
-static inline int user_has_fpu(void)
-{
- return current->thread.fpu.fpregs_active;
-}
-
#endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 0c8c812d23b4..89c6ec80c1ac 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -12,6 +12,7 @@
#include <linux/regset.h>
#include <linux/compat.h>
+#include <linux/sched.h>
#include <linux/slab.h>
#include <asm/user.h>
@@ -43,6 +44,15 @@ extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
+extern int fpstate_alloc_init(struct fpu *fpu);
+extern void fpstate_init(struct fpu *fpu);
+extern void fpu__clear(struct task_struct *tsk);
+
+extern int dump_fpu(struct pt_regs *, struct user_i387_struct *);
+extern void fpu__restore(void);
+extern void fpu__init_check_bugs(void);
+extern void fpu__resume_cpu(void);
+
DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx);
extern void convert_from_fxsr(struct user_i387_ia32_struct *env,
@@ -335,6 +345,21 @@ static inline void __fpregs_activate(struct fpu *fpu)
}
/*
+ * The question "does this thread have fpu access?"
+ * is slightly racy, since preemption could come in
+ * and revoke it immediately after the test.
+ *
+ * However, even in that very unlikely scenario,
+ * we can just assume we have FPU access - typically
+ * to save the FP state - we'll just take a #NM
+ * fault and get the FPU access back.
+ */
+static inline int user_has_fpu(void)
+{
+ return current->thread.fpu.fpregs_active;
+}
+
+/*
* Encapsulate the CR0.TS handling together with the
* software flag.
*
diff --git a/arch/x86/include/asm/stackprotector.h b/arch/x86/include/asm/stackprotector.h
index 6a998598f172..c2e00bb2a136 100644
--- a/arch/x86/include/asm/stackprotector.h
+++ b/arch/x86/include/asm/stackprotector.h
@@ -39,7 +39,9 @@
#include <asm/processor.h>
#include <asm/percpu.h>
#include <asm/desc.h>
+
#include <linux/random.h>
+#include <linux/sched.h>
/*
* 24 byte read-only segment initializer for stack canary. Linker
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 29dd74318ec6..bd17db15a2c1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -12,7 +12,7 @@
#include <asm/bugs.h>
#include <asm/processor.h>
#include <asm/processor-flags.h>
-#include <asm/fpu/api.h>
+#include <asm/fpu/internal.h>
#include <asm/msr.h>
#include <asm/paravirt.h>
#include <asm/alternative.h>
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5cb738a18ca3..f93ae71416e4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -40,7 +40,7 @@
#include <asm/vmx.h>
#include <asm/virtext.h>
#include <asm/mce.h>
-#include <asm/fpu/api.h>
+#include <asm/fpu/internal.h>
#include <asm/xcr.h>
#include <asm/perf_event.h>
#include <asm/debugreg.h>
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 3bb4c6a24ea5..cf843855e4f6 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -31,7 +31,7 @@
#include <asm/traps.h>
#include <asm/desc.h>
#include <asm/user.h>
-#include <asm/fpu/api.h>
+#include <asm/fpu/internal.h>
#include "fpu_system.h"
#include "fpu_emu.h"
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index ad0ce6b70fac..0d7dd1f5ac36 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -19,6 +19,7 @@
#include <asm/page.h>
#include <asm/mce.h>
#include <asm/suspend.h>
+#include <asm/fpu/internal.h>
#include <asm/debugreg.h>
#include <asm/cpu.h>
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index fce5989e66d9..b80e4b8c9b6e 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -46,7 +46,7 @@
#include <asm/setup.h>
#include <asm/lguest.h>
#include <asm/uaccess.h>
-#include <asm/fpu/api.h>
+#include <asm/fpu/internal.h>
#include <asm/tlbflush.h>
#include "../lg.h"
--
2.1.0
Especially the irq_ts_save() function is pretty bloaty, generating
over a dozen instructions, so uninline them.
Even though the API is used rarely, the space savings are measurable:
text data bss dec hex filename
13331995 2572920 1634304 17539219 10ba093 vmlinux.before
13331739 2572920 1634304 17538963 10b9f93 vmlinux.after
( This also allows the removal of an include file inclusion from fpu/api.h,
speeding up the kernel build slightly. )
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 27 ++-------------------------
arch/x86/kernel/fpu/core.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 32 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 0c713455fc63..62035cc1d961 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -10,8 +10,6 @@
#ifndef _ASM_X86_FPU_API_H
#define _ASM_X86_FPU_API_H
-#include <linux/hardirq.h>
-
/*
* Careful: __kernel_fpu_begin/end() must be called with preempt disabled
* and they don't touch the preempt state on their own.
@@ -35,28 +33,7 @@ extern bool irq_fpu_usable(void);
* in interrupt context interacting wrongly with other user/kernel fpu usage, we
* should use them only in the context of irq_ts_save/restore()
*/
-static inline int irq_ts_save(void)
-{
- /*
- * If in process context and not atomic, we can take a spurious DNA fault.
- * Otherwise, doing clts() in process context requires disabling preemption
- * or some heavy lifting like kernel_fpu_begin()
- */
- if (!in_atomic())
- return 0;
-
- if (read_cr0() & X86_CR0_TS) {
- clts();
- return 1;
- }
-
- return 0;
-}
-
-static inline void irq_ts_restore(int TS_state)
-{
- if (TS_state)
- stts();
-}
+extern int irq_ts_save(void);
+extern void irq_ts_restore(int TS_state);
#endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 15fd714b6a83..34a4e1032424 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -6,6 +6,7 @@
* Gareth Hughes <[email protected]>, May 2000
*/
#include <asm/fpu/internal.h>
+#include <linux/hardirq.h>
/*
* Track whether the kernel is using the FPU state
@@ -140,6 +141,35 @@ void kernel_fpu_end(void)
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
+/*
+ * CR0::TS save/restore functions:
+ */
+int irq_ts_save(void)
+{
+ /*
+ * If in process context and not atomic, we can take a spurious DNA fault.
+ * Otherwise, doing clts() in process context requires disabling preemption
+ * or some heavy lifting like kernel_fpu_begin()
+ */
+ if (!in_atomic())
+ return 0;
+
+ if (read_cr0() & X86_CR0_TS) {
+ clts();
+ return 1;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(irq_ts_save);
+
+void irq_ts_restore(int TS_state)
+{
+ if (TS_state)
+ stts();
+}
+EXPORT_SYMBOL_GPL(irq_ts_restore);
+
static void __save_fpu(struct fpu *fpu)
{
if (use_xsave()) {
--
2.1.0
So fpu_save_init() is a historic name that got its name when the only
way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
state after saving it.
Nowadays the name is misleading, because ever since the introduction of
FXSAVE (and more modern FPU saving instructions) the 'we need to reload
the FPU state' part is only true if there's a pending FPU exception [*],
which is almost never the case.
So rename it to copy_fpregs_to_fpstate() to make it clear what's
happening. Also add a few comments about why we cannot keep registers
in certain cases.
Also clean up the control flow a bit, to make it more apparent when
we are dropping/keeping FP registers, and to optimize the common
case (of keeping fpregs) some more.
[*] Probably not true anymore, modern instructions always leave the FPU
state intact, even if exceptions are pending: because pending FP
exceptions are posted on the next FP instruction, not asynchronously.
They were truly asynchronous back in the IRQ13 case, and we had to
synchronize with them, but that code is not working anymore: we don't
have IRQ13 mapped in the IDT anymore.
But a cleanup patch is obviously not the place to change subtle behavior.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 34 ++++++++++++++++++++++++----------
arch/x86/kernel/fpu/core.c | 4 ++--
arch/x86/kernel/traps.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/mm/mpx.c | 2 +-
5 files changed, 29 insertions(+), 15 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 89c6ec80c1ac..11055f51e67a 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -265,9 +265,15 @@ static inline void fpu_fxsave(struct fpu *fpu)
/*
* These must be called with preempt disabled. Returns
- * 'true' if the FPU state is still intact.
+ * 'true' if the FPU state is still intact and we can
+ * keep registers active.
+ *
+ * The legacy FNSAVE instruction cleared all FPU state
+ * unconditionally, so registers are essentially destroyed.
+ * Modern FPU state can be kept in registers, if there are
+ * no pending FP exceptions. (Note the FIXME below.)
*/
-static inline int fpu_save_init(struct fpu *fpu)
+static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
{
if (use_xsave()) {
xsave_state(&fpu->state->xsave);
@@ -276,13 +282,16 @@ static inline int fpu_save_init(struct fpu *fpu)
* xsave header may indicate the init state of the FP.
*/
if (!(fpu->state->xsave.header.xfeatures & XSTATE_FP))
- return 1;
- } else if (use_fxsr()) {
- fpu_fxsave(fpu);
+ goto keep_fpregs;
} else {
- asm volatile("fnsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- return 0;
+ if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ /* FNSAVE always clears FPU registers: */
+ asm volatile("fnsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ goto drop_fpregs;
+ }
}
/*
@@ -295,9 +304,14 @@ static inline int fpu_save_init(struct fpu *fpu)
*/
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
- return 0;
+ goto drop_fpregs;
}
+
+keep_fpregs:
return 1;
+
+drop_fpregs:
+ return 0;
}
extern void fpu__save(struct fpu *fpu);
@@ -448,7 +462,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
(use_eager_fpu() || new_fpu->counter > 5);
if (old_fpu->fpregs_active) {
- if (!fpu_save_init(old_fpu))
+ if (!copy_fpregs_to_fpstate(old_fpu))
old_fpu->last_cpu = -1;
else
old_fpu->last_cpu = cpu;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 34a4e1032424..538f2541b7f7 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -102,7 +102,7 @@ void __kernel_fpu_begin(void)
kernel_fpu_disable();
if (fpu->fpregs_active) {
- fpu_save_init(fpu);
+ copy_fpregs_to_fpstate(fpu);
} else {
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
if (!use_eager_fpu())
@@ -196,7 +196,7 @@ void fpu__save(struct fpu *fpu)
if (use_eager_fpu()) {
__save_fpu(fpu);
} else {
- fpu_save_init(fpu);
+ copy_fpregs_to_fpstate(fpu);
fpregs_deactivate(fpu);
}
}
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index a65586edbb57..f028f1da3480 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -395,7 +395,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
* It is not directly accessible, though, so we need to
* do an xsave and then pull it out of the xsave buffer.
*/
- fpu_save_init(&tsk->thread.fpu);
+ copy_fpregs_to_fpstate(&tsk->thread.fpu);
xsave_buf = &(tsk->thread.fpu.state->xsave);
bndcsr = get_xsave_addr(xsave_buf, XSTATE_BNDCSR);
if (!bndcsr)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b58b9397098..d90bf4afa2b0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7058,7 +7058,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
return;
vcpu->guest_fpu_loaded = 0;
- fpu_save_init(&vcpu->arch.guest_fpu);
+ copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
__kernel_fpu_end();
++vcpu->stat.fpu_reload;
kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 5563be313fd6..3287215be60a 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -357,7 +357,7 @@ static __user void *task_get_bounds_dir(struct task_struct *tsk)
* The bounds directory pointer is stored in a register
* only accessible if we first do an xsave.
*/
- fpu_save_init(&tsk->thread.fpu);
+ copy_fpregs_to_fpstate(&tsk->thread.fpu);
bndcsr = get_xsave_addr(&tsk->thread.fpu.state->xsave, XSTATE_BNDCSR);
if (!bndcsr)
return MPX_INVALID_BOUNDS_DIR;
--
2.1.0
So we have the following ancient code in copy_fpregs_to_fpstate():
if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
asm volatile("fnclex");
goto drop_fpregs;
}
which clears pending FPU exceptions and then drops registers, which
causes the next FP instruction of the saved context to re-load the
saved FPU state, with all pending exceptions marked properly, and
will re-start the exception handling mechanism in the hardware.
Since FPU exceptions are always issued on instruction boundaries,
in particular on the next FP instruction following the exception
generating instruction, there's no fear of getting an FP exception
asynchronously.
They were truly asynchronous back in the IRQ13 days, when the FPU was
a weird and expensive co-processor that did its own processing, and we
had to synchronize with them, but that code is not working anymore:
we don't have IRQ13 mapped in the IDT anymore.
With the introduction of optimized XSAVE support there's a new
complication: if the xstate features bit indicates that a particular
state component is unused (in 'init state'), then the hardware does
not guarantee that the XSAVE (et al) instruction keeps the underlying
FPU state image in memory valid and current. In practice this means
that the hardware won't write it, and the exceptions flag in the
state might be an older version, with it still being set. This
meant that we had to check the xfeatures flag as well, adding
another memory load and branch to a critical hot path of the scheduler.
So optimize all this by removing both the old quirk and the new check,
and straight-line optimizing the most common cases with likely()
hints. Quite a bit of code gets removed this way:
arch/x86/kernel/process_64.o:
text data bss dec filename
5484 8 0 5492 process_64.o.before
5416 8 0 5424 process_64.o.after
Now there's also a chance that some weird behavior or erratum was
masked by our IRQ13 handling quirk (or that I misunderstood the
nature of the quirk), and that this change triggers some badness.
There's no real good way to protect against that possibility other
than keeping this change well isolated, well commented and well
bisectable. If you bisect a weird (or not so weird) breakage to
this commit then please let us know!
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 40 ++++++++++------------------------------
1 file changed, 10 insertions(+), 30 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 11055f51e67a..10663b02ee22 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -271,46 +271,26 @@ static inline void fpu_fxsave(struct fpu *fpu)
* The legacy FNSAVE instruction cleared all FPU state
* unconditionally, so registers are essentially destroyed.
* Modern FPU state can be kept in registers, if there are
- * no pending FP exceptions. (Note the FIXME below.)
+ * no pending FP exceptions.
*/
static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
{
- if (use_xsave()) {
+ if (likely(use_xsave())) {
xsave_state(&fpu->state->xsave);
+ return 1;
+ }
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (!(fpu->state->xsave.header.xfeatures & XSTATE_FP))
- goto keep_fpregs;
- } else {
- if (use_fxsr()) {
- fpu_fxsave(fpu);
- } else {
- /* FNSAVE always clears FPU registers: */
- asm volatile("fnsave %[fx]; fwait"
- : [fx] "=m" (fpu->state->fsave));
- goto drop_fpregs;
- }
+ if (likely(use_fxsr())) {
+ fpu_fxsave(fpu);
+ return 1;
}
/*
- * If exceptions are pending, we need to clear them so
- * that we don't randomly get exceptions later.
- *
- * FIXME! Is this perhaps only true for the old-style
- * irq13 case? Maybe we could leave the x87 state
- * intact otherwise?
+ * Legacy FPU register saving, FNSAVE always clears FPU registers,
+ * so we have to mark them inactive:
*/
- if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES)) {
- asm volatile("fnclex");
- goto drop_fpregs;
- }
-
-keep_fpregs:
- return 1;
+ asm volatile("fnsave %[fx]; fwait" : [fx] "=m" (fpu->state->fsave));
-drop_fpregs:
return 0;
}
--
2.1.0
So 6 years ago we made the FPU fpstate dynamically allocated:
aa283f49276e ("x86, fpu: lazy allocation of FPU area - v5")
61c4628b5386 ("x86, fpu: split FPU state from task struct - v5")
In hindsight this was a mistake:
- it complicated context allocation failure handling, such as:
/* kthread execs. TODO: cleanup this horror. */
if (WARN_ON(fpstate_alloc_init(fpu)))
force_sig(SIGKILL, tsk);
- it caused us to enable irqs in fpu__restore():
local_irq_enable();
/*
* does a slab alloc which can sleep
*/
if (fpstate_alloc_init(fpu)) {
/*
* ran out of memory!
*/
do_group_exit(SIGKILL);
return;
}
local_irq_disable();
- it (slightly) slowed down task creation/destruction by adding
slab allocation/free pattens.
- it made access to context contents (slightly) slower by adding
one more pointer dereference.
The motivation for the dynamic allocation was two-fold:
- reduce memory consumption by non-FPU tasks
- allocate and handle only the necessary amount of context for
various XSAVE processors that have varying hardware frame
sizes.
These days, with glibc using SSE memcpy by default and GCC optimizing
for SSE/AVX by default, the scope of FPU using apps on an x86 system is
much larger than it was 6 years ago.
For example on a freshly installed Fedora 21 desktop system, with a
recent kernel, all non-kthread tasks have used the FPU shortly after
bootup.
Also, even modern embedded x86 CPUs try to support the latest vector
instruction set - so they'll too often use the larger xstate frame
sizes.
So remove the dynamic allocation complication by embedding the FPU
fpstate in task_struct again. This should make the FPU a lot more
accessible to all sorts of atomic contexts.
We could still optimize for the xstate frame size in the future,
by moving the state structure to the last element of task_struct,
and allocating only a part of that.
This change is kept minimal by still keeping the ctx_alloc()/free()
routines (that now do nothing substantial) - we'll remove them in
the following patches.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 34 +++++++++++++++++-----------------
arch/x86/include/asm/fpu/types.h | 2 +-
arch/x86/kernel/fpu/core.c | 52 ++++++++++++++++++++--------------------------------
arch/x86/kernel/fpu/xsave.c | 12 ++++++------
arch/x86/kernel/traps.c | 2 +-
arch/x86/kvm/x86.c | 14 +++++++-------
arch/x86/math-emu/fpu_aux.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 4 ++--
arch/x86/math-emu/fpu_system.h | 2 +-
arch/x86/mm/mpx.c | 2 +-
10 files changed, 57 insertions(+), 69 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 10663b02ee22..4ce830fb3f31 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -232,9 +232,9 @@ static inline int frstor_user(struct i387_fsave_struct __user *fx)
static inline void fpu_fxsave(struct fpu *fpu)
{
if (config_enabled(CONFIG_X86_32))
- asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state->fxsave));
+ asm volatile( "fxsave %[fx]" : [fx] "=m" (fpu->state.fxsave));
else if (config_enabled(CONFIG_AS_FXSAVEQ))
- asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state->fxsave));
+ asm volatile("fxsaveq %[fx]" : [fx] "=m" (fpu->state.fxsave));
else {
/* Using "rex64; fxsave %0" is broken because, if the memory
* operand uses any extended registers for addressing, a second
@@ -251,15 +251,15 @@ static inline void fpu_fxsave(struct fpu *fpu)
* an extended register is needed for addressing (fix submitted
* to mainline 2005-11-21).
*
- * asm volatile("rex64/fxsave %0" : "=m" (fpu->state->fxsave));
+ * asm volatile("rex64/fxsave %0" : "=m" (fpu->state.fxsave));
*
* This, however, we can work around by forcing the compiler to
* select an addressing mode that doesn't require extended
* registers.
*/
asm volatile( "rex64/fxsave (%[fx])"
- : "=m" (fpu->state->fxsave)
- : [fx] "R" (&fpu->state->fxsave));
+ : "=m" (fpu->state.fxsave)
+ : [fx] "R" (&fpu->state.fxsave));
}
}
@@ -276,7 +276,7 @@ static inline void fpu_fxsave(struct fpu *fpu)
static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
{
if (likely(use_xsave())) {
- xsave_state(&fpu->state->xsave);
+ xsave_state(&fpu->state.xsave);
return 1;
}
@@ -289,7 +289,7 @@ static inline int copy_fpregs_to_fpstate(struct fpu *fpu)
* Legacy FPU register saving, FNSAVE always clears FPU registers,
* so we have to mark them inactive:
*/
- asm volatile("fnsave %[fx]; fwait" : [fx] "=m" (fpu->state->fsave));
+ asm volatile("fnsave %[fx]; fwait" : [fx] "=m" (fpu->state.fsave));
return 0;
}
@@ -299,11 +299,11 @@ extern void fpu__save(struct fpu *fpu);
static inline int fpu_restore_checking(struct fpu *fpu)
{
if (use_xsave())
- return fpu_xrstor_checking(&fpu->state->xsave);
+ return fpu_xrstor_checking(&fpu->state.xsave);
else if (use_fxsr())
- return fxrstor_checking(&fpu->state->fxsave);
+ return fxrstor_checking(&fpu->state.fxsave);
else
- return frstor_checking(&fpu->state->fsave);
+ return frstor_checking(&fpu->state.fsave);
}
static inline int restore_fpu_checking(struct fpu *fpu)
@@ -454,7 +454,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
if (fpu.preload) {
new_fpu->counter++;
__fpregs_activate(new_fpu);
- prefetch(new_fpu->state);
+ prefetch(&new_fpu->state);
} else if (!use_eager_fpu())
stts();
} else {
@@ -465,7 +465,7 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
if (fpu_want_lazy_restore(new_fpu, cpu))
fpu.preload = 0;
else
- prefetch(new_fpu->state);
+ prefetch(&new_fpu->state);
fpregs_activate(new_fpu);
}
}
@@ -534,25 +534,25 @@ static inline void user_fpu_begin(void)
static inline unsigned short get_fpu_cwd(struct task_struct *tsk)
{
if (cpu_has_fxsr) {
- return tsk->thread.fpu.state->fxsave.cwd;
+ return tsk->thread.fpu.state.fxsave.cwd;
} else {
- return (unsigned short)tsk->thread.fpu.state->fsave.cwd;
+ return (unsigned short)tsk->thread.fpu.state.fsave.cwd;
}
}
static inline unsigned short get_fpu_swd(struct task_struct *tsk)
{
if (cpu_has_fxsr) {
- return tsk->thread.fpu.state->fxsave.swd;
+ return tsk->thread.fpu.state.fxsave.swd;
} else {
- return (unsigned short)tsk->thread.fpu.state->fsave.swd;
+ return (unsigned short)tsk->thread.fpu.state.fsave.swd;
}
}
static inline unsigned short get_fpu_mxcsr(struct task_struct *tsk)
{
if (cpu_has_xmm) {
- return tsk->thread.fpu.state->fxsave.mxcsr;
+ return tsk->thread.fpu.state.fxsave.mxcsr;
} else {
return MXCSR_DEFAULT;
}
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 231a8f53b2f8..3a15ac6032eb 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -143,7 +143,7 @@ struct fpu {
unsigned int last_cpu;
unsigned int fpregs_active;
- union thread_xstate *state;
+ union thread_xstate state;
/*
* This counter contains the number of consecutive context switches
* during which the FPU stays used. If this is over a threshold, the
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 538f2541b7f7..422cbb4bbe01 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -174,9 +174,9 @@ static void __save_fpu(struct fpu *fpu)
{
if (use_xsave()) {
if (unlikely(system_state == SYSTEM_BOOTING))
- xsave_state_booting(&fpu->state->xsave);
+ xsave_state_booting(&fpu->state.xsave);
else
- xsave_state(&fpu->state->xsave);
+ xsave_state(&fpu->state.xsave);
} else {
fpu_fxsave(fpu);
}
@@ -207,16 +207,16 @@ EXPORT_SYMBOL_GPL(fpu__save);
void fpstate_init(struct fpu *fpu)
{
if (!cpu_has_fpu) {
- finit_soft_fpu(&fpu->state->soft);
+ finit_soft_fpu(&fpu->state.soft);
return;
}
- memset(fpu->state, 0, xstate_size);
+ memset(&fpu->state, 0, xstate_size);
if (cpu_has_fxsr) {
- fx_finit(&fpu->state->fxsave);
+ fx_finit(&fpu->state.fxsave);
} else {
- struct i387_fsave_struct *fp = &fpu->state->fsave;
+ struct i387_fsave_struct *fp = &fpu->state.fsave;
fp->cwd = 0xffff037fu;
fp->swd = 0xffff0000u;
fp->twd = 0xffffffffu;
@@ -241,15 +241,8 @@ void fpstate_cache_init(void)
int fpstate_alloc(struct fpu *fpu)
{
- if (fpu->state)
- return 0;
-
- fpu->state = kmem_cache_alloc(task_xstate_cachep, GFP_KERNEL);
- if (!fpu->state)
- return -ENOMEM;
-
/* The CPU requires the FPU state to be aligned to 16 byte boundaries: */
- WARN_ON((unsigned long)fpu->state & 15);
+ WARN_ON((unsigned long)&fpu->state & 15);
return 0;
}
@@ -257,10 +250,6 @@ EXPORT_SYMBOL_GPL(fpstate_alloc);
void fpstate_free(struct fpu *fpu)
{
- if (fpu->state) {
- kmem_cache_free(task_xstate_cachep, fpu->state);
- fpu->state = NULL;
- }
}
EXPORT_SYMBOL_GPL(fpstate_free);
@@ -277,11 +266,11 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
WARN_ON(src_fpu != ¤t->thread.fpu);
if (use_eager_fpu()) {
- memset(&dst_fpu->state->xsave, 0, xstate_size);
+ memset(&dst_fpu->state.xsave, 0, xstate_size);
__save_fpu(dst_fpu);
} else {
fpu__save(src_fpu);
- memcpy(dst_fpu->state, src_fpu->state, xstate_size);
+ memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
}
}
@@ -289,7 +278,6 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
{
dst_fpu->counter = 0;
dst_fpu->fpregs_active = 0;
- dst_fpu->state = NULL;
dst_fpu->last_cpu = -1;
if (src_fpu->fpstate_active) {
@@ -483,7 +471,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
sanitize_i387_state(target);
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &fpu->state->fxsave, 0, -1);
+ &fpu->state.fxsave, 0, -1);
}
int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
@@ -503,19 +491,19 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
sanitize_i387_state(target);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &fpu->state->fxsave, 0, -1);
+ &fpu->state.fxsave, 0, -1);
/*
* mxcsr reserved bits must be masked to zero for security reasons.
*/
- fpu->state->fxsave.mxcsr &= mxcsr_feature_mask;
+ fpu->state.fxsave.mxcsr &= mxcsr_feature_mask;
/*
* update the header bits in the xsave header, indicating the
* presence of FP and SSE state.
*/
if (cpu_has_xsave)
- fpu->state->xsave.header.xfeatures |= XSTATE_FPSSE;
+ fpu->state.xsave.header.xfeatures |= XSTATE_FPSSE;
return ret;
}
@@ -535,7 +523,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;
- xsave = &fpu->state->xsave;
+ xsave = &fpu->state.xsave;
/*
* Copy the 48bytes defined by the software first into the xstate
@@ -566,7 +554,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (ret)
return ret;
- xsave = &fpu->state->xsave;
+ xsave = &fpu->state.xsave;
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, xsave, 0, -1);
/*
@@ -657,7 +645,7 @@ static inline u32 twd_fxsr_to_i387(struct i387_fxsave_struct *fxsave)
void
convert_from_fxsr(struct user_i387_ia32_struct *env, struct task_struct *tsk)
{
- struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
+ struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state.fxsave;
struct _fpreg *to = (struct _fpreg *) &env->st_space[0];
struct _fpxreg *from = (struct _fpxreg *) &fxsave->st_space[0];
int i;
@@ -695,7 +683,7 @@ void convert_to_fxsr(struct task_struct *tsk,
const struct user_i387_ia32_struct *env)
{
- struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state->fxsave;
+ struct i387_fxsave_struct *fxsave = &tsk->thread.fpu.state.fxsave;
struct _fpreg *from = (struct _fpreg *) &env->st_space[0];
struct _fpxreg *to = (struct _fpxreg *) &fxsave->st_space[0];
int i;
@@ -736,7 +724,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
- &fpu->state->fsave, 0,
+ &fpu->state.fsave, 0,
-1);
sanitize_i387_state(target);
@@ -770,7 +758,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return user_regset_copyin(&pos, &count, &kbuf, &ubuf,
- &fpu->state->fsave, 0,
+ &fpu->state.fsave, 0,
-1);
if (pos > 0 || count < sizeof(env))
@@ -785,7 +773,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
* presence of FP.
*/
if (cpu_has_xsave)
- fpu->state->xsave.header.xfeatures |= XSTATE_FP;
+ fpu->state.xsave.header.xfeatures |= XSTATE_FP;
return ret;
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index a23236358fb0..c7d48eb0a194 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -44,14 +44,14 @@ static unsigned int xfeatures_nr;
*/
void __sanitize_i387_state(struct task_struct *tsk)
{
- struct i387_fxsave_struct *fx = &tsk->thread.fpu.state->fxsave;
+ struct i387_fxsave_struct *fx = &tsk->thread.fpu.state.fxsave;
int feature_bit;
u64 xfeatures;
if (!fx)
return;
- xfeatures = tsk->thread.fpu.state->xsave.header.xfeatures;
+ xfeatures = tsk->thread.fpu.state.xsave.header.xfeatures;
/*
* None of the feature bits are in init state. So nothing else
@@ -147,7 +147,7 @@ static inline int check_for_xstate(struct i387_fxsave_struct __user *buf,
static inline int save_fsave_header(struct task_struct *tsk, void __user *buf)
{
if (use_fxsr()) {
- struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state.xsave;
struct user_i387_ia32_struct env;
struct _fpstate_ia32 __user *fp = buf;
@@ -245,7 +245,7 @@ static inline int save_user_xstate(struct xsave_struct __user *buf)
*/
int save_xstate_sig(void __user *buf, void __user *buf_fx, int size)
{
- struct xsave_struct *xsave = ¤t->thread.fpu.state->xsave;
+ struct xsave_struct *xsave = ¤t->thread.fpu.state.xsave;
struct task_struct *tsk = current;
int ia32_fxstate = (buf != buf_fx);
@@ -288,7 +288,7 @@ sanitize_restored_xstate(struct task_struct *tsk,
struct user_i387_ia32_struct *ia32_env,
u64 xfeatures, int fx_only)
{
- struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+ struct xsave_struct *xsave = &tsk->thread.fpu.state.xsave;
struct xstate_header *header = &xsave->header;
if (use_xsave()) {
@@ -402,7 +402,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
*/
drop_fpu(fpu);
- if (__copy_from_user(&fpu->state->xsave, buf_fx, state_size) ||
+ if (__copy_from_user(&fpu->state.xsave, buf_fx, state_size) ||
__copy_from_user(&env, buf, sizeof(env))) {
fpstate_init(fpu);
err = -1;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f028f1da3480..48dfcd9ed351 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -396,7 +396,7 @@ dotraplinkage void do_bounds(struct pt_regs *regs, long error_code)
* do an xsave and then pull it out of the xsave buffer.
*/
copy_fpregs_to_fpstate(&tsk->thread.fpu);
- xsave_buf = &(tsk->thread.fpu.state->xsave);
+ xsave_buf = &(tsk->thread.fpu.state.xsave);
bndcsr = get_xsave_addr(xsave_buf, XSTATE_BNDCSR);
if (!bndcsr)
goto exit_trap;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d90bf4afa2b0..8bb0de5bf9c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3196,7 +3196,7 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu,
static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
{
- struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
+ struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state.xsave;
u64 xstate_bv = xsave->header.xfeatures;
u64 valid;
@@ -3232,7 +3232,7 @@ static void fill_xsave(u8 *dest, struct kvm_vcpu *vcpu)
static void load_xsave(struct kvm_vcpu *vcpu, u8 *src)
{
- struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state->xsave;
+ struct xsave_struct *xsave = &vcpu->arch.guest_fpu.state.xsave;
u64 xstate_bv = *(u64 *)(src + XSAVE_HDR_OFFSET);
u64 valid;
@@ -3277,7 +3277,7 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
fill_xsave((u8 *) guest_xsave->region, vcpu);
} else {
memcpy(guest_xsave->region,
- &vcpu->arch.guest_fpu.state->fxsave,
+ &vcpu->arch.guest_fpu.state.fxsave,
sizeof(struct i387_fxsave_struct));
*(u64 *)&guest_xsave->region[XSAVE_HDR_OFFSET / sizeof(u32)] =
XSTATE_FPSSE;
@@ -3302,7 +3302,7 @@ static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
} else {
if (xstate_bv & ~XSTATE_FPSSE)
return -EINVAL;
- memcpy(&vcpu->arch.guest_fpu.state->fxsave,
+ memcpy(&vcpu->arch.guest_fpu.state.fxsave,
guest_xsave->region, sizeof(struct i387_fxsave_struct));
}
return 0;
@@ -6973,7 +6973,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
struct i387_fxsave_struct *fxsave =
- &vcpu->arch.guest_fpu.state->fxsave;
+ &vcpu->arch.guest_fpu.state.fxsave;
memcpy(fpu->fpr, fxsave->st_space, 128);
fpu->fcw = fxsave->cwd;
@@ -6990,7 +6990,7 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
{
struct i387_fxsave_struct *fxsave =
- &vcpu->arch.guest_fpu.state->fxsave;
+ &vcpu->arch.guest_fpu.state.fxsave;
memcpy(fxsave->st_space, fpu->fpr, 128);
fxsave->cwd = fpu->fcw;
@@ -7014,7 +7014,7 @@ int fx_init(struct kvm_vcpu *vcpu)
fpstate_init(&vcpu->arch.guest_fpu);
if (cpu_has_xsaves)
- vcpu->arch.guest_fpu.state->xsave.header.xcomp_bv =
+ vcpu->arch.guest_fpu.state.xsave.header.xcomp_bv =
host_xcr0 | XSTATE_COMPACTION_ENABLED;
/*
diff --git a/arch/x86/math-emu/fpu_aux.c b/arch/x86/math-emu/fpu_aux.c
index dc8adad10a2f..7562341ce299 100644
--- a/arch/x86/math-emu/fpu_aux.c
+++ b/arch/x86/math-emu/fpu_aux.c
@@ -52,7 +52,7 @@ void finit_soft_fpu(struct i387_soft_struct *soft)
void finit(void)
{
- finit_soft_fpu(¤t->thread.fpu.state->soft);
+ finit_soft_fpu(¤t->thread.fpu.state.soft);
}
/*
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index cf843855e4f6..5e003704ebfa 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -683,7 +683,7 @@ int fpregs_soft_set(struct task_struct *target,
unsigned int pos, unsigned int count,
const void *kbuf, const void __user *ubuf)
{
- struct i387_soft_struct *s387 = &target->thread.fpu.state->soft;
+ struct i387_soft_struct *s387 = &target->thread.fpu.state.soft;
void *space = s387->st_space;
int ret;
int offset, other, i, tags, regnr, tag, newtop;
@@ -735,7 +735,7 @@ int fpregs_soft_get(struct task_struct *target,
unsigned int pos, unsigned int count,
void *kbuf, void __user *ubuf)
{
- struct i387_soft_struct *s387 = &target->thread.fpu.state->soft;
+ struct i387_soft_struct *s387 = &target->thread.fpu.state.soft;
const void *space = s387->st_space;
int ret;
int offset = (S387->ftop & 7) * 10, other = 80 - offset;
diff --git a/arch/x86/math-emu/fpu_system.h b/arch/x86/math-emu/fpu_system.h
index 2c614410a5f3..9ccecb61a4fa 100644
--- a/arch/x86/math-emu/fpu_system.h
+++ b/arch/x86/math-emu/fpu_system.h
@@ -31,7 +31,7 @@
#define SEG_EXPAND_DOWN(s) (((s).b & ((1 << 11) | (1 << 10))) \
== (1 << 10))
-#define I387 (current->thread.fpu.state)
+#define I387 (¤t->thread.fpu.state)
#define FPU_info (I387->soft.info)
#define FPU_CS (*(unsigned short *) &(FPU_info->regs->cs))
diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
index 3287215be60a..ea5b367b63a9 100644
--- a/arch/x86/mm/mpx.c
+++ b/arch/x86/mm/mpx.c
@@ -358,7 +358,7 @@ static __user void *task_get_bounds_dir(struct task_struct *tsk)
* only accessible if we first do an xsave.
*/
copy_fpregs_to_fpstate(&tsk->thread.fpu);
- bndcsr = get_xsave_addr(&tsk->thread.fpu.state->xsave, XSTATE_BNDCSR);
+ bndcsr = get_xsave_addr(&tsk->thread.fpu.state.xsave, XSTATE_BNDCSR);
if (!bndcsr)
return MPX_INVALID_BOUNDS_DIR;
--
2.1.0
Now that we always allocate the FPU context as part of task_struct there's
no need for separate allocations - remove them and their primary failure
handling code.
( Note that there's still secondary error codes that have become superfluous,
those will be removed in separate patches. )
Move the somewhat misplaced setup_xstate_comp() call to the core.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 4 ----
arch/x86/kernel/fpu/core.c | 51 ++-------------------------------------------------
arch/x86/kernel/fpu/init.c | 1 +
arch/x86/kernel/process.c | 10 ----------
arch/x86/kvm/x86.c | 11 -----------
5 files changed, 3 insertions(+), 74 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 4ce830fb3f31..9454f21f0edf 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -558,10 +558,6 @@ static inline unsigned short get_fpu_mxcsr(struct task_struct *tsk)
}
}
-extern void fpstate_cache_init(void);
-
-extern int fpstate_alloc(struct fpu *fpu);
-extern void fpstate_free(struct fpu *fpu);
extern int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu);
static inline unsigned long
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 422cbb4bbe01..7d42a54b5f23 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -226,34 +226,6 @@ void fpstate_init(struct fpu *fpu)
EXPORT_SYMBOL_GPL(fpstate_init);
/*
- * FPU state allocation:
- */
-static struct kmem_cache *task_xstate_cachep;
-
-void fpstate_cache_init(void)
-{
- task_xstate_cachep =
- kmem_cache_create("task_xstate", xstate_size,
- __alignof__(union thread_xstate),
- SLAB_PANIC | SLAB_NOTRACK, NULL);
- setup_xstate_comp();
-}
-
-int fpstate_alloc(struct fpu *fpu)
-{
- /* The CPU requires the FPU state to be aligned to 16 byte boundaries: */
- WARN_ON((unsigned long)&fpu->state & 15);
-
- return 0;
-}
-EXPORT_SYMBOL_GPL(fpstate_alloc);
-
-void fpstate_free(struct fpu *fpu)
-{
-}
-EXPORT_SYMBOL_GPL(fpstate_free);
-
-/*
* Copy the current task's FPU state to a new task's FPU context.
*
* In the 'eager' case we just save to the destination context.
@@ -280,13 +252,9 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
dst_fpu->fpregs_active = 0;
dst_fpu->last_cpu = -1;
- if (src_fpu->fpstate_active) {
- int err = fpstate_alloc(dst_fpu);
-
- if (err)
- return err;
+ if (src_fpu->fpstate_active)
fpu_copy(dst_fpu, src_fpu);
- }
+
return 0;
}
@@ -305,13 +273,6 @@ int fpstate_alloc_init(struct fpu *fpu)
if (WARN_ON_ONCE(fpu->fpstate_active))
return -EINVAL;
- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpstate_alloc(fpu);
- if (ret)
- return ret;
-
fpstate_init(fpu);
/* Safe to do for the current task: */
@@ -356,13 +317,6 @@ static int fpu__unlazy_stopped(struct fpu *child_fpu)
return 0;
}
- /*
- * Memory allocation at the first usage of the FPU and other state.
- */
- ret = fpstate_alloc(child_fpu);
- if (ret)
- return ret;
-
fpstate_init(child_fpu);
/* Safe to do for stopped child tasks: */
@@ -423,7 +377,6 @@ void fpu__clear(struct task_struct *tsk)
if (!use_eager_fpu()) {
/* FPU state will be reallocated lazily at the first use. */
drop_fpu(fpu);
- fpstate_free(fpu);
} else {
if (!fpu->fpstate_active) {
/* kthread execs. TODO: cleanup this horror. */
diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c
index 7ae5a62918c7..460e7e2c6186 100644
--- a/arch/x86/kernel/fpu/init.c
+++ b/arch/x86/kernel/fpu/init.c
@@ -265,6 +265,7 @@ void fpu__init_system(struct cpuinfo_x86 *c)
fpu__init_system_generic();
fpu__init_system_xstate_size_legacy();
fpu__init_system_xstate();
+ setup_xstate_comp();
fpu__init_system_ctx_switch();
}
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 2bd188501ac9..4b4b16c8e6ee 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -86,16 +86,6 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return fpu__copy(&dst->thread.fpu, &src->thread.fpu);
}
-void arch_release_task_struct(struct task_struct *tsk)
-{
- fpstate_free(&tsk->thread.fpu);
-}
-
-void arch_task_cache_init(void)
-{
- fpstate_cache_init();
-}
-
/*
* Free current thread data structures etc..
*/
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8bb0de5bf9c0..68529251e897 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7008,10 +7008,6 @@ int fx_init(struct kvm_vcpu *vcpu)
{
int err;
- err = fpstate_alloc(&vcpu->arch.guest_fpu);
- if (err)
- return err;
-
fpstate_init(&vcpu->arch.guest_fpu);
if (cpu_has_xsaves)
vcpu->arch.guest_fpu.state.xsave.header.xcomp_bv =
@@ -7028,11 +7024,6 @@ int fx_init(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(fx_init);
-static void fx_free(struct kvm_vcpu *vcpu)
-{
- fpstate_free(&vcpu->arch.guest_fpu);
-}
-
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
if (vcpu->guest_fpu_loaded)
@@ -7070,7 +7061,6 @@ void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
kvmclock_reset(vcpu);
free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
- fx_free(vcpu);
kvm_x86_ops->vcpu_free(vcpu);
}
@@ -7126,7 +7116,6 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
kvm_mmu_unload(vcpu);
vcpu_put(vcpu);
- fx_free(vcpu);
kvm_x86_ops->vcpu_free(vcpu);
}
--
2.1.0
Remove the failure code and propagate this down to callers.
Note that this function still has an 'init' aspect, which must be
called.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/kernel/fpu/core.c | 37 +++++++------------------------------
arch/x86/kernel/fpu/xsave.c | 4 ++--
arch/x86/kvm/x86.c | 4 ++--
arch/x86/math-emu/fpu_entry.c | 8 ++------
5 files changed, 14 insertions(+), 41 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 9454f21f0edf..1d0c5cee29eb 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -44,7 +44,7 @@ extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
-extern int fpstate_alloc_init(struct fpu *fpu);
+extern void fpstate_alloc_init(struct fpu *fpu);
extern void fpstate_init(struct fpu *fpu);
extern void fpu__clear(struct task_struct *tsk);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7d42a54b5f23..567d789d7736 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -259,26 +259,17 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
}
/*
- * Allocate the backing store for the current task's FPU registers
- * and initialize the registers themselves as well.
- *
- * Can fail.
+ * Initialize the current task's in-memory FPU context:
*/
-int fpstate_alloc_init(struct fpu *fpu)
+void fpstate_alloc_init(struct fpu *fpu)
{
- int ret;
-
- if (WARN_ON_ONCE(fpu != ¤t->thread.fpu))
- return -EINVAL;
- if (WARN_ON_ONCE(fpu->fpstate_active))
- return -EINVAL;
+ WARN_ON_ONCE(fpu != ¤t->thread.fpu);
+ WARN_ON_ONCE(fpu->fpstate_active);
fpstate_init(fpu);
/* Safe to do for the current task: */
fpu->fpstate_active = 1;
-
- return 0;
}
EXPORT_SYMBOL_GPL(fpstate_alloc_init);
@@ -340,20 +331,8 @@ void fpu__restore(void)
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
- if (!fpu->fpstate_active) {
- local_irq_enable();
- /*
- * does a slab alloc which can sleep
- */
- if (fpstate_alloc_init(fpu)) {
- /*
- * ran out of memory!
- */
- do_group_exit(SIGKILL);
- return;
- }
- local_irq_disable();
- }
+ if (!fpu->fpstate_active)
+ fpstate_alloc_init(fpu);
/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
kernel_fpu_disable();
@@ -379,9 +358,7 @@ void fpu__clear(struct task_struct *tsk)
drop_fpu(fpu);
} else {
if (!fpu->fpstate_active) {
- /* kthread execs. TODO: cleanup this horror. */
- if (WARN_ON(fpstate_alloc_init(fpu)))
- force_sig(SIGKILL, tsk);
+ fpstate_alloc_init(fpu);
user_fpu_begin();
restore_init_xstate();
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index c7d48eb0a194..dd2cef08a1a4 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -358,8 +358,8 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;
- if (!fpu->fpstate_active && fpstate_alloc_init(fpu))
- return -1;
+ if (!fpu->fpstate_active)
+ fpstate_alloc_init(fpu);
if (!static_cpu_has(X86_FEATURE_FPU))
return fpregs_soft_set(current, NULL,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 68529251e897..707f4e27ee91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6601,8 +6601,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;
- if (!fpu->fpstate_active && fpstate_alloc_init(fpu))
- return -ENOMEM;
+ if (!fpu->fpstate_active)
+ fpstate_alloc_init(fpu);
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 5e003704ebfa..99ddfc274df3 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -149,12 +149,8 @@ void math_emulate(struct math_emu_info *info)
struct desc_struct code_descriptor;
struct fpu *fpu = ¤t->thread.fpu;
- if (!fpu->fpstate_active) {
- if (fpstate_alloc_init(fpu)) {
- do_group_exit(SIGKILL);
- return;
- }
- }
+ if (!fpu->fpstate_active)
+ fpstate_alloc_init(fpu);
#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
2.1.0
Now that there are no FPU context allocations, rename fpstate_alloc_init()
to fpstate_init_curr(), to signal that it initializes the fpstate and
marks it active, for the current task.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/kernel/fpu/core.c | 8 ++++----
arch/x86/kernel/fpu/xsave.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/math-emu/fpu_entry.c | 2 +-
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 1d0c5cee29eb..1345ab3dd273 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -44,7 +44,7 @@ extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
-extern void fpstate_alloc_init(struct fpu *fpu);
+extern void fpstate_init_curr(struct fpu *fpu);
extern void fpstate_init(struct fpu *fpu);
extern void fpu__clear(struct task_struct *tsk);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 567d789d7736..ca1b74831887 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -261,7 +261,7 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
/*
* Initialize the current task's in-memory FPU context:
*/
-void fpstate_alloc_init(struct fpu *fpu)
+void fpstate_init_curr(struct fpu *fpu)
{
WARN_ON_ONCE(fpu != ¤t->thread.fpu);
WARN_ON_ONCE(fpu->fpstate_active);
@@ -271,7 +271,7 @@ void fpstate_alloc_init(struct fpu *fpu)
/* Safe to do for the current task: */
fpu->fpstate_active = 1;
}
-EXPORT_SYMBOL_GPL(fpstate_alloc_init);
+EXPORT_SYMBOL_GPL(fpstate_init_curr);
/*
* This function is called before we modify a stopped child's
@@ -332,7 +332,7 @@ void fpu__restore(void)
struct fpu *fpu = &tsk->thread.fpu;
if (!fpu->fpstate_active)
- fpstate_alloc_init(fpu);
+ fpstate_init_curr(fpu);
/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
kernel_fpu_disable();
@@ -358,7 +358,7 @@ void fpu__clear(struct task_struct *tsk)
drop_fpu(fpu);
} else {
if (!fpu->fpstate_active) {
- fpstate_alloc_init(fpu);
+ fpstate_init_curr(fpu);
user_fpu_begin();
restore_init_xstate();
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index dd2cef08a1a4..49d9f3dcc2ea 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -359,7 +359,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
return -EACCES;
if (!fpu->fpstate_active)
- fpstate_alloc_init(fpu);
+ fpstate_init_curr(fpu);
if (!static_cpu_has(X86_FEATURE_FPU))
return fpregs_soft_set(current, NULL,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 707f4e27ee91..74b53c314da0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6602,7 +6602,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
sigset_t sigsaved;
if (!fpu->fpstate_active)
- fpstate_alloc_init(fpu);
+ fpstate_init_curr(fpu);
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 99ddfc274df3..4c6ab791d0e5 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -150,7 +150,7 @@ void math_emulate(struct math_emu_info *info)
struct fpu *fpu = ¤t->thread.fpu;
if (!fpu->fpstate_active)
- fpstate_alloc_init(fpu);
+ fpstate_init_curr(fpu);
#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
2.1.0
Now that FPU contexts are always allocated, fpu__unlazy_stopped()
cannot fail. Remove its error return and propagate the changes to
the callers.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 48 +++++++++++++-----------------------------------
1 file changed, 13 insertions(+), 35 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index ca1b74831887..b8b03547072d 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -296,24 +296,18 @@ EXPORT_SYMBOL_GPL(fpstate_init_curr);
* the read-only case, it's not strictly necessary for
* read-only access to the context.
*/
-static int fpu__unlazy_stopped(struct fpu *child_fpu)
+static void fpu__unlazy_stopped(struct fpu *child_fpu)
{
- int ret;
-
- if (WARN_ON_ONCE(child_fpu == ¤t->thread.fpu))
- return -EINVAL;
+ WARN_ON_ONCE(child_fpu == ¤t->thread.fpu);
if (child_fpu->fpstate_active) {
child_fpu->last_cpu = -1;
- return 0;
- }
-
- fpstate_init(child_fpu);
-
- /* Safe to do for stopped child tasks: */
- child_fpu->fpstate_active = 1;
+ } else {
+ fpstate_init(child_fpu);
- return 0;
+ /* Safe to do for stopped child tasks: */
+ child_fpu->fpstate_active = 1;
+ }
}
/*
@@ -389,15 +383,11 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
void *kbuf, void __user *ubuf)
{
struct fpu *fpu = &target->thread.fpu;
- int ret;
if (!cpu_has_fxsr)
return -ENODEV;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
-
+ fpu__unlazy_stopped(fpu);
sanitize_i387_state(target);
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
@@ -414,10 +404,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
-
+ fpu__unlazy_stopped(fpu);
sanitize_i387_state(target);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
@@ -449,9 +436,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
+ fpu__unlazy_stopped(fpu);
xsave = &fpu->state.xsave;
@@ -480,9 +465,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
+ fpu__unlazy_stopped(fpu);
xsave = &fpu->state.xsave;
@@ -643,11 +626,8 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
{
struct fpu *fpu = &target->thread.fpu;
struct user_i387_ia32_struct env;
- int ret;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
+ fpu__unlazy_stopped(fpu);
if (!static_cpu_has(X86_FEATURE_FPU))
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -677,9 +657,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;
- ret = fpu__unlazy_stopped(fpu);
- if (ret)
- return ret;
+ fpu__unlazy_stopped(fpu);
sanitize_i387_state(target);
--
2.1.0
Now that fpstate_init() cannot fail the error return of fx_init()
has lost its purpose. Eliminate the error return and propagate this
change to all callers.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/x86.c | 14 +++-----------
2 files changed, 3 insertions(+), 13 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dea2e7e962e3..c29e61a8d6d4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -999,8 +999,6 @@ void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
-int fx_init(struct kvm_vcpu *vcpu);
-
void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
const u8 *new, int bytes);
int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 74b53c314da0..92a8490cc69d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7004,10 +7004,8 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
return 0;
}
-int fx_init(struct kvm_vcpu *vcpu)
+static void fx_init(struct kvm_vcpu *vcpu)
{
- int err;
-
fpstate_init(&vcpu->arch.guest_fpu);
if (cpu_has_xsaves)
vcpu->arch.guest_fpu.state.xsave.header.xcomp_bv =
@@ -7019,10 +7017,7 @@ int fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.xcr0 = XSTATE_FP;
vcpu->arch.cr0 |= X86_CR0_ET;
-
- return 0;
}
-EXPORT_SYMBOL_GPL(fx_init);
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
@@ -7341,9 +7336,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
goto fail_free_mce_banks;
}
- r = fx_init(vcpu);
- if (r)
- goto fail_free_wbinvd_dirty_mask;
+ fx_init(vcpu);
vcpu->arch.ia32_tsc_adjust_msr = 0x0;
vcpu->arch.pv_time_enabled = false;
@@ -7357,8 +7350,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
kvm_pmu_init(vcpu);
return 0;
-fail_free_wbinvd_dirty_mask:
- free_cpumask_var(vcpu->arch.wbinvd_dirty_mask);
+
fail_free_mce_banks:
kfree(vcpu->arch.mce_banks);
fail_free_lapic:
--
2.1.0
Now that fpstate_init_curr() is not doing implicit allocations
anymore, almost all uses of it involve a very simple pattern:
if (!fpu->fpstate_active)
fpstate_init_curr(fpu);
which is basically activating the FPU fpstate if it was not active
before.
So propagate the check into the function itself, and rename the
function according to its new purpose:
fpu__activate_curr(fpu);
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/kernel/fpu/core.c | 21 +++++++++++----------
arch/x86/kernel/fpu/xsave.c | 3 +--
arch/x86/kvm/x86.c | 3 +--
arch/x86/math-emu/fpu_entry.c | 3 +--
5 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 1345ab3dd273..de19fc53f54e 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -44,7 +44,7 @@ extern void fpu__init_system_xstate(void);
extern void fpu__init_cpu_xstate(void);
extern void fpu__init_system(struct cpuinfo_x86 *c);
-extern void fpstate_init_curr(struct fpu *fpu);
+extern void fpu__activate_curr(struct fpu *fpu);
extern void fpstate_init(struct fpu *fpu);
extern void fpu__clear(struct task_struct *tsk);
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index b8b03547072d..3221603d79bc 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -259,19 +259,21 @@ int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
}
/*
- * Initialize the current task's in-memory FPU context:
+ * Activate the current task's in-memory FPU context,
+ * if it has not been used before:
*/
-void fpstate_init_curr(struct fpu *fpu)
+void fpu__activate_curr(struct fpu *fpu)
{
WARN_ON_ONCE(fpu != ¤t->thread.fpu);
- WARN_ON_ONCE(fpu->fpstate_active);
- fpstate_init(fpu);
+ if (!fpu->fpstate_active) {
+ fpstate_init(fpu);
- /* Safe to do for the current task: */
- fpu->fpstate_active = 1;
+ /* Safe to do for the current task: */
+ fpu->fpstate_active = 1;
+ }
}
-EXPORT_SYMBOL_GPL(fpstate_init_curr);
+EXPORT_SYMBOL_GPL(fpu__activate_curr);
/*
* This function is called before we modify a stopped child's
@@ -325,8 +327,7 @@ void fpu__restore(void)
struct task_struct *tsk = current;
struct fpu *fpu = &tsk->thread.fpu;
- if (!fpu->fpstate_active)
- fpstate_init_curr(fpu);
+ fpu__activate_curr(fpu);
/* Avoid __kernel_fpu_begin() right after fpregs_activate() */
kernel_fpu_disable();
@@ -352,7 +353,7 @@ void fpu__clear(struct task_struct *tsk)
drop_fpu(fpu);
} else {
if (!fpu->fpstate_active) {
- fpstate_init_curr(fpu);
+ fpu__activate_curr(fpu);
user_fpu_begin();
restore_init_xstate();
}
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xsave.c
index 49d9f3dcc2ea..f549e2a44336 100644
--- a/arch/x86/kernel/fpu/xsave.c
+++ b/arch/x86/kernel/fpu/xsave.c
@@ -358,8 +358,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
if (!access_ok(VERIFY_READ, buf, size))
return -EACCES;
- if (!fpu->fpstate_active)
- fpstate_init_curr(fpu);
+ fpu__activate_curr(fpu);
if (!static_cpu_has(X86_FEATURE_FPU))
return fpregs_soft_set(current, NULL,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 92a8490cc69d..9ff4df77e069 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6601,8 +6601,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
int r;
sigset_t sigsaved;
- if (!fpu->fpstate_active)
- fpstate_init_curr(fpu);
+ fpu__activate_curr(fpu);
if (vcpu->sigset_active)
sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
diff --git a/arch/x86/math-emu/fpu_entry.c b/arch/x86/math-emu/fpu_entry.c
index 4c6ab791d0e5..5b850514eb68 100644
--- a/arch/x86/math-emu/fpu_entry.c
+++ b/arch/x86/math-emu/fpu_entry.c
@@ -149,8 +149,7 @@ void math_emulate(struct math_emu_info *info)
struct desc_struct code_descriptor;
struct fpu *fpu = ¤t->thread.fpu;
- if (!fpu->fpstate_active)
- fpstate_init_curr(fpu);
+ fpu__activate_curr(fpu);
#ifdef RE_ENTRANT_CHECKING
if (emulating) {
--
2.1.0
In line with the fpstate_activate() change, name
fpu__unlazy_stopped() in a similar fashion as well: its purpose
is to make the fpstate of a stopped task the current and active FPU
context, which may require unlazying and initialization.
The unlazying is just part of the job, the main concept is to make
the fpstate active.
Also clarify the function's description to clarify its exact
usage and the background behind it all.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 31 ++++++++++++++++---------------
1 file changed, 16 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3221603d79bc..2fb7a77872cd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -276,29 +276,30 @@ void fpu__activate_curr(struct fpu *fpu)
EXPORT_SYMBOL_GPL(fpu__activate_curr);
/*
- * This function is called before we modify a stopped child's
- * FPU state context.
+ * This function must be called before we modify a stopped child's
+ * fpstate.
*
* If the child has not used the FPU before then initialize its
- * FPU context.
+ * fpstate.
*
* If the child has used the FPU before then unlazy it.
*
- * [ After this function call, after the context is modified and
- * the child task is woken up, the child task will restore
- * the modified FPU state from the modified context. If we
+ * [ After this function call, after registers in the fpstate are
+ * modified and the child task has woken up, the child task will
+ * restore the modified FPU state from the modified context. If we
* didn't clear its lazy status here then the lazy in-registers
- * state pending on its former CPU could be restored, losing
+ * state pending on its former CPU could be restored, corrupting
* the modifications. ]
*
* This function is also called before we read a stopped child's
- * FPU state - to make sure it's modified.
+ * FPU state - to make sure it's initialized if the child has
+ * no active FPU state.
*
* TODO: A future optimization would be to skip the unlazying in
* the read-only case, it's not strictly necessary for
* read-only access to the context.
*/
-static void fpu__unlazy_stopped(struct fpu *child_fpu)
+static void fpu__activate_stopped(struct fpu *child_fpu)
{
WARN_ON_ONCE(child_fpu == ¤t->thread.fpu);
@@ -388,7 +389,7 @@ int xfpregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
sanitize_i387_state(target);
return user_regset_copyout(&pos, &count, &kbuf, &ubuf,
@@ -405,7 +406,7 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_fxsr)
return -ENODEV;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
sanitize_i387_state(target);
ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf,
@@ -437,7 +438,7 @@ int xstateregs_get(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
xsave = &fpu->state.xsave;
@@ -466,7 +467,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
if (!cpu_has_xsave)
return -ENODEV;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
xsave = &fpu->state.xsave;
@@ -628,7 +629,7 @@ int fpregs_get(struct task_struct *target, const struct user_regset *regset,
struct fpu *fpu = &target->thread.fpu;
struct user_i387_ia32_struct env;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
if (!static_cpu_has(X86_FEATURE_FPU))
return fpregs_soft_get(target, regset, pos, count, kbuf, ubuf);
@@ -658,7 +659,7 @@ int fpregs_set(struct task_struct *target, const struct user_regset *regset,
struct user_i387_ia32_struct env;
int ret;
- fpu__unlazy_stopped(fpu);
+ fpu__activate_stopped(fpu);
sanitize_i387_state(target);
--
2.1.0
We have repeat patterns of:
if (!use_eager_fpu())
clts();
... to activate FPU registers, and:
if (!use_eager_fpu())
stts();
... to deactivate them.
Encapsulate these in:
__fpregs_activate_hw();
__fpregs_activate_hw();
and use them accordingly.
Doing this synchronizes the idiom with the fpu->fpregs_active
software-flag's handling functions, creating clear patterns of:
__fpregs_activate_hw();
__fpregs_activate(fpu);
etc., which improves readability.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/internal.h | 32 ++++++++++++++++++++++++--------
arch/x86/kernel/fpu/core.c | 7 +++----
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index de19fc53f54e..28556c6671c3 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -324,14 +324,31 @@ static inline int restore_fpu_checking(struct fpu *fpu)
return fpu_restore_checking(fpu);
}
-/* Must be paired with an 'stts' after! */
+/*
+ * Wrap lazy FPU TS handling in a 'hw fpregs activation/deactivation'
+ * idiom, which is then paired with the sw-flag (fpregs_active) later on:
+ */
+
+static inline void __fpregs_activate_hw(void)
+{
+ if (!use_eager_fpu())
+ clts();
+}
+
+static inline void __fpregs_deactivate_hw(void)
+{
+ if (!use_eager_fpu())
+ stts();
+}
+
+/* Must be paired with an 'stts' (fpregs_deactivate_hw()) after! */
static inline void __fpregs_deactivate(struct fpu *fpu)
{
fpu->fpregs_active = 0;
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
}
-/* Must be paired with a 'clts' before! */
+/* Must be paired with a 'clts' (fpregs_activate_hw()) before! */
static inline void __fpregs_activate(struct fpu *fpu)
{
fpu->fpregs_active = 1;
@@ -362,16 +379,14 @@ static inline int user_has_fpu(void)
*/
static inline void fpregs_activate(struct fpu *fpu)
{
- if (!use_eager_fpu())
- clts();
+ __fpregs_activate_hw();
__fpregs_activate(fpu);
}
static inline void fpregs_deactivate(struct fpu *fpu)
{
__fpregs_deactivate(fpu);
- if (!use_eager_fpu())
- stts();
+ __fpregs_deactivate_hw();
}
static inline void drop_fpu(struct fpu *fpu)
@@ -455,8 +470,9 @@ switch_fpu_prepare(struct fpu *old_fpu, struct fpu *new_fpu, int cpu)
new_fpu->counter++;
__fpregs_activate(new_fpu);
prefetch(&new_fpu->state);
- } else if (!use_eager_fpu())
- stts();
+ } else {
+ __fpregs_deactivate_hw();
+ }
} else {
old_fpu->counter = 0;
old_fpu->last_cpu = -1;
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 2fb7a77872cd..43689660d71c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -105,8 +105,7 @@ void __kernel_fpu_begin(void)
copy_fpregs_to_fpstate(fpu);
} else {
this_cpu_write(fpu_fpregs_owner_ctx, NULL);
- if (!use_eager_fpu())
- clts();
+ __fpregs_activate_hw();
}
}
EXPORT_SYMBOL(__kernel_fpu_begin);
@@ -118,8 +117,8 @@ void __kernel_fpu_end(void)
if (fpu->fpregs_active) {
if (WARN_ON(restore_fpu_checking(fpu)))
fpu_reset_state(fpu);
- } else if (!use_eager_fpu()) {
- stts();
+ } else {
+ __fpregs_deactivate_hw();
}
kernel_fpu_enable();
--
2.1.0
__save_fpu() has this pattern:
if (unlikely(system_state == SYSTEM_BOOTING))
xsave_state_booting(&fpu->state.xsave);
else
xsave_state(&fpu->state.xsave);
... but it does not actually get called during system bootup.
So remove the complication and always call xsave_state().
To make sure this assumption is correct, add a WARN_ONCE()
debug check to xsave_state().
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/xsave.h | 2 ++
arch/x86/kernel/fpu/core.c | 5 +----
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xsave.h
index a10e66582c1b..2f2ed322263f 100644
--- a/arch/x86/include/asm/fpu/xsave.h
+++ b/arch/x86/include/asm/fpu/xsave.h
@@ -133,6 +133,8 @@ static inline int xsave_state(struct xsave_struct *fx)
u32 hmask = mask >> 32;
int err = 0;
+ WARN_ON(system_state == SYSTEM_BOOTING);
+
/*
* If xsaves is enabled, xsaves replaces xsaveopt because
* it supports compact format and supervisor states in addition to
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 43689660d71c..a0e2b65745da 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -172,10 +172,7 @@ EXPORT_SYMBOL_GPL(irq_ts_restore);
static void __save_fpu(struct fpu *fpu)
{
if (use_xsave()) {
- if (unlikely(system_state == SYSTEM_BOOTING))
- xsave_state_booting(&fpu->state.xsave);
- else
- xsave_state(&fpu->state.xsave);
+ xsave_state(&fpu->state.xsave);
} else {
fpu_fxsave(fpu);
}
--
2.1.0
The current implementation of __save_fpu():
if (use_xsave()) {
xsave_state(&fpu->state.xsave);
} else {
fpu_fxsave(fpu);
}
Is actually a simplified version of copy_fpregs_to_fpstate(),
if use_eager_fpu() is true.
But all call sites of __save_fpu() call it only it when use_eager_fpu()
is true.
So we can eliminate __save_fpu() altogether and use the standard
copy_fpregs_to_fpstate() function. This cleans up the code
by making it use fewer variants of FPU register saving.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index a0e2b65745da..478e002ab122 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -169,15 +169,6 @@ void irq_ts_restore(int TS_state)
}
EXPORT_SYMBOL_GPL(irq_ts_restore);
-static void __save_fpu(struct fpu *fpu)
-{
- if (use_xsave()) {
- xsave_state(&fpu->state.xsave);
- } else {
- fpu_fxsave(fpu);
- }
-}
-
/*
* Save the FPU state (initialize it if necessary):
*
@@ -190,7 +181,7 @@ void fpu__save(struct fpu *fpu)
preempt_disable();
if (fpu->fpregs_active) {
if (use_eager_fpu()) {
- __save_fpu(fpu);
+ copy_fpregs_to_fpstate(fpu);
} else {
copy_fpregs_to_fpstate(fpu);
fpregs_deactivate(fpu);
@@ -235,7 +226,7 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
if (use_eager_fpu()) {
memset(&dst_fpu->state.xsave, 0, xstate_size);
- __save_fpu(dst_fpu);
+ copy_fpregs_to_fpstate(dst_fpu);
} else {
fpu__save(src_fpu);
memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
--
2.1.0
Factor out a common call.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 478e002ab122..1e79a6b3fc27 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -180,12 +180,9 @@ void fpu__save(struct fpu *fpu)
preempt_disable();
if (fpu->fpregs_active) {
- if (use_eager_fpu()) {
- copy_fpregs_to_fpstate(fpu);
- } else {
- copy_fpregs_to_fpstate(fpu);
+ copy_fpregs_to_fpstate(fpu);
+ if (!use_eager_fpu())
fpregs_deactivate(fpu);
- }
}
preempt_enable();
}
--
2.1.0
So fpu__save() does this currently:
copy_fpregs_to_fpstate(fpu);
if (!use_eager_fpu())
fpregs_deactivate(fpu);
... which deactivates the FPU on lazy switching systems unconditionally.
Both usecases of fpu__save() use this function to save the
FPU state into a fpstate: fork()/clone() and math error signal handling.
The unconditional disabling of FPU registers in the lazy switching
case is probably a mistaken conversion of old FNSAVE code (that had
to disable FPU registers).
So speed up this code by only disabling FPU registers when absolutely
necessary: when indicated by the copy_fpregs_to_fpstate() return
code:
if (!copy_fpregs_to_fpstate(fpu))
fpregs_deactivate(fpu);
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1e79a6b3fc27..8835b802aa16 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -170,7 +170,7 @@ void irq_ts_restore(int TS_state)
EXPORT_SYMBOL_GPL(irq_ts_restore);
/*
- * Save the FPU state (initialize it if necessary):
+ * Save the FPU state (mark it for reload if necessary):
*
* This only ever gets called for the current task.
*/
@@ -180,8 +180,7 @@ void fpu__save(struct fpu *fpu)
preempt_disable();
if (fpu->fpregs_active) {
- copy_fpregs_to_fpstate(fpu);
- if (!use_eager_fpu())
+ if (!copy_fpregs_to_fpstate(fpu))
fpregs_deactivate(fpu);
}
preempt_enable();
--
2.1.0
Optimize fpu_copy() a bit by expanding the ->fpstate_active == 1
portion of fpu__save() into it.
( The main purpose of this change is to enable another, larger
optimization that will be done in the next patch. )
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 8835b802aa16..7fdeabc1f2af 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -224,7 +224,10 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
memset(&dst_fpu->state.xsave, 0, xstate_size);
copy_fpregs_to_fpstate(dst_fpu);
} else {
- fpu__save(src_fpu);
+ preempt_disable();
+ if (!copy_fpregs_to_fpstate(src_fpu))
+ fpregs_deactivate(src_fpu);
+ preempt_enable();
memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
}
}
--
2.1.0
The current fpu_copy() code on lazy switching CPUs always saves
into the current fpstate and then copies it over into the child
context:
preempt_disable();
if (!copy_fpregs_to_fpstate(src_fpu))
fpregs_deactivate(src_fpu);
preempt_enable();
memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
That memcpy() can be avoided on all lazy switching setups except
really old FNSAVE-only systems: change fpu_copy() to directly save
into the child context, for both the lazy and the eager context
switching case.
Note that we still have to do a memcpy() back into the parent
context in the FNSAVE case, but this won't be executed on the
majority of x86 systems that got built in the last 10 years or so.
Reviewed-by: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/core.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 7fdeabc1f2af..8ae4c2450c2b 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -220,16 +220,35 @@ static void fpu_copy(struct fpu *dst_fpu, struct fpu *src_fpu)
{
WARN_ON(src_fpu != ¤t->thread.fpu);
- if (use_eager_fpu()) {
+ /*
+ * Don't let 'init optimized' areas of the XSAVE area
+ * leak into the child task:
+ */
+ if (use_eager_fpu())
memset(&dst_fpu->state.xsave, 0, xstate_size);
- copy_fpregs_to_fpstate(dst_fpu);
- } else {
- preempt_disable();
- if (!copy_fpregs_to_fpstate(src_fpu))
- fpregs_deactivate(src_fpu);
- preempt_enable();
- memcpy(&dst_fpu->state, &src_fpu->state, xstate_size);
+
+ /*
+ * Save current FPU registers directly into the child
+ * FPU context, without any memory-to-memory copying.
+ *
+ * If the FPU context got destroyed in the process (FNSAVE
+ * done on old CPUs) then copy it back into the source
+ * context and mark the current task for lazy restore.
+ *
+ * We have to do all this with preemption disabled,
+ * mostly because of the FNSAVE case, because in that
+ * case we must not allow preemption in the window
+ * between the FNSAVE and us marking the context lazy.
+ *
+ * It shouldn't be an issue as even FNSAVE is plenty
+ * fast in terms of critical section length.
+ */
+ preempt_disable();
+ if (!copy_fpregs_to_fpstate(dst_fpu)) {
+ memcpy(&src_fpu->state, &dst_fpu->state, xstate_size);
+ fpregs_deactivate(src_fpu);
}
+ preempt_enable();
}
int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu)
--
2.1.0
'xsave' is an x86 instruction name to most people - but xsave.h is
about a lot more than just the XSAVE instruction: it includes
definitions and support, both internal and external, related to
xstate and xfeatures support.
As a first step in cleaning up the various xstate uses rename this
header to 'fpu/xstate.h' to better reflect what this header file
is about.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/camellia_aesni_avx2_glue.c | 2 +-
arch/x86/crypto/camellia_aesni_avx_glue.c | 2 +-
arch/x86/crypto/cast5_avx_glue.c | 2 +-
arch/x86/crypto/cast6_avx_glue.c | 2 +-
arch/x86/crypto/serpent_avx2_glue.c | 2 +-
arch/x86/crypto/serpent_avx_glue.c | 2 +-
arch/x86/crypto/sha-mb/sha1_mb.c | 2 +-
arch/x86/crypto/sha1_ssse3_glue.c | 2 +-
arch/x86/crypto/sha256_ssse3_glue.c | 2 +-
arch/x86/crypto/sha512_ssse3_glue.c | 2 +-
arch/x86/crypto/twofish_avx_glue.c | 2 +-
arch/x86/include/asm/fpu/internal.h | 2 +-
arch/x86/include/asm/fpu/{xsave.h => xstate.h} | 0
arch/x86/kvm/cpuid.c | 2 +-
14 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 004acd7bb4e0..46484482f267 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -20,7 +20,7 @@
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/camellia.h>
#include <asm/crypto/glue_helper.h>
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 2f7ead8caf53..0122cd95563a 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -20,7 +20,7 @@
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/camellia.h>
#include <asm/crypto/glue_helper.h>
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index 2c3360be6fc8..ca4f32e7a423 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -32,7 +32,7 @@
#include <crypto/cryptd.h>
#include <crypto/ctr.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/glue_helper.h>
#define CAST5_PARALLEL_BLOCKS 16
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index a2ec18a56e4f..21d0b845c8c4 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -37,7 +37,7 @@
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/glue_helper.h>
#define CAST6_PARALLEL_BLOCKS 8
diff --git a/arch/x86/crypto/serpent_avx2_glue.c b/arch/x86/crypto/serpent_avx2_glue.c
index 206ec57725a3..aa325fa5c7a6 100644
--- a/arch/x86/crypto/serpent_avx2_glue.c
+++ b/arch/x86/crypto/serpent_avx2_glue.c
@@ -21,7 +21,7 @@
#include <crypto/xts.h>
#include <crypto/serpent.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/serpent-avx.h>
#include <asm/crypto/glue_helper.h>
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index 4feb68c9a41f..f66ae85f58fe 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -37,7 +37,7 @@
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/serpent-avx.h>
#include <asm/crypto/glue_helper.h>
diff --git a/arch/x86/crypto/sha-mb/sha1_mb.c b/arch/x86/crypto/sha-mb/sha1_mb.c
index 03ffaf8c2244..6f3f76568bd5 100644
--- a/arch/x86/crypto/sha-mb/sha1_mb.c
+++ b/arch/x86/crypto/sha-mb/sha1_mb.c
@@ -66,7 +66,7 @@
#include <crypto/crypto_wq.h>
#include <asm/byteorder.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <linux/hardirq.h>
#include <asm/fpu/internal.h>
#include "sha_mb_ctx.h"
diff --git a/arch/x86/crypto/sha1_ssse3_glue.c b/arch/x86/crypto/sha1_ssse3_glue.c
index 71ab2b35d5e0..84db12f052e8 100644
--- a/arch/x86/crypto/sha1_ssse3_glue.c
+++ b/arch/x86/crypto/sha1_ssse3_glue.c
@@ -31,7 +31,7 @@
#include <crypto/sha1_base.h>
#include <asm/fpu/api.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
asmlinkage void sha1_transform_ssse3(u32 *digest, const char *data,
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index dcbd8ea6eaaf..eb65522f02b8 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -39,7 +39,7 @@
#include <crypto/sha256_base.h>
#include <asm/fpu/api.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <linux/string.h>
asmlinkage void sha256_transform_ssse3(u32 *digest, const char *data,
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index e8836e0c1098..78914641c72b 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -37,7 +37,7 @@
#include <crypto/sha512_base.h>
#include <asm/fpu/api.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <linux/string.h>
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 3b6c8ba64f81..95434f5b705a 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -38,7 +38,7 @@
#include <crypto/xts.h>
#include <asm/fpu/api.h>
#include <asm/xcr.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include <asm/crypto/twofish.h>
#include <asm/crypto/glue_helper.h>
#include <crypto/scatterwalk.h>
diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
index 28556c6671c3..8ec785ecce81 100644
--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -17,7 +17,7 @@
#include <asm/user.h>
#include <asm/fpu/api.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#ifdef CONFIG_X86_64
# include <asm/sigcontext32.h>
diff --git a/arch/x86/include/asm/fpu/xsave.h b/arch/x86/include/asm/fpu/xstate.h
similarity index 100%
rename from arch/x86/include/asm/fpu/xsave.h
rename to arch/x86/include/asm/fpu/xstate.h
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0ce4c4f87332..2426e6530d3c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -17,7 +17,7 @@
#include <linux/vmalloc.h>
#include <linux/uaccess.h>
#include <asm/user.h>
-#include <asm/fpu/xsave.h>
+#include <asm/fpu/xstate.h>
#include "cpuid.h"
#include "lapic.h"
#include "mmu.h"
--
2.1.0
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/Makefile | 2 +-
arch/x86/kernel/fpu/{xsave.c => xstate.c} | 0
2 files changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/fpu/Makefile b/arch/x86/kernel/fpu/Makefile
index 2020a2b7a597..6ae59bccdd2f 100644
--- a/arch/x86/kernel/fpu/Makefile
+++ b/arch/x86/kernel/fpu/Makefile
@@ -2,4 +2,4 @@
# Build rules for the FPU support code:
#
-obj-y += init.o bugs.o core.o xsave.o
+obj-y += init.o bugs.o core.o xstate.o
diff --git a/arch/x86/kernel/fpu/xsave.c b/arch/x86/kernel/fpu/xstate.c
similarity index 100%
rename from arch/x86/kernel/fpu/xsave.c
rename to arch/x86/kernel/fpu/xstate.c
--
2.1.0
A lot of FPU using driver code is querying complex CPU features to be
able to figure out whether a given set of xstate features is supported
by the CPU or not.
Introduce a simplified API function that can be used on any CPU type
to get this information. Also add an error string return pointer,
so that the driver can print a meaningful error message with a
standardized feature name.
Also mark xfeatures_mask as __read_only.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/api.h | 9 +++++++++
arch/x86/kernel/fpu/xstate.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h
index 62035cc1d961..1429a7c736db 100644
--- a/arch/x86/include/asm/fpu/api.h
+++ b/arch/x86/include/asm/fpu/api.h
@@ -36,4 +36,13 @@ extern bool irq_fpu_usable(void);
extern int irq_ts_save(void);
extern void irq_ts_restore(int TS_state);
+/*
+ * Query the presence of one or more xfeatures. Works on any legacy CPU as well.
+ *
+ * If 'feature_name' is set then put a human-readable description of
+ * the feature there as well - this can be used to print error (or success)
+ * messages.
+ */
+extern int cpu_has_xfeatures(u64 xfeatures_mask, const char **feature_name);
+
#endif /* _ASM_X86_FPU_API_H */
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index f549e2a44336..2e52f01f4931 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -11,10 +11,23 @@
#include <asm/tlbflush.h>
#include <asm/xcr.h>
+static const char *xfeature_names[] =
+{
+ "x87 floating point registers" ,
+ "SSE registers" ,
+ "AVX registers" ,
+ "MPX bounds registers" ,
+ "MPX CSR" ,
+ "AVX-512 opmask" ,
+ "AVX-512 Hi256" ,
+ "AVX-512 ZMM_Hi256" ,
+ "unknown xstate feature" ,
+};
+
/*
* Mask of xstate features supported by the CPU and the kernel:
*/
-u64 xfeatures_mask;
+u64 xfeatures_mask __read_mostly;
/*
* Represents init state for the supported extended state.
@@ -29,6 +42,44 @@ static unsigned int xstate_comp_offsets[sizeof(xfeatures_mask)*8];
static unsigned int xfeatures_nr;
/*
+ * Return whether the system supports a given xfeature.
+ *
+ * Also return the name of the (most advanced) feature that the caller requested:
+ */
+int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
+{
+ u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask;
+
+ if (unlikely(feature_name)) {
+ long xfeature_idx, max_idx;
+ u64 xfeatures_print;
+ /*
+ * So we use FLS here to be able to print the most advanced
+ * feature that was requested but is missing. So if a driver
+ * asks about "XSTATE_SSE | XSTATE_YMM" we'll print the
+ * missing AVX feature - this is the most informative message
+ * to users:
+ */
+ if (xfeatures_missing)
+ xfeatures_print = xfeatures_missing;
+ else
+ xfeatures_print = xfeatures_needed;
+
+ xfeature_idx = fls64(xfeatures_print)-1;
+ max_idx = ARRAY_SIZE(xfeature_names)-1;
+ xfeature_idx = min(xfeature_idx, max_idx);
+
+ *feature_name = xfeature_names[xfeature_idx];
+ }
+
+ if (xfeatures_missing)
+ return 0;
+
+ return 1;
+}
+EXPORT_SYMBOL_GPL(cpu_has_xfeatures);
+
+/*
* When executing XSAVEOPT (optimized XSAVE), if a processor implementation
* detects that an FPU state component is still (or is again) in its
* initialized state, it may clear the corresponding bit in the header.xfeatures
--
2.1.0
We do a boot time printout of xfeatures in print_xstate_features(),
simplify this code to make use of the recently introduced cpu_has_xfeature()
method.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++-------------
1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 2e52f01f4931..0f849229c93b 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -545,13 +545,12 @@ static void __init setup_xstate_features(void)
} while (1);
}
-static void print_xstate_feature(u64 xstate_mask, const char *desc)
+static void print_xstate_feature(u64 xstate_mask)
{
- if (xfeatures_mask & xstate_mask) {
- int xstate_feature = fls64(xstate_mask)-1;
+ const char *feature_name;
- pr_info("x86/fpu: Supporting XSAVE feature %2d: '%s'\n", xstate_feature, desc);
- }
+ if (cpu_has_xfeatures(xstate_mask, &feature_name))
+ pr_info("x86/fpu: Supporting XSAVE feature 0x%02Lx: '%s'\n", xstate_mask, feature_name);
}
/*
@@ -559,14 +558,14 @@ static void print_xstate_feature(u64 xstate_mask, const char *desc)
*/
static void print_xstate_features(void)
{
- print_xstate_feature(XSTATE_FP, "x87 floating point registers");
- print_xstate_feature(XSTATE_SSE, "SSE registers");
- print_xstate_feature(XSTATE_YMM, "AVX registers");
- print_xstate_feature(XSTATE_BNDREGS, "MPX bounds registers");
- print_xstate_feature(XSTATE_BNDCSR, "MPX CSR");
- print_xstate_feature(XSTATE_OPMASK, "AVX-512 opmask");
- print_xstate_feature(XSTATE_ZMM_Hi256, "AVX-512 Hi256");
- print_xstate_feature(XSTATE_Hi16_ZMM, "AVX-512 ZMM_Hi256");
+ print_xstate_feature(XSTATE_FP);
+ print_xstate_feature(XSTATE_SSE);
+ print_xstate_feature(XSTATE_YMM);
+ print_xstate_feature(XSTATE_BNDREGS);
+ print_xstate_feature(XSTATE_BNDCSR);
+ print_xstate_feature(XSTATE_OPMASK);
+ print_xstate_feature(XSTATE_ZMM_Hi256);
+ print_xstate_feature(XSTATE_Hi16_ZMM);
}
/*
--
2.1.0
Transform the xstate masks into an enumerated list of xfeature bits.
This removes the hard coding of XFEATURES_NR_MAX.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/xstate.h | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 2f2ed322263f..9b2869dea490 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -4,25 +4,39 @@
#include <linux/types.h>
#include <asm/processor.h>
-#define XSTATE_CPUID 0x0000000d
-
-#define XSTATE_FP 0x1
-#define XSTATE_SSE 0x2
-#define XSTATE_YMM 0x4
-#define XSTATE_BNDREGS 0x8
-#define XSTATE_BNDCSR 0x10
-#define XSTATE_OPMASK 0x20
-#define XSTATE_ZMM_Hi256 0x40
-#define XSTATE_Hi16_ZMM 0x80
+/*
+ * List of XSAVE features Linux knows about:
+ */
+enum xfeature_bit {
+ XSTATE_BIT_FP,
+ XSTATE_BIT_SSE,
+ XSTATE_BIT_YMM,
+ XSTATE_BIT_BNDREGS,
+ XSTATE_BIT_BNDCSR,
+ XSTATE_BIT_OPMASK,
+ XSTATE_BIT_ZMM_Hi256,
+ XSTATE_BIT_Hi16_ZMM,
+
+ XFEATURES_NR_MAX,
+};
+
+#define XSTATE_FP (1 << XSTATE_BIT_FP)
+#define XSTATE_SSE (1 << XSTATE_BIT_SSE)
+#define XSTATE_YMM (1 << XSTATE_BIT_YMM)
+#define XSTATE_BNDREGS (1 << XSTATE_BIT_BNDREGS)
+#define XSTATE_BNDCSR (1 << XSTATE_BIT_BNDCSR)
+#define XSTATE_OPMASK (1 << XSTATE_BIT_OPMASK)
+#define XSTATE_ZMM_Hi256 (1 << XSTATE_BIT_ZMM_Hi256)
+#define XSTATE_Hi16_ZMM (1 << XSTATE_BIT_Hi16_ZMM)
-/* The highest xstate bit above (of XSTATE_Hi16_ZMM): */
-#define XFEATURES_NR_MAX 8
#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
#define XSTATE_AVX512 (XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM)
/* Bit 63 of XCR0 is reserved for future expansion */
#define XSTATE_EXTEND_MASK (~(XSTATE_FPSSE | (1ULL << 63)))
+#define XSTATE_CPUID 0x0000000d
+
#define FXSAVE_SIZE 512
#define XSAVE_HDR_SIZE 64
--
2.1.0
So xsave.h is an internal header that FPU using drivers commonly include,
to get access to the xstate feature names, amongst other things.
Move these type definitions to fpu/fpu.h to allow simplification
of FPU using driver code.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fpu/types.h | 28 ++++++++++++++++++++++++++++
arch/x86/include/asm/fpu/xstate.h | 28 ----------------------------
2 files changed, 28 insertions(+), 28 deletions(-)
diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index 3a15ac6032eb..006ec2975f6f 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -79,6 +79,34 @@ struct i387_soft_struct {
};
/*
+ * List of XSAVE features Linux knows about:
+ */
+enum xfeature_bit {
+ XSTATE_BIT_FP,
+ XSTATE_BIT_SSE,
+ XSTATE_BIT_YMM,
+ XSTATE_BIT_BNDREGS,
+ XSTATE_BIT_BNDCSR,
+ XSTATE_BIT_OPMASK,
+ XSTATE_BIT_ZMM_Hi256,
+ XSTATE_BIT_Hi16_ZMM,
+
+ XFEATURES_NR_MAX,
+};
+
+#define XSTATE_FP (1 << XSTATE_BIT_FP)
+#define XSTATE_SSE (1 << XSTATE_BIT_SSE)
+#define XSTATE_YMM (1 << XSTATE_BIT_YMM)
+#define XSTATE_BNDREGS (1 << XSTATE_BIT_BNDREGS)
+#define XSTATE_BNDCSR (1 << XSTATE_BIT_BNDCSR)
+#define XSTATE_OPMASK (1 << XSTATE_BIT_OPMASK)
+#define XSTATE_ZMM_Hi256 (1 << XSTATE_BIT_ZMM_Hi256)
+#define XSTATE_Hi16_ZMM (1 << XSTATE_BIT_Hi16_ZMM)
+
+#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
+#define XSTATE_AVX512 (XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM)
+
+/*
* There are 16x 256-bit AVX registers named YMM0-YMM15.
* The low 128 bits are aliased to the 16 SSE registers (XMM0-XMM15)
* and are stored in 'struct i387_fxsave_struct::xmm_space[]'.
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 9b2869dea490..31a002ad5aeb 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -4,34 +4,6 @@
#include <linux/types.h>
#include <asm/processor.h>
-/*
- * List of XSAVE features Linux knows about:
- */
-enum xfeature_bit {
- XSTATE_BIT_FP,
- XSTATE_BIT_SSE,
- XSTATE_BIT_YMM,
- XSTATE_BIT_BNDREGS,
- XSTATE_BIT_BNDCSR,
- XSTATE_BIT_OPMASK,
- XSTATE_BIT_ZMM_Hi256,
- XSTATE_BIT_Hi16_ZMM,
-
- XFEATURES_NR_MAX,
-};
-
-#define XSTATE_FP (1 << XSTATE_BIT_FP)
-#define XSTATE_SSE (1 << XSTATE_BIT_SSE)
-#define XSTATE_YMM (1 << XSTATE_BIT_YMM)
-#define XSTATE_BNDREGS (1 << XSTATE_BIT_BNDREGS)
-#define XSTATE_BNDCSR (1 << XSTATE_BIT_BNDCSR)
-#define XSTATE_OPMASK (1 << XSTATE_BIT_OPMASK)
-#define XSTATE_ZMM_Hi256 (1 << XSTATE_BIT_ZMM_Hi256)
-#define XSTATE_Hi16_ZMM (1 << XSTATE_BIT_Hi16_ZMM)
-
-
-#define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE)
-#define XSTATE_AVX512 (XSTATE_OPMASK | XSTATE_ZMM_Hi256 | XSTATE_Hi16_ZMM)
/* Bit 63 of XCR0 is reserved for future expansion */
#define XSTATE_EXTEND_MASK (~(XSTATE_FPSSE | (1ULL << 63)))
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit:
text data bss dec hex filename
2128 2896 0 5024 13a0 camellia_aesni_avx_glue.o.before
2067 2896 0 4963 1363 camellia_aesni_avx_glue.o.after
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/camellia_aesni_avx_glue.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
index 0122cd95563a..80a0e4389c9a 100644
--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
@@ -19,8 +19,7 @@
#include <crypto/ctr.h>
#include <crypto/lrw.h>
#include <crypto/xts.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
#include <asm/crypto/camellia.h>
#include <asm/crypto/glue_helper.h>
@@ -553,16 +552,10 @@ static struct crypto_alg cmll_algs[10] = { {
static int __init camellia_aesni_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx || !cpu_has_aes || !cpu_has_osxsave) {
- pr_info("AVX or AES-NI instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/sha256_ssse3_glue.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index eb65522f02b8..f8097fc0d1d1 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -38,8 +38,6 @@
#include <crypto/sha.h>
#include <crypto/sha256_base.h>
#include <asm/fpu/api.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
#include <linux/string.h>
asmlinkage void sha256_transform_ssse3(u32 *digest, const char *data,
@@ -132,15 +130,9 @@ static struct shash_alg algs[] = { {
#ifdef CONFIG_AS_AVX
static bool __init avx_usable(void)
{
- u64 xcr0;
-
- if (!cpu_has_avx || !cpu_has_osxsave)
- return false;
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX detected but unusable.\n");
-
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, NULL)) {
+ if (cpu_has_avx)
+ pr_info("AVX detected but unusable.\n");
return false;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/camellia_aesni_avx2_glue.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/camellia_aesni_avx2_glue.c b/arch/x86/crypto/camellia_aesni_avx2_glue.c
index 46484482f267..76ea7df217e6 100644
--- a/arch/x86/crypto/camellia_aesni_avx2_glue.c
+++ b/arch/x86/crypto/camellia_aesni_avx2_glue.c
@@ -19,8 +19,7 @@
#include <crypto/ctr.h>
#include <crypto/lrw.h>
#include <crypto/xts.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
#include <asm/crypto/camellia.h>
#include <asm/crypto/glue_helper.h>
@@ -561,16 +560,10 @@ static struct crypto_alg cmll_algs[10] = { {
static int __init camellia_aesni_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx2 || !cpu_has_avx || !cpu_has_aes || !cpu_has_osxsave) {
- pr_info("AVX2 or AES-NI instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX2 detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/twofish_avx_glue.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/twofish_avx_glue.c b/arch/x86/crypto/twofish_avx_glue.c
index 95434f5b705a..c2bd0ce718ee 100644
--- a/arch/x86/crypto/twofish_avx_glue.c
+++ b/arch/x86/crypto/twofish_avx_glue.c
@@ -37,8 +37,6 @@
#include <crypto/lrw.h>
#include <crypto/xts.h>
#include <asm/fpu/api.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
#include <asm/crypto/twofish.h>
#include <asm/crypto/glue_helper.h>
#include <crypto/scatterwalk.h>
@@ -558,16 +556,10 @@ static struct crypto_alg twofish_algs[10] = { {
static int __init twofish_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx || !cpu_has_osxsave) {
- printk(KERN_INFO "AVX instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- printk(KERN_INFO "AVX detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/serpent_avx_glue.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/serpent_avx_glue.c b/arch/x86/crypto/serpent_avx_glue.c
index f66ae85f58fe..da7dafc9b16d 100644
--- a/arch/x86/crypto/serpent_avx_glue.c
+++ b/arch/x86/crypto/serpent_avx_glue.c
@@ -36,8 +36,7 @@
#include <crypto/ctr.h>
#include <crypto/lrw.h>
#include <crypto/xts.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
#include <asm/crypto/serpent-avx.h>
#include <asm/crypto/glue_helper.h>
@@ -596,16 +595,10 @@ static struct crypto_alg serpent_algs[10] = { {
static int __init serpent_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx || !cpu_has_osxsave) {
- printk(KERN_INFO "AVX instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- printk(KERN_INFO "AVX detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/cast5_avx_glue.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/cast5_avx_glue.c b/arch/x86/crypto/cast5_avx_glue.c
index ca4f32e7a423..be00aa48b2b5 100644
--- a/arch/x86/crypto/cast5_avx_glue.c
+++ b/arch/x86/crypto/cast5_avx_glue.c
@@ -31,8 +31,7 @@
#include <crypto/cast5.h>
#include <crypto/cryptd.h>
#include <crypto/ctr.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
#include <asm/crypto/glue_helper.h>
#define CAST5_PARALLEL_BLOCKS 16
@@ -468,16 +467,10 @@ static struct crypto_alg cast5_algs[6] = { {
static int __init cast5_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx || !cpu_has_osxsave) {
- pr_info("AVX instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit.
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/sha512_ssse3_glue.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/sha512_ssse3_glue.c b/arch/x86/crypto/sha512_ssse3_glue.c
index 78914641c72b..2edad7b81870 100644
--- a/arch/x86/crypto/sha512_ssse3_glue.c
+++ b/arch/x86/crypto/sha512_ssse3_glue.c
@@ -36,8 +36,6 @@
#include <crypto/sha.h>
#include <crypto/sha512_base.h>
#include <asm/fpu/api.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
#include <linux/string.h>
@@ -131,15 +129,9 @@ static struct shash_alg algs[] = { {
#ifdef CONFIG_AS_AVX
static bool __init avx_usable(void)
{
- u64 xcr0;
-
- if (!cpu_has_avx || !cpu_has_osxsave)
- return false;
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX detected but unusable.\n");
-
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, NULL)) {
+ if (cpu_has_avx)
+ pr_info("AVX detected but unusable.\n");
return false;
}
--
2.1.0
Use the new 'cpu_has_xfeatures()' function to query AVX CPU support.
This has the following advantages to the driver:
- Decouples the driver from FPU internals: it's now only using <asm/fpu/api.h>.
- Removes detection complexity from the driver, no more raw XGETBV instruction
- Shrinks the code a bit:
text data bss dec hex filename
2128 2896 0 5024 13a0 camellia_aesni_avx_glue.o.before
2067 2896 0 4963 1363 camellia_aesni_avx_glue.o.after
- Standardizes feature name error message printouts across drivers
There are also advantages to the x86 FPU code: once all drivers
are decoupled from internals we can move them out of common
headers and we'll also be able to remove xcr.h.
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/crypto/cast6_avx_glue.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)
diff --git a/arch/x86/crypto/cast6_avx_glue.c b/arch/x86/crypto/cast6_avx_glue.c
index 21d0b845c8c4..5dbba7224221 100644
--- a/arch/x86/crypto/cast6_avx_glue.c
+++ b/arch/x86/crypto/cast6_avx_glue.c
@@ -36,8 +36,7 @@
#include <crypto/ctr.h>
#include <crypto/lrw.h>
#include <crypto/xts.h>
-#include <asm/xcr.h>
-#include <asm/fpu/xstate.h>
+#include <asm/fpu/api.h>
#include <asm/crypto/glue_helper.h>
#define CAST6_PARALLEL_BLOCKS 8
@@ -590,16 +589,10 @@ static struct crypto_alg cast6_algs[10] = { {
static int __init cast6_init(void)
{
- u64 xcr0;
+ const char *feature_name;
- if (!cpu_has_avx || !cpu_has_osxsave) {
- pr_info("AVX instructions are not detected.\n");
- return -ENODEV;
- }
-
- xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
- if ((xcr0 & (XSTATE_SSE | XSTATE_YMM)) != (XSTATE_SSE | XSTATE_YMM)) {
- pr_info("AVX detected but unusable.\n");
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
return -ENODEV;
}
--
2.1.0
On 05/05/2015 10:49 AM, Ingo Molnar wrote:
> 'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for?
xstate_bv is what it is called in the SDM. I'd really like to see the
nomenclature match the SDM where it's sensible because it says lots of
things like:
XSAVES does not write to any parts of the XSAVE header other
than the XSTATE_BV and XCOMP_BV fields.
It's nice to have code that does:
...->xstate_bv
to match up with that documentation IMNHO.
* Dave Hansen <[email protected]> wrote:
> On 05/05/2015 10:49 AM, Ingo Molnar wrote:
> > 'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for?
>
> xstate_bv is what it is called in the SDM. [...]
So I'm not attached to the ::xfeatures name (we could name it
xstate_mask, etc.) - but xstate_bv? It's really nonsensical IMHO - and
I wanted it to be more obvious.
We could put the SDM name into a comment, next to the field
definition? Something like, if 'xfeatures' is too long:
struct xstate_header {
u64 xfeat; /* xstate components, SDM: XSTATE_BV */
u64 xfeat_comp; /* compacted xstate components, SDM: XCOMP_BV */
u64 reserved[6];
} __attribute__((packed));
or so? Then if you grep for 'XSTATE_BV', you'll immediately see that
it's called xfeat_comp.
> [...] I'd really like to see the nomenclature match the SDM where
> it's sensible because it says lots of things like:
>
> XSAVES does not write to any parts of the XSAVE header other
> than the XSTATE_BV and XCOMP_BV fields.
>
> It's nice to have code that does:
>
> ...->xstate_bv
>
> to match up with that documentation IMNHO.
Where the SDM uses sensible names I'm all for that - but IMHO this is
not such a case.
Thanks,
Ingo
On 05/05/2015 11:16 AM, Ingo Molnar wrote:
> We could put the SDM name into a comment, next to the field
> definition? Something like, if 'xfeatures' is too long:
>
> struct xstate_header {
> u64 xfeat; /* xstate components, SDM: XSTATE_BV */
> u64 xfeat_comp; /* compacted xstate components, SDM: XCOMP_BV */
> u64 reserved[6];
> } __attribute__((packed));
When you're in the depths of the SDM and the kernel code, the fewer
context switches you have to make, the better. I say this from the
perspective of someone who's had a copy of the SDM open to xsave* for
about a month straight.
In any case, having "xfeat" and "xfeat_comp" is a bad idea. They're not
really related concepts other than their bits refer to the same states.
They should not have such similar names.
XSTATE_BV is the set of states written to the xsave area.
XCOMP_BV is essentially always XCR0 (aka pcntxt_mask, aka
xfeatures_mask) or'd with bit 63.
> From: Ingo Molnar [mailto:[email protected]] On Behalf Of Ingo
> Molnar
> Sent: Tuesday, May 05, 2015 10:51 AM
> To: [email protected]
> Cc: Andy Lutomirski; Borislav Petkov; Dave Hansen; Yu, Fenghua; H. Peter
> Anvin; Linus Torvalds; Oleg Nesterov; Thomas Gleixner
> Subject: [PATCH 151/208] x86/fpu: Introduce
> cpu_has_xfeatures(xfeatures_mask, feature_name)
>
> A lot of FPU using driver code is querying complex CPU features to be able to
> figure out whether a given set of xstate features is supported by the CPU or
> not.
>
> Introduce a simplified API function that can be used on any CPU type to get
> this information. Also add an error string return pointer, so that the driver
> can print a meaningful error message with a standardized feature name.
>
> Also mark xfeatures_mask as __read_only.
>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Fenghua Yu <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/include/asm/fpu/api.h | 9 +++++++++
> arch/x86/kernel/fpu/xstate.c | 53
> ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/fpu/api.h
> b/arch/x86/include/asm/fpu/api.h index 62035cc1d961..1429a7c736db
> 100644
> --- a/arch/x86/include/asm/fpu/api.h
> +++ b/arch/x86/include/asm/fpu/api.h
> @@ -36,4 +36,13 @@ extern bool irq_fpu_usable(void); extern int
> irq_ts_save(void); extern void irq_ts_restore(int TS_state);
>
> +/*
> + * Query the presence of one or more xfeatures. Works on any legacy CPU
> as well.
> + *
> + * If 'feature_name' is set then put a human-readable description of
> + * the feature there as well - this can be used to print error (or
> +success)
> + * messages.
> + */
> +extern int cpu_has_xfeatures(u64 xfeatures_mask, const char
> +**feature_name);
> +
> #endif /* _ASM_X86_FPU_API_H */
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index f549e2a44336..2e52f01f4931 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -11,10 +11,23 @@
> #include <asm/tlbflush.h>
> #include <asm/xcr.h>
>
> +static const char *xfeature_names[] =
> +{
> + "x87 floating point registers" ,
> + "SSE registers" ,
> + "AVX registers" ,
> + "MPX bounds registers" ,
> + "MPX CSR" ,
> + "AVX-512 opmask" ,
> + "AVX-512 Hi256" ,
> + "AVX-512 ZMM_Hi256" ,
> + "unknown xstate feature" ,
> +};
> +
> /*
> * Mask of xstate features supported by the CPU and the kernel:
> */
> -u64 xfeatures_mask;
> +u64 xfeatures_mask __read_mostly;
>
> /*
> * Represents init state for the supported extended state.
> @@ -29,6 +42,44 @@ static unsigned int
> xstate_comp_offsets[sizeof(xfeatures_mask)*8];
> static unsigned int xfeatures_nr;
>
> /*
> + * Return whether the system supports a given xfeature.
> + *
> + * Also return the name of the (most advanced) feature that the caller
> requested:
> + */
> +int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
> +{
> + u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask;
> +
> + if (unlikely(feature_name)) {
> + long xfeature_idx, max_idx;
> + u64 xfeatures_print;
> + /*
> + * So we use FLS here to be able to print the most advanced
> + * feature that was requested but is missing. So if a driver
> + * asks about "XSTATE_SSE | XSTATE_YMM" we'll print the
> + * missing AVX feature - this is the most informative message
> + * to users:
> + */
> + if (xfeatures_missing)
> + xfeatures_print = xfeatures_missing;
If the feature is missing (xfeatures_missing!=0), the xfeature_name will point to the next feature name with the higher idx in xfeatures_mask.
Is that supposed to be?
I think the "if (xfeatures_missing)" and " xfeatures_print = xfeatures_missing;" are not needed, right?
> + else
> + xfeatures_print = xfeatures_needed;
> +
> + xfeature_idx = fls64(xfeatures_print)-1;
> + max_idx = ARRAY_SIZE(xfeature_names)-1;
> + xfeature_idx = min(xfeature_idx, max_idx);
> +
> + *feature_name = xfeature_names[xfeature_idx];
> + }
> +
> + if (xfeatures_missing)
> + return 0;
> +
> + return 1;
> +}
> +EXPORT_SYMBOL_GPL(cpu_has_xfeatures);
> +
> +/*
> * When executing XSAVEOPT (optimized XSAVE), if a processor
> implementation
> * detects that an FPU state component is still (or is again) in its
> * initialized state, it may clear the corresponding bit in the header.xfeatures
> --
> 2.1.0
* Yu, Fenghua <[email protected]> wrote:
> > From: Ingo Molnar [mailto:[email protected]] On Behalf Of Ingo
> > Molnar
> > Sent: Tuesday, May 05, 2015 10:51 AM
> > To: [email protected]
> > Cc: Andy Lutomirski; Borislav Petkov; Dave Hansen; Yu, Fenghua; H. Peter
> > Anvin; Linus Torvalds; Oleg Nesterov; Thomas Gleixner
> > Subject: [PATCH 151/208] x86/fpu: Introduce
> > cpu_has_xfeatures(xfeatures_mask, feature_name)
> >
> > A lot of FPU using driver code is querying complex CPU features to be able to
> > figure out whether a given set of xstate features is supported by the CPU or
> > not.
> >
> > Introduce a simplified API function that can be used on any CPU type to get
> > this information. Also add an error string return pointer, so that the driver
> > can print a meaningful error message with a standardized feature name.
> >
> > Also mark xfeatures_mask as __read_only.
> >
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Fenghua Yu <[email protected]>
> > Cc: H. Peter Anvin <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
> > ---
> > arch/x86/include/asm/fpu/api.h | 9 +++++++++
> > arch/x86/kernel/fpu/xstate.c | 53
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/fpu/api.h
> > b/arch/x86/include/asm/fpu/api.h index 62035cc1d961..1429a7c736db
> > 100644
> > --- a/arch/x86/include/asm/fpu/api.h
> > +++ b/arch/x86/include/asm/fpu/api.h
> > @@ -36,4 +36,13 @@ extern bool irq_fpu_usable(void); extern int
> > irq_ts_save(void); extern void irq_ts_restore(int TS_state);
> >
> > +/*
> > + * Query the presence of one or more xfeatures. Works on any legacy CPU
> > as well.
> > + *
> > + * If 'feature_name' is set then put a human-readable description of
> > + * the feature there as well - this can be used to print error (or
> > +success)
> > + * messages.
> > + */
> > +extern int cpu_has_xfeatures(u64 xfeatures_mask, const char
> > +**feature_name);
> > +
> > #endif /* _ASM_X86_FPU_API_H */
> > diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> > index f549e2a44336..2e52f01f4931 100644
> > --- a/arch/x86/kernel/fpu/xstate.c
> > +++ b/arch/x86/kernel/fpu/xstate.c
> > @@ -11,10 +11,23 @@
> > #include <asm/tlbflush.h>
> > #include <asm/xcr.h>
> >
> > +static const char *xfeature_names[] =
> > +{
> > + "x87 floating point registers" ,
> > + "SSE registers" ,
> > + "AVX registers" ,
> > + "MPX bounds registers" ,
> > + "MPX CSR" ,
> > + "AVX-512 opmask" ,
> > + "AVX-512 Hi256" ,
> > + "AVX-512 ZMM_Hi256" ,
> > + "unknown xstate feature" ,
> > +};
> > +
> > /*
> > * Mask of xstate features supported by the CPU and the kernel:
> > */
> > -u64 xfeatures_mask;
> > +u64 xfeatures_mask __read_mostly;
> >
> > /*
> > * Represents init state for the supported extended state.
> > @@ -29,6 +42,44 @@ static unsigned int
> > xstate_comp_offsets[sizeof(xfeatures_mask)*8];
> > static unsigned int xfeatures_nr;
> >
> > /*
> > + * Return whether the system supports a given xfeature.
> > + *
> > + * Also return the name of the (most advanced) feature that the caller
> > requested:
> > + */
> > +int cpu_has_xfeatures(u64 xfeatures_needed, const char **feature_name)
> > +{
> > + u64 xfeatures_missing = xfeatures_needed & ~xfeatures_mask;
> > +
> > + if (unlikely(feature_name)) {
> > + long xfeature_idx, max_idx;
> > + u64 xfeatures_print;
> > + /*
> > + * So we use FLS here to be able to print the most advanced
> > + * feature that was requested but is missing. So if a driver
> > + * asks about "XSTATE_SSE | XSTATE_YMM" we'll print the
> > + * missing AVX feature - this is the most informative message
> > + * to users:
> > + */
> > + if (xfeatures_missing)
> > + xfeatures_print = xfeatures_missing;
>
> If the feature is missing (xfeatures_missing!=0), the xfeature_name
> will point to the next feature name with the higher idx in
> xfeatures_mask. Is that supposed to be?
Yes, so this is a reporting detail. The intention here is the
following: when a driver requests multiple features to be present, for
example:
if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, NULL)) {
then it makes sense to report the highest feature bit, not the first
one we find to not exist. Why? Because in the above example, on an old
CPU we could miss on XSTATE_SSE already, and report that to the user -
which creates the incorrect assumption that the minimum requirement is
for SSE. The user then tries the same driver on a more modern system,
which has SSE but not YMM, and gets a shifting goal post.
So I wanted to report the most modern feature that is missing, to
avoid such kind of reporting ambiguity on older systems.
Alternatively we could iterate the mask in reverse order as well?
Thanks,
Ingo
* Dave Hansen <[email protected]> wrote:
> On 05/05/2015 11:16 AM, Ingo Molnar wrote:
> > We could put the SDM name into a comment, next to the field
> > definition? Something like, if 'xfeatures' is too long:
> >
> > struct xstate_header {
> > u64 xfeat; /* xstate components, SDM: XSTATE_BV */
> > u64 xfeat_comp; /* compacted xstate components, SDM: XCOMP_BV */
> > u64 reserved[6];
> > } __attribute__((packed));
>
> When you're in the depths of the SDM and the kernel code, the fewer
> context switches you have to make, the better. [...]
But that's not the only consideration. While in general I'm all for
following reference hardware documentation with names, there's a limit
for how far we'll go in following stupid vendor names, and I think
'XSTATE_BV' and 'XCOMP_BV' are well beyond any sane limit (see further
below my suggestion for better naming).
> [...] I say this from the perspective of someone who's had a copy
> of the SDM open to xsave* for about a month straight.
If only one of us worked at the company that invented those
nonsensical names and complex SDMs, and could complain to them? ;-)
> In any case, having "xfeat" and "xfeat_comp" is a bad idea. They're
> not really related concepts other than their bits refer to the same
> states. They should not have such similar names.
Agreed.
> XSTATE_BV is the set of states written to the xsave area.
>
> XCOMP_BV is essentially always XCR0 (aka pcntxt_mask, aka
> xfeatures_mask) or'd with bit 63.
So how about this naming:
/*
* Mask of xstate components currently not in init state,
* typically written to by XSAVE*.
*/
u64 xfeat_mask_used; /* SDM: XSTATE_BV */
/*
* Mask of all state components saved/restored, plus the
* compaction flag. (Note that the XRSTORS instruction caches
* this value, and the next SAVES done for this same
* area expects it to match, before it can perform the 'were
* these registers modified' hardware optimization.)
*/
u64 xfeat_mask_all; /* SDM: XCOMP_BV */
(Note that I kept the SDM name easily greppable.)
The 'compaction' aspect of 'xfeat_mask_all' is just an additional
quirk that does not deserve to be represented in the primary naming:
bit 63 of 'xfeat_mask_all' is set to 1 if the format is compacted:
basically 'compaction' can be thought of as an additional, special
'xfeature' that modifies the offsets in the save area to eliminate
holes. [*]
Basically this naming tells us the biggest, most relevant
differentiation between these two fields:
- the 'xfeat_mask_used' field reflects the current, momentary,
optimized state of the area. This mask is content dependent,
and it is a subset of:
- the 'xfeat_mask_all' field which reflects all states supported by
that fpstate context. This mask is content independent.
The compaction aspect of 'xfeat_mask_all' is obviously important to
the hardware (and depending on its value the position of various
registers in the save area are different), but secondary to the big
picture.
Note that once you have a good name, a lot of code becomes a lot more
obvious - and I wish Intel did more than just googling for the first
available historic QuickBASIC variable name when picking new CPU
symbols.
Thanks,
Ingo
[*]
Btw., does Intel have any special plans with xstate compaction?
AFAICS in Linux we just want to enable xfeat_mask_all to the max,
including compaction, and never really modify it (in the task's
lifetime).
I'm also wondering whether there will be any real 'holes' in the
xfeatures capability masks of future CPUs: right now xfeatures tend to
be already 'compacted' (because new CPUs tend to support all
xfeatures), so compaction mostly appears to be an academic feature. Or
is there already hardware out there where it matter?
Maybe once we get AVX512 in addition to MPX we can use compaction
materially: as there will be lots of tasks without MPX state but with
AVX512 state - in fact I suspect that will be the common case.
OTOH MPX state is relatively small compared to AVX and AVX512 state,
so skipping the hole won't buy us much, and the question is, how
expensive is compaction, will save/restore be slower with compaction
enabled? Has to be measured I suspect.
* Ingo Molnar <[email protected]> wrote:
> > XSTATE_BV is the set of states written to the xsave area.
> >
> > XCOMP_BV is essentially always XCR0 (aka pcntxt_mask, aka
> > xfeatures_mask) or'd with bit 63.
>
> So how about this naming:
>
> /*
> * Mask of xstate components currently not in init state,
> * typically written to by XSAVE*.
> */
> u64 xfeat_mask_used; /* SDM: XSTATE_BV */
>
> /*
> * Mask of all state components saved/restored, plus the
> * compaction flag. (Note that the XRSTORS instruction caches
> * this value, and the next SAVES done for this same
> * area expects it to match, before it can perform the 'were
> * these registers modified' hardware optimization.)
> */
> u64 xfeat_mask_all; /* SDM: XCOMP_BV */
>
> (Note that I kept the SDM name easily greppable.)
Hm, so the problem with this naming is that for non-compacted XRSTOR,
XCOMP_BV has to be zero. (This seems nonsensical btw., as there's a
separate 'compaction' flag at bit 63 already.)
So a better name would be:
/*
* Mask of xstate components currently not in init state,
* typically written to by XSAVE*.
*/
u64 xfeat_used_mask; /* SDM: XSTATE_BV */
/*
* This mask is non-zero if the CPU supports state compaction:
* it is the mask of all state components to be saved/restored,
* plus the compaction flag at bit 63.
* (Note that the XRSTORS instruction caches this value, and
* the next SAVES done for this same area expects it to match,
* before it can perform the 'were these registers modified'
* hardware optimization.)
*/
u64 xfeat_comp_mask; /* SDM: XCOMP_BV */
?
Thanks,
Ingo
On 05/06/2015 05:46 AM, Ingo Molnar wrote:
> So a better name would be:
>
> /*
> * Mask of xstate components currently not in init state,
> * typically written to by XSAVE*.
> */
> u64 xfeat_used_mask; /* SDM: XSTATE_BV */
The comment and name make sense if we always call xsave* with an
"instruction mask" where it has at least as many bits as we have set in
'pcntxt_mask' (aka xfeatures_mask).
If we ever get to a point where we are saving only a subset of the
supported features (say if we only wanted to consult the MPX registers
and none of the other state), then this stops making sense.
I think 'xfeat_saved_mask' or 'xstate_saved_mask' makes more sense.
Maybe a comment like:
/*
* Mask of xstate components currently present in the buffer.
* A non-present bit can mean that the feature is in the init
* state or that we did not ask the instruction to save it.
* typically written to by XSAVE*.
*/
> /*
> * This mask is non-zero if the CPU supports state compaction:
> * it is the mask of all state components to be saved/restored,
> * plus the compaction flag at bit 63.
That's not correct. It's non-zero if it supports compaction and it was
saved using an instruction that supports compaction. A CPU supporting
xsaves, but using xsave will receive an uncompacted version with xcomp_bv=0.
> * (Note that the XRSTORS instruction caches this value, and
> * the next SAVES done for this same area expects it to match,
> * before it can perform the 'were these registers modified'
> * hardware optimization.)
> */
> u64 xfeat_comp_mask; /* SDM: XCOMP_BV */
That seems like a bit of a silly thing to mention in a comment since it
never changes.
How about something like this?
/*
* Must be 0 when compacted state not supported. If compacted state is
* supported and XRSTOR variant understands both formats, Bit 63 tells
* instruction which format is to be used.
*
* This tells you the format of the buffer when using compacted layout.
* The format is determined by the features enabled in XCR* along with
* the features requested at XSAVE* time (SDM: RFBM).
*
* Note that even if a feature is present in this mask, it may still be
* absent from 'xfeat_used_mask', which means that space was allocated
* in the layout, but that it was not actually written.
*/
On 05/05/2015 11:16 PM, Ingo Molnar wrote:
> Btw., does Intel have any special plans with xstate compaction?
>
> AFAICS in Linux we just want to enable xfeat_mask_all to the max,
> including compaction, and never really modify it (in the task's
> lifetime).
Special plans?
If we do an XRSTORS on it before we do an XSAVES, then we need to worry.
But, if we do an XSAVES, the CPU will set it up for us.
> I'm also wondering whether there will be any real 'holes' in the
> xfeatures capability masks of future CPUs: right now xfeatures tend to
> be already 'compacted' (because new CPUs tend to support all
> xfeatures), so compaction mostly appears to be an academic feature. Or
> is there already hardware out there where it matter?
There is a hole in the SDM today. See section 2.6 in the currently
released 054 version. I also know of actual hardware platforms with
holes. *PLUS*, someone can always shot down CPUID bits in their
hypervisor or with kernel command-line options.
> Maybe once we get AVX512 in addition to MPX we can use compaction
> materially: as there will be lots of tasks without MPX state but with
> AVX512 state - in fact I suspect that will be the common case.
Right.
But we'd need to get to a point where we are calling 'xsaves' with a
Requested Feature BitMask (aka RFBM[]) that had holes in it. As it
stands today, we always call it with RFBM=-1 and so we always have
XCOMP_BV = XCR0.
We'd need to determine which fields are in the init state before we do
an xsaves.
> OTOH MPX state is relatively small compared to AVX and AVX512 state,
> so skipping the hole won't buy us much, and the question is, how
> expensive is compaction, will save/restore be slower with compaction
> enabled? Has to be measured I suspect.
Yep.
On Wed, May 06, 2015 at 11:27:47AM -0700, Dave Hansen wrote:
> But we'd need to get to a point where we are calling 'xsaves' with a
> Requested Feature BitMask (aka RFBM[]) that had holes in it. As it
> stands today, we always call it with RFBM=-1 and so we always have
> XCOMP_BV = XCR0.
>
> We'd need to determine which fields are in the init state before we do
> an xsaves.
Btw, do we have any perf data as to the improvement the compacted
variant brings?
I mean, it means a bunch of jumping through hoops in SW but is it worth
it?
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
* Dave Hansen <[email protected]> wrote:
> On 05/06/2015 05:46 AM, Ingo Molnar wrote:
> > So a better name would be:
> >
> > /*
> > * Mask of xstate components currently not in init state,
> > * typically written to by XSAVE*.
> > */
> > u64 xfeat_used_mask; /* SDM: XSTATE_BV */
>
> The comment and name make sense if we always call xsave* with an
> "instruction mask" where it has at least as many bits as we have set
> in 'pcntxt_mask' (aka xfeatures_mask).
>
> If we ever get to a point where we are saving only a subset of the
> supported features (say if we only wanted to consult the MPX
> registers and none of the other state), then this stops making
> sense.
>
> I think 'xfeat_saved_mask' or 'xstate_saved_mask' makes more sense.
Good, that name works for me too: it still expresses the dynamic,
content-dependent nature of this mask.
> Maybe a comment like:
>
> /*
> * Mask of xstate components currently present in the buffer.
> * A non-present bit can mean that the feature is in the init
> * state or that we did not ask the instruction to save it.
> * typically written to by XSAVE*.
> */
>
> > /*
> > * This mask is non-zero if the CPU supports state compaction:
> > * it is the mask of all state components to be saved/restored,
> > * plus the compaction flag at bit 63.
>
> That's not correct. It's non-zero if it supports compaction and it
> was saved using an instruction that supports compaction. A CPU
> supporting xsaves, but using xsave will receive an uncompacted
> version with xcomp_bv=0.
That is what I meant: under Linux this bit is non-zero if the CPU
supports state compaction.
> > * (Note that the XRSTORS instruction caches this value, and
> > * the next SAVES done for this same area expects it to match,
> > * before it can perform the 'were these registers modified'
> > * hardware optimization.)
> > */
> > u64 xfeat_comp_mask; /* SDM: XCOMP_BV */
>
> That seems like a bit of a silly thing to mention in a comment since
> it never changes.
But that's the reason why in Linux we don't really change it - we
don't go about trying to slice&dice the xstate into multiple
components. We just save it all and let the 'init' and 'modified'
optimizations in the hardware take care of the optimizations.
> How about something like this?
>
> /*
> * Must be 0 when compacted state not supported. If compacted state is
> * supported and XRSTOR variant understands both formats, Bit 63 tells
> * instruction which format is to be used.
Yes.
Btw., as a side note, this is a silly hardware interface: not setting
bit 63 is not valid as per the SDM (will cause a #GP), so it might as
well have left the whole bit 63 complication out of it: if this mask
is nonzero then a compacted format is requested. If it's zero, then a
non-compacted format.
Right?
> *
> * This tells you the format of the buffer when using compacted layout.
> * The format is determined by the features enabled in XCR* along with
> * the features requested at XSAVE* time (SDM: RFBM).
I has not 2 but 3 inputs: what is being saved/restored is determined
by this very bitmask here. If a bit is missing from this mask, it
won't be saved (it's 'compacted'), even if it's otherwise set in XCR*
or is requested in the instruction mask.
That's the whole point of compaction.
> *
> * Note that even if a feature is present in this mask, it may still be
> * absent from 'xfeat_used_mask', which means that space was allocated
> * in the layout, but that it was not actually written.
So here I'd mention _why_ it can be left out from xfeat_used_mask:
when an xstate component is in 'init state', then only 0 is written to
the xfeat_used_mask but the component area itself is not touched.
This means that previous contents of the saved area are still there
and are stale, so the kernel has to be careful about not exposing
these to user-space indiscriminately. That's what the 'sanitization'
functions are about.
Thanks,
Ingo
* Dave Hansen <[email protected]> wrote:
> On 05/05/2015 11:16 PM, Ingo Molnar wrote:
> > Btw., does Intel have any special plans with xstate compaction?
> >
> > AFAICS in Linux we just want to enable xfeat_mask_all to the max,
> > including compaction, and never really modify it (in the task's
> > lifetime).
>
> Special plans?
I.e. are there any plans beyond using it strictly for full state
save/restore.
> If we do an XRSTORS on it before we do an XSAVES, then we need to
> worry. But, if we do an XSAVES, the CPU will set it up for us.
>
> > I'm also wondering whether there will be any real 'holes' in the
> > xfeatures capability masks of future CPUs: right now xfeatures
> > tend to be already 'compacted' (because new CPUs tend to support
> > all xfeatures), so compaction mostly appears to be an academic
> > feature. Or is there already hardware out there where it matter?
>
> There is a hole in the SDM today. See section 2.6 in the currently
> released 054 version. I also know of actual hardware platforms with
> holes. *PLUS*, someone can always shot down CPUID bits in their
> hypervisor or with kernel command-line options.
I see, so MPX (bits 3 and 4) aren't there yet.
Btw., there's a new xfeature it appears:
XCR0.PKRU (bit 9): If 1, the XSAVE feature set can be used to manage
the PKRU register (see Section 2.7).
and bit 8 is a hole again.
Btw., regarding XCR0.PKRU: that enables 'Protection Keys' in the PTE
format. What's the main purpose of these keys? They seem to duplicate
the read/write bits in the PTE, with the exception that they don't
impact instruction fetches. So is this used to allow user-space to
execute but otherwise not read instructions?
Or some other purpose I missed?
In any case, these holes are really minor at the moment, and the
question is, what is the performance difference between a 'compactede'
XSAVE*/XRSTOR* pair, versus a standard format one?
> > Maybe once we get AVX512 in addition to MPX we can use compaction
> > materially: as there will be lots of tasks without MPX state but
> > with AVX512 state - in fact I suspect that will be the common
> > case.
>
> Right.
>
> But we'd need to get to a point where we are calling 'xsaves' with a
> Requested Feature BitMask (aka RFBM[]) that had holes in it. As it
> stands today, we always call it with RFBM=-1 and so we always have
> XCOMP_BV = XCR0.
XCOMP_BV must also have bit 63 set.
13.8.1
Standard Form of XRSTOR
The standard from of XRSTOR performs additional fault checking.
Either of the following conditions causes ageneral-protection
exception (#GP):
The XSTATE_BV field of the XSAVE header sets a bit that is not set
in XCR0. Bytes 23:8 of the XSAVE header are not all 0 (this implies
^^^^^^^^^^^^
that all bits in XCOMP_BV are 0).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note the part I underlined: all of XCOMP_BV has to be 0 for any
standard form of XRSTOR, and if we use a compacted form, bit 63 must
be set:
this is why bit 63 is a nonsensical interface: it being nonzero
already tells the hardware that we requested compaction ...
> We'd need to determine which fields are in the init state before we
> do an xsaves.
Why? I don't think that's necessary.
The way I read the SDM both the 'init' and the 'modified'
optimizations are mostly automatic: the CPU determines it
automatically when a state component is (or returned to!) init state,
and signals that via the relevant bit in XSTATE_BV being zeroed out.
This is what the SDM says about XSAVES (section 13.11 in the 054 SDM):
— If state component i is in its initial configuration, XSTATE_BV[i]
may be written with either 0 or 1.
so XSAVES itself performs the first step of the 'init optimization',
automatically: it will opportunistically write 0 to the relevant bit
in XSTATE_BV and won't save the state.
Once there's 0 in XSTATE_BV, put there by XSAVES, the XRSTOR
instruction is able to perform the other half of the optimization: by
not restoring it but initializing it (if needed).
XSAVES will also set up XSTATE_BV and XCOMP_BV so that XRSTOR does not
have to worry about it, it will do a compacted restore.
Thanks,
Ingo
On 05/07/2015 05:22 AM, Ingo Molnar wrote:
> I.e. are there any plans beyond using it strictly for full state
> save/restore.
None that I know of, but having two (relatively) tiny features
(protection keys and MPX) might change things.
> Btw., regarding XCR0.PKRU: that enables 'Protection Keys' in the PTE
> format. What's the main purpose of these keys? They seem to duplicate
> the read/write bits in the PTE, with the exception that they don't
> impact instruction fetches. So is this used to allow user-space to
> execute but otherwise not read instructions?
>
> Or some other purpose I missed?
You can change the permissions of a given key with writes to the
register without changing the PTE. No TLB shootdown, plus the
permission changes are local to the CPU thread.
I have patches today if you're interested.
> In any case, these holes are really minor at the moment, and the
> question is, what is the performance difference between a 'compactede'
> XSAVE*/XRSTOR* pair, versus a standard format one?
Yeah, that would be interesting to know.
>>> Maybe once we get AVX512 in addition to MPX we can use compaction
>>> materially: as there will be lots of tasks without MPX state but
>>> with AVX512 state - in fact I suspect that will be the common
>>> case.
>>
>> Right.
>>
>> But we'd need to get to a point where we are calling 'xsaves' with a
>> Requested Feature BitMask (aka RFBM[]) that had holes in it. As it
>> stands today, we always call it with RFBM=-1 and so we always have
>> XCOMP_BV = XCR0.
...
>> We'd need to determine which fields are in the init state before we
>> do an xsaves.
>
> Why? I don't think that's necessary.
"If RFBM[i] = 0, XSTATE_BV[i] is written as 0."
We need to pull XSTATE_BV in to the instruction mask when doing an
XSAVE* if our RFBM has bits unset that *are* set in XSTATE_BV.
Otherwise, we'll destroy the bits at XSAVE* time.
It's not a problem today because the instruction mask is always -1, so
it always has every bit set that *MIGHT* be set in XSTATE_BV.
As for the whole bit 63 thing... It's possible and valid to have a
XCOMP_BV[62:0]=0 because the instruction mask only contained bits that
were unset in XCR0|IA32_XSS. You need bit 63 to tell you which format
you are using.
* Dave Hansen <[email protected]> wrote:
> > Btw., regarding XCR0.PKRU: that enables 'Protection Keys' in the
> > PTE format. What's the main purpose of these keys? They seem to
> > duplicate the read/write bits in the PTE, with the exception that
> > they don't impact instruction fetches. So is this used to allow
> > user-space to execute but otherwise not read instructions?
> >
> > Or some other purpose I missed?
>
> You can change the permissions of a given key with writes to the
> register without changing the PTE. No TLB shootdown, plus the
> permission changes are local to the CPU thread.
interesting ... where are we doing that? kmap_atomic() is the only
place I can think of in the kernel, but there we are already skipping
the TLB shootdown by doing an INVLPG.
> I have patches today if you're interested.
I'm always interested in new CPU features ;-)
> ...
> >> We'd need to determine which fields are in the init state before we
> >> do an xsaves.
> >
> > Why? I don't think that's necessary.
>
> "If RFBM[i] = 0, XSTATE_BV[i] is written as 0."
Yes, that's natural: RFBM is the combination of XCR0 (constant) and
the 'instruction mask' (constant as well) - i.e. it's a wide bitmask
including all our xfeatures - essentially 'xfeatures_mask' (in
tmp.fpu).
> We need to pull XSTATE_BV in to the instruction mask when doing an
> XSAVE* if our RFBM has bits unset that *are* set in XSTATE_BV.
> Otherwise, we'll destroy the bits at XSAVE* time.
But why would our RFBM be narrower than any possible XSTATE_BV we
handle? Our XCR0 is at the max.
Also, the XSTATE_BV value of the save area is immaterial when we do an
XSAVES: all the state is in CPU registers, we want to save it to the
save area. XSAVES will write it for us.
> It's not a problem today because the instruction mask is always -1,
> so it always has every bit set that *MIGHT* be set in XSTATE_BV.
Yes. And why would we ever want to narrow it?
> As for the whole bit 63 thing... It's possible and valid to have a
> XCOMP_BV[62:0]=0 because the instruction mask only contained bits
> that were unset in XCR0|IA32_XSS. You need bit 63 to tell you which
> format you are using.
So basically if you request an XSAVES to ... write nothing (XCR0 AND
instruction mask is 0), then it will represent this as 0|1<<63 in
XCOMP_BV?
In that case it does not matter whether the area is compacted or
standard: nothing was saved and nothing will have to be restored, only
the xsave header area will be accessed. Am I missing something?
Thanks,
Ingo
On 05/07/2015 08:33 AM, Ingo Molnar wrote:
> * Dave Hansen <[email protected]> wrote:
>>> Btw., regarding XCR0.PKRU: that enables 'Protection Keys' in the
>>> PTE format. What's the main purpose of these keys? They seem to
>>> duplicate the read/write bits in the PTE, with the exception that
>>> they don't impact instruction fetches. So is this used to allow
>>> user-space to execute but otherwise not read instructions?
>>>
>>> Or some other purpose I missed?
>>
>> You can change the permissions of a given key with writes to the
>> register without changing the PTE. No TLB shootdown, plus the
>> permission changes are local to the CPU thread.
>
> interesting ... where are we doing that? kmap_atomic() is the only
> place I can think of in the kernel, but there we are already skipping
> the TLB shootdown by doing an INVLPG.
Userspace. :)
It's for userspace-only.
>>>> We'd need to determine which fields are in the init state before we
>>>> do an xsaves.
>>>
>>> Why? I don't think that's necessary.
>>
>> "If RFBM[i] = 0, XSTATE_BV[i] is written as 0."
>
> Yes, that's natural: RFBM is the combination of XCR0 (constant) and
> the 'instruction mask' (constant as well) - i.e. it's a wide bitmask
> including all our xfeatures - essentially 'xfeatures_mask' (in
> tmp.fpu).
>
>> We need to pull XSTATE_BV in to the instruction mask when doing an
>> XSAVE* if our RFBM has bits unset that *are* set in XSTATE_BV.
>> Otherwise, we'll destroy the bits at XSAVE* time.
>
> But why would our RFBM be narrower than any possible XSTATE_BV we
> handle? Our XCR0 is at the max.
>
> Also, the XSTATE_BV value of the save area is immaterial when we do an
> XSAVES: all the state is in CPU registers, we want to save it to the
> save area. XSAVES will write it for us.
>
>> It's not a problem today because the instruction mask is always -1,
>> so it always has every bit set that *MIGHT* be set in XSTATE_BV.
>
> Yes. And why would we ever want to narrow it?
Because it actually allows us to take advantage of the compaction.
Think of the layout of a task using protection keys and MPX.
MPX = 8*4 + 8*2 = 48 bytes.
PKEYs = 4 bytes
They'll be spread out in the standard form *OR* the compacted form with
a RFBM=-1. But, with the compacted form with RFBM=PK|MPX_BITS, they'll
fit in a cacheline.
>> As for the whole bit 63 thing... It's possible and valid to have a
>> XCOMP_BV[62:0]=0 because the instruction mask only contained bits
>> that were unset in XCR0|IA32_XSS. You need bit 63 to tell you which
>> format you are using.
>
> So basically if you request an XSAVES to ... write nothing (XCR0 AND
> instruction mask is 0), then it will represent this as 0|1<<63 in
> XCOMP_BV?
>
> In that case it does not matter whether the area is compacted or
> standard: nothing was saved and nothing will have to be restored, only
> the xsave header area will be accessed. Am I missing something?
Take a look at the SDM. There are differences in the behavior when
restoring the compacted vs. standard format. I don't know the deep
reasons for *WHY*, just that there are some deltas clearly spelled out
there.
* Dave Hansen <[email protected]> wrote:
> >> It's not a problem today because the instruction mask is always
> >> -1, so it always has every bit set that *MIGHT* be set in
> >> XSTATE_BV.
> >
> > Yes. And why would we ever want to narrow it?
>
> Because it actually allows us to take advantage of the compaction.
> Think of the layout of a task using protection keys and MPX.
>
> MPX = 8*4 + 8*2 = 48 bytes.
> PKEYs = 4 bytes
>
> They'll be spread out in the standard form *OR* the compacted form
> with a RFBM=-1. But, with the compacted form with RFBM=PK|MPX_BITS,
> they'll fit in a cacheline.
but but ... if this is a normal userspace task then it will use AVX on
modern Linux distros and then we are up to a 1K FPU context size
already, 48 bytes won't make a visible dent - especially if compacted
form has some saving cost. With AVX512 we are at around 2K?
I certainly don't _object_ to using the compacted format, as it makes
sense and it's clearly getting quite a bit of attention from the
hardware folks, but we should run some timings and such.
In any case I'm mostly just curious, not worried.
Thanks,
Ingo
On 05/05/2015 10:49 AM, Ingo Molnar wrote:
> @@ -574,12 +573,10 @@ static void setup_init_fpu_buf(void)
> on_boot_cpu = 0;
>
> /*
> - * Setup init_xstate_buf to represent the init state of
> + * Setup init_xstate_ctx to represent the init state of
> * all the features managed by the xsave
> */
> - init_xstate_buf = alloc_bootmem_align(xstate_size,
> - __alignof__(struct xsave_struct));
> - fx_finit(&init_xstate_buf->i387);
> + fx_finit(&init_xstate_ctx.i387);
This is causing memory corruption in 4.2-rc2.
We do not know the size of the 'init_xstate_buf' before we boot. It's
completely enumerated in CPUID leaves but it is not static by any means.
This commit when applied (3e5e126774) tries to replace the dynamic
allocation with a static one. When we do the first 'xrstor' (in
copy_xregs_to_kernel_booting()) it overruns init_fpstate and corrupts
the next chunk of memory (which is xfeatures_mask in my case).
I'm seeing this on a system with states not represented in
XSTATE_RESERVE (XSTATE_ZMM_Hi256 / XSTATE_OPMASK / XSTATE_Hi16_ZMM).
The systems affected are not widely available, but this is something
that we absolutely do not want to see regress.
This bug could also occur if a future CPU decided to change the amount
of storage allocated for a given xstate feature (which would be
architecturally OK).
According to the commit:
> This removes the last bootmem allocation from the FPU init path, allowing
> it to be called earlier in the boot sequence.
so we can't easily just revert this, although I'm not 100% that this is
before bootmem is availalble.
This patch works around the problem, btw:
https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
One curiosity here is that the bisect for this actually turned up the
patch that disables 'XSAVES' support. When we used 'XSAVES' and the
"compacted" format, we managed to fit in to the buffer and things worked
(accidentally).
Actually we could statically associate a biggerbuffer based on the XCR0 features we support. That would preclude dynamic enabling and really just adds complexity for no good reason.
On July 14, 2015 12:46:17 PM PDT, Dave Hansen <[email protected]> wrote:
>On 05/05/2015 10:49 AM, Ingo Molnar wrote:
>> @@ -574,12 +573,10 @@ static void setup_init_fpu_buf(void)
>> on_boot_cpu = 0;
>>
>> /*
>> - * Setup init_xstate_buf to represent the init state of
>> + * Setup init_xstate_ctx to represent the init state of
>> * all the features managed by the xsave
>> */
>> - init_xstate_buf = alloc_bootmem_align(xstate_size,
>> - __alignof__(struct xsave_struct));
>> - fx_finit(&init_xstate_buf->i387);
>> + fx_finit(&init_xstate_ctx.i387);
>
>This is causing memory corruption in 4.2-rc2.
>
>We do not know the size of the 'init_xstate_buf' before we boot. It's
>completely enumerated in CPUID leaves but it is not static by any
>means.
> This commit when applied (3e5e126774) tries to replace the dynamic
>allocation with a static one. When we do the first 'xrstor' (in
>copy_xregs_to_kernel_booting()) it overruns init_fpstate and corrupts
>the next chunk of memory (which is xfeatures_mask in my case).
>
>I'm seeing this on a system with states not represented in
>XSTATE_RESERVE (XSTATE_ZMM_Hi256 / XSTATE_OPMASK / XSTATE_Hi16_ZMM).
>The systems affected are not widely available, but this is something
>that we absolutely do not want to see regress.
>
>This bug could also occur if a future CPU decided to change the amount
>of storage allocated for a given xstate feature (which would be
>architecturally OK).
>
>According to the commit:
>
>> This removes the last bootmem allocation from the FPU init path,
>allowing
>> it to be called earlier in the boot sequence.
>
>so we can't easily just revert this, although I'm not 100% that this is
>before bootmem is availalble.
>
>This patch works around the problem, btw:
>
> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
>
>One curiosity here is that the bisect for this actually turned up the
>patch that disables 'XSAVES' support. When we used 'XSAVES' and the
>"compacted" format, we managed to fit in to the buffer and things
>worked
>(accidentally).
--
Sent from my mobile phone. Please pardon brevity and lack of formatting.
* Dave Hansen <[email protected]> wrote:
> On 05/05/2015 10:49 AM, Ingo Molnar wrote:
> > @@ -574,12 +573,10 @@ static void setup_init_fpu_buf(void)
> > on_boot_cpu = 0;
> >
> > /*
> > - * Setup init_xstate_buf to represent the init state of
> > + * Setup init_xstate_ctx to represent the init state of
> > * all the features managed by the xsave
> > */
> > - init_xstate_buf = alloc_bootmem_align(xstate_size,
> > - __alignof__(struct xsave_struct));
> > - fx_finit(&init_xstate_buf->i387);
> > + fx_finit(&init_xstate_ctx.i387);
>
> This is causing memory corruption in 4.2-rc2.
>
> We do not know the size of the 'init_xstate_buf' before we boot. It's
> completely enumerated in CPUID leaves but it is not static by any means.
> This commit when applied (3e5e126774) tries to replace the dynamic
> allocation with a static one. When we do the first 'xrstor' (in
> copy_xregs_to_kernel_booting()) it overruns init_fpstate and corrupts
> the next chunk of memory (which is xfeatures_mask in my case).
>
> I'm seeing this on a system with states not represented in
> XSTATE_RESERVE (XSTATE_ZMM_Hi256 / XSTATE_OPMASK / XSTATE_Hi16_ZMM).
> The systems affected are not widely available, but this is something
> that we absolutely do not want to see regress.
>
> This bug could also occur if a future CPU decided to change the amount
> of storage allocated for a given xstate feature (which would be
> architecturally OK).
>
> According to the commit:
>
> > This removes the last bootmem allocation from the FPU init path, allowing
> > it to be called earlier in the boot sequence.
>
> so we can't easily just revert this, although I'm not 100% that this is
> before bootmem is availalble.
>
> This patch works around the problem, btw:
>
> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
Yeah, so I got this prototype hardware boot crash reported in private mail and
decoded it and after some debugging I suggested the +PAGE_SIZE hack - possibly you
got that hack from the same person?
My suggestion was to solve this properly: if we list xstate features as supported
then we should size their max size correctly. The AVX bits are currently not
properly enumerated and sized - and I refuse to add feature support to the kernel
where per task CPU state fields that the kernel saves/restores are opaque...
So please add proper AVX512 support structures to fpu/types.h and size
XSTATE_RESERVE correctly - or alternatively we can remove the current incomplete
AVX512 bits.
Thanks,
Ingo
On 07/15/2015 04:07 AM, Ingo Molnar wrote:
> * Dave Hansen <[email protected]> wrote:
>>> /*
>>> - * Setup init_xstate_buf to represent the init state of
>>> + * Setup init_xstate_ctx to represent the init state of
>>> * all the features managed by the xsave
>>> */
>>> - init_xstate_buf = alloc_bootmem_align(xstate_size,
>>> - __alignof__(struct xsave_struct));
>>> - fx_finit(&init_xstate_buf->i387);
>>> + fx_finit(&init_xstate_ctx.i387);
>>
>> This is causing memory corruption in 4.2-rc2.
...
>> This patch works around the problem, btw:
>>
>> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
>
> Yeah, so I got this prototype hardware boot crash reported in private mail and
> decoded it and after some debugging I suggested the +PAGE_SIZE hack - possibly you
> got that hack from the same person?
Nope, I came up with that gem of a patch all on my own.
I also wouldn't characterize this as prototype hardware. There are
obviously plenty of folks depending on mainline to boot and function on
hardware that has AVX-512 support. That's why two different Intel folks
came to you independently.
> My suggestion was to solve this properly: if we list xstate features as supported
> then we should size their max size correctly. The AVX bits are currently not
> properly enumerated and sized - and I refuse to add feature support to the kernel
> where per task CPU state fields that the kernel saves/restores are opaque...
We might know the size and composition of the individual components, but
we do not know the size of the buffer. Different implementations of a
given feature are quite free to have different data stored in the
buffer, or even to rearrange or pad it. That's why the sizes are not
explicitly called out by the architecture and why we enumerated them
before your patch that caused this regression.
The component itself may not be opaque, but the size of the *buffer* is
not a simple sum of the component sizes. Here's a real-world example:
[ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
[ 0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
Notice that component 3 is not at 0x240+0x100. This means our existing
init_xstate_size(), and why any attempt to staticlly-size the buffer is
broken.
I understand why you were misled by it, but the old "xsave_hdr_struct"
was wrong. Fenghua even posted patches to remove it before the FPU
rework (you were cc'd):
https://lkml.org/lkml/2015/4/18/164
> So please add proper AVX512 support structures to fpu/types.h and size
> XSTATE_RESERVE correctly - or alternatively we can remove the current incomplete
> AVX512 bits.
The old code sized the buffer in a fully architectural way and it
worked. The CPU *tells* you how much memory the 'xsave' instruction is
going to scribble on. The new code just merrily calls it and let it
scribble away. This is as clear-cut a regression as I've ever seen.
The least we can do is detect that the kernel undersized the buffer and
disable support for the features that do not fit. A very lightly tested
patch to do that is attached. I'm not super eager to put that in to an
-rc2 kernel though.
On Wed, Jul 15, 2015 at 5:34 PM, Dave Hansen
<[email protected]> wrote:
>
> The old code sized the buffer in a fully architectural way and it
> worked. The CPU *tells* you how much memory the 'xsave' instruction is
> going to scribble on. The new code just merrily calls it and let it
> scribble away. This is as clear-cut a regression as I've ever seen.
Yes, I think we'll need to revert it, or do something else drastic
like make that initial fp state allocation *much* bigger and then have
a "disable xsaves if if it's still not big enough".
setup_xstate_features() should be able to easily just say "this was
the maximum offset+size we saw", and we can take that to either do a
proper allocation, or verify that the static allocation is indeed big
enough.
Apparently a straight revert doesn't work, if only because things in
that area have been renamed very aggressively (both files and
functions and variables). Ingo?
Linus
On Wed, Jul 15, 2015 at 5:34 PM, Dave Hansen
<[email protected]> wrote:
>
> I understand why you were misled by it, but the old "xsave_hdr_struct"
> was wrong. Fenghua even posted patches to remove it before the FPU
> rework (you were cc'd):
>
> https://lkml.org/lkml/2015/4/18/164
Oh, and that patch looks like a good idea.
I wish there was some way to make sure sizeof() fail on it so that
we'd enforce that nobody allocates that thing as-is. I had this dim
memory that an unsized array at the end would do that, but I was
clearly wrong. It's just the array itself you can't do sizeof on, not
the structure that contains it. Is there some magic trick that I'm
forgetting?
Linus
* Dave Hansen <[email protected]> wrote:
> On 07/15/2015 04:07 AM, Ingo Molnar wrote:
> > * Dave Hansen <[email protected]> wrote:
> >>> /*
> >>> - * Setup init_xstate_buf to represent the init state of
> >>> + * Setup init_xstate_ctx to represent the init state of
> >>> * all the features managed by the xsave
> >>> */
> >>> - init_xstate_buf = alloc_bootmem_align(xstate_size,
> >>> - __alignof__(struct xsave_struct));
> >>> - fx_finit(&init_xstate_buf->i387);
> >>> + fx_finit(&init_xstate_ctx.i387);
> >>
> >> This is causing memory corruption in 4.2-rc2.
> ...
> >> This patch works around the problem, btw:
> >>
> >> https://www.sr71.net/~dave/intel/bloat-xsave-gunk-2.patch
> >
> > Yeah, so I got this prototype hardware boot crash reported in private mail and
> > decoded it and after some debugging I suggested the +PAGE_SIZE hack - possibly you
> > got that hack from the same person?
>
> Nope, I came up with that gem of a patch all on my own.
:)
> I also wouldn't characterize this as prototype hardware. There are obviously
> plenty of folks depending on mainline to boot and function on hardware that has
> AVX-512 support. That's why two different Intel folks came to you
> independently.
Yeah, so I treat it as a regression even if it's unreleased hw, what matters to
regressions is number of people affected, plus that the kernel should work for a
reasonable set of future hardware as well, without much trouble.
Just curious: does any released hardware have AVX-512? I went by Wikipedia, which
seems to list pre-release hw:
https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512
Intel
Xeon Phi Knights Landing: AVX-512 F, CDI, PFI and ERI[1] in 2015[6]
Xeon Skylake: AVX-512 F, CDI, VL, BW, and DQ[7] in 2015[8]
Cannonlake (speculation)
> > My suggestion was to solve this properly: if we list xstate features as
> > supported then we should size their max size correctly. The AVX bits are
> > currently not properly enumerated and sized - and I refuse to add feature
> > support to the kernel where per task CPU state fields that the kernel
> > saves/restores are opaque...
>
> We might know the size and composition of the individual components, but we do
> not know the size of the buffer. Different implementations of a given feature
> are quite free to have different data stored in the buffer, or even to rearrange
> or pad it. That's why the sizes are not explicitly called out by the
> architecture and why we enumerated them before your patch that caused this
> regression.
But we _have_ to know their structure and layout of the XSAVE context for any
reasonable ptrace and signal frame support. Can you set/get AVX-512 registers via
ptrace? MPX state?
That's one of the reasons why I absolutely hate how this 'opaque per task CPU
context blob' concept snuck into the x86 code via the XSAVE patches without proper
enumeration of the data structures, sorry...
It makes it way too easy to 'support' CPU features without actually doing a good
job of it - and in fact it makes certain reasonable things impossible or very,
very hard, which makes me nervous.
But we'll fix the boot regression, no argument about that!
> The component itself may not be opaque, but the size of the *buffer* is not a
> simple sum of the component sizes. Here's a real-world example:
>
> [ 0.000000] x86/fpu: xstate_offset[2]: 0240, xstate_sizes[2]: 0100
> [ 0.000000] x86/fpu: xstate_offset[3]: 03c0, xstate_sizes[3]: 0040
>
> Notice that component 3 is not at 0x240+0x100. This means our existing
> init_xstate_size(), and why any attempt to staticlly-size the buffer is broken.
>
> I understand why you were misled by it, but the old "xsave_hdr_struct" was
> wrong. Fenghua even posted patches to remove it before the FPU rework (you were
> cc'd):
>
> https://lkml.org/lkml/2015/4/18/164
Yeah, so I thought the worst bugs were fixed and that these would re-emerge on top
of the new code.
Whether we have a static limit or not is orthogonal to the issue of sizing it
properly - and the plan was to have a dynamic context area in any case.
> > So please add proper AVX512 support structures to fpu/types.h and size
> > XSTATE_RESERVE correctly - or alternatively we can remove the current
> > incomplete AVX512 bits.
>
> The old code sized the buffer in a fully architectural way and it worked. The
> CPU *tells* you how much memory the 'xsave' instruction is going to scribble on.
> The new code just merrily calls it and let it scribble away. This is as
> clear-cut a regression as I've ever seen.
This is a regression which we'll fix, but the 'old' dynamic code clearly did not
work for a long time, I'm sure you still remember my attempt at addressing the
worst fallout in:
e88221c50cad ("x86/fpu: Disable XSAVES* support for now")
Those kinds of totally non-working aspects were what made me nervous about the
opaque data structure aspect.
Because we can have dynamic sizing of the context area and non-opaque data
structures.
> The least we can do is detect that the kernel undersized the buffer and disable
> support for the features that do not fit. A very lightly tested patch to do
> that is attached. I'm not super eager to put that in to an -rc2 kernel though.
Ok, this approach looks good to me as an interim fix. I'll give it a whirl on
older hardware. I agree with you that it needs to be sized dynamically.
> This came out a lot more complicated than I would have liked.
>
> Instead of simply enabling all of the XSAVE features that we both know about and
> the CPU supports, we have to be careful to not overflow our buffer in
> 'init_fpstate.xsave'.
Yeah, and this can be fixed separately and on top of your fix: my plan during the
FPU rework was to move the context area to the end of task_struct and size it
dynamically.
This needs some (very minor) changes to kernel/fork.c to allow an architecture to
determine the full task_struct size dynamically - but looks very doable and clean.
Wanna try this, or should I?
> To do this, we enable each XSAVE feature and then ask the CPU how large that
> makes the buffer. If we would overflow the buffer we allocated, we turn off the
> feature.
>
> This means that no matter what the CPU does, we will not corrupt random memory
> like we do before this patch. It also means that we can fall back in a way
> which cripples the system the least.
Yes, agreed.
Thanks,
Ingo
* Ingo Molnar <[email protected]> wrote:
> > The least we can do is detect that the kernel undersized the buffer and
> > disable support for the features that do not fit. A very lightly tested patch
> > to do that is attached. I'm not super eager to put that in to an -rc2 kernel
> > though.
>
> Ok, this approach looks good to me as an interim fix. I'll give it a whirl on
> older hardware. I agree with you that it needs to be sized dynamically.
Hm, so this patch crashed the boot of 2 out of 3 systems that I tried :-/
But it does not really matter, as I think the dynamic allocation is the right fix
in any case (your last patch), so this patch should be moot.
Thanks,
Ingo
On 07/17/2015 12:45 AM, Ingo Molnar wrote:
> Just curious: does any released hardware have AVX-512? I went by Wikipedia, which
> seems to list pre-release hw:
>> We might know the size and composition of the individual components, but we do
>> not know the size of the buffer. Different implementations of a given feature
>> are quite free to have different data stored in the buffer, or even to rearrange
>> or pad it. That's why the sizes are not explicitly called out by the
>> architecture and why we enumerated them before your patch that caused this
>> regression.
>
> But we _have_ to know their structure and layout of the XSAVE context for any
> reasonable ptrace and signal frame support.
There are two different things here. One is the structure and layout
inside of the state components. That obviously needs full kernel
knowledge and can not be opaque, especially when the kernel needs to go
looking at it (like with MPX's BNDCSR for instance).
But, the relative layout of the components is up for grabs. The CPU is
completely free (architecturally) to pad components or rearrange things.
It's not opaque (it's fully enumerated in CPUID), but it's far from
something which is static or which we can realistically represent in a
static structure.
> Can you set/get AVX-512 registers via ptrace? MPX state?
The xsave buffer is just copied out to userspace with REGSET_XSTATE.
Userspace needs to do the same song and dance with CPUID to parse it
that the kernel does.
>> This came out a lot more complicated than I would have liked.
>>
>> Instead of simply enabling all of the XSAVE features that we both know about and
>> the CPU supports, we have to be careful to not overflow our buffer in
>> 'init_fpstate.xsave'.
>
> Yeah, and this can be fixed separately and on top of your fix: my plan during the
> FPU rework was to move the context area to the end of task_struct and size it
> dynamically.
>
> This needs some (very minor) changes to kernel/fork.c to allow an architecture to
> determine the full task_struct size dynamically - but looks very doable and clean.
> Wanna try this, or should I?
I think you already did this later in the thread.
* Dave Hansen <[email protected]> wrote:
> On 07/17/2015 12:45 AM, Ingo Molnar wrote:
> > Just curious: does any released hardware have AVX-512? I went by Wikipedia, which
> > seems to list pre-release hw:
>
>
> >> We might know the size and composition of the individual components, but we do
> >> not know the size of the buffer. Different implementations of a given feature
> >> are quite free to have different data stored in the buffer, or even to rearrange
> >> or pad it. That's why the sizes are not explicitly called out by the
> >> architecture and why we enumerated them before your patch that caused this
> >> regression.
> >
> > But we _have_ to know their structure and layout of the XSAVE context for any
> > reasonable ptrace and signal frame support.
>
> There are two different things here. One is the structure and layout inside of
> the state components. That obviously needs full kernel knowledge and can not be
> opaque, especially when the kernel needs to go looking at it (like with MPX's
> BNDCSR for instance).
>
> But, the relative layout of the components is up for grabs. The CPU is
> completely free (architecturally) to pad components or rearrange things.
>
> It's not opaque (it's fully enumerated in CPUID), but it's far from something
> which is static or which we can realistically represent in a static structure.
Ok, agreed.
> > Can you set/get AVX-512 registers via ptrace? MPX state?
>
> The xsave buffer is just copied out to userspace with REGSET_XSTATE. Userspace
> needs to do the same song and dance with CPUID to parse it that the kernel does.
Indeed - I missed REGSET_XSTATE and its interaction with
update_regset_xstate_info().
Good - I have no other complaints.
> > This needs some (very minor) changes to kernel/fork.c to allow an architecture
> > to determine the full task_struct size dynamically - but looks very doable and
> > clean. Wanna try this, or should I?
>
> I think you already did this later in the thread.
Yeah, wanted to get a fix for the regression to Linus ASAP. If we go changing core
code in kernel/fork.c we better have it in -rc3.
So right now I have these two applied:
0f6df268588f x86/fpu, sched: Dynamically allocate 'struct fpu'
218d096a24b4 x86/fpu, sched: Introduce CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and use it on x86
... do we need any of the other patches you sent to get working AVX512 support?
I think we should be fine, but I don't have the hardware.
Thanks,
Ingo
On 07/17/2015 12:32 PM, Ingo Molnar wrote:
> ... do we need any of the other patches you sent to get working AVX512 support?
> I think we should be fine, but I don't have the hardware.
I don't think so, but I'll go verify by actually running with those
x86/fpu commits.