Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031649AbbEER6s (ORCPT ); Tue, 5 May 2015 13:58:48 -0400 Received: from mail-wg0-f42.google.com ([74.125.82.42]:33105 "EHLO mail-wg0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030835AbbEER6o (ORCPT ); Tue, 5 May 2015 13:58:44 -0400 From: Ingo Molnar To: linux-kernel@vger.kernel.org Cc: Andy Lutomirski , Borislav Petkov , Dave Hansen , Fenghua Yu , "H. Peter Anvin" , Linus Torvalds , Oleg Nesterov , Thomas Gleixner Subject: [PATCH 000/208] big x86 FPU code rewrite Date: Tue, 5 May 2015 19:57:45 +0200 Message-Id: <1430848712-28064-1-git-send-email-mingo@kernel.org> X-Mailer: git-send-email 2.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 29098 Lines: 559 [Second part of the series - Gmail didn't like me sending so many mails.] Over the past 10 years the x86 FPU has organically grown into somewhat of a spaghetti monster that few (if any) kernel developers understand and which code few people enjoy to hack. Many people suggested over the years that it needs a major cleanup, and some time ago I went "what the heck" and started doing it step by step to see where it leads - it cannot be that hard! Three weeks and 200+ patches later I think I have to admit that I seriously underestimated the magnitude of the project! ;-) This work in progress series is large, but it I think makes the code maintainable and hackable again. It's pretty complete, as per the 9 high level goals laid out further below. Individual patches are all finegrained, so should be easy to review - Boris Petkov already reviewed most of the patches so they are not entirely raw. Individual patches have been tested heavily for bisectability, they were both build and boot on a relatively wide range of x86 hardware that I have access to. But nevertheless the changes are pretty invasive, so I'd expect there to be test failures. This is the only time I intend to post them to lkml in their entirety, to not spam lkml too much. (Future additions will be posted as delta series.) I'd like to ask interested people to test this tree, and to comment on the patches. The changes can be found in the following Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tmp.fpu (The tree might be rebased, depending on feedback.) Here are the main themes that motivated most of the changes: 1) I collected all FPU code into arch/x86/kernel/fpu/*.c and split it all up into the following, topically organized source code files: -rw-rw-r-- 1 mingo mingo 1423 May 5 16:36 arch/x86/kernel/fpu/bugs.c -rw-rw-r-- 1 mingo mingo 12206 May 5 16:36 arch/x86/kernel/fpu/core.c -rw-rw-r-- 1 mingo mingo 7342 May 5 16:36 arch/x86/kernel/fpu/init.c -rw-rw-r-- 1 mingo mingo 10909 May 5 16:36 arch/x86/kernel/fpu/measure.c -rw-rw-r-- 1 mingo mingo 9012 May 5 16:36 arch/x86/kernel/fpu/regset.c -rw-rw-r-- 1 mingo mingo 11188 May 5 16:36 arch/x86/kernel/fpu/signal.c -rw-rw-r-- 1 mingo mingo 10140 May 5 16:36 arch/x86/kernel/fpu/xstate.c Similarly I've collected and split up all FPU related header files, and organized them topically: -rw-rw-r-- 1 mingo mingo 1690 May 5 16:35 arch/x86/include/asm/fpu/api.h -rw-rw-r-- 1 mingo mingo 12937 May 5 16:36 arch/x86/include/asm/fpu/internal.h -rw-rw-r-- 1 mingo mingo 278 May 5 16:36 arch/x86/include/asm/fpu/measure.h -rw-rw-r-- 1 mingo mingo 596 May 5 16:35 arch/x86/include/asm/fpu/regset.h -rw-rw-r-- 1 mingo mingo 1013 May 5 16:35 arch/x86/include/asm/fpu/signal.h -rw-rw-r-- 1 mingo mingo 8137 May 5 16:36 arch/x86/include/asm/fpu/types.h -rw-rw-r-- 1 mingo mingo 5691 May 5 16:36 arch/x86/include/asm/fpu/xstate.h is the only 'public' API left, used in various drivers. I decoupled drivers and non-FPU x86 code from various FPU internals. 2) I renamed various internal data types, APIs and helpers, and organized its support functions accordingly. For example, all functions that deal with copying FPU registers in and out of the FPU, are now named consistently: copy_fxregs_to_kernel() # was: fpu_fxsave() copy_xregs_to_kernel() # was: xsave_state() copy_kernel_to_fregs() # was: frstor_checking() copy_kernel_to_fxregs() # was: fxrstor_checking() copy_kernel_to_xregs() # was: fpu_xrstor_checking() copy_kernel_to_xregs_booting() # was: xrstor_state_booting() copy_fregs_to_user() # was: fsave_user() copy_fxregs_to_user() # was: fxsave_user() copy_xregs_to_user() # was: xsave_user() copy_user_to_fregs() # was: frstor_user() copy_user_to_fxregs() # was: fxrstor_user() copy_user_to_xregs() # was: xrestore_user() copy_user_to_fpregs_zeroing() # was: restore_user_xstate() 'xregs' stands for registers supported by XSAVE 'fxregs' stands for registers supported by FXSAVE 'fregs' stands for registers supported by FSAVE 'fpregs' stands for generic FPU registers. Similarly, the high level FPU functions got reorganized as well: extern void fpu__activate_curr(struct fpu *fpu); extern void fpu__activate_stopped(struct fpu *fpu); extern void fpu__save(struct fpu *fpu); extern void fpu__restore(struct fpu *fpu); extern int fpu__restore_sig(void __user *buf, int ia32_frame); extern void fpu__drop(struct fpu *fpu); extern int fpu__copy(struct fpu *dst_fpu, struct fpu *src_fpu); extern void fpu__clear(struct fpu *fpu); extern int fpu__exception_code(struct fpu *fpu, int trap_nr); Those functions that used to take a task_struct argument now take the more limited 'struct fpu' argument, and their naming is consistent and logical as well. Likewise, the FP state data types are now consistently named as well: struct fregs_state; struct fxregs_state; struct swregs_state; struct xregs_state; union fpregs_state; 3) Various core data types got streamlined around four byte flags in 'struct fpu': fpu->fpstate_active # was: tsk->flags & PF_USED_MATH fpu->fpregs_active # was: fpu->has_fpu fpu->last_cpu fpu->counter which now fit into a single word. 4) task->thread.fpu->state got embedded again, as task->thread.fpu.state. This eliminated a lot of awkward late dynamic memory allocation of FPU state and the problematic handling of failures. Note that while the allocation is static right now, this is a WIP interim state: we can still do dynamic allocation of FPU state, by moving the FPU state last in task_struct and then allocating task_struct accordingly. 5) The amazingly convoluted init dependencies got sorted out, into two cleanly separated families of initialization functions: the fpu__init_system_*() functions, and the fpu__init_cpu_*() functions. This allowed the removal of various __init annotation hacks and obscure boot time checks. 6) Decoupled the FPU core from the save code. xsave.c and xsave.h got shrunk quite a bit, and it now hosts only XSAVE/etc. related functionality, not generic FPU handling functions. 7) Added a ton of comments explaining how things works and why, hopefully making this code accessible to everyone interested. 8) Added FPU debugging code (CONFIG_X86_DEBUG_FPU=y) and added an FPU hw benchmarking subsystem (CONFIG_X86_DEBUG_FPU_MEASUREMENTS=y), which performs boot time measurements like: x86/fpu:################################################################## x86/fpu: Running FPU performance measurement suite (cache hot): x86/fpu: Cost of: null : 108 cycles x86/fpu:######## CPU instructions: ############################ x86/fpu: Cost of: NOP insn : 0 cycles x86/fpu: Cost of: RDTSC insn : 12 cycles x86/fpu: Cost of: RDMSR insn : 100 cycles x86/fpu: Cost of: WRMSR insn : 396 cycles x86/fpu: Cost of: CLI insn same-IF : 0 cycles x86/fpu: Cost of: CLI insn flip-IF : 0 cycles x86/fpu: Cost of: STI insn same-IF : 0 cycles x86/fpu: Cost of: STI insn flip-IF : 0 cycles x86/fpu: Cost of: PUSHF insn : 0 cycles x86/fpu: Cost of: POPF insn same-IF : 20 cycles x86/fpu: Cost of: POPF insn flip-IF : 28 cycles x86/fpu:######## IRQ save/restore APIs: ############################ x86/fpu: Cost of: local_irq_save() fn : 20 cycles x86/fpu: Cost of: local_irq_restore() fn same-IF : 24 cycles x86/fpu: Cost of: local_irq_restore() fn flip-IF : 28 cycles x86/fpu: Cost of: irq_save()+restore() fn same-IF : 48 cycles x86/fpu: Cost of: irq_save()+restore() fn flip-IF : 48 cycles x86/fpu:######## locking APIs: ############################ x86/fpu: Cost of: smp_mb() fn : 40 cycles x86/fpu: Cost of: cpu_relax() fn : 8 cycles x86/fpu: Cost of: spin_lock()+unlock() fn : 64 cycles x86/fpu: Cost of: read_lock()+unlock() fn : 76 cycles x86/fpu: Cost of: write_lock()+unlock() fn : 52 cycles x86/fpu: Cost of: rcu_read_lock()+unlock() fn : 16 cycles x86/fpu: Cost of: preempt_disable()+enable() fn : 20 cycles x86/fpu: Cost of: mutex_lock()+unlock() fn : 56 cycles x86/fpu:######## MM instructions: ############################ x86/fpu: Cost of: __flush_tlb() fn : 132 cycles x86/fpu: Cost of: __flush_tlb_global() fn : 920 cycles x86/fpu: Cost of: __flush_tlb_one() fn : 288 cycles x86/fpu: Cost of: __flush_tlb_range() fn : 412 cycles x86/fpu:######## FPU instructions: ############################ x86/fpu: Cost of: CR0 read : 4 cycles x86/fpu: Cost of: CR0 write : 208 cycles x86/fpu: Cost of: CR0::TS fault : 1156 cycles x86/fpu: Cost of: FNINIT insn : 76 cycles x86/fpu: Cost of: FWAIT insn : 0 cycles x86/fpu: Cost of: FSAVE insn : 168 cycles x86/fpu: Cost of: FRSTOR insn : 160 cycles x86/fpu: Cost of: FXSAVE insn : 84 cycles x86/fpu: Cost of: FXRSTOR insn : 44 cycles x86/fpu: Cost of: FXRSTOR fault : 688 cycles x86/fpu: Cost of: XSAVE insn : 104 cycles x86/fpu: Cost of: XRSTOR insn : 80 cycles x86/fpu: Cost of: XRSTOR fault : 884 cycles x86/fpu:################################################################## Based on such measurements we'll be able to do performance tuning, set default policies and do optimizations in a more informed fashion, as the speed of various x86 hardware varies a lot. 9) Reworked many ancient inlining and uninlining decisions based on modern principles. Any feedback is welcome! Thanks, Ingo ===== Ingo Molnar (208): x86/fpu: Rename unlazy_fpu() to fpu__save() x86/fpu: Add comments to fpu__save() and restrict its export x86/fpu: Add debugging check to fpu__save() x86/fpu: Rename fpu_detect() to fpu__detect() x86/fpu: Remove stale init_fpu() prototype x86/fpu: Split an fpstate_alloc_init() function out of init_fpu() x86/fpu: Make init_fpu() static x86/fpu: Rename init_fpu() to fpu__unlazy_stopped() and add debugging check x86/fpu: Optimize fpu__unlazy_stopped() x86/fpu: Simplify fpu__unlazy_stopped() x86/fpu: Remove fpu_allocated() x86/fpu: Move fpu_alloc() out of line x86/fpu: Rename fpu_alloc() to fpstate_alloc() x86/fpu: Rename fpu_free() to fpstate_free() x86/fpu: Rename fpu_finit() to fpstate_init() x86/fpu: Rename fpu_init() to fpu__cpu_init() x86/fpu: Rename init_thread_xstate() to fpstate_xstate_init_size() x86/fpu: Move thread_info::fpu_counter into thread_info::fpu.counter x86/fpu: Improve the comment for the fpu::counter field x86/fpu: Move FPU data structures to asm/fpu_types.h x86/fpu: Clean up asm/fpu/types.h x86/fpu: Move i387.c and xsave.c to arch/x86/kernel/fpu/ x86/fpu: Fix header file dependencies of fpu-internal.h x86/fpu: Split out the boot time FPU init code into fpu/init.c x86/fpu: Remove unnecessary includes from core.c x86/fpu: Move the no_387 handling and FPU detection code into init.c x86/fpu: Remove the free_thread_xstate() complication x86/fpu: Factor out fpu__flush_thread() from flush_thread() x86/fpu: Move math_state_restore() to fpu/core.c x86/fpu: Rename math_state_restore() to fpu__restore() x86/fpu: Factor out the FPU bug detection code into fpu__init_check_bugs() x86/fpu: Simplify the xsave_state*() methods x86/fpu: Remove fpu_xsave() x86/fpu: Move task_xstate_cachep handling to core.c x86/fpu: Factor out fpu__copy() x86/fpu: Uninline fpstate_free() and move it next to the allocation function x86/fpu: Make task_xstate_cachep static x86/fpu: Make kernel_fpu_disable/enable() static x86/fpu: Add debug check to kernel_fpu_disable() x86/fpu: Add kernel_fpu_disabled() x86/fpu: Remove __save_init_fpu() x86/fpu: Move fpu_copy() to fpu/core.c x86/fpu: Add debugging check to fpu_copy() x86/fpu: Print out whether we are doing lazy/eager FPU context switches x86/fpu: Eliminate the __thread_has_fpu() wrapper x86/fpu: Change __thread_clear_has_fpu() to 'struct fpu' parameter x86/fpu: Move 'PER_CPU(fpu_owner_task)' to fpu/core.c x86/fpu: Change fpu_owner_task to fpu_fpregs_owner_ctx x86/fpu: Remove 'struct task_struct' usage from __thread_set_has_fpu() x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_end() x86/fpu: Remove 'struct task_struct' usage from __thread_fpu_begin() x86/fpu: Open code PF_USED_MATH usages x86/fpu: Document fpu__unlazy_stopped() x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active x86/fpu: Remove 'struct task_struct' usage from drop_fpu() x86/fpu: Remove task_disable_lazy_fpu_restore() x86/fpu: Use 'struct fpu' in fpu_lazy_restore() x86/fpu: Use 'struct fpu' in restore_fpu_checking() x86/fpu: Use 'struct fpu' in fpu_reset_state() x86/fpu: Use 'struct fpu' in switch_fpu_prepare() x86/fpu: Use 'struct fpu' in switch_fpu_finish() x86/fpu: Move __save_fpu() into fpu/core.c x86/fpu: Use 'struct fpu' in __fpu_save() x86/fpu: Use 'struct fpu' in fpu__save() x86/fpu: Use 'struct fpu' in fpu_copy() x86/fpu: Use 'struct fpu' in fpu__copy() x86/fpu: Use 'struct fpu' in fpstate_alloc_init() x86/fpu: Use 'struct fpu' in fpu__unlazy_stopped() x86/fpu: Rename fpu__flush_thread() to fpu__clear() x86/fpu: Clean up fpu__clear() a bit x86/fpu: Rename i387.h to fpu/api.h x86/fpu: Move xsave.h to fpu/xsave.h x86/fpu: Rename fpu-internal.h to fpu/internal.h x86/fpu: Move MXCSR_DEFAULT to fpu/internal.h x86/fpu: Remove xsave_init() __init obfuscation x86/fpu: Remove assembly guard from asm/fpu/api.h x86/fpu: Improve FPU detection kernel messages x86/fpu: Print supported xstate features in human readable way x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask' x86/fpu: Rename 'xstate_features' to 'xfeatures_nr' x86/fpu: Move XCR0 manipulation to the FPU code proper x86/fpu: Clean up regset functions x86/fpu: Rename 'xsave_hdr' to 'header' x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures' x86/fpu: Clean up and fix MXCSR handling x86/fpu: Rename regset FPU register accessors x86/fpu: Explain the AVX register layout in the xsave area x86/fpu: Improve the __sanitize_i387_state() documentation x86/fpu: Rename fpu->has_fpu to fpu->fpregs_active x86/fpu: Rename __thread_set_has_fpu() to __fpregs_activate() x86/fpu: Rename __thread_clear_has_fpu() to __fpregs_deactivate() x86/fpu: Rename __thread_fpu_begin() to fpregs_activate() x86/fpu: Rename __thread_fpu_end() to fpregs_deactivate() x86/fpu: Remove fpstate_xstate_init_size() boot quirk x86/fpu: Remove xsave_init() bootmem allocations x86/fpu: Make setup_init_fpu_buf() run-once explicitly x86/fpu: Remove 'init_xstate_buf' bootmem allocation x86/fpu: Split fpu__cpu_init() into early-boot and cpu-boot parts x86/fpu: Make the system/cpu init distinction clear in the xstate code as well x86/fpu: Move CPU capability check into fpu__init_cpu_xstate() x86/fpu: Move legacy check to fpu__init_system_xstate() x86/fpu: Propagate once per boot quirk into fpu__init_system_xstate() x86/fpu: Remove xsave_init() x86/fpu: Do fpu__init_system_xstate only from fpu__init_system() x86/fpu: Set up the legacy FPU init image from fpu__init_system() x86/fpu: Remove setup_init_fpu_buf() call from eager_fpu_init() x86/fpu: Move all eager-fpu setup code to eager_fpu_init() x86/fpu: Move eager_fpu_init() to fpu/init.c x86/fpu: Clean up eager_fpu_init() and rename it to fpu__ctx_switch_init() x86/fpu: Split fpu__ctx_switch_init() into _cpu() and _system() portions x86/fpu: Do CLTS fpu__init_system() x86/fpu: Move the fpstate_xstate_init_size() call into fpu__init_system() x86/fpu: Call fpu__init_cpu_ctx_switch() from fpu__init_cpu() x86/fpu: Do system-wide setup from fpu__detect() x86/fpu: Remove fpu__init_cpu_ctx_switch() call from fpu__init_system() x86/fpu: Simplify fpu__cpu_init() x86/fpu: Factor out fpu__init_cpu_generic() x86/fpu: Factor out fpu__init_system_generic() x86/fpu: Factor out fpu__init_system_early_generic() x86/fpu: Move !FPU check ingo fpu__init_system_early_generic() x86/fpu: Factor out FPU bug checks into fpu/bugs.c x86/fpu: Make check_fpu() init ordering independent x86/fpu: Move fpu__init_system_early_generic() out of fpu__detect() x86/fpu: Remove the extra fpu__detect() layer x86/fpu: Rename fpstate_xstate_init_size() to fpu__init_system_xstate_size_legacy() x86/fpu: Reorder init methods x86/fpu: Add more comments to the FPU init code x86/fpu: Move fpu__save() to fpu/internals.h x86/fpu: Uninline kernel_fpu_begin()/end() x86/fpu: Move various internal function prototypes to fpu/internal.h x86/fpu: Uninline the irq_ts_save()/restore() functions x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() x86/fpu: Optimize copy_fpregs_to_fpstate() by removing the FNCLEX synchronization with FP exceptions x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) x86/fpu: Remove failure paths from fpstate-alloc low level functions x86/fpu: Remove failure return from fpstate_alloc_init() x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr() x86/fpu: Simplify fpu__unlazy_stopped() error handling x86/fpu, kvm: Simplify fx_init() x86/fpu: Simplify fpstate_init_curr() usage x86/fpu: Rename fpu__unlazy_stopped() to fpu__activate_stopped() x86/fpu: Factor out FPU hw activation/deactivation x86/fpu: Simplify __save_fpu() x86/fpu: Eliminate __save_fpu() x86/fpu: Simplify fpu__save() x86/fpu: Optimize fpu__save() x86/fpu: Optimize fpu_copy() x86/fpu: Optimize fpu_copy() some more on lazy switching systems x86/fpu: Rename fpu/xsave.h to fpu/xstate.h x86/fpu: Rename fpu/xsave.c to fpu/xstate.c x86/fpu: Introduce cpu_has_xfeatures(xfeatures_mask, feature_name) x86/fpu: Simplify print_xstate_features() x86/fpu: Enumerate xfeature bits x86/fpu: Move xfeature type enumeration to fpu/types.h x86/fpu, crypto x86/camellia_aesni_avx: Simplify the camellia_aesni_init() xfeature checks x86/fpu, crypto x86/sha256_ssse3: Simplify the sha256_ssse3_mod_init() xfeature checks x86/fpu, crypto x86/camellia_aesni_avx2: Simplify the camellia_aesni_init() xfeature checks x86/fpu, crypto x86/twofish_avx: Simplify the twofish_init() xfeature checks x86/fpu, crypto x86/serpent_avx: Simplify the serpent_init() xfeature checks x86/fpu, crypto x86/cast5_avx: Simplify the cast5_init() xfeature checks x86/fpu, crypto x86/sha512_ssse3: Simplify the sha512_ssse3_mod_init() xfeature checks x86/fpu, crypto x86/cast6_avx: Simplify the cast6_init() xfeature checks x86/fpu, crypto x86/sha1_ssse3: Simplify the sha1_ssse3_mod_init() xfeature checks x86/fpu, crypto x86/serpent_avx2: Simplify the init() xfeature checks x86/fpu, crypto x86/sha1_mb: Remove FPU internal headers from sha1_mb.c x86/fpu: Move asm/xcr.h to asm/fpu/internal.h x86/fpu: Rename sanitize_i387_state() to fpstate_sanitize_xstate() x86/fpu: Simplify fpstate_sanitize_xstate() calls x86/fpu: Pass 'struct fpu' to fpstate_sanitize_xstate() x86/fpu: Rename save_xstate_sig() to copy_fpstate_to_sigframe() x86/fpu: Rename save_user_xstate() to copy_fpregs_to_sigframe() x86/fpu: Clarify ancient comments in fpu__restore() x86/fpu: Rename user_has_fpu() to fpregs_active() x86/fpu: Initialize fpregs in fpu__init_cpu_generic() x86/fpu: Clean up fpu__clear() state handling x86/alternatives, x86/fpu: Add 'alternatives_patched' debug flag and use it in xsave_state() x86/fpu: Synchronize the naming of drop_fpu() and fpu_reset_state() x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs() x86/fpu: Move all the fpu__*() high level methods closer to each other x86/fpu: Move fpu__clear() to 'struct fpu *' parameter passing x86/fpu: Rename restore_xstate_sig() to fpu__restore_sig() x86/fpu: Move the signal frame handling code closer to each other x86/fpu: Merge fpu__reset() and fpu__clear() x86/fpu: Move is_ia32*frame() helpers out of fpu/internal.h x86/fpu: Split out fpu/signal.h from fpu/internal.h for signal frame handling functions x86/fpu: Factor out fpu/regset.h from fpu/internal.h x86/fpu: Remove run-once init quirks x86/fpu: Factor out the exception error code handling code x86/fpu: Harmonize the names of the fpstate_init() helper functions x86/fpu: Create 'union thread_xstate' helper for fpstate_init() x86/fpu: Generalize 'init_xstate_ctx' x86/fpu: Move restore_init_xstate() out of fpu/internal.h x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions x86/fpu: Factor out fpu/signal.c x86/fpu: Factor out the FPU regset code into fpu/regset.c x86/fpu: Harmonize FPU register state types x86/fpu: Change fpu->fpregs_active from 'int' to 'char', add lazy switching comments x86/fpu: Document the various fpregs state formats x86/fpu: Move debugging check from kernel_fpu_begin() to __kernel_fpu_begin() x86/fpu/xstate: Don't assume the first zero xfeatures zero bit means the end x86/fpu: Clean up xstate feature reservation x86/fpu/xstate: Clean up setup_xstate_comp() call x86/fpu/init: Propagate __init annotations x86/fpu: Pass 'struct fpu' to fpu__restore() x86/fpu: Fix the 'nofxsr' boot parameter to also clear X86_FEATURE_FXSR_OPT x86/fpu: Add CONFIG_X86_DEBUG_FPU=y FPU debugging code x86/fpu: Add FPU performance measurement subsystem x86/fpu: Reorganize fpu/internal.h Documentation/preempt-locking.txt | 2 +- arch/x86/Kconfig.debug | 27 ++ arch/x86/crypto/aesni-intel_glue.c | 2 +- arch/x86/crypto/camellia_aesni_avx2_glue.c | 15 +- arch/x86/crypto/camellia_aesni_avx_glue.c | 15 +- arch/x86/crypto/cast5_avx_glue.c | 15 +- arch/x86/crypto/cast6_avx_glue.c | 15 +- arch/x86/crypto/crc32-pclmul_glue.c | 2 +- arch/x86/crypto/crc32c-intel_glue.c | 3 +- arch/x86/crypto/crct10dif-pclmul_glue.c | 2 +- arch/x86/crypto/fpu.c | 2 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 2 +- arch/x86/crypto/serpent_avx2_glue.c | 15 +- arch/x86/crypto/serpent_avx_glue.c | 15 +- arch/x86/crypto/sha-mb/sha1_mb.c | 5 +- arch/x86/crypto/sha1_ssse3_glue.c | 16 +- arch/x86/crypto/sha256_ssse3_glue.c | 16 +- arch/x86/crypto/sha512_ssse3_glue.c | 16 +- arch/x86/crypto/twofish_avx_glue.c | 16 +- arch/x86/ia32/ia32_signal.c | 13 +- arch/x86/include/asm/alternative.h | 6 + arch/x86/include/asm/crypto/glue_helper.h | 2 +- arch/x86/include/asm/efi.h | 2 +- arch/x86/include/asm/fpu-internal.h | 626 --------------------------------------- arch/x86/include/asm/fpu/api.h | 48 +++ arch/x86/include/asm/fpu/internal.h | 488 ++++++++++++++++++++++++++++++ arch/x86/include/asm/fpu/measure.h | 13 + arch/x86/include/asm/fpu/regset.h | 21 ++ arch/x86/include/asm/fpu/signal.h | 33 +++ arch/x86/include/asm/fpu/types.h | 293 ++++++++++++++++++ arch/x86/include/asm/{xsave.h => fpu/xstate.h} | 60 ++-- arch/x86/include/asm/i387.h | 108 ------- arch/x86/include/asm/kvm_host.h | 2 - arch/x86/include/asm/mpx.h | 8 +- arch/x86/include/asm/processor.h | 141 +-------- arch/x86/include/asm/simd.h | 2 +- arch/x86/include/asm/stackprotector.h | 2 + arch/x86/include/asm/suspend_32.h | 2 +- arch/x86/include/asm/suspend_64.h | 2 +- arch/x86/include/asm/user.h | 12 +- arch/x86/include/asm/xcr.h | 49 --- arch/x86/include/asm/xor.h | 2 +- arch/x86/include/asm/xor_32.h | 2 +- arch/x86/include/asm/xor_avx.h | 2 +- arch/x86/include/uapi/asm/sigcontext.h | 8 +- arch/x86/kernel/Makefile | 2 +- arch/x86/kernel/alternative.c | 5 + arch/x86/kernel/cpu/bugs.c | 57 +--- arch/x86/kernel/cpu/bugs_64.c | 2 + arch/x86/kernel/cpu/common.c | 29 +- arch/x86/kernel/fpu/Makefile | 11 + arch/x86/kernel/fpu/bugs.c | 71 +++++ arch/x86/kernel/fpu/core.c | 509 +++++++++++++++++++++++++++++++ arch/x86/kernel/fpu/init.c | 288 ++++++++++++++++++ arch/x86/kernel/fpu/measure.c | 509 +++++++++++++++++++++++++++++++ arch/x86/kernel/fpu/regset.c | 356 ++++++++++++++++++++++ arch/x86/kernel/fpu/signal.c | 404 +++++++++++++++++++++++++ arch/x86/kernel/fpu/xstate.c | 406 +++++++++++++++++++++++++ arch/x86/kernel/i387.c | 656 ---------------------------------------- arch/x86/kernel/process.c | 52 +--- arch/x86/kernel/process_32.c | 15 +- arch/x86/kernel/process_64.c | 13 +- arch/x86/kernel/ptrace.c | 12 +- arch/x86/kernel/signal.c | 38 ++- arch/x86/kernel/smpboot.c | 3 +- arch/x86/kernel/traps.c | 120 ++------ arch/x86/kernel/xsave.c | 724 --------------------------------------------- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/vmx.c | 5 +- arch/x86/kvm/x86.c | 68 ++--- arch/x86/lguest/boot.c | 2 +- arch/x86/lib/mmx_32.c | 2 +- arch/x86/math-emu/fpu_aux.c | 4 +- arch/x86/math-emu/fpu_entry.c | 20 +- arch/x86/math-emu/fpu_system.h | 2 +- arch/x86/mm/mpx.c | 15 +- arch/x86/power/cpu.c | 11 +- arch/x86/xen/enlighten.c | 2 +- drivers/char/hw_random/via-rng.c | 2 +- drivers/crypto/padlock-aes.c | 2 +- drivers/crypto/padlock-sha.c | 2 +- drivers/lguest/x86/core.c | 12 +- lib/raid6/x86.h | 2 +- 83 files changed, 3742 insertions(+), 2841 deletions(-) delete mode 100644 arch/x86/include/asm/fpu-internal.h create mode 100644 arch/x86/include/asm/fpu/api.h create mode 100644 arch/x86/include/asm/fpu/internal.h create mode 100644 arch/x86/include/asm/fpu/measure.h create mode 100644 arch/x86/include/asm/fpu/regset.h create mode 100644 arch/x86/include/asm/fpu/signal.h create mode 100644 arch/x86/include/asm/fpu/types.h rename arch/x86/include/asm/{xsave.h => fpu/xstate.h} (77%) delete mode 100644 arch/x86/include/asm/i387.h delete mode 100644 arch/x86/include/asm/xcr.h create mode 100644 arch/x86/kernel/fpu/Makefile create mode 100644 arch/x86/kernel/fpu/bugs.c create mode 100644 arch/x86/kernel/fpu/core.c create mode 100644 arch/x86/kernel/fpu/init.c create mode 100644 arch/x86/kernel/fpu/measure.c create mode 100644 arch/x86/kernel/fpu/regset.c create mode 100644 arch/x86/kernel/fpu/signal.c create mode 100644 arch/x86/kernel/fpu/xstate.c delete mode 100644 arch/x86/kernel/i387.c delete mode 100644 arch/x86/kernel/xsave.c -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/