Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3538685imu; Wed, 7 Nov 2018 11:53:46 -0800 (PST) X-Google-Smtp-Source: AJdET5cLanLW2WgMy5gZC3plrllWoajgOya4SD9pY2k7og8I/DXrrRaqvdG7QOkJAJ0XWsnxCbm6 X-Received: by 2002:a62:6981:: with SMTP id e123-v6mr1607026pfc.104.1541620426042; Wed, 07 Nov 2018 11:53:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1541620426; cv=none; d=google.com; s=arc-20160816; b=ZpS2MUiR4KGF94EkObcDa5tPGLW+e7rDFcbzDluwJw6PvCGnRlCCdofIsz78TbxaHE PyQgpG5Y6KvgIumjdzRr90mskK+QswbooajWOgLi+GC2I0c6uzLtVRzB7G5XfTB56oTj H0TsaVLhuuYv6yGKQuae3cQDy3GkCrBU24lqibtOdT5o7H99DAnzvfbo1brhGeMa8cc/ oIjPl5Ek5vII1mngMxPeG+txqLgcCg0OPSRsnyxg+yGAxeE+99ARW6nud3q6MLMq7z0s 1srs+GVuqMNtBkWFOXSjWkjhvvomrwQmGk3YGzYh5lWwKn+X43H1Efou03oasV8OjMFu PUcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=EFOC/WNQ+6oDgfYr5EvJZo8lPZ6peEyqvXRuw5P3Qi4=; b=Bo5M3uR6to2i2nyX/JAvUb4q2Zi9fzF0nga14xym9BJujsUpYXOef7HzYTzg8bXD3c VfO5Fy6yRRVTFobnwaqTAP8Sau6RP9tV5HewWBbAQYWeX9Bj7JZqugjrk1iFzUzvPy+T D8BicCh6fHO6m3s1ET0cmLZVqB/ZwZ8IEmbJBFvyyJJ064fCTSAsDaZmkzM9CxpT8bjI hl7bQbzd9tglm3RpJP13ysqfjmb1nNQawtU6N2Fge5qlgGIHWOZ1tX3OKDf/cX1UcmLx KcUuR3g4A8SRO5SYesuAHg+6i8M21FAPVqKWN/G2rnFTKY4/KS56+4g+brHLU+XgNzk5 RB7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k18si1548245pgf.97.2018.11.07.11.53.27; Wed, 07 Nov 2018 11:53:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727244AbeKHFXx (ORCPT + 99 others); Thu, 8 Nov 2018 00:23:53 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:41829 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726334AbeKHFVC (ORCPT ); Thu, 8 Nov 2018 00:21:02 -0500 Received: from localhost ([127.0.0.1] helo=bazinga.breakpoint.cc) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1gKTp3-00070Q-Ut; Wed, 07 Nov 2018 20:49:06 +0100 From: Sebastian Andrzej Siewior To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, Andy Lutomirski , Paolo Bonzini , =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= , kvm@vger.kernel.org, "Jason A. Donenfeld" , Rik van Riel , Dave Hansen Subject: [PATCH v4] x86: load FPU registers on return to userland Date: Wed, 7 Nov 2018 20:48:35 +0100 Message-Id: <20181107194858.9380-1-bigeasy@linutronix.de> X-Mailer: git-send-email 2.19.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a refurbished series originally started by by Rik van Riel. The goal is load the FPU registers on return to userland and not on every context switch. By this optimisation we can: - avoid loading the registers if the task stays in kernel and does not return to userland - make kernel_fpu_begin() cheaper: it only saves the registers on the first invocation. The second invocation does not need save them again. To access the FPU registers in kernel we need: - disable preemption to avoid that the scheduler switches tasks. By doing so it would set TIF_NEED_FPU_LOAD and the FPU registers would be not valid. - disable BH because the softirq might use kernel_fpu_begin() and then set TIF_NEED_FPU_LOAD instead loading the FPU registers on completion. v3…v4: It has been suggested to remove the `initialized' member of the struct fpu because it should not required be needed with lazy-FPU-restore and would make the review easier. This is the first part of the series, the second is basically the rebase of the v3 queue. As a result, the diffstat became negative (which wasn't the case in previous version) :) I tried to incorporate all the review comments that came up, some of them were "outdated" after the removal of the `initialized' member. I'm sorry should I missed any. v1…v3: v2 was never posted. I followed the idea to completely decouple PKRU from xstate. This didn't quite work and made a few things complicated. One obvious required fixup is copy_fpstate_to_sigframe() where the PKRU state needs to be fiddled into xstate. This required another xfeatures_mask so that the sanity checks were performed and xstate_offsets would be computed. Additionally ptrace also reads/sets xstate in order to get/set the register and PKRU is one of them. So this would need some fiddle, too. In v3 I dropped that decouple idea. I also learned that the wrpkru instruction is not privileged and so caching it in kernel does not work. Instead I keep PKRU in xstate area and load it at context switch time while the remaining registers are deferred (until return to userland). The offset of PKRU within xstate is enumerated at boot time so why not use it. Rik van Riel (5): x86/fpu: Add (__)make_fpregs_active helpers x86/fpu: Eager switch PKRU state x86/fpu: Always store the registers in copy_fpstate_to_sigframe() x86/fpu: Prepare copy_fpstate_to_sigframe() for TIF_NEED_FPU_LOAD x86/fpu: Defer FPU state load until return to userspace Sebastian Andrzej Siewior (18): x86/fpu: Use ULL for shift in xfeature_uncompacted_offset() x86/fpu: Remove fpu->initialized usage in __fpu__restore_sig() x86/fpu: Remove fpu__restore() x86/entry/32: Remove asm/math_emu.h include x86/fpu: Remove preempt_disable() in fpu__clear() x86/fpu: Always init the `state' in fpu__clear() x86/fpu: Remove fpu->initialized usage in copy_fpstate_to_sigframe() x86/fpu: Remove fpu->initialized x86/fpu: Remove user_fpu_begin() x86/entry: Remove _TIF_ALLWORK_MASK x86/fpu: Make __raw_xsave_addr() use feature number instead of mask x86/fpu: Make get_xsave_field_ptr() and get_xsave_addr() use feature number instead of mask x86/pkeys: Make init_pkru_value static x86/fpu: Only write PKRU if it is different from current x86/pkeys: Don't check if PKRU is zero before writting it x86/entry: Add TIF_NEED_FPU_LOAD x86/fpu: Update xstate's PKRU value on write_pkru() x86/fpu: Don't restore the FPU state directly from userland in __fpu__restore_sig() Documentation/preempt-locking.txt | 1 - arch/x86/entry/common.c | 8 ++ arch/x86/ia32/ia32_signal.c | 17 +-- arch/x86/include/asm/fpu/api.h | 25 ++++ arch/x86/include/asm/fpu/internal.h | 149 ++++++---------------- arch/x86/include/asm/fpu/signal.h | 2 +- arch/x86/include/asm/fpu/types.h | 9 -- arch/x86/include/asm/fpu/xstate.h | 6 +- arch/x86/include/asm/pgtable.h | 19 ++- arch/x86/include/asm/special_insns.h | 13 +- arch/x86/include/asm/thread_info.h | 10 +- arch/x86/include/asm/trace/fpu.h | 8 +- arch/x86/kernel/fpu/core.c | 181 +++++++++++++-------------- arch/x86/kernel/fpu/init.c | 2 - arch/x86/kernel/fpu/regset.c | 24 +--- arch/x86/kernel/fpu/signal.c | 133 +++++++++----------- arch/x86/kernel/fpu/xstate.c | 45 ++++--- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/process_32.c | 14 +-- arch/x86/kernel/process_64.c | 11 +- arch/x86/kernel/signal.c | 17 ++- arch/x86/kernel/traps.c | 2 +- arch/x86/kvm/x86.c | 47 ++++--- arch/x86/math-emu/fpu_entry.c | 3 - arch/x86/mm/mpx.c | 6 +- arch/x86/mm/pkeys.c | 15 +-- 26 files changed, 343 insertions(+), 426 deletions(-) git://git.kernel.org/pub/scm/linux/kernel/git/bigeasy/staging.git x86_fpu_rtu_v4 Sebastian