Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp620218pxb; Tue, 1 Feb 2022 07:08:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJxlgQXNJB1mdu6sGuYhz0wEeYKi6OGwanw1xRuC9m6FsgbcOmDlk3RlCLcmRCh8E7PDOy1S X-Received: by 2002:a17:907:62a0:: with SMTP id nd32mr21458274ejc.96.1643728115016; Tue, 01 Feb 2022 07:08:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643728115; cv=none; d=google.com; s=arc-20160816; b=f0kWwgz2q7boDemJ/vdQMmXRs7SKpyDHR+djZ8ubrZB9A0mu8McOZWq3w+gUQyvKlg XUwhkeW8qZp3A7W4xz3iJhWpyH/VyLTkQol+zVWWfZ0R7mkJXcDzgY0v5T5TGxgYk8dq 9hteqoRxOlqyFCCA5Lthf2ctS+/ew6uWYs/C3Mka280XulF+D2r8NAj5/r32wycRZ943 FkLoQrT7GShsz+n9W3M/B/gKebOHc1EtFB0E9SplnuwZ4qLT3XoY5Qn+KO68BvCYmeKq y4lvvOecWHVl83W1cUWQdgCAYh0MEXyE1ryeUm6YhhSLs9mFHSq+sEOhYY2KHNJPSKkr 2Ulw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=p2tditp6C8oBOXVFTZC5dMxfTsq/Jl/M1TV4rgkEaTw=; b=PzFZw1L71womgzmAhKA4+l0drSUx1WU7nnsHxLUCptuOKAdtA0/1q4/Hiut6QffgLY 4fhT+f2rIg36viIyjFPTy1uw/zilfM0bcmx4yAwVEW2D3/1DJFCwkzB4HKZn/Kx+f1x2 HyXeQkbNmVKV2rNCsOjvZNAJiagkyBW40q7wd3fMiucF9oi20ZQHKLF/eAvA2EbEAuzo IPIHfTaAifMpFTCwZYw8r4Wtvx30NhJsSwpGWIpBcRk0hDZh3lbdOCCos3m6QeRHXLgT LjSw8e2eJcgGkhhfsSUOG4ods4POnNrghX3KYZmndAqfmqPEM/TEGIj0D0FLZaNfEIRu /uew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aZ6Gdfkm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w13si8671069edx.605.2022.02.01.07.08.09; Tue, 01 Feb 2022 07:08:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aZ6Gdfkm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1356565AbiA3VZl (ORCPT + 99 others); Sun, 30 Jan 2022 16:25:41 -0500 Received: from mga06.intel.com ([134.134.136.31]:52047 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1356648AbiA3VYU (ORCPT ); Sun, 30 Jan 2022 16:24:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1643577860; x=1675113860; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=t3JrZFhF1CwcBiNV2/x8DFu6hUw9NbjzQecxrm091YQ=; b=aZ6GdfkmrrEBFzN0fsQtrz+PE4+2ua0PdNKWVXZ9Q5MUU/oZBLWIxbgC s7DFK1ubikxeyjh2HTqmxZMioH+23kQczKa0x55LOl/v+c35LqkndSlxJ qjoEb5Drqr86FkZkedhNPFjDHCEAQoJFTyS/ZwnAXwcRrSV+xCCdpDv9w hqOXzHThWPRZdm+nXoNH+e4LMp14S5zfioiul05KHdvI3/tCv7fZhRFbk cXoOPwT/L1tgxK/O0+qhxeE4TYGPA3juuL+PZdUSx7Ujqjmbv4/ZgswMK Bbd6ZNcUm1nPgl3MZ+rmTT/HHVvLMpElXHCYVopazM3pL2Pn6wWuBnp/D g==; X-IronPort-AV: E=McAfee;i="6200,9189,10243"; a="308104961" X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="308104961" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:04 -0800 X-IronPort-AV: E=Sophos;i="5.88,329,1635231600"; d="scan'208";a="536856868" Received: from avmallar-mobl1.amr.corp.intel.com (HELO rpedgeco-desk.amr.corp.intel.com) ([10.209.123.171]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2022 13:22:03 -0800 From: Rick Edgecombe To: x86@kernel.org, "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Arnd Bergmann , Andy Lutomirski , Balbir Singh , Borislav Petkov , Cyrill Gorcunov , Dave Hansen , Eugene Syromiatnikov , Florian Weimer , "H . J . Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V . Shankar" , Dave Martin , Weijiang Yang , "Kirill A . Shutemov" , joao.moreira@intel.com, John Allen , kcc@google.com, eranian@google.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH 23/35] x86/fpu: Add helpers for modifying supervisor xstate Date: Sun, 30 Jan 2022 13:18:26 -0800 Message-Id: <20220130211838.8382-24-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20220130211838.8382-1-rick.p.edgecombe@intel.com> References: <20220130211838.8382-1-rick.p.edgecombe@intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add helpers that can be used to modify supervisor xstate safely for the current task. State for supervisors xstate based features can be live and accesses via MSR's, or saved in memory in an xsave buffer. When the kernel needs to modify this state it needs to be sure to operate on it in the right place, so the modifications don't get clobbered. In the past supervisor xstate features have used get_xsave_addr() directly, and performed open coded logic handle operating on the saved state correctly. This has posed two problems: 1. It has logic that has been gotten wrong more than once. 2. To reduce code, less common path's are not optimized. Determination of which path's are less common is based on assumptions about far away code that could change. In addition, now that get_xsave_addr() is not available outside of the core fpu code, there isn't even a way for these supervisor features to modify the in memory state. To resolve these problems, add some helpers that encapsulate the correct logic to operate on the correct copy of the state. Map the MSR's to the struct field location in a case statements in __get_xsave_member(). Use the helpers like this, to write to either the MSR or saved state: void *xstate; xstate = start_update_xsave_msrs(XFEATURE_FOO); r = xsave_rdmsrl(state, MSR_IA32_FOO_1, &val) if (r) xsave_wrmsrl(state, MSR_IA32_FOO_2, FOO_ENABLE); end_update_xsave_msrs(); Signed-off-by: Rick Edgecombe --- v1: - New patch. arch/x86/include/asm/fpu/api.h | 5 ++ arch/x86/kernel/fpu/xstate.c | 134 +++++++++++++++++++++++++++++++++ 2 files changed, 139 insertions(+) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index c83b3020350a..6aec27984b62 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -165,4 +165,9 @@ static inline bool fpstate_is_confidential(struct fpu_guest *gfpu) struct task_struct; extern long fpu_xstate_prctl(struct task_struct *tsk, int option, unsigned long arg2); +void *start_update_xsave_msrs(int xfeature_nr); +void end_update_xsave_msrs(void); +int xsave_rdmsrl(void *state, unsigned int msr, unsigned long long *p); +int xsave_wrmsrl(void *state, u32 msr, u64 val); +int xsave_set_clear_bits_msrl(void *state, u32 msr, u64 set, u64 clear); #endif /* _ASM_X86_FPU_API_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 44397202762b..c5e20e0d0725 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -1867,3 +1867,137 @@ int proc_pid_arch_status(struct seq_file *m, struct pid_namespace *ns, return 0; } #endif /* CONFIG_PROC_PID_ARCH_STATUS */ + +static u64 *__get_xsave_member(void *xstate, u32 msr) +{ + switch (msr) { + /* Currently there are no MSR's supported */ + default: + WARN_ONCE(1, "x86/fpu: unsupported xstate msr (%u)\n", msr); + return NULL; + } +} + +/* + * Return a pointer to the xstate for the feature if it should be used, or NULL + * if the MSRs should be written to directly. To do this safely, using the + * associated read/write helpers is required. + */ +void *start_update_xsave_msrs(int xfeature_nr) +{ + void *xstate; + + /* + * fpregs_lock() only disables preemption (mostly). So modifing state + * in an interrupt could screw up some in progress fpregs operation, + * but appear to work. Warn about it. + */ + WARN_ON_ONCE(!in_task()); + WARN_ON_ONCE(current->flags & PF_KTHREAD); + + fpregs_lock(); + + fpregs_assert_state_consistent(); + + /* + * If the registers don't need to be reloaded. Go ahead and operate on the + * registers. + */ + if (!test_thread_flag(TIF_NEED_FPU_LOAD)) + return NULL; + + xstate = get_xsave_addr(¤t->thread.fpu.fpstate->regs.xsave, xfeature_nr); + + /* + * If regs are in the init state, they can't be retrieved from + * init_fpstate due to the init optimization, but are not nessarily + * zero. The only option is to restore to make everything live and + * operate on registers. This will clear TIF_NEED_FPU_LOAD. + * + * Otherwise, if not in the init state but TIF_NEED_FPU_LOAD is set, + * operate on the buffer. The registers will be restored before going + * to userspace in any case, but the task might get preempted before + * then, so this possibly saves an xsave. + */ + if (!xstate) + fpregs_restore_userregs(); + return xstate; +} + +void end_update_xsave_msrs(void) +{ + fpregs_unlock(); +} + +/* + * When TIF_NEED_FPU_LOAD is set and fpregs_state_valid() is true, the saved + * state and fp state match. In this case, the kernel has some good options - + * it can skip the restore before returning to userspace or it could skip + * an xsave if preempted before then. + * + * But if this correspondence is broken by either a write to the in-memory + * buffer or the registers, the kernel needs to be notified so it doesn't miss + * an xsave or restore. __xsave_msrl_prepare_write() peforms this check and + * notifies the kernel if needed. Use before writes only, to not take away + * the kernel's options when not required. + * + * If TIF_NEED_FPU_LOAD is set, then the logic in start_update_xsave_msrs() + * must have resulted in targeting the in-memory state, so invaliding the + * registers is the right thing to do. + */ +static void __xsave_msrl_prepare_write(void) +{ + if (test_thread_flag(TIF_NEED_FPU_LOAD) && + fpregs_state_valid(¤t->thread.fpu, smp_processor_id())) + __fpu_invalidate_fpregs_state(¤t->thread.fpu); +} + +int xsave_rdmsrl(void *xstate, unsigned int msr, unsigned long long *p) +{ + u64 *member_ptr; + + if (!xstate) + return rdmsrl_safe(msr, p); + + member_ptr = __get_xsave_member(xstate, msr); + if (!member_ptr) + return 1; + + *p = *member_ptr; + + return 0; +} + +int xsave_wrmsrl(void *xstate, u32 msr, u64 val) +{ + u64 *member_ptr; + + __xsave_msrl_prepare_write(); + if (!xstate) + return wrmsrl_safe(msr, val); + + member_ptr = __get_xsave_member(xstate, msr); + if (!member_ptr) + return 1; + + *member_ptr = val; + + return 0; +} + +int xsave_set_clear_bits_msrl(void *xstate, u32 msr, u64 set, u64 clear) +{ + u64 val, new_val; + int ret; + + ret = xsave_rdmsrl(xstate, msr, &val); + if (ret) + return ret; + + new_val = (val & ~clear) | set; + + if (new_val != val) + return xsave_wrmsrl(xstate, msr, new_val); + + return 0; +} -- 2.17.1