Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp880784imm; Thu, 31 May 2018 11:01:35 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKaDxqcAHm9Fr8MIvJtlWu1sb8GWMGjQC9B53fTpWAefMnxoicWFBi678tEb/izXzsSx0Z+ X-Received: by 2002:a65:5a0d:: with SMTP id y13-v6mr6450365pgs.15.1527789695680; Thu, 31 May 2018 11:01:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527789695; cv=none; d=google.com; s=arc-20160816; b=m94n5nOydjDskT9ClMGlvIRWMeBl2sOzKVha/D/IbsW4QCxWRHYBhpBRBfRmdJsHxU RKMiA6feLgJl3ZFg/kvySO1GxhG1tWDVcGXIw+DSZQL6M5C3U48TQCH9KKP3OOsCPyGk Tl63r7l+WGimYRb1iPeK1s7hpB5AyxOLrvBjUVNOCrojBnLXqmENcdXsFi6dUG0OoYRh g0dMUUxF1QF0PM5zDE/HOtEacgr0Qyi2PRHEHblaV038zlveezpPy8gnzTpVhKQfJkit eyqil50jzKOglXCLho5S+l+/j2B8RbNvy+l4UivZARbuKrBMvU1zfLiUPmaNJGlEhvZj N8kQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=O2D264QmQSWMQHdK1CxxJO8CGEfkeL7fX00/lqwesm4=; b=Bz4RUsD7TlqWI3nPS7y+r0gZSntpNfFI6mfh6Y5KnI6IuE2/Z7mAIgCcWEfGkVqwrb OOcVdCN/Et1i9iw1/BQm18QkN/nO96T/R9zPY8lQSwFyDjAAMe3IWOF+iVwy2gSPGi4c CXkIYnAKoM/fO4X1WJBzx01C19qo04JypKmhjmm+7aIqRyRQe2R+bZEz962HdGMoLCy9 gtuE403FkXD/gzeYzzKF0MqzNKCaLFAXUOhsRZHNG/hHYOO102esG4xp4O2pqFwgDlLw dwLkNbmQAexa6ehIS6hbRY5JL9vWBeMTkXk1eI43XDtWMxMjGyMTy0IfnEp6NxS8rlhB GAyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e72-v6si10130104pfl.132.2018.05.31.11.01.21; Thu, 31 May 2018 11:01:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755982AbeEaR7F (ORCPT + 99 others); Thu, 31 May 2018 13:59:05 -0400 Received: from mga01.intel.com ([192.55.52.88]:18552 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755824AbeEaR64 (ORCPT ); Thu, 31 May 2018 13:58:56 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 31 May 2018 10:58:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.49,463,1520924400"; d="scan'208";a="60725497" Received: from chang-linux-2.sc.intel.com ([10.3.52.139]) by orsmga001.jf.intel.com with ESMTP; 31 May 2018 10:58:54 -0700 From: "Chang S. Bae" To: Andy Lutomirski , "H . Peter Anvin" , Thomas Gleixner , Ingo Molnar Cc: Andi Kleen , Dave Hansen , Markus T Metzger , "Ravi V . Shankar" , "Chang S . Bae" , linux-kernel@vger.kernel.org Subject: [PATCH V2 01/15] x86/fsgsbase/64: Introduce FS/GS base helper functions Date: Thu, 31 May 2018 10:58:31 -0700 Message-Id: <1527789525-8857-2-git-send-email-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1527789525-8857-1-git-send-email-chang.seok.bae@intel.com> References: <1527789525-8857-1-git-send-email-chang.seok.bae@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org With new helpers, FS/GS base access is centralized. Eventually, when FSGSBASE instruction enabled, it will be faster. The helpers are used on ptrace APIs (PTRACE_ARCH_PRCTL, PTRACE_SETREG, PTRACE_GETREG, etc). Idea is to keep the FS/GS-update mechanism organized. Notion of "active" and "inactive" are used to distinguish GS bases between "kernel" and "user". "inactive" GS base is the GS base, backed up at kernel entries, of inactive (user) task's. Based-on-code-from: Andy Lutomirski Signed-off-by: Chang S. Bae Reviewed-by: Andi Kleen Cc: H. Peter Anvin Cc: Dave Hansen Cc: Thomas Gleixner Cc: Ingo Molnar --- arch/x86/include/asm/fsgsbase.h | 47 +++++++++++++++ arch/x86/kernel/process_64.c | 128 +++++++++++++++++++++++++++++----------- arch/x86/kernel/ptrace.c | 28 +++------ 3 files changed, 149 insertions(+), 54 deletions(-) create mode 100644 arch/x86/include/asm/fsgsbase.h diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h new file mode 100644 index 0000000..0d4fbef --- /dev/null +++ b/arch/x86/include/asm/fsgsbase.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_FSGSBASE_H +#define _ASM_FSGSBASE_H 1 + +#ifndef __ASSEMBLY__ + +#ifdef CONFIG_X86_64 + +#include + +/* + * Read/write an (inactive) task's fsbase or gsbase. This returns + * the value that the FS/GS base would have (if the task were to be + * resumed). The current task is also supported. + */ +unsigned long read_task_fsbase(struct task_struct *task); +unsigned long read_task_gsbase(struct task_struct *task); +int write_task_fsbase(struct task_struct *task, unsigned long fsbase); +int write_task_gsbase(struct task_struct *task, unsigned long gsbase); + +/* Helper functions for reading/writing FS/GS base */ + +static inline unsigned long read_fsbase(void) +{ + unsigned long fsbase; + + rdmsrl(MSR_FS_BASE, fsbase); + return fsbase; +} + +void write_fsbase(unsigned long fsbase); + +static inline unsigned long read_inactive_gsbase(void) +{ + unsigned long gsbase; + + rdmsrl(MSR_KERNEL_GS_BASE, gsbase); + return gsbase; +} + +void write_inactive_gsbase(unsigned long gsbase); + +#endif /* CONFIG_X86_64 */ + +#endif /* __ASSEMBLY__ */ + +#endif /* _ASM_FSGSBASE_H */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 12bb445..5506e1b 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -54,6 +54,7 @@ #include #include #include +#include #ifdef CONFIG_IA32_EMULATION /* Not included via unistd.h */ #include @@ -278,6 +279,90 @@ static __always_inline void load_seg_legacy(unsigned short prev_index, } } +void write_fsbase(unsigned long fsbase) +{ + /* set the selector to 0 to not confuse __switch_to */ + loadseg(FS, 0); + wrmsrl(MSR_FS_BASE, fsbase); +} + +void write_inactive_gsbase(unsigned long gsbase) +{ + /* set the selector to 0 to not confuse __switch_to */ + loadseg(GS, 0); + wrmsrl(MSR_KERNEL_GS_BASE, gsbase); +} + +unsigned long read_task_fsbase(struct task_struct *task) +{ + unsigned long fsbase; + + if (task == current) + fsbase = read_fsbase(); + else + /* + * XXX: This will not behave as expected if called + * if fsindex != 0 + */ + fsbase = task->thread.fsbase; + + return fsbase; +} + +unsigned long read_task_gsbase(struct task_struct *task) +{ + unsigned long gsbase; + + if (task == current) + gsbase = read_inactive_gsbase(); + else + /* + * XXX: This will not behave as expected if called + * if gsindex != 0 + */ + gsbase = task->thread.gsbase; + + return gsbase; +} + +int write_task_fsbase(struct task_struct *task, unsigned long fsbase) +{ + int cpu; + + /* + * Not strictly needed for fs, but do it for symmetry + * with gs + */ + if (unlikely(fsbase >= TASK_SIZE_MAX)) + return -EPERM; + + cpu = get_cpu(); + task->thread.fsbase = fsbase; + if (task == current) + write_fsbase(fsbase); + task->thread.fsindex = 0; + put_cpu(); + + return 0; +} + +int write_task_gsbase(struct task_struct *task, unsigned long gsbase) +{ + int cpu; + + if (unlikely(gsbase >= TASK_SIZE_MAX)) + return -EPERM; + + cpu = get_cpu(); + task->thread.gsbase = gsbase; + if (task == current) + write_inactive_gsbase(gsbase); + task->thread.gsindex = 0; + put_cpu(); + + return 0; +} + int copy_thread_tls(unsigned long clone_flags, unsigned long sp, unsigned long arg, struct task_struct *p, unsigned long tls) { @@ -618,54 +703,27 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr) long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2) { int ret = 0; - int doit = task == current; - int cpu; switch (option) { - case ARCH_SET_GS: - if (arg2 >= TASK_SIZE_MAX) - return -EPERM; - cpu = get_cpu(); - task->thread.gsindex = 0; - task->thread.gsbase = arg2; - if (doit) { - load_gs_index(0); - ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, arg2); - } - put_cpu(); + case ARCH_SET_GS: { + ret = write_task_gsbase(task, arg2); break; - case ARCH_SET_FS: - /* Not strictly needed for fs, but do it for symmetry - with gs */ - if (arg2 >= TASK_SIZE_MAX) - return -EPERM; - cpu = get_cpu(); - task->thread.fsindex = 0; - task->thread.fsbase = arg2; - if (doit) { - /* set the selector to 0 to not confuse __switch_to */ - loadsegment(fs, 0); - ret = wrmsrl_safe(MSR_FS_BASE, arg2); - } - put_cpu(); + } + case ARCH_SET_FS: { + ret = write_task_fsbase(task, arg2); break; + } case ARCH_GET_FS: { unsigned long base; - if (doit) - rdmsrl(MSR_FS_BASE, base); - else - base = task->thread.fsbase; + base = read_task_fsbase(task); ret = put_user(base, (unsigned long __user *)arg2); break; } case ARCH_GET_GS: { unsigned long base; - if (doit) - rdmsrl(MSR_KERNEL_GS_BASE, base); - else - base = task->thread.gsbase; + base = read_task_gsbase(task); ret = put_user(base, (unsigned long __user *)arg2); break; } diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c index ed5c4cd..b2f0beb 100644 --- a/arch/x86/kernel/ptrace.c +++ b/arch/x86/kernel/ptrace.c @@ -39,6 +39,7 @@ #include #include #include +#include #include "tls.h" @@ -396,12 +397,11 @@ static int putreg(struct task_struct *child, if (value >= TASK_SIZE_MAX) return -EIO; /* - * When changing the segment base, use do_arch_prctl_64 - * to set either thread.fs or thread.fsindex and the - * corresponding GDT slot. + * When changing the FS base, use the same + * mechanism as for do_arch_prctl_64 */ if (child->thread.fsbase != value) - return do_arch_prctl_64(child, ARCH_SET_FS, value); + return write_task_fsbase(child, value); return 0; case offsetof(struct user_regs_struct,gs_base): /* @@ -410,7 +410,7 @@ static int putreg(struct task_struct *child, if (value >= TASK_SIZE_MAX) return -EIO; if (child->thread.gsbase != value) - return do_arch_prctl_64(child, ARCH_SET_GS, value); + return write_task_gsbase(child, value); return 0; #endif } @@ -434,20 +434,10 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset) return get_flags(task); #ifdef CONFIG_X86_64 - case offsetof(struct user_regs_struct, fs_base): { - /* - * XXX: This will not behave as expected if called on - * current or if fsindex != 0. - */ - return task->thread.fsbase; - } - case offsetof(struct user_regs_struct, gs_base): { - /* - * XXX: This will not behave as expected if called on - * current or if fsindex != 0. - */ - return task->thread.gsbase; - } + case offsetof(struct user_regs_struct, fs_base): + return read_task_fsbase(task); + case offsetof(struct user_regs_struct, gs_base): + return read_task_gsbase(task); #endif } -- 2.7.4