2020-05-09 17:38:50

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 00/18] Enable FSGSBASE instructions

Benefits:
Currently a user process that wishes to read or write the FS/GS base must
make a system call. But recent X86 processors have added new instructions
for use in 64-bit mode that allow direct access to the FS and GS segment
base addresses. The operating system controls whether applications can
use these instructions with a %cr4 control bit.

In addition to benefits to applications, performance improvements to the
OS context switch code are possible by making use of these instructions. A
third party reported out promising performance numbers out of their
initial benchmarking of the previous version of this patch series [9].

Enablement check:
The kernel provides information about the enabled state of FSGSBASE to
applications using the ELF_AUX vector. If the HWCAP2_FSGSBASE bit is set in
the AUX vector, the kernel has FSGSBASE instructions enabled and
applications can use them.

Kernel changes:
Major changes made in the kernel are in context switch, paranoid path, and
ptrace. In a context switch, a task's FS/GS base will be secured regardless
of its selector. In the paranoid path, GS base is unconditionally
overwritten to the kernel GS base on entry and the original GS base is
restored on exit. Ptrace includes divergence of FS/GS index and base
values.

Security:
For mitigating the Spectre v1 SWAPGS issue, LFENCE instructions were added
on most kernel entries. Those patches are dependent on previous behaviors
that users couldn't load a kernel address into the GS base. These patches
change that assumption since the user can load any address into GS base.
The changes to the kernel entry path in this patch series take account of
the SWAPGS issue.

Changes from v10:

- Rewrite the commit message for patch #1.
- Document communication/acks from userspace projects that are
potentially affected by this.

Andi Kleen (2):
x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

Andy Lutomirski (4):
x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
x86/entry/64: Clean up paranoid exit
x86/fsgsbase/64: Use FSGSBASE in switch_to() if available
x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken
bit

Chang S. Bae (9):
x86/ptrace: Prevent ptrace from clearing the FS/GS selector
selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base
write
x86/entry/64: Switch CR3 before SWAPGS in paranoid entry
x86/entry/64: Introduce the FIND_PERCPU_BASE macro
x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit
x86/entry/64: Document GSBASE handling in the paranoid path
x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions
x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace
selftests/x86/fsgsbase: Test ptracer-induced GS base write with
FSGSBASE

Sasha Levin (1):
x86/fsgsbase/64: move save_fsgs to header file

Thomas Gleixner (1):
Documentation/x86/64: Add documentation for GS/FS addressing mode

Tony Luck (1):
x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation

.../admin-guide/kernel-parameters.txt | 2 +
Documentation/x86/entry_64.rst | 9 +
Documentation/x86/x86_64/fsgs.rst | 199 ++++++++++++++++++
Documentation/x86/x86_64/index.rst | 1 +
arch/x86/entry/calling.h | 40 ++++
arch/x86/entry/entry_64.S | 131 +++++++++---
arch/x86/include/asm/fsgsbase.h | 45 +++-
arch/x86/include/asm/inst.h | 15 ++
arch/x86/include/uapi/asm/hwcap2.h | 3 +
arch/x86/kernel/cpu/bugs.c | 6 +-
arch/x86/kernel/cpu/common.c | 22 ++
arch/x86/kernel/process.c | 10 +-
arch/x86/kernel/process.h | 68 ++++++
arch/x86/kernel/process_64.c | 142 +++++++------
arch/x86/kernel/ptrace.c | 17 +-
tools/testing/selftests/x86/fsgsbase.c | 24 ++-
16 files changed, 605 insertions(+), 129 deletions(-)
create mode 100644 Documentation/x86/x86_64/fsgs.rst

--
2.20.1


2020-05-09 17:38:53

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 02/18] selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write

From: "Chang S. Bae" <[email protected]>

The test validates that the selector is not changed when a ptracer writes
the ptracee's GS base.

Originally-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
tools/testing/selftests/x86/fsgsbase.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 15a329da59fa3..950a48b2e3662 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
wait(&status);

if (WSTOPSIG(status) == SIGTRAP) {
- unsigned long gs, base;
+ unsigned long gs;
unsigned long gs_offset = USER_REGS_OFFSET(gs);
unsigned long base_offset = USER_REGS_OFFSET(gs_base);

@@ -481,7 +481,6 @@ static void test_ptrace_write_gsbase(void)
err(1, "PTRACE_POKEUSER");

gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
- base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);

/*
* In a non-FSGSBASE system, the nonzero selector will load
@@ -489,11 +488,21 @@ static void test_ptrace_write_gsbase(void)
* selector value is changed or not by the GSBASE write in
* a ptracer.
*/
- if (gs == 0 && base == 0xFF) {
- printf("[OK]\tGS was reset as expected\n");
- } else {
+ if (gs != *shared_scratch) {
nerrs++;
- printf("[FAIL]\tGS=0x%lx, GSBASE=0x%lx (should be 0, 0xFF)\n", gs, base);
+ printf("[FAIL]\tGS changed to %lx\n", gs);
+
+ /*
+ * On older kernels, poking a nonzero value into the
+ * base would zero the selector. On newer kernels,
+ * this behavior has changed -- poking the base
+ * changes only the base and, if FSGSBASE is not
+ * available, this may not effect.
+ */
+ if (gs == 0)
+ printf("\tNote: this is expected behavior on older kernels.\n");
+ } else {
+ printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
}
}

--
2.20.1

2020-05-09 17:38:59

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 05/18] x86/entry/64: Switch CR3 before SWAPGS in paranoid entry

From: "Chang S. Bae" <[email protected]>

When FSGSBASE is enabled, the GS base handling in paranoid entry will need
to retrieve the kernel GS base which requires that the kernel page table is
active.

As the CR3 switch to the kernel page tables (PTI is active) does not depend
on kernel GS base, move the CR3 switch in front of the GS base handling.

Comment the EBX content while at it.

No functional change.

Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Vegard Nossum <[email protected]>
---
arch/x86/entry/entry_64.S | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3adb3c8e2409b..7f27626f8426f 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1220,15 +1220,7 @@ SYM_CODE_START_LOCAL(paranoid_entry)
cld
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
- movl $1, %ebx
- movl $MSR_GS_BASE, %ecx
- rdmsr
- testl %edx, %edx
- js 1f /* negative -> in kernel */
- SWAPGS
- xorl %ebx, %ebx

-1:
/*
* Always stash CR3 in %r14. This value will be restored,
* verbatim, at exit. Needed if paranoid_entry interrupted
@@ -1238,16 +1230,31 @@ SYM_CODE_START_LOCAL(paranoid_entry)
* This is also why CS (stashed in the "iret frame" by the
* hardware at entry) can not be used: this may be a return
* to kernel code, but with a user CR3 value.
+ *
+ * Switching CR3 does not depend on kernel GS base so it can
+ * be done before switching to the kernel GS base. This is
+ * required for FSGSBASE because the kernel GS base has to
+ * be retrieved from a kernel internal table.
*/
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14

+ /* EBX = 1 -> kernel GSBASE active, no restore required */
+ movl $1, %ebx
/*
- * The above SAVE_AND_SWITCH_TO_KERNEL_CR3 macro doesn't do an
- * unconditional CR3 write, even in the PTI case. So do an lfence
- * to prevent GS speculation, regardless of whether PTI is enabled.
+ * The kernel-enforced convention is a negative GS base indicates
+ * a kernel value. No SWAPGS needed on entry and exit.
*/
- FENCE_SWAPGS_KERNEL_ENTRY
+ movl $MSR_GS_BASE, %ecx
+ rdmsr
+ testl %edx, %edx
+ jns .Lparanoid_entry_swapgs
+ ret

+.Lparanoid_entry_swapgs:
+ SWAPGS
+ FENCE_SWAPGS_KERNEL_ENTRY
+ /* EBX = 0 -> SWAPGS required on exit */
+ xorl %ebx, %ebx
ret
SYM_CODE_END(paranoid_entry)

--
2.20.1

2020-05-09 17:39:06

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 08/18] x86/entry/64: Document GSBASE handling in the paranoid path

From: "Chang S. Bae" <[email protected]>

On FSGSBASE systems, the way to handle GS base in the paranoid path is
different from the existing SWAPGS-based entry/exit path handling. Document
the reason and what has to be done for FSGSBASE enabled systems.

Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
Documentation/x86/entry_64.rst | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/Documentation/x86/entry_64.rst b/Documentation/x86/entry_64.rst
index a48b3f6ebbe87..0499a40723af3 100644
--- a/Documentation/x86/entry_64.rst
+++ b/Documentation/x86/entry_64.rst
@@ -108,3 +108,12 @@ We try to only use IST entries and the paranoid entry code for vectors
that absolutely need the more expensive check for the GS base - and we
generate all 'normal' entry points with the regular (faster) paranoid=0
variant.
+
+On FSGSBASE systems, however, user space can set GS without kernel
+interaction. It means the value of GS base itself does not imply anything,
+whether a kernel value or a user space value. So, there is no longer a safe
+way to check whether the exception is entering from user mode or kernel
+mode in the paranoid entry code path. So the GS base value needs to be read
+out, saved and the kernel GS base value written. On exit, the saved GS base
+value needs to be restored unconditionally. The non-paranoid entry/exit
+code still uses SWAPGS unconditionally as the state is known.
--
2.20.1

2020-05-09 17:39:10

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 10/18] x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions

From: "Chang S. Bae" <[email protected]>

Add CPU feature conditional FS/GS base access to the relevant helper
functions. That allows accelerating certain FS/GS base operations in
subsequent changes.

Note, that while possible, the user space entry/exit GS base operations are
not going to use the new FSGSBASE instructions. The reason is that it would
require additional storage for the user space value which adds more
complexity to the low level code and experiments have shown marginal
benefit. This may be revisited later but for now the SWAPGS based handling
in the entry code is preserved except for the paranoid entry/exit code.

Suggested-by: Tony Luck <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrew Cooper <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 27 +++++++--------
arch/x86/kernel/process_64.c | 58 +++++++++++++++++++++++++++++++++
2 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177499b40..aefd53767a5d4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
}

+#include <asm/cpufeature.h>
+
/* Helper functions for reading/writing FS/GS base */

static inline unsigned long x86_fsbase_read_cpu(void)
{
unsigned long fsbase;

- rdmsrl(MSR_FS_BASE, fsbase);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ fsbase = rdfsbase();
+ else
+ rdmsrl(MSR_FS_BASE, fsbase);

return fsbase;
}

-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
- unsigned long gsbase;
-
- rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
- return gsbase;
-}
-
static inline void x86_fsbase_write_cpu(unsigned long fsbase)
{
- wrmsrl(MSR_FS_BASE, fsbase);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ wrfsbase(fsbase);
+ else
+ wrmsrl(MSR_FS_BASE, fsbase);
}

-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
- wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);

#endif /* CONFIG_X86_64 */

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 5ef9d8f25b0e8..aaa65f284b9b9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -328,6 +328,64 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
return base;
}

+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+ unsigned long gsbase;
+
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ bool need_restore = false;
+ unsigned long flags;
+
+ /*
+ * We read the inactive GS base value by swapping
+ * to make it the active one. But we cannot allow
+ * an interrupt while we switch to and from.
+ */
+ if (!irqs_disabled()) {
+ local_irq_save(flags);
+ need_restore = true;
+ }
+
+ native_swapgs();
+ gsbase = rdgsbase();
+ native_swapgs();
+
+ if (need_restore)
+ local_irq_restore(flags);
+ } else {
+ rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+ }
+
+ return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ bool need_restore = false;
+ unsigned long flags;
+
+ /*
+ * We write the inactive GS base value by swapping
+ * to make it the active one. But we cannot allow
+ * an interrupt while we switch to and from.
+ */
+ if (!irqs_disabled()) {
+ local_irq_save(flags);
+ need_restore = true;
+ }
+
+ native_swapgs();
+ wrgsbase(gsbase);
+ native_swapgs();
+
+ if (need_restore)
+ local_irq_restore(flags);
+ } else {
+ wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+ }
+}
+
unsigned long x86_fsbase_read_task(struct task_struct *task)
{
unsigned long fsbase;
--
2.20.1

2020-05-09 17:39:11

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 11/18] x86/fsgsbase/64: Use FSGSBASE in switch_to() if available

From: Andy Lutomirski <[email protected]>

With the new FSGSBASE instructions, FS/GS base can be efficiently read
and written in __switch_to(). Use that capability to preserve the full
state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas. (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GS base
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup. It seems to save about 100 cycles per
context switch compared to the baseline 4.6-rc1 behavior on a Skylake
laptop.

[ chang: 5~10% performance improvements were seen by a context switch
benchmark that ran threads with different FS/GS base values (to the
baseline 4.16). ]

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index aaa65f284b9b9..e066750be89a0 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -199,8 +199,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
{
savesegment(fs, task->thread.fsindex);
savesegment(gs, task->thread.gsindex);
- save_base_legacy(task, task->thread.fsindex, FS);
- save_base_legacy(task, task->thread.gsindex, GS);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /*
+ * If FSGSBASE is enabled, we can't make any useful guesses
+ * about the base, and user code expects us to save the current
+ * value. Fortunately, reading the base directly is efficient.
+ */
+ task->thread.fsbase = rdfsbase();
+ task->thread.gsbase = x86_gsbase_read_cpu_inactive();
+ } else {
+ save_base_legacy(task, task->thread.fsindex, FS);
+ save_base_legacy(task, task->thread.gsindex, GS);
+ }
}

#if IS_ENABLED(CONFIG_KVM)
@@ -279,10 +289,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
struct thread_struct *next)
{
- load_seg_legacy(prev->fsindex, prev->fsbase,
- next->fsindex, next->fsbase, FS);
- load_seg_legacy(prev->gsindex, prev->gsbase,
- next->gsindex, next->gsbase, GS);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /* Update the FS and GS selectors if they could have changed. */
+ if (unlikely(prev->fsindex || next->fsindex))
+ loadseg(FS, next->fsindex);
+ if (unlikely(prev->gsindex || next->gsindex))
+ loadseg(GS, next->gsindex);
+
+ /* Update the bases. */
+ wrfsbase(next->fsbase);
+ x86_gsbase_write_cpu_inactive(next->gsbase);
+ } else {
+ load_seg_legacy(prev->fsindex, prev->fsbase,
+ next->fsindex, next->fsbase, FS);
+ load_seg_legacy(prev->gsindex, prev->gsbase,
+ next->gsindex, next->gsbase, GS);
+ }
}

static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
--
2.20.1

2020-05-09 17:39:16

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 13/18] x86/fsgsbase/64: Use FSGSBASE instructions on thread copy and ptrace

From: "Chang S. Bae" <[email protected]>

When FSGSBASE is enabled, copying threads and reading FS/GS base using
ptrace must read the actual values.

When copying a thread, use fsgs_save() and copy the saved values. For
ptrace, the bases must be read from memory regardless of the selector
if FSGSBASE is enabled.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kernel/process.c | 10 ++++++----
arch/x86/kernel/process_64.c | 6 ++++--
2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9da70b279dad8..3ebb56cc2cfee 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -127,6 +127,7 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
struct inactive_task_frame *frame;
struct fork_frame *fork_frame;
struct pt_regs *childregs;
+ struct task_struct *me = current;
int ret = 0;

childregs = task_pt_regs(p);
@@ -140,10 +141,11 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
memset(p->thread.ptrace_bps, 0, sizeof(p->thread.ptrace_bps));

#ifdef CONFIG_X86_64
- savesegment(gs, p->thread.gsindex);
- p->thread.gsbase = p->thread.gsindex ? 0 : current->thread.gsbase;
- savesegment(fs, p->thread.fsindex);
- p->thread.fsbase = p->thread.fsindex ? 0 : current->thread.fsbase;
+ save_fsgs(me);
+ p->thread.fsindex = me->thread.fsindex;
+ p->thread.fsbase = me->thread.fsbase;
+ p->thread.gsindex = me->thread.gsindex;
+ p->thread.gsbase = me->thread.gsbase;
savesegment(es, p->thread.es);
savesegment(ds, p->thread.ds);
#else
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4be88124d81ea..57cdbbb0381ac 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -346,7 +346,8 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)

if (task == current)
fsbase = x86_fsbase_read_cpu();
- else if (task->thread.fsindex == 0)
+ else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+ (task->thread.fsindex == 0))
fsbase = task->thread.fsbase;
else
fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -360,7 +361,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)

if (task == current)
gsbase = x86_gsbase_read_cpu_inactive();
- else if (task->thread.gsindex == 0)
+ else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+ (task->thread.gsindex == 0))
gsbase = task->thread.gsbase;
else
gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
--
2.20.1

2020-05-09 17:39:27

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 17/18] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

From: Andi Kleen <[email protected]>

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF
aux vector. AT_HWCAP2 is already used by PPC for similar purposes.

The application can access it open coded or by using the getauxval()
function in newer versions of glibc.

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/uapi/asm/hwcap2.h | 3 +++
arch/x86/kernel/cpu/common.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 8b2effe6efb82..5fdfcb47000f9 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
/* MONITOR/MWAIT enabled in Ring 3 */
#define HWCAP2_RING3MWAIT (1 << 0)

+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE BIT(1)
+
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0d480cbadc7dc..b5a086ea34258 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1495,8 +1495,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);

/* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE))
+ if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
cr4_set_bits(X86_CR4_FSGSBASE);
+ elf_hwcap2 |= HWCAP2_FSGSBASE;
+ }

/*
* The vendor-specific functions might have changed features.
--
2.20.1

2020-05-09 17:39:33

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 18/18] Documentation/x86/64: Add documentation for GS/FS addressing mode

From: Thomas Gleixner <[email protected]>

Explain how the GS/FS based addressing can be utilized in user space
applications along with the differences between the generic prctl() based
GS/FS base control and the FSGSBASE version available on newer CPUs.

Originally-by: Andi Kleen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: Jonathan Corbet <[email protected]>
---
Documentation/x86/x86_64/fsgs.rst | 199 +++++++++++++++++++++++++++++
Documentation/x86/x86_64/index.rst | 1 +
2 files changed, 200 insertions(+)
create mode 100644 Documentation/x86/x86_64/fsgs.rst

diff --git a/Documentation/x86/x86_64/fsgs.rst b/Documentation/x86/x86_64/fsgs.rst
new file mode 100644
index 0000000000000..50960e09e1f66
--- /dev/null
+++ b/Documentation/x86/x86_64/fsgs.rst
@@ -0,0 +1,199 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Using FS and GS segments in user space applications
+===================================================
+
+The x86 architecture supports segmentation. Instructions which access
+memory can use segment register based addressing mode. The following
+notation is used to address a byte within a segment:
+
+ Segment-register:Byte-address
+
+The segment base address is added to the Byte-address to compute the
+resulting virtual address which is accessed. This allows to access multiple
+instances of data with the identical Byte-address, i.e. the same code. The
+selection of a particular instance is purely based on the base-address in
+the segment register.
+
+In 32-bit mode the CPU provides 6 segments, which also support segment
+limits. The limits can be used to enforce address space protections.
+
+In 64-bit mode the CS/SS/DS/ES segments are ignored and the base address is
+always 0 to provide a full 64bit address space. The FS and GS segments are
+still functional in 64-bit mode.
+
+Common FS and GS usage
+------------------------------
+
+The FS segment is commonly used to address Thread Local Storage (TLS). FS
+is usually managed by runtime code or a threading library. Variables
+declared with the '__thread' storage class specifier are instantiated per
+thread and the compiler emits the FS: address prefix for accesses to these
+variables. Each thread has its own FS base address so common code can be
+used without complex address offset calculations to access the per thread
+instances. Applications should not use FS for other purposes when they use
+runtimes or threading libraries which manage the per thread FS.
+
+The GS segment has no common use and can be used freely by
+applications. GCC and Clang support GS based addressing via address space
+identifiers.
+
+Reading and writing the FS/GS base address
+------------------------------------------
+
+There exist two mechanisms to read and write the FS/GS base address:
+
+ - the arch_prctl() system call
+
+ - the FSGSBASE instruction family
+
+Accessing FS/GS base with arch_prctl()
+--------------------------------------
+
+ The arch_prctl(2) based mechanism is available on all 64-bit CPUs and all
+ kernel versions.
+
+ Reading the base:
+
+ arch_prctl(ARCH_GET_FS, &fsbase);
+ arch_prctl(ARCH_GET_GS, &gsbase);
+
+ Writing the base:
+
+ arch_prctl(ARCH_SET_FS, fsbase);
+ arch_prctl(ARCH_SET_GS, gsbase);
+
+ The ARCH_SET_GS prctl may be disabled depending on kernel configuration
+ and security settings.
+
+Accessing FS/GS base with the FSGSBASE instructions
+---------------------------------------------------
+
+ With the Ivy Bridge CPU generation Intel introduced a new set of
+ instructions to access the FS and GS base registers directly from user
+ space. These instructions are also supported on AMD Family 17H CPUs. The
+ following instructions are available:
+
+ =============== ===========================
+ RDFSBASE %reg Read the FS base register
+ RDGSBASE %reg Read the GS base register
+ WRFSBASE %reg Write the FS base register
+ WRGSBASE %reg Write the GS base register
+ =============== ===========================
+
+ The instructions avoid the overhead of the arch_prctl() syscall and allow
+ more flexible usage of the FS/GS addressing modes in user space
+ applications. This does not prevent conflicts between threading libraries
+ and runtimes which utilize FS and applications which want to use it for
+ their own purpose.
+
+FSGSBASE instructions enablement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ The instructions are enumerated in CPUID leaf 7, bit 0 of EBX. If
+ available /proc/cpuinfo shows 'fsgsbase' in the flag entry of the CPUs.
+
+ The availability of the instructions does not enable them
+ automatically. The kernel has to enable them explicitly in CR4. The
+ reason for this is that older kernels make assumptions about the values in
+ the GS register and enforce them when GS base is set via
+ arch_prctl(). Allowing user space to write arbitrary values to GS base
+ would violate these assumptions and cause malfunction.
+
+ On kernels which do not enable FSGSBASE the execution of the FSGSBASE
+ instructions will fault with a #UD exception.
+
+ The kernel provides reliable information about the enabled state in the
+ ELF AUX vector. If the HWCAP2_FSGSBASE bit is set in the AUX vector, the
+ kernel has FSGSBASE instructions enabled and applications can use them.
+ The following code example shows how this detection works::
+
+ #include <sys/auxv.h>
+ #include <elf.h>
+
+ /* Will be eventually in asm/hwcap.h */
+ #ifndef HWCAP2_FSGSBASE
+ #define HWCAP2_FSGSBASE (1 << 1)
+ #endif
+
+ ....
+
+ unsigned val = getauxval(AT_HWCAP2);
+
+ if (val & HWCAP2_FSGSBASE)
+ printf("FSGSBASE enabled\n");
+
+FSGSBASE instructions compiler support
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+GCC version 4.6.4 and newer provide instrinsics for the FSGSBASE
+instructions. Clang 5 supports them as well.
+
+ =================== ===========================
+ _readfsbase_u64() Read the FS base register
+ _readfsbase_u64() Read the GS base register
+ _writefsbase_u64() Write the FS base register
+ _writegsbase_u64() Write the GS base register
+ =================== ===========================
+
+To utilize these instrinsics <immintrin.h> must be included in the source
+code and the compiler option -mfsgsbase has to be added.
+
+Compiler support for FS/GS based addressing
+-------------------------------------------
+
+GCC version 6 and newer provide support for FS/GS based addressing via
+Named Address Spaces. GCC implements the following address space
+identifiers for x86:
+
+ ========= ====================================
+ __seg_fs Variable is addressed relative to FS
+ __seg_gs Variable is addressed relative to GS
+ ========= ====================================
+
+The preprocessor symbols __SEG_FS and __SEG_GS are defined when these
+address spaces are supported. Code which implements fallback modes should
+check whether these symbols are defined. Usage example::
+
+ #ifdef __SEG_GS
+
+ long data0 = 0;
+ long data1 = 1;
+
+ long __seg_gs *ptr;
+
+ /* Check whether FSGSBASE is enabled by the kernel (HWCAP2_FSGSBASE) */
+ ....
+
+ /* Set GS base to point to data0 */
+ _writegsbase_u64(&data0);
+
+ /* Access offset 0 of GS */
+ ptr = 0;
+ printf("data0 = %ld\n", *ptr);
+
+ /* Set GS base to point to data1 */
+ _writegsbase_u64(&data1);
+ /* ptr still addresses offset 0! */
+ printf("data1 = %ld\n", *ptr);
+
+
+Clang does not provide the GCC address space identifiers, but it provides
+address spaces via an attribute based mechanism in Clang 2.6 and newer
+versions:
+
+ ==================================== =====================================
+ __attribute__((address_space(256)) Variable is addressed relative to GS
+ __attribute__((address_space(257)) Variable is addressed relative to FS
+ ==================================== =====================================
+
+FS/GS based addressing with inline assembly
+-------------------------------------------
+
+In case the compiler does not support address spaces, inline assembly can
+be used for FS/GS based addressing mode::
+
+ mov %fs:offset, %reg
+ mov %gs:offset, %reg
+
+ mov %reg, %fs:offset
+ mov %reg, %gs:offset
diff --git a/Documentation/x86/x86_64/index.rst b/Documentation/x86/x86_64/index.rst
index d6eaaa5a35fcd..a56070fc8e77a 100644
--- a/Documentation/x86/x86_64/index.rst
+++ b/Documentation/x86/x86_64/index.rst
@@ -14,3 +14,4 @@ x86_64 Support
fake-numa-for-cpusets
cpu-hotplug-spec
machinecheck
+ fsgs
--
2.20.1

2020-05-09 17:39:47

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 16/18] x86/fsgsbase/64: Enable FSGSBASE on 64bit by default and add a chicken bit

From: Andy Lutomirski <[email protected]>

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

While this changes userspace visible ABI, we could not find a project
that would be affected by this. Few projects were contacted for input
and ack:

- 5-level EPT: http://lkml.kernel.org/r/[email protected]
- rr: https://mail.mozilla.org/pipermail/rr-dev/2018-March/000616.html
- CRIU: https://lists.openvz.org/pipermail/criu/2018-March/040654.html

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 +-
arch/x86/kernel/cpu/common.c | 32 ++++++++-----------
2 files changed, 15 insertions(+), 20 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index af3aaade195b8..1924845c879c2 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3033,8 +3033,7 @@
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.

- unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be
- replaced with a nofsgsbase flag.
+ nofsgsbase [X86] Disables FSGSBASE instructions.

no_console_suspend
[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 4224760c74e27..0d480cbadc7dc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -418,21 +418,21 @@ static void __init setup_cr_pinning(void)
static_key_enable(&cr_pinning.key);
}

-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
{
- unsafe_fsgsbase = true;
+ /* Require an exact match without trailing characters. */
+ if (strlen(arg))
+ return 0;
+
+ /* Do not emit a message if the feature is not present. */
+ if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+ pr_info("FSGSBASE disabled via kernel command line\n");
return 1;
}
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);

/*
* Protection Keys are not available in 32-bit mode.
@@ -1495,12 +1495,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);

/* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
- if (unsafe_fsgsbase)
- cr4_set_bits(X86_CR4_FSGSBASE);
- else
- clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
- }
+ if (cpu_has(c, X86_FEATURE_FSGSBASE))
+ cr4_set_bits(X86_CR4_FSGSBASE);

/*
* The vendor-specific functions might have changed features.
--
2.20.1

2020-05-09 17:40:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 09/18] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

From: Andi Kleen <[email protected]>

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c743de77c..fdd1177499b40 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);

+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+ unsigned long fsbase;
+
+ asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+ return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+ unsigned long gsbase;
+
+ asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+ return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+ asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+ asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
/* Helper functions for reading/writing FS/GS base */

static inline unsigned long x86_fsbase_read_cpu(void)
--
2.20.1

2020-05-09 17:41:00

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 04/18] x86/entry/64: Clean up paranoid exit

From: Andy Lutomirski <[email protected]>

All that paranoid exit needs to do is to disable IRQs, handle IRQ tracing,
then restore CR3, and restore GS base. Simply do those actions in that
order. Cleaning up the spaghetti code.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Vegard Nossum <[email protected]>
---
arch/x86/entry/entry_64.S | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 0e9504fabe526..3adb3c8e2409b 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1266,19 +1266,25 @@ SYM_CODE_END(paranoid_entry)
SYM_CODE_START_LOCAL(paranoid_exit)
UNWIND_HINT_REGS
DISABLE_INTERRUPTS(CLBR_ANY)
+
+ /*
+ * The order of operations is important. IRQ tracing requires
+ * kernel GS base and CR3. RESTORE_CR3 requires kernel GS base.
+ *
+ * NB to anyone to try to optimize this code: this code does
+ * not execute at all for exceptions from user mode. Those
+ * exceptions go through error_exit instead.
+ */
TRACE_IRQS_OFF_DEBUG
- testl %ebx, %ebx /* swapgs needed? */
- jnz .Lparanoid_exit_no_swapgs
- TRACE_IRQS_IRETQ
- /* Always restore stashed CR3 value (see paranoid_entry) */
- RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
+ RESTORE_CR3 scratch_reg=%rax save_reg=%r14
+
+ /* If EBX is 0, SWAPGS is required */
+ testl %ebx, %ebx
+ jnz restore_regs_and_return_to_kernel
+
+ /* We are returning to a context with user GS base */
SWAPGS_UNSAFE_STACK
jmp restore_regs_and_return_to_kernel
-.Lparanoid_exit_no_swapgs:
- TRACE_IRQS_IRETQ_DEBUG
- /* Always restore stashed CR3 value (see paranoid_entry) */
- RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
- jmp restore_regs_and_return_to_kernel
SYM_CODE_END(paranoid_exit)

/*
--
2.20.1

2020-05-09 17:41:12

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 15/18] selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE

From: "Chang S. Bae" <[email protected]>

This validates that GS selector and base are independently preserved in
ptrace commands.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
tools/testing/selftests/x86/fsgsbase.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index 950a48b2e3662..9a4349813a30a 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -465,7 +465,7 @@ static void test_ptrace_write_gsbase(void)
wait(&status);

if (WSTOPSIG(status) == SIGTRAP) {
- unsigned long gs;
+ unsigned long gs, base;
unsigned long gs_offset = USER_REGS_OFFSET(gs);
unsigned long base_offset = USER_REGS_OFFSET(gs_base);

@@ -481,6 +481,7 @@ static void test_ptrace_write_gsbase(void)
err(1, "PTRACE_POKEUSER");

gs = ptrace(PTRACE_PEEKUSER, child, gs_offset, NULL);
+ base = ptrace(PTRACE_PEEKUSER, child, base_offset, NULL);

/*
* In a non-FSGSBASE system, the nonzero selector will load
@@ -501,8 +502,14 @@ static void test_ptrace_write_gsbase(void)
*/
if (gs == 0)
printf("\tNote: this is expected behavior on older kernels.\n");
+ } else if (have_fsgsbase && (base != 0xFF)) {
+ nerrs++;
+ printf("[FAIL]\tGSBASE changed to %lx\n", base);
} else {
- printf("[OK]\tGS remained 0x%hx\n", *shared_scratch);
+ printf("[OK]\tGS remained 0x%hx", *shared_scratch);
+ if (have_fsgsbase)
+ printf(" and GSBASE changed to 0xFF");
+ printf("\n");
}
}

--
2.20.1

2020-05-09 17:41:46

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 14/18] x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation

From: Tony Luck <[email protected]>

Before enabling FSGSBASE the kernel could safely assume that the content
of GS base was a user address. Thus any speculative access as the result
of a mispredicted branch controlling the execution of SWAPGS would be to
a user address. So systems with speculation-proof SMAP did not need to
add additional LFENCE instructions to mitigate.

With FSGSBASE enabled a hostile user can set GS base to a kernel address.
So they can make the kernel speculatively access data they wish to leak
via a side channel. This means that SMAP provides no protection.

Add FSGSBASE as an additional condition to enable the fence-based SWAPGS
mitigation.

Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kernel/cpu/bugs.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ed54b3b21c396..487603ea51cd1 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -450,14 +450,12 @@ static void __init spectre_v1_select_mitigation(void)
* If FSGSBASE is enabled, the user can put a kernel address in
* GS, in which case SMAP provides no protection.
*
- * [ NOTE: Don't check for X86_FEATURE_FSGSBASE until the
- * FSGSBASE enablement patches have been merged. ]
- *
* If FSGSBASE is disabled, the user can only put a user space
* address in GS. That makes an attack harder, but still
* possible if there's no SMAP protection.
*/
- if (!smap_works_speculatively()) {
+ if (boot_cpu_has(X86_FEATURE_FSGSBASE) ||
+ !smap_works_speculatively()) {
/*
* Mitigation can be provided from SWAPGS itself or
* PTI as the CR3 write in the Meltdown mitigation
--
2.20.1

2020-05-09 17:42:36

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 12/18] x86/fsgsbase/64: move save_fsgs to header file

Given copy_thread_tls() is now shared between 32 and 64 bit and we need
to use save_fsgs() there, move it to a header file.

Signed-off-by: Sasha Levin <[email protected]>
---
arch/x86/kernel/process.h | 68 ++++++++++++++++++++++++++++++++++++
arch/x86/kernel/process_64.c | 68 ------------------------------------
2 files changed, 68 insertions(+), 68 deletions(-)

diff --git a/arch/x86/kernel/process.h b/arch/x86/kernel/process.h
index 1d0797b2338a2..e21b6669a3851 100644
--- a/arch/x86/kernel/process.h
+++ b/arch/x86/kernel/process.h
@@ -37,3 +37,71 @@ static inline void switch_to_extra(struct task_struct *prev,
prev_tif & _TIF_WORK_CTXSW_PREV))
__switch_to_xtra(prev, next);
}
+
+enum which_selector {
+ FS,
+ GS
+};
+
+/*
+ * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
+ * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
+ * It's forcibly inlined because it'll generate better code and this function
+ * is hot.
+ */
+static __always_inline void save_base_legacy(struct task_struct *prev_p,
+ unsigned short selector,
+ enum which_selector which)
+{
+ if (likely(selector == 0)) {
+ /*
+ * On Intel (without X86_BUG_NULL_SEG), the segment base could
+ * be the pre-existing saved base or it could be zero. On AMD
+ * (with X86_BUG_NULL_SEG), the segment base could be almost
+ * anything.
+ *
+ * This branch is very hot (it's hit twice on almost every
+ * context switch between 64-bit programs), and avoiding
+ * the RDMSR helps a lot, so we just assume that whatever
+ * value is already saved is correct. This matches historical
+ * Linux behavior, so it won't break existing applications.
+ *
+ * To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
+ * report that the base is zero, it needs to actually be zero:
+ * see the corresponding logic in load_seg_legacy.
+ */
+ } else {
+ /*
+ * If the selector is 1, 2, or 3, then the base is zero on
+ * !X86_BUG_NULL_SEG CPUs and could be anything on
+ * X86_BUG_NULL_SEG CPUs. In the latter case, Linux
+ * has never attempted to preserve the base across context
+ * switches.
+ *
+ * If selector > 3, then it refers to a real segment, and
+ * saving the base isn't necessary.
+ */
+ if (which == FS)
+ prev_p->thread.fsbase = 0;
+ else
+ prev_p->thread.gsbase = 0;
+ }
+}
+
+static __always_inline void save_fsgs(struct task_struct *task)
+{
+ savesegment(fs, task->thread.fsindex);
+ savesegment(gs, task->thread.gsindex);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /*
+ * If FSGSBASE is enabled, we can't make any useful guesses
+ * about the base, and user code expects us to save the current
+ * value. Fortunately, reading the base directly is efficient.
+ */
+ task->thread.fsbase = rdfsbase();
+ task->thread.gsbase = x86_gsbase_read_cpu_inactive();
+ } else {
+ save_base_legacy(task, task->thread.fsindex, FS);
+ save_base_legacy(task, task->thread.gsindex, GS);
+ }
+}
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e066750be89a0..4be88124d81ea 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -145,74 +145,6 @@ void release_thread(struct task_struct *dead_task)
WARN_ON(dead_task->mm);
}

-enum which_selector {
- FS,
- GS
-};
-
-/*
- * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
- * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
- * It's forcibly inlined because it'll generate better code and this function
- * is hot.
- */
-static __always_inline void save_base_legacy(struct task_struct *prev_p,
- unsigned short selector,
- enum which_selector which)
-{
- if (likely(selector == 0)) {
- /*
- * On Intel (without X86_BUG_NULL_SEG), the segment base could
- * be the pre-existing saved base or it could be zero. On AMD
- * (with X86_BUG_NULL_SEG), the segment base could be almost
- * anything.
- *
- * This branch is very hot (it's hit twice on almost every
- * context switch between 64-bit programs), and avoiding
- * the RDMSR helps a lot, so we just assume that whatever
- * value is already saved is correct. This matches historical
- * Linux behavior, so it won't break existing applications.
- *
- * To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
- * report that the base is zero, it needs to actually be zero:
- * see the corresponding logic in load_seg_legacy.
- */
- } else {
- /*
- * If the selector is 1, 2, or 3, then the base is zero on
- * !X86_BUG_NULL_SEG CPUs and could be anything on
- * X86_BUG_NULL_SEG CPUs. In the latter case, Linux
- * has never attempted to preserve the base across context
- * switches.
- *
- * If selector > 3, then it refers to a real segment, and
- * saving the base isn't necessary.
- */
- if (which == FS)
- prev_p->thread.fsbase = 0;
- else
- prev_p->thread.gsbase = 0;
- }
-}
-
-static __always_inline void save_fsgs(struct task_struct *task)
-{
- savesegment(fs, task->thread.fsindex);
- savesegment(gs, task->thread.gsindex);
- if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
- /*
- * If FSGSBASE is enabled, we can't make any useful guesses
- * about the base, and user code expects us to save the current
- * value. Fortunately, reading the base directly is efficient.
- */
- task->thread.fsbase = rdfsbase();
- task->thread.gsbase = x86_gsbase_read_cpu_inactive();
- } else {
- save_base_legacy(task, task->thread.fsindex, FS);
- save_base_legacy(task, task->thread.gsindex, GS);
- }
-}
-
#if IS_ENABLED(CONFIG_KVM)
/*
* While a process is running,current->thread.fsbase and current->thread.gsbase
--
2.20.1

2020-05-09 17:42:48

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 06/18] x86/entry/64: Introduce the FIND_PERCPU_BASE macro

From: "Chang S. Bae" <[email protected]>

GS base is used to find per-CPU data in the kernel. But when GS base is
unknown, the per-CPU base can be found from the per_cpu_offset table with a
CPU NR. The CPU NR is extracted from the limit field of the CPUNODE entry
in GDT, or by the RDPID instruction. This is a prerequisite for using
FSGSBASE in the low level entry code.

Also, add the GAS-compatible RDPID macro as binutils 2.21 does not support
it. Support is added in version 2.27.

Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Vegard Nossum <[email protected]>
---
arch/x86/entry/calling.h | 34 ++++++++++++++++++++++++++++++++++
arch/x86/include/asm/inst.h | 15 +++++++++++++++
2 files changed, 49 insertions(+)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0789e13ece905..0eb134e18b7a9 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -6,6 +6,7 @@
#include <asm/percpu.h>
#include <asm/asm-offsets.h>
#include <asm/processor-flags.h>
+#include <asm/inst.h>

/*

@@ -347,6 +348,39 @@ For 32-bit we have the following conventions - kernel is built with
#endif
.endm

+#ifdef CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+ movq $__CPUNODE_SEG, \reg
+ lsl \reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GS base value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro GET_PERCPU_BASE reg:req
+ ALTERNATIVE \
+ "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+ "RDPID \reg", \
+ X86_FEATURE_RDPID
+ andq $VDSO_CPUNODE_MASK, \reg
+ movq __per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro GET_PERCPU_BASE reg:req
+ movq pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
/*
* This does 'call enter_from_user_mode' unless we can avoid it based on
* kernel config or using the static jump infrastructure.
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796da07f88..d063841a17e39 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
.endif
MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
.endm
+
+.macro RDPID opd
+ REG_TYPE rdpid_opd_type \opd
+ .if rdpid_opd_type == REG_TYPE_R64
+ R64_NUM rdpid_opd \opd
+ .else
+ R32_NUM rdpid_opd \opd
+ .endif
+ .byte 0xf3
+ .if rdpid_opd > 7
+ PFX_REX rdpid_opd 0
+ .endif
+ .byte 0x0f, 0xc7
+ MODRM 0xc0 rdpid_opd 0x7
+.endm
#endif

#endif
--
2.20.1

2020-05-09 17:42:49

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 07/18] x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit

From: "Chang S. Bae" <[email protected]>

Without FSGSBASE, user space cannot change GS base other than through a
PRCTL. The kernel enforces that the user space GS base value is positive
as negative values are used for detecting the kernel space GS base value
in the paranoid entry code.

If FSGSBASE is enabled, user space can set arbitrary GS base values without
kernel intervention, including negative ones, which breaks the paranoid
entry assumptions.

To avoid this, paranoid entry needs to unconditionally save the current
GS base value independent of the interrupted context, retrieve and write
the kernel GS base and unconditionally restore the saved value on exit.
The restore happens either in paranoid exit or in the special exit path of
the NMI low level code.

All other entry code paths which use unconditional SWAPGS are not affected
as they do not depend on the actual content.

The new logic for paranoid entry, when FSGSBASE is enabled, removes SWAPGS
and replaces with unconditional WRGSBASE. Hence no fences are needed.

Suggested-by: H. Peter Anvin <[email protected]>
Suggested-by: Andy Lutomirski <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Acked-by: Tom Lendacky <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tom Lendacky <[email protected]>
Cc: Vegard Nossum <[email protected]>
---
arch/x86/entry/calling.h | 6 +++
arch/x86/entry/entry_64.S | 78 ++++++++++++++++++++++++++++++++++-----
2 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 0eb134e18b7a9..5f3a8ecaddc2d 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -340,6 +340,12 @@ For 32-bit we have the following conventions - kernel is built with
#endif
.endm

+.macro SAVE_AND_SET_GSBASE scratch_reg:req save_reg:req
+ rdgsbase \save_reg
+ GET_PERCPU_BASE \scratch_reg
+ wrgsbase \scratch_reg
+.endm
+
#endif /* CONFIG_X86_64 */

.macro STACKLEAK_ERASE
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 7f27626f8426f..a4fd01c8f2970 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
#include <asm/export.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
#include <linux/err.h>

#include "calling.h"
@@ -1211,9 +1212,14 @@ idtentry machine_check do_mce has_error_code=0 paranoid=1
#endif

/*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs. Return GS base related information
+ * in EBX depending on the availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE R/EBX
+ * N 0 -> SWAPGS on exit
+ * 1 -> no SWAPGS on exit
+ *
+ * Y GS base value at entry, must be restored in paranoid_exit
*/
SYM_CODE_START_LOCAL(paranoid_entry)
UNWIND_HINT_FUNC
@@ -1238,7 +1244,29 @@ SYM_CODE_START_LOCAL(paranoid_entry)
*/
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14

- /* EBX = 1 -> kernel GSBASE active, no restore required */
+ /*
+ * Handling GS base depends on the availability of FSGSBASE.
+ *
+ * Without FSGSBASE the kernel enforces that negative GS base
+ * values indicate kernel GS base. With FSGSBASE no assumptions
+ * can be made about the GS base value when entering from user
+ * space.
+ */
+ ALTERNATIVE "jmp .Lparanoid_entry_checkgs", "", X86_FEATURE_FSGSBASE
+
+ /*
+ * Read the current GS base and store it in %rbx unconditionally,
+ * retrieve and set the current CPUs kernel GS base. The stored value
+ * has to be restored in paranoid_exit unconditionally.
+ *
+ * This unconditional write of GS base ensures no subsequent load
+ * based on a mispredicted GS base.
+ */
+ SAVE_AND_SET_GSBASE scratch_reg=%rax save_reg=%rbx
+ ret
+
+.Lparanoid_entry_checkgs:
+ /* EBX = 1 -> kernel GS base active, no restore required */
movl $1, %ebx
/*
* The kernel-enforced convention is a negative GS base indicates
@@ -1265,10 +1293,17 @@ SYM_CODE_END(paranoid_entry)
*
* We may be returning to very strange contexts (e.g. very early
* in syscall entry), so checking for preemption here would
- * be complicated. Fortunately, we there's no good reason
- * to try to handle preemption here.
+ * be complicated. Fortunately, there's no good reason to try
+ * to handle preemption here.
+ *
+ * R/EBX contains the GS base related information depending on the
+ * availability of the FSGSBASE instructions:
+ *
+ * FSGSBASE R/EBX
+ * N 0 -> SWAPGS on exit
+ * 1 -> no SWAPGS on exit
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * Y User space GS base, must be restored unconditionally
*/
SYM_CODE_START_LOCAL(paranoid_exit)
UNWIND_HINT_REGS
@@ -1285,7 +1320,15 @@ SYM_CODE_START_LOCAL(paranoid_exit)
TRACE_IRQS_OFF_DEBUG
RESTORE_CR3 scratch_reg=%rax save_reg=%r14

- /* If EBX is 0, SWAPGS is required */
+ /* Handle the three GS base cases */
+ ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE
+
+ /* With FSGSBASE enabled, unconditionally resotre GS base */
+ wrgsbase %rbx
+ jmp restore_regs_and_return_to_kernel
+
+.Lparanoid_exit_checkgs:
+ /* On non-FSGSBASE systems, conditionally do SWAPGS */
testl %ebx, %ebx
jnz restore_regs_and_return_to_kernel

@@ -1699,10 +1742,27 @@ end_repeat_nmi:
/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%r15 save_reg=%r14

- testl %ebx, %ebx /* swapgs needed? */
+ /*
+ * The above invocation of paranoid_entry stored the GS base
+ * related information in R/EBX depending on the availability
+ * of FSGSBASE.
+ *
+ * If FSGSBASE is enabled, restore the saved GS base value
+ * unconditionally, otherwise take the conditional SWAPGS path.
+ */
+ ALTERNATIVE "jmp nmi_no_fsgsbase", "", X86_FEATURE_FSGSBASE
+
+ wrgsbase %rbx
+ jmp nmi_restore
+
+nmi_no_fsgsbase:
+ /* EBX == 0 -> invoke SWAPGS */
+ testl %ebx, %ebx
jnz nmi_restore
+
nmi_swapgs:
SWAPGS_UNSAFE_STACK
+
nmi_restore:
POP_REGS

--
2.20.1

2020-05-09 17:42:51

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 03/18] x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

From: Andy Lutomirski <[email protected]>

This is temporary. It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole. Don't do it.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 +++
arch/x86/kernel/cpu/common.c | 24 +++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7bc83f3d9bdfe..af3aaade195b8 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3033,6 +3033,9 @@
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.

+ unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be
+ replaced with a nofsgsbase flag.
+
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index bed0cb83fe245..4224760c74e27 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -418,6 +418,22 @@ static void __init setup_cr_pinning(void)
static_key_enable(&cr_pinning.key);
}

+/*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+ unsafe_fsgsbase = true;
+ return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
/*
* Protection Keys are not available in 32-bit mode.
*/
@@ -1478,6 +1494,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_smap(c);
setup_umip(c);

+ /* Enable FSGSBASE instructions if available. */
+ if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+ if (unsafe_fsgsbase)
+ cr4_set_bits(X86_CR4_FSGSBASE);
+ else
+ clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+ }
+
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
--
2.20.1

2020-05-09 17:42:54

by Sasha Levin

[permalink] [raw]
Subject: [PATCH v11 01/18] x86/ptrace: Prevent ptrace from clearing the FS/GS selector

From: "Chang S. Bae" <[email protected]>

When a ptracer writes a ptracee's FS/GS base with a different value, the
selector is also cleared. While this behavior is incorrect as the selector
should be preserved, most userspace applications did not notice that as
they do not use non-zero segments to begin with.

Instead, with this patch, when a tracee sets the base we will let it do
so without clearing the selector.

The change above means that a tracee that already has a selector set
will fail in an attempt to set the base - the change won't stick and the
value will be instead based on the value of the selector. As with the
above, we haven't found userspace that would be affected by this change.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
[sasha: rewrite commit message]
Signed-off-by: Sasha Levin <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/kernel/ptrace.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index f0e1ddbc2fd78..cc56efb75d275 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -380,25 +380,12 @@ static int putreg(struct task_struct *child,
case offsetof(struct user_regs_struct,fs_base):
if (value >= TASK_SIZE_MAX)
return -EIO;
- /*
- * When changing the FS base, use do_arch_prctl_64()
- * to set the index to zero and to set the base
- * as requested.
- *
- * NB: This behavior is nonsensical and likely needs to
- * change when FSGSBASE support is added.
- */
- if (child->thread.fsbase != value)
- return do_arch_prctl_64(child, ARCH_SET_FS, value);
+ x86_fsbase_write_task(child, value);
return 0;
case offsetof(struct user_regs_struct,gs_base):
- /*
- * Exactly the same here as the %fs handling above.
- */
if (value >= TASK_SIZE_MAX)
return -EIO;
- if (child->thread.gsbase != value)
- return do_arch_prctl_64(child, ARCH_SET_GS, value);
+ x86_gsbase_write_task(child, value);
return 0;
#endif
}
--
2.20.1

2020-05-10 00:45:39

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v11 12/18] x86/fsgsbase/64: move save_fsgs to header file

Hi Sasha,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/asm]
[also build test ERROR on tip/auto-latest linus/master tip/x86/core v5.7-rc4 next-20200508]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Sasha-Levin/Enable-FSGSBASE-instructions/20200510-032805
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 2ce0d7f9766f0e49bb54f149c77bae89464932fb
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from arch/x86/include/uapi/asm/ptrace.h:6:0,
from arch/x86/include/asm/ptrace.h:7,
from arch/x86/include/asm/math_emu.h:5,
from arch/x86/include/asm/processor.h:13,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from arch/x86/kernel/process.c:6:
>> arch/x86/include/uapi/asm/ptrace-abi.h:16:12: error: expected identifier before numeric constant
#define FS 9
^
>> arch/x86/kernel/process.h:42:2: note: in expansion of macro 'FS'
FS,
^~
In file included from arch/x86/kernel/process.c:46:0:
arch/x86/kernel/process.h: In function 'save_base_legacy':
>> arch/x86/kernel/process.h:85:18: error: 'struct thread_struct' has no member named 'fsbase'
prev_p->thread.fsbase = 0;
^
>> arch/x86/kernel/process.h:87:18: error: 'struct thread_struct' has no member named 'gsbase'
prev_p->thread.gsbase = 0;
^
In file included from arch/x86/include/asm/ptrace.h:5:0,
from arch/x86/include/asm/math_emu.h:5,
from arch/x86/include/asm/processor.h:13,
from arch/x86/include/asm/cpufeature.h:5,
from arch/x86/include/asm/thread_info.h:53,
from include/linux/thread_info.h:38,
from arch/x86/include/asm/preempt.h:7,
from include/linux/preempt.h:78,
from include/linux/spinlock.h:51,
from include/linux/mmzone.h:8,
from include/linux/gfp.h:6,
from include/linux/mm.h:10,
from arch/x86/kernel/process.c:6:
arch/x86/kernel/process.h: In function 'save_fsgs':
>> arch/x86/kernel/process.h:93:30: error: 'struct thread_struct' has no member named 'fsindex'
savesegment(fs, task->thread.fsindex);
^
arch/x86/include/asm/segment.h:368:32: note: in definition of macro 'savesegment'
asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
^~~~~
>> arch/x86/kernel/process.h:94:30: error: 'struct thread_struct' has no member named 'gsindex'
savesegment(gs, task->thread.gsindex);
^
arch/x86/include/asm/segment.h:368:32: note: in definition of macro 'savesegment'
asm("mov %%" #seg ",%0":"=r" (value) : : "memory")
^~~~~
In file included from arch/x86/kernel/process.c:46:0:
arch/x86/kernel/process.h:101:15: error: 'struct thread_struct' has no member named 'fsbase'
task->thread.fsbase = rdfsbase();
^
>> arch/x86/kernel/process.h:101:25: error: implicit declaration of function 'rdfsbase'; did you mean 'rb_erase'? [-Werror=implicit-function-declaration]
task->thread.fsbase = rdfsbase();
^~~~~~~~
rb_erase
arch/x86/kernel/process.h:102:15: error: 'struct thread_struct' has no member named 'gsbase'
task->thread.gsbase = x86_gsbase_read_cpu_inactive();
^
>> arch/x86/kernel/process.h:102:25: error: implicit declaration of function 'x86_gsbase_read_cpu_inactive' [-Werror=implicit-function-declaration]
task->thread.gsbase = x86_gsbase_read_cpu_inactive();
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/x86/kernel/process.h:104:38: error: 'struct thread_struct' has no member named 'fsindex'
save_base_legacy(task, task->thread.fsindex, FS);
^
arch/x86/kernel/process.h:105:38: error: 'struct thread_struct' has no member named 'gsindex'
save_base_legacy(task, task->thread.gsindex, GS);
^
cc1: some warnings being treated as errors

vim +85 arch/x86/kernel/process.h

40
41 enum which_selector {
> 42 FS,
43 GS
44 };
45
46 /*
47 * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
48 * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
49 * It's forcibly inlined because it'll generate better code and this function
50 * is hot.
51 */
52 static __always_inline void save_base_legacy(struct task_struct *prev_p,
53 unsigned short selector,
54 enum which_selector which)
55 {
56 if (likely(selector == 0)) {
57 /*
58 * On Intel (without X86_BUG_NULL_SEG), the segment base could
59 * be the pre-existing saved base or it could be zero. On AMD
60 * (with X86_BUG_NULL_SEG), the segment base could be almost
61 * anything.
62 *
63 * This branch is very hot (it's hit twice on almost every
64 * context switch between 64-bit programs), and avoiding
65 * the RDMSR helps a lot, so we just assume that whatever
66 * value is already saved is correct. This matches historical
67 * Linux behavior, so it won't break existing applications.
68 *
69 * To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
70 * report that the base is zero, it needs to actually be zero:
71 * see the corresponding logic in load_seg_legacy.
72 */
73 } else {
74 /*
75 * If the selector is 1, 2, or 3, then the base is zero on
76 * !X86_BUG_NULL_SEG CPUs and could be anything on
77 * X86_BUG_NULL_SEG CPUs. In the latter case, Linux
78 * has never attempted to preserve the base across context
79 * switches.
80 *
81 * If selector > 3, then it refers to a real segment, and
82 * saving the base isn't necessary.
83 */
84 if (which == FS)
> 85 prev_p->thread.fsbase = 0;
86 else
> 87 prev_p->thread.gsbase = 0;
88 }
89 }
90
91 static __always_inline void save_fsgs(struct task_struct *task)
92 {
> 93 savesegment(fs, task->thread.fsindex);
> 94 savesegment(gs, task->thread.gsindex);
95 if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
96 /*
97 * If FSGSBASE is enabled, we can't make any useful guesses
98 * about the base, and user code expects us to save the current
99 * value. Fortunately, reading the base directly is efficient.
100 */
> 101 task->thread.fsbase = rdfsbase();
> 102 task->thread.gsbase = x86_gsbase_read_cpu_inactive();

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.05 kB)
.config.gz (70.63 kB)
Download all attachments

2020-05-10 01:42:15

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH v11 00/18] Enable FSGSBASE instructions

On 5/9/20 10:36 AM, Sasha Levin wrote:
> Changes from v10:
>
> - Rewrite the commit message for patch #1.
> - Document communication/acks from userspace projects that are
> potentially affected by this.

I'm glad someone's pushing this forward. But, I'm also very curious how
you came to be submitting this series. Is this a team effort between
you, the And[iy]s and Chang? Or, were you just trying to help out?

I was hoping to see some acknowledgement of this situation in the cover
letter but didn't see anything.

2020-05-10 14:18:32

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v11 00/18] Enable FSGSBASE instructions

On Sat, May 09, 2020 at 06:40:02PM -0700, Dave Hansen wrote:
>On 5/9/20 10:36 AM, Sasha Levin wrote:
>> Changes from v10:
>>
>> - Rewrite the commit message for patch #1.
>> - Document communication/acks from userspace projects that are
>> potentially affected by this.
>
>I'm glad someone's pushing this forward. But, I'm also very curious how
>you came to be submitting this series. Is this a team effort between
>you, the And[iy]s and Chang? Or, were you just trying to help out?
>
>I was hoping to see some acknowledgement of this situation in the cover
>letter but didn't see anything.

What happened here was that v9 needed to be rebased on top of v5.7 which
required some changes. I did the rebase and sent it to Andi and Chang
who have suggested that I'll just send it out myself. There was no
planning beyond that.

My interest in this is that we have a few workloads that value the
ability to access FS/GS base directly and show nice performance
improvement with this patchset. I'm not a fan of carrying stuff out of
tree :)

--
Thanks,
Sasha

2020-05-11 00:55:58

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v11 00/18] Enable FSGSBASE instructions

> My interest in this is that we have a few workloads that value the
> ability to access FS/GS base directly and show nice performance

Can you please share some rough numbers, Sasha?

I would expect everything that does a lot of context switches
to benefit automatically, apart from the new free register (which
requires enabling, but also has great potential)

Also of course NMIs will be faster, so perf will have somewhat
lower overhead when profiling.

-Andi

2020-05-11 04:53:04

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH v11 00/18] Enable FSGSBASE instructions

On Sun, May 10, 2020 at 05:53:19PM -0700, Andi Kleen wrote:
>> My interest in this is that we have a few workloads that value the
>> ability to access FS/GS base directly and show nice performance
>
>Can you please share some rough numbers, Sasha?

I don't have any recent numbers around these - this series effectively
enables certain workloads rather than just improve the performance
somewhat so benchmarking for exact numbers isn't too interesting here.

>I would expect everything that does a lot of context switches
>to benefit automatically, apart from the new free register (which
>requires enabling, but also has great potential)

And even more so when these registers are actually being used for the
purpose they were designed for (this is in the context of secure
computing/enclaves/etc).

--
Thanks,
Sasha