2019-02-01 20:58:01

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 00/13] x86: Enable FSGSBASE instructions

update from v4 [4]:
* Remove the FSGSBASE assembly macros

Update from v3 [3]:
* Raise minimum binutils requirement to use the new instructions directly
* Optimize FIND_PERCPU_BASE macro
* Rename some helper functions, __{rd,wr}gsbase_inactive()
* Use NOKPROBE_SYMBOL instead of __kprobes
* Rebase on top of the helper function fix [6]

Update from v2 [2]:
* Separate out the preparatory patches [5] (merged as of now)
* Bisect the paranoid_entry update patch
* Edit minor nits

Updates from v1 [1]:
* Update the GSBASE update mechanism on the paranoid entry/exit.
* Exclude ptracer backward compatibility patches.
* Include FSGSBASE documentation and enumerating capability
for user space
* Add the TAINT_INSECURE flag.

[1] Version 1: https://lore.kernel.org/patchwork/cover/934843
[2] Version 2: https://lore.kernel.org/patchwork/cover/912063
[3] Version 3: https://lore.kernel.org/patchwork/cover/1002725
[4] Version 4: https://lore.kernel.org/patchwork/cover/1032799
[5] https://lore.kernel.org/patchwork/cover/988180
[6] https://lore.kernel.org/patchwork/patch/1017513

Andi Kleen (3):
x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions
x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2
x86/fsgsbase/64: Add documentation for FSGSBASE

Andy Lutomirski (4):
x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE
x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is
on
selftests/x86/fsgsbase: Test WRGSBASE
x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit

Chang S. Bae (6):
taint: Introduce a new taint flag (insecure)
kbuild: Raise the minimum required binutils version to 2.21
x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions
x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions
if available
x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro
x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry

.../admin-guide/kernel-parameters.txt | 2 +
Documentation/process/changes.rst | 6 +-
Documentation/sysctl/kernel.txt | 1 +
Documentation/x86/fsgs.txt | 104 +++++++++++++++++
arch/x86/entry/entry_64.S | 71 +++++++++---
arch/x86/include/asm/fsgsbase.h | 100 ++++++++++++++--
arch/x86/include/asm/inst.h | 15 +++
arch/x86/include/uapi/asm/hwcap2.h | 3 +
arch/x86/kernel/cpu/common.c | 22 ++++
arch/x86/kernel/process_64.c | 108 ++++++++++++++++--
include/linux/kernel.h | 3 +-
kernel/panic.c | 1 +
tools/testing/selftests/x86/fsgsbase.c | 102 ++++++++++++++++-
13 files changed, 497 insertions(+), 41 deletions(-)
create mode 100644 Documentation/x86/fsgs.txt

--
2.19.1



2019-02-01 20:58:04

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 01/13] taint: Introduce a new taint flag (insecure)

For testing (or root-only) purposes, the new flag will serve to tag the
kernel taint accurately.

When adding a new feature support, patches need to be incrementally
applied and tested with temporal parameters. Currently, there is no flag
for this usage.

Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andi Kleen <[email protected]>
---
Documentation/sysctl/kernel.txt | 1 +
include/linux/kernel.h | 3 ++-
kernel/panic.c | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 379063e58326..fb4244515314 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -1064,6 +1064,7 @@ ORed together. The letters are seen in "Tainted" line of Oops reports.
32768 (K): The kernel has been live patched.
65536 (X): Auxiliary taint, defined and used by for distros.
131072 (T): The kernel was built with the struct randomization plugin.
+262144 (Z): The kernel is running in a known insecure configuration.

==============================================================

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 8f0e68e250a7..dc149ff8cc52 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -599,7 +599,8 @@ extern enum system_states {
#define TAINT_LIVEPATCH 15
#define TAINT_AUX 16
#define TAINT_RANDSTRUCT 17
-#define TAINT_FLAGS_COUNT 18
+#define TAINT_INSECURE 18
+#define TAINT_FLAGS_COUNT 19

struct taint_flag {
char c_true; /* character printed when tainted */
diff --git a/kernel/panic.c b/kernel/panic.c
index f121e6ba7e11..cb6b90538375 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -362,6 +362,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
[ TAINT_LIVEPATCH ] = { 'K', ' ', true },
[ TAINT_AUX ] = { 'X', ' ', true },
[ TAINT_RANDSTRUCT ] = { 'T', ' ', true },
+ [ TAINT_INSECURE ] = { 'Z', ' ', false },
};

/**
--
2.19.1


2019-02-01 20:58:08

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 03/13] kbuild: Raise the minimum required binutils version to 2.21

It helps to use some new instructions directly in inline assembly.

Suggested-by: Andi Kleen <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Linux Torvalds <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
---
Documentation/process/changes.rst | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
index 18735dc460a0..0a18075c485e 100644
--- a/Documentation/process/changes.rst
+++ b/Documentation/process/changes.rst
@@ -31,7 +31,7 @@ you probably needn't concern yourself with isdn4k-utils.
====================== =============== ========================================
GNU C 4.6 gcc --version
GNU make 3.81 make --version
-binutils 2.20 ld -v
+binutils 2.21 ld -v
flex 2.5.35 flex --version
bison 2.0 bison --version
util-linux 2.10o fdformat --version
@@ -77,9 +77,7 @@ You will need GNU make 3.81 or later to build the kernel.
Binutils
--------

-The build system has, as of 4.13, switched to using thin archives (`ar T`)
-rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
-This requires binutils 2.20 or newer.
+Binutils 2.21 or newer is needed to build the kernel.

pkg-config
----------
--
2.19.1


2019-02-01 20:58:20

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 08/13] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro

GSBASE is used to find per-CPU data in the kernel. But when it is unknown,
the per-CPU base can be found from the per_cpu_offset table with a CPU NR.
The CPU NR is extracted from the limit field of the CPUNODE entry in GDT,
or by the RDPID instruction.

Also, add the GAS-compatible RDPID macro.

The new macro will be used on a following patch.

Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 46 +++++++++++++++++++++++++++++++++
arch/x86/include/asm/inst.h | 15 +++++++++++
2 files changed, 61 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index aefd53767a5d..eecca2250748 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -78,6 +78,52 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);

#endif /* CONFIG_X86_64 */

+#else /* __ASSEMBLY__ */
+
+#ifdef CONFIG_X86_64
+
+#include <asm/inst.h>
+
+#if CONFIG_SMP
+
+/*
+ * CPU/node NR is loaded from the limit (size) field of a special segment
+ * descriptor entry in GDT.
+ */
+.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
+ movq $__CPUNODE_SEG, \reg
+ lsl \reg, \reg
+.endm
+
+/*
+ * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
+ * We normally use %gs for accessing per-CPU data, but we are setting up
+ * %gs here and obviously can not use %gs itself to access per-CPU data.
+ */
+.macro FIND_PERCPU_BASE reg:req
+ /*
+ * The CPU/node NR is initialized earlier, directly in cpu_init().
+ * The CPU NR is extracted from it.
+ */
+ ALTERNATIVE \
+ "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
+ "RDPID \reg", \
+ X86_FEATURE_RDPID
+ andq $VDSO_CPUNODE_MASK, \reg
+ movq __per_cpu_offset(, \reg, 8), \reg
+.endm
+
+#else
+
+.macro FIND_PERCPU_BASE reg:req
+ /* Tracking the base offset value */
+ movq pcpu_unit_offsets(%rip), \reg
+.endm
+
+#endif /* CONFIG_SMP */
+
+#endif /* CONFIG_X86_64 */
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_FSGSBASE_H */
diff --git a/arch/x86/include/asm/inst.h b/arch/x86/include/asm/inst.h
index f5a796da07f8..d063841a17e3 100644
--- a/arch/x86/include/asm/inst.h
+++ b/arch/x86/include/asm/inst.h
@@ -306,6 +306,21 @@
.endif
MODRM 0xc0 movq_r64_xmm_opd1 movq_r64_xmm_opd2
.endm
+
+.macro RDPID opd
+ REG_TYPE rdpid_opd_type \opd
+ .if rdpid_opd_type == REG_TYPE_R64
+ R64_NUM rdpid_opd \opd
+ .else
+ R32_NUM rdpid_opd \opd
+ .endif
+ .byte 0xf3
+ .if rdpid_opd > 7
+ PFX_REX rdpid_opd 0
+ .endif
+ .byte 0x0f, 0xc7
+ MODRM 0xc0 rdpid_opd 0x7
+.endm
#endif

#endif
--
2.19.1


2019-02-01 20:58:25

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 13/13] x86/fsgsbase/64: Add documentation for FSGSBASE

From: Andi Kleen <[email protected]>

v2: Minor updates to documentation requested in review.
v3: Update for new gcc and various improvements.

[ chang: Fix some typo. Fix the example code. ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
Documentation/x86/fsgs.txt | 104 +++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
create mode 100644 Documentation/x86/fsgs.txt

diff --git a/Documentation/x86/fsgs.txt b/Documentation/x86/fsgs.txt
new file mode 100644
index 000000000000..7a973a5c1767
--- /dev/null
+++ b/Documentation/x86/fsgs.txt
@@ -0,0 +1,104 @@
+
+Using FS and GS prefixes on 64bit x86 linux
+
+The x86 architecture supports segment prefixes per instruction to add an
+offset to an address. On 64bit x86, these are mostly nops, except for FS
+and GS.
+
+This offers an efficient way to reference a global pointer.
+
+The compiler has to generate special code to use these base registers,
+or they can be accessed with inline assembler.
+
+ mov %gs:offset,%reg
+ mov %fs:offset,%reg
+
+On 64bit code, FS is used to address the thread local segment (TLS), declared using
+__thread. The compiler then automatically generates the correct prefixes and
+relocations to access these values.
+
+FS is normally managed by the runtime code or the threading library
+Overwriting it can break a lot of things (including syscalls and gdb),
+but it can make sense to save/restore it for threading purposes.
+
+GS is freely available, but may need special (compiler or inline assembler)
+code to use.
+
+Traditionally 64bit FS and GS could be set by the arch_prctl system call
+
+ arch_prctl(ARCH_SET_GS, value)
+ arch_prctl(ARCH_SET_FS, value)
+
+[There was also an older method using modify_ldt(), inherited from 32bit,
+but this is not discussed here.]
+
+However using a syscall is problematic for user space threading libraries
+that want to context switch in user space. The whole point of them
+is avoiding the overhead of a syscall. It's also cleaner for compilers
+wanting to use the extra register to use instructions to write
+it, or read it directly to compute addresses and offsets.
+
+Newer Intel CPUs (Ivy Bridge and later) added new instructions to directly
+access these registers quickly from user context
+
+ RDFSBASE %reg read the FS base (or _readfsbase_u64)
+ RDGSBASE %reg read the GS base (or _readgsbase_u64)
+
+ WRFSBASE %reg write the FS base (or _writefsbase_u64)
+ WRGSBASE %reg write the GS base (or _writegsbase_u64)
+
+If you use the intrinsics include <immintrin.h> and set the -mfsgsbase option.
+
+The instructions are supported by the CPU when the "fsgsbase" string is shown in
+/proc/cpuinfo (or directly retrieved through the CPUID instruction,
+7:0 (ebx), word 9, bit 0)
+
+The instructions are only available to 64bit binaries.
+
+In addition the kernel needs to explicitly enable these instructions, as it
+may otherwise not correctly context switch the state. Newer Linux
+kernels enable this. When the kernel did not enable the instruction
+they will fault with an #UD exception.
+
+An FSGSBASE enabled kernel can be detected by checking the AT_HWCAP2
+bitmask in the aux vector. When the HWCAP2_FSGSBASE bit is set the
+kernel supports FSGSBASE.
+
+ #include <sys/auxv.h>
+ #include <elf.h>
+
+ /* Will be eventually in asm/hwcap.h */
+ #define HWCAP2_FSGSBASE (1 << 1)
+
+ unsigned val = getauxval(AT_HWCAP2);
+ if (val & HWCAP2_FSGSBASE) {
+ asm("wrgsbase %0" :: "r" (ptr));
+ }
+
+No extra CPUID check needed as the kernel will not set this bit if the CPU
+does not support it.
+
+gcc 6 will have special support to directly access data relative
+to fs/gs using the __seg_fs and __seg_gs address space pointer
+modifiers.
+
+#ifndef __SEG_GS
+#error "Need gcc 6 or later"
+#endif
+
+struct gsdata {
+ int a;
+ int b;
+} gsdata = { 1, 2 };
+
+int __seg_gs *valp = 0; /* offset relative to GS */
+
+ /* Check if kernel supports FSGSBASE as above */
+
+ /* Set up new GS */
+ asm("wrgsbase %0" :: "r" (&gsdata));
+
+ /* Now the global pointer can be used normally */
+ printf("gsdata.a = %d\n", *valp);
+
+Andi Kleen
--
2.19.1


2019-02-01 20:58:30

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 10/13] selftests/x86/fsgsbase: Test WRGSBASE

From: Andy Lutomirski <[email protected]>

This validates that GS and GSBASE are independently preserved across
context switches.

[ chang: Use FSGSBASE instructions directly instead of .byte ]

Signed-off-by: Andy Lutomirski <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
tools/testing/selftests/x86/fsgsbase.c | 102 ++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/x86/fsgsbase.c b/tools/testing/selftests/x86/fsgsbase.c
index f249e042b3b5..5956475972f1 100644
--- a/tools/testing/selftests/x86/fsgsbase.c
+++ b/tools/testing/selftests/x86/fsgsbase.c
@@ -23,6 +23,7 @@
#include <pthread.h>
#include <asm/ldt.h>
#include <sys/mman.h>
+#include <setjmp.h>

#ifndef __x86_64__
# error This test is 64-bit only
@@ -71,6 +72,43 @@ static void sigsegv(int sig, siginfo_t *si, void *ctx_void)

}

+static jmp_buf jmpbuf;
+
+static void sigill(int sig, siginfo_t *si, void *ctx_void)
+{
+ siglongjmp(jmpbuf, 1);
+}
+
+static bool have_fsgsbase;
+
+static inline unsigned long rdgsbase(void)
+{
+ unsigned long gsbase;
+
+ asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+ return gsbase;
+}
+
+static inline unsigned long rdfsbase(void)
+{
+ unsigned long fsbase;
+
+ asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+ return fsbase;
+}
+
+static inline void wrgsbase(unsigned long gsbase)
+{
+ asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
+static inline void wrfsbase(unsigned long fsbase)
+{
+ asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
enum which_base { FS, GS };

static unsigned long read_base(enum which_base which)
@@ -199,14 +237,16 @@ static void do_remote_base()
to_set, hard_zero ? " and clear gs" : "", sel);
}

-void do_unexpected_base(void)
+static __thread int set_thread_area_entry_number = -1;
+
+static void do_unexpected_base(void)
{
/*
* The goal here is to try to arrange for GS == 0, GSBASE !=
* 0, and for the the kernel the think that GSBASE == 0.
*
* To make the test as reliable as possible, this uses
- * explicit descriptorss. (This is not the only way. This
+ * explicit descriptors. (This is not the only way. This
* could use ARCH_SET_GS with a low, nonzero base, but the
* relevant side effect of ARCH_SET_GS could change.)
*/
@@ -239,7 +279,7 @@ void do_unexpected_base(void)
MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, -1, 0);
memcpy(low_desc, &desc, sizeof(desc));

- low_desc->entry_number = -1;
+ low_desc->entry_number = set_thread_area_entry_number;

/* 32-bit set_thread_area */
long ret;
@@ -254,6 +294,8 @@ void do_unexpected_base(void)
return;
}
printf("\tother thread: using GDT slot %d\n", desc.entry_number);
+ set_thread_area_entry_number = desc.entry_number;
+
asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)((desc.entry_number << 3) | 0x3)));
}

@@ -265,6 +307,34 @@ void do_unexpected_base(void)
asm volatile ("mov %0, %%gs" : : "rm" ((unsigned short)0));
}

+void test_wrbase(unsigned short index, unsigned long base)
+{
+ unsigned short newindex;
+ unsigned long newbase;
+
+ printf("[RUN]\tGS = 0x%hx, GSBASE = 0x%lx\n", index, base);
+
+ asm volatile ("mov %0, %%gs" : : "rm" (index));
+ wrgsbase(base);
+
+ remote_base = 0;
+ ftx = 1;
+ syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);
+ while (ftx != 0)
+ syscall(SYS_futex, &ftx, FUTEX_WAIT, 1, NULL, NULL, 0);
+
+ asm volatile ("mov %%gs, %0" : "=rm" (newindex));
+ newbase = rdgsbase();
+
+ if (newindex == index && newbase == base) {
+ printf("[OK]\tIndex and base were preserved\n");
+ } else {
+ printf("[FAIL]\tAfter switch, GS = 0x%hx and GSBASE = 0x%lx\n",
+ newindex, newbase);
+ nerrs++;
+ }
+}
+
static void *threadproc(void *ctx)
{
while (1) {
@@ -371,6 +441,17 @@ int main()
{
pthread_t thread;

+ /* Probe FSGSBASE */
+ sethandler(SIGILL, sigill, 0);
+ if (sigsetjmp(jmpbuf, 1) == 0) {
+ rdfsbase();
+ have_fsgsbase = true;
+ printf("\tFSGSBASE instructions are enabled\n");
+ } else {
+ printf("\tFSGSBASE instructions are disabled\n");
+ }
+ clearhandler(SIGILL);
+
sethandler(SIGSEGV, sigsegv, 0);

check_gs_value(0);
@@ -417,6 +498,21 @@ int main()

test_unexpected_base();

+ if (have_fsgsbase) {
+ unsigned short ss;
+
+ asm volatile ("mov %%ss, %0" : "=rm" (ss));
+
+ test_wrbase(0, 0);
+ test_wrbase(0, 1);
+ test_wrbase(0, 0x200000000);
+ test_wrbase(0, 0xffffffffffffffff);
+ test_wrbase(ss, 0);
+ test_wrbase(ss, 1);
+ test_wrbase(ss, 0x200000000);
+ test_wrbase(ss, 0xffffffffffffffff);
+ }
+
ftx = 3; /* Kill the thread. */
syscall(SYS_futex, &ftx, FUTEX_WAKE, 0, NULL, NULL, 0);

--
2.19.1


2019-02-01 20:58:41

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 09/13] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry

The FSGSBASE instructions allow fast accesses on GSBASE. Now, at the
paranoid_entry, the per-CPU base value can be always copied to GSBASE.
And the original GSBASE value will be restored at the exit.

So far, GSBASE modification has not been directly allowed from userspace.
So, swapping GSBASE has been conditionally executed according to the
kernel-enforced convention that a negative GSBASE indicates a kernel value.
But when FSGSBASE is enabled, userspace can put an arbitrary value in
GSBASE. The change will secure a correct GSBASE value with FSGSBASE.

Also, factor out the RDMSR-based GSBASE read into a new macro,
READ_MSR_GSBASE.

Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
---
arch/x86/entry/entry_64.S | 71 +++++++++++++++++++++++++++------
arch/x86/include/asm/fsgsbase.h | 9 +++++
2 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb7b629..9df528565e40 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -38,6 +38,7 @@
#include <asm/export.h>
#include <asm/frame.h>
#include <asm/nospec-branch.h>
+#include <asm/fsgsbase.h>
#include <linux/err.h>

#include "calling.h"
@@ -934,10 +935,14 @@ ENTRY(\sym)
addq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
.endif

- /* these procedures expect "no swapgs" flag in ebx */
.if \paranoid
+ /*
+ * With FSGSBASE, original GSBASE is stored in %rbx
+ * Without FSGSBASE, expect "no swapgs" flag in %ebx
+ */
jmp paranoid_exit
.else
+ /* Expect "no swapgs" flag in %ebx */
jmp error_exit
.endif

@@ -1151,22 +1156,24 @@ idtentry machine_check do_mce has_error_code=0 paranoid=1
#endif

/*
- * Save all registers in pt_regs, and switch gs if needed.
- * Use slow, but surefire "are we in kernel?" check.
- * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
+ * Save all registers in pt_regs.
+ *
+ * When FSGSBASE enabled, current GSBASE is always copied to %rbx.
+ *
+ * Without FSGSBASE, SWAPGS is needed when entering from userspace.
+ * A positive GSBASE means it is a user value and a negative GSBASE
+ * means it is a kernel value.
+ *
+ * Return:
+ * With FSGSBASE, %rbx has current GSBASE.
+ * Without that,
+ * %ebx=0: need SWAPGS on exit, %ebx=1: otherwise
*/
ENTRY(paranoid_entry)
UNWIND_HINT_FUNC
cld
PUSH_AND_CLEAR_REGS save_ret=1
ENCODE_FRAME_POINTER 8
- movl $1, %ebx
- movl $MSR_GS_BASE, %ecx
- rdmsr
- testl %edx, %edx
- js 1f /* negative -> in kernel */
- SWAPGS
- xorl %ebx, %ebx

1:
/*
@@ -1178,9 +1185,38 @@ ENTRY(paranoid_entry)
* This is also why CS (stashed in the "iret frame" by the
* hardware at entry) can not be used: this may be a return
* to kernel code, but with a user CR3 value.
+ *
+ * As long as this PTI macro doesn't depend on kernel GSBASE,
+ * we can do it early. This is because FIND_PERCPU_BASE
+ * references data in kernel space.
*/
SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14

+ /*
+ * Read GSBASE by RDGSBASE. Kernel GSBASE is found
+ * from the per-CPU offset table with a CPU NR.
+ */
+ ALTERNATIVE "jmp .Lparanoid_entry_no_fsgsbase", "",\
+ X86_FEATURE_FSGSBASE
+ rdgsbase %rbx
+ FIND_PERCPU_BASE %rax
+ wrgsbase %rax
+ ret
+
+.Lparanoid_entry_no_fsgsbase:
+ movl $1, %ebx
+ /*
+ * FSGSBASE is not in use, so depend on the kernel-enforced
+ * convention that a negative GSBASE indicates a kernel value.
+ */
+ READ_MSR_GSBASE save_reg=%edx
+ testl %edx, %edx /* Negative -> in kernel */
+ jns .Lparanoid_entry_swapgs
+ ret
+
+.Lparanoid_entry_swapgs:
+ SWAPGS
+ xorl %ebx, %ebx
ret
END(paranoid_entry)

@@ -1194,12 +1230,21 @@ END(paranoid_entry)
* be complicated. Fortunately, we there's no good reason
* to try to handle preemption here.
*
- * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
+ * On entry,
+ * With FSGSBASE,
+ * %rbx is original GSBASE that needs to be restored on the exit
+ * Without that,
+ * %ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
*/
ENTRY(paranoid_exit)
UNWIND_HINT_REGS
DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF_DEBUG
+ ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase", "nop",\
+ X86_FEATURE_FSGSBASE
+ wrgsbase %rbx
+ jmp .Lparanoid_exit_no_swapgs;
+.Lparanoid_exit_no_fsgsbase:
testl %ebx, %ebx /* swapgs needed? */
jnz .Lparanoid_exit_no_swapgs
TRACE_IRQS_IRETQ
@@ -1212,7 +1257,7 @@ ENTRY(paranoid_exit)
/* Always restore stashed CR3 value (see paranoid_entry) */
RESTORE_CR3 scratch_reg=%rbx save_reg=%r14
.Lparanoid_exit_restore:
- jmp restore_regs_and_return_to_kernel
+ jmp restore_regs_and_return_to_kernel
END(paranoid_exit)

/*
diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index eecca2250748..1cb7b03c107a 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -122,6 +122,15 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);

#endif /* CONFIG_SMP */

+.macro READ_MSR_GSBASE save_reg:req
+ movl $MSR_GS_BASE, %ecx
+ /* Read MSR specified by %ecx into %edx:%eax */
+ rdmsr
+ .ifnc \save_reg, %edx
+ movl %edx, \save_reg
+ .endif
+.endm
+
#endif /* CONFIG_X86_64 */

#endif /* __ASSEMBLY__ */
--
2.19.1


2019-02-01 20:58:50

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 05/13] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions

The helper functions will switch on faster accesses to FSBASE and GSBASE
when the FSGSBASE feature is enabled.

Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
if the user GSBASE is saved at kernel entry, being updated as changes, and
restored back at kernel exit. However, it seems to spend more cycles for
savings and restorations. Little or no benefit was measured from
experiments.

Also, introduce __{rd,wr}gsbase_inactive() as helpers to access user GSBASE
with SWAPGS. Note, for Xen PV, paravirt hooks can be added, since it may
allow a very efficient but different implementation.

[ Use NOKPROBE_SYMBOL instead of __kprobes ]

Signed-off-by: Chang S. Bae <[email protected]>
Cc: Any Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Cooper <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 27 +++++++-------
arch/x86/kernel/process_64.c | 62 +++++++++++++++++++++++++++++++--
2 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index fdd1177499b4..aefd53767a5d 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
}

+#include <asm/cpufeature.h>
+
/* Helper functions for reading/writing FS/GS base */

static inline unsigned long x86_fsbase_read_cpu(void)
{
unsigned long fsbase;

- rdmsrl(MSR_FS_BASE, fsbase);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ fsbase = rdfsbase();
+ else
+ rdmsrl(MSR_FS_BASE, fsbase);

return fsbase;
}

-static inline unsigned long x86_gsbase_read_cpu_inactive(void)
-{
- unsigned long gsbase;
-
- rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
-
- return gsbase;
-}
-
static inline void x86_fsbase_write_cpu(unsigned long fsbase)
{
- wrmsrl(MSR_FS_BASE, fsbase);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ wrfsbase(fsbase);
+ else
+ wrmsrl(MSR_FS_BASE, fsbase);
}

-static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
-{
- wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
-}
+extern unsigned long x86_gsbase_read_cpu_inactive(void);
+extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);

#endif /* CONFIG_X86_64 */

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 6a62f4af9fcf..ebc55ed31fe7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -160,6 +160,42 @@ enum which_selector {
GS
};

+/*
+ * Interrupts are disabled here. Out of line to be protected
+ * from kprobes. It is not used on Xen paravirt. When paravirt
+ * support is needed, it needs to be renamed with native_ prefix.
+ */
+static noinline unsigned long __rdgsbase_inactive(void)
+{
+ unsigned long gsbase, flags;
+
+ local_irq_save(flags);
+ native_swapgs();
+ gsbase = rdgsbase();
+ native_swapgs();
+ local_irq_restore(flags);
+
+ return gsbase;
+}
+NOKPROBE_SYMBOL(__rdgsbase_inactive);
+
+/*
+ * Interrupts are disabled here. Out of line to be protected
+ * from kprobes. It is not used on Xen paravirt. When paravirt
+ * support is needed, it needs to be renamed with native_ prefix.
+ */
+static noinline void __wrgsbase_inactive(unsigned long gsbase)
+{
+ unsigned long flags;
+
+ local_irq_save(flags);
+ native_swapgs();
+ wrgsbase(gsbase);
+ native_swapgs();
+ local_irq_restore(flags);
+}
+NOKPROBE_SYMBOL(__wrgsbase_inactive);
+
/*
* Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
* not available. The goal is to be reasonably fast on non-FSGSBASE systems.
@@ -338,13 +374,34 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
return base;
}

+unsigned long x86_gsbase_read_cpu_inactive(void)
+{
+ unsigned long gsbase;
+
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ gsbase = __rdgsbase_inactive();
+ else
+ rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+
+ return gsbase;
+}
+
+void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
+{
+ if (static_cpu_has(X86_FEATURE_FSGSBASE))
+ __wrgsbase_inactive(gsbase);
+ else
+ wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+}
+
unsigned long x86_fsbase_read_task(struct task_struct *task)
{
unsigned long fsbase;

if (task == current)
fsbase = x86_fsbase_read_cpu();
- else if (task->thread.fsindex == 0)
+ else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+ (task->thread.fsindex == 0))
fsbase = task->thread.fsbase;
else
fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -358,7 +415,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)

if (task == current)
gsbase = x86_gsbase_read_cpu_inactive();
- else if (task->thread.gsindex == 0)
+ else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
+ (task->thread.gsindex == 0))
gsbase = task->thread.gsbase;
else
gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
--
2.19.1


2019-02-01 20:58:56

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 06/13] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on

From: Andy Lutomirski <[email protected]>

With the new FSGSBASE instructions, we can efficiently read and write
the FSBASE and GSBASE in __switch_to(). Use that capability to preserve
the full state.

This will enable user code to do whatever it wants with the new
instructions without any kernel-induced gotchas. (There can still be
architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE
if WRGSBASE was used, but users are expected to read the CPU manual
before doing things like that.)

This is a considerable speedup. It seems to save about 100 cycles
per context switch compared to the baseline 4.6-rc1 behavior on my
Skylake laptop.

[ chang: 5~10% performance improvements were seen by a context switch
benchmark that ran threads with different FS/GSBASE values (to the
baseline 4.16). Minor edit on the changelog. ]

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/kernel/process_64.c | 34 ++++++++++++++++++++++++++++------
1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ebc55ed31fe7..d8ade9530fdb 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -245,8 +245,18 @@ static __always_inline void save_fsgs(struct task_struct *task)
{
savesegment(fs, task->thread.fsindex);
savesegment(gs, task->thread.gsindex);
- save_base_legacy(task, task->thread.fsindex, FS);
- save_base_legacy(task, task->thread.gsindex, GS);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /*
+ * If FSGSBASE is enabled, we can't make any useful guesses
+ * about the base, and user code expects us to save the current
+ * value. Fortunately, reading the base directly is efficient.
+ */
+ task->thread.fsbase = rdfsbase();
+ task->thread.gsbase = __rdgsbase_inactive();
+ } else {
+ save_base_legacy(task, task->thread.fsindex, FS);
+ save_base_legacy(task, task->thread.gsindex, GS);
+ }
}

#if IS_ENABLED(CONFIG_KVM)
@@ -325,10 +335,22 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
static __always_inline void x86_fsgsbase_load(struct thread_struct *prev,
struct thread_struct *next)
{
- load_seg_legacy(prev->fsindex, prev->fsbase,
- next->fsindex, next->fsbase, FS);
- load_seg_legacy(prev->gsindex, prev->gsbase,
- next->gsindex, next->gsbase, GS);
+ if (static_cpu_has(X86_FEATURE_FSGSBASE)) {
+ /* Update the FS and GS selectors if they could have changed. */
+ if (unlikely(prev->fsindex || next->fsindex))
+ loadseg(FS, next->fsindex);
+ if (unlikely(prev->gsindex || next->gsindex))
+ loadseg(GS, next->gsindex);
+
+ /* Update the bases. */
+ wrfsbase(next->fsbase);
+ __wrgsbase_inactive(next->gsbase);
+ } else {
+ load_seg_legacy(prev->fsindex, prev->fsbase,
+ next->fsindex, next->fsbase, FS);
+ load_seg_legacy(prev->gsindex, prev->gsbase,
+ next->gsindex, next->gsbase, GS);
+ }
}

static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
--
2.19.1


2019-02-01 20:59:19

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 02/13] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE

From: Andy Lutomirski <[email protected]>

This is temporary. It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole. Don't do it.

[ chang: Minor fix. Add the TAINT_INSECURE flag. ]

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 +++
arch/x86/kernel/cpu/common.c | 27 +++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d59dff450614..871260e3e832 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2760,6 +2760,9 @@
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.

+ unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be
+ replaced with a nofsgsbase flag.
+
no_console_suspend
[HW] Never suspend the console
Disable suspending of consoles during suspend and
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index cb28e98a0659..6e2cba21328f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -365,6 +365,25 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
cr4_clear_bits(X86_CR4_UMIP);
}

+/*
+ * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
+ * updated. This allows us to get the kernel ready incrementally. Setting
+ * unsafe_fsgsbase and TAINT_INSECURE flags will allow the series to be
+ * bisected if necessary.
+ *
+ * Once all the pieces are in place, these will go away and be replaced with
+ * a nofsgsbase chicken flag.
+ */
+static bool unsafe_fsgsbase;
+
+static __init int setup_unsafe_fsgsbase(char *arg)
+{
+ unsafe_fsgsbase = true;
+ add_taint(TAINT_INSECURE, LOCKDEP_STILL_OK);
+ return 1;
+}
+__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+
/*
* Protection Keys are not available in 32-bit mode.
*/
@@ -1352,6 +1371,14 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_smap(c);
setup_umip(c);

+ /* Enable FSGSBASE instructions if available. */
+ if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
+ if (unsafe_fsgsbase)
+ cr4_set_bits(X86_CR4_FSGSBASE);
+ else
+ clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
+ }
+
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
--
2.19.1


2019-02-01 20:59:25

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 04/13] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

From: Andi Kleen <[email protected]>

Add C intrinsics and assembler macros for the new FSBASE and GSBASE
instructions.

Very straight forward. Used in followon patches.

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

v2: Use __always_inline

[ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
the macros with GAS-compatible ones. ]

If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
here for extra performance.

[ chang: Use FSGSBASE instructions directly. Removed GAS-compatible
macros as the minimum required binutils (v2.21) supports the FSGSBASE
instructions. ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c743de77..fdd1177499b4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);

+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+ unsigned long fsbase;
+
+ asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+ return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+ unsigned long gsbase;
+
+ asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+ return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+ asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+ asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
/* Helper functions for reading/writing FS/GS base */

static inline unsigned long x86_fsbase_read_cpu(void)
--
2.19.1


2019-02-01 20:59:26

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 04/13] x86/fsgsbase/64: Add intrinsics/macros for FSGSBASE instructions

From: Andi Kleen <[email protected]>

Add C intrinsics and assembler macros for the new FSBASE and GSBASE
instructions.

Very straight forward. Used in followon patches.

[ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]

v2: Use __always_inline

[ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
the macros with GAS-compatible ones. ]

If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
here for extra performance.

[ chang: Use FSGSBASE instructions directly ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
index bca4c743de77..fdd1177499b4 100644
--- a/arch/x86/include/asm/fsgsbase.h
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -19,6 +19,36 @@ extern unsigned long x86_gsbase_read_task(struct task_struct *task);
extern void x86_fsbase_write_task(struct task_struct *task, unsigned long fsbase);
extern void x86_gsbase_write_task(struct task_struct *task, unsigned long gsbase);

+/* Must be protected by X86_FEATURE_FSGSBASE check. */
+
+static __always_inline unsigned long rdfsbase(void)
+{
+ unsigned long fsbase;
+
+ asm volatile("rdfsbase %0" : "=r" (fsbase) :: "memory");
+
+ return fsbase;
+}
+
+static __always_inline unsigned long rdgsbase(void)
+{
+ unsigned long gsbase;
+
+ asm volatile("rdgsbase %0" : "=r" (gsbase) :: "memory");
+
+ return gsbase;
+}
+
+static __always_inline void wrfsbase(unsigned long fsbase)
+{
+ asm volatile("wrfsbase %0" :: "r" (fsbase) : "memory");
+}
+
+static __always_inline void wrgsbase(unsigned long gsbase)
+{
+ asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
+}
+
/* Helper functions for reading/writing FS/GS base */

static inline unsigned long x86_fsbase_read_cpu(void)
--
2.19.1


2019-02-01 20:59:39

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 12/13] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2

From: Andi Kleen <[email protected]>

The kernel needs to explicitly enable FSGSBASE. So, the application needs
to know if it can safely use these instructions. Just looking at the CPUID
bit is not enough because it may be running in a kernel that does not
enable the instructions.

One way for the application would be to just try and catch the SIGILL.
But that is difficult to do in libraries which may not want to overwrite
the signal handlers of the main application.

So we need to provide a way for the application to discover the kernel
capability.

I used AT_HWCAP2 in the ELF aux vector which is already used by PPC for
similar things. We define a new Linux defined bitmap returned in AT_HWCAP.
Next to MONITOR/MWAIT, bit 1 is reserved for FSGSBASE capability checks.

The application can then access it manually or using the getauxval()
function in newer glibc.

[ chang: Rebase and edit the changelog accordingly. ]

Signed-off-by: Andi Kleen <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/uapi/asm/hwcap2.h | 3 +++
arch/x86/kernel/cpu/common.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/uapi/asm/hwcap2.h b/arch/x86/include/uapi/asm/hwcap2.h
index 6ebaae90e207..c5ce54e749f6 100644
--- a/arch/x86/include/uapi/asm/hwcap2.h
+++ b/arch/x86/include/uapi/asm/hwcap2.h
@@ -5,4 +5,7 @@
/* MONITOR/MWAIT enabled in Ring 3 */
#define HWCAP2_RING3MWAIT (1 << 0)

+/* Kernel allows FSGSBASE instructions available in Ring 3 */
+#define HWCAP2_FSGSBASE BIT(1)
+
#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3d7d4ca1a29e..3bdac91316c9 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1369,8 +1369,10 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);

/* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE))
+ if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
cr4_set_bits(X86_CR4_FSGSBASE);
+ elf_hwcap2 |= HWCAP2_FSGSBASE;
+ }

/*
* The vendor-specific functions might have changed features.
--
2.19.1


2019-02-01 20:59:42

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v5 11/13] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit

From: Andy Lutomirski <[email protected]>

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.

Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 3 +-
arch/x86/kernel/cpu/common.c | 35 ++++++++-----------
2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 871260e3e832..20ab1ba22a3e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2760,8 +2760,7 @@
no5lvl [X86-64] Disable 5-level paging mode. Forces
kernel to use 4-level paging instead.

- unsafe_fsgsbase [X86] Allow FSGSBASE instructions. This will be
- replaced with a nofsgsbase flag.
+ nofsgsbase [X86] Disables FSGSBASE instructions.

no_console_suspend
[HW] Never suspend the console
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6e2cba21328f..3d7d4ca1a29e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -365,24 +365,21 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c)
cr4_clear_bits(X86_CR4_UMIP);
}

-/*
- * Temporary hack: FSGSBASE is unsafe until a few kernel code paths are
- * updated. This allows us to get the kernel ready incrementally. Setting
- * unsafe_fsgsbase and TAINT_INSECURE flags will allow the series to be
- * bisected if necessary.
- *
- * Once all the pieces are in place, these will go away and be replaced with
- * a nofsgsbase chicken flag.
- */
-static bool unsafe_fsgsbase;
-
-static __init int setup_unsafe_fsgsbase(char *arg)
+static __init int x86_nofsgsbase_setup(char *arg)
{
- unsafe_fsgsbase = true;
- add_taint(TAINT_INSECURE, LOCKDEP_STILL_OK);
+ /* Require an exact match without trailing characters. */
+ if (strlen(arg))
+ return 0;
+
+ /* Do not emit a message if the feature is not present. */
+ if (!boot_cpu_has(X86_FEATURE_FSGSBASE))
+ return 1;
+
+ setup_clear_cpu_cap(X86_FEATURE_FSGSBASE);
+ pr_info("nofsgsbase: FSGSBASE disabled\n");
return 1;
}
-__setup("unsafe_fsgsbase", setup_unsafe_fsgsbase);
+__setup("nofsgsbase", x86_nofsgsbase_setup);

/*
* Protection Keys are not available in 32-bit mode.
@@ -1372,12 +1369,8 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);

/* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
- if (unsafe_fsgsbase)
- cr4_set_bits(X86_CR4_FSGSBASE);
- else
- clear_cpu_cap(c, X86_FEATURE_FSGSBASE);
- }
+ if (cpu_has(c, X86_FEATURE_FSGSBASE))
+ cr4_set_bits(X86_CR4_FSGSBASE);

/*
* The vendor-specific functions might have changed features.
--
2.19.1


2019-02-01 23:03:59

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH v5 00/13] x86: Enable FSGSBASE instructions


Patches all look good to me.

Reviewed-by: Andi Kleen <[email protected]>

-Andi

2019-02-02 02:45:39

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 01/13] taint: Introduce a new taint flag (insecure)

On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
>
> For testing (or root-only) purposes, the new flag will serve to tag the
> kernel taint accurately.
>
> When adding a new feature support, patches need to be incrementally
> applied and tested with temporal parameters. Currently, there is no flag
> for this usage.

I think this should be reviewed by someone like akpm. akpm, for
background, this is part of an x86 patch series. If only part of the
series is applied, the kernel will be blatantly insecure (but still
functional and useful for testing and bisection), and this taint flag
will be set if this kernel is booted. With the whole series applied,
there are no users of the taint flag in the kernel.

Do you think this is a good idea?

2019-02-02 02:46:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 00/13] x86: Enable FSGSBASE instructions

Hi hpa-

A while back, you were working on some patches to make modify_ldt()
play better with this series. What happened to them?

--Andy

2019-02-02 02:46:29

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 03/13] kbuild: Raise the minimum required binutils version to 2.21

On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
>
> It helps to use some new instructions directly in inline assembly.

akpm, can you ack this patch? AFAIK you are the only, or at least
most vocal, user of ancient userspace to build new kernels. Are you
okay with this?

>
> Suggested-by: Andi Kleen <[email protected]>
> Signed-off-by: Chang S. Bae <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Linux Torvalds <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> ---
> Documentation/process/changes.rst | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst
> index 18735dc460a0..0a18075c485e 100644
> --- a/Documentation/process/changes.rst
> +++ b/Documentation/process/changes.rst
> @@ -31,7 +31,7 @@ you probably needn't concern yourself with isdn4k-utils.
> ====================== =============== ========================================
> GNU C 4.6 gcc --version
> GNU make 3.81 make --version
> -binutils 2.20 ld -v
> +binutils 2.21 ld -v
> flex 2.5.35 flex --version
> bison 2.0 bison --version
> util-linux 2.10o fdformat --version
> @@ -77,9 +77,7 @@ You will need GNU make 3.81 or later to build the kernel.
> Binutils
> --------
>
> -The build system has, as of 4.13, switched to using thin archives (`ar T`)
> -rather than incremental linking (`ld -r`) for built-in.a intermediate steps.
> -This requires binutils 2.20 or newer.
> +Binutils 2.21 or newer is needed to build the kernel.
>
> pkg-config
> ----------
> --
> 2.19.1
>

2019-02-02 02:53:34

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 04/13] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions

On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
>
> From: Andi Kleen <[email protected]>
>
> Add C intrinsics and assembler macros for the new FSBASE and GSBASE
> instructions.
>
> Very straight forward. Used in followon patches.
>
> [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and
> make <asm/fsgsbase.h> safe to include on 32-bit kernels. ]
>
> v2: Use __always_inline
>
> [ chang: Revise the changelog. Place them in <asm/fsgsbase.h>. Replace
> the macros with GAS-compatible ones. ]
>
> If GCC supports it, we can add -mfsgsbase to CFLAGS and use the builtins
> here for extra performance

Does it really get better performance? If so, let's do it. If not,
let's remove the comment. And, whatever you do, please put this above
the [luto] and [chang] parts.
.
>
> [ chang: Use FSGSBASE instructions directly. Removed GAS-compatible
> macros as the minimum required binutils (v2.21) supports the FSGSBASE
> instructions. ]

Can you stick the "v2" revision notes below the --- or even just
remove them? It makes the changelog a lot harder to review and it's
not really useful in the git tree.

2019-02-02 02:59:14

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 05/13] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions

On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
>
> The helper functions will switch on faster accesses to FSBASE and GSBASE
> when the FSGSBASE feature is enabled.
>
> Accessing user GSBASE needs a couple of SWAPGS operations. It is avoidable
> if the user GSBASE is saved at kernel entry, being updated as changes, and
> restored back at kernel exit. However, it seems to spend more cycles for
> savings and restorations. Little or no benefit was measured from
> experiments.
>
> Also, introduce __{rd,wr}gsbase_inactive() as helpers to access user GSBASE
> with SWAPGS. Note, for Xen PV, paravirt hooks can be added, since it may
> allow a very efficient but different implementation.
>
> [ Use NOKPROBE_SYMBOL instead of __kprobes ]

^^^ This line looks like it shold be deleted.

>
> Signed-off-by: Chang S. Bae <[email protected]>
> Cc: Any Lutomirski <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andrew Cooper <[email protected]>
> ---
> arch/x86/include/asm/fsgsbase.h | 27 +++++++-------
> arch/x86/kernel/process_64.c | 62 +++++++++++++++++++++++++++++++--
> 2 files changed, 72 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
> index fdd1177499b4..aefd53767a5d 100644
> --- a/arch/x86/include/asm/fsgsbase.h
> +++ b/arch/x86/include/asm/fsgsbase.h
> @@ -49,35 +49,32 @@ static __always_inline void wrgsbase(unsigned long gsbase)
> asm volatile("wrgsbase %0" :: "r" (gsbase) : "memory");
> }
>
> +#include <asm/cpufeature.h>
> +
> /* Helper functions for reading/writing FS/GS base */
>
> static inline unsigned long x86_fsbase_read_cpu(void)
> {
> unsigned long fsbase;
>
> - rdmsrl(MSR_FS_BASE, fsbase);
> + if (static_cpu_has(X86_FEATURE_FSGSBASE))
> + fsbase = rdfsbase();
> + else
> + rdmsrl(MSR_FS_BASE, fsbase);
>
> return fsbase;
> }
>
> -static inline unsigned long x86_gsbase_read_cpu_inactive(void)
> -{
> - unsigned long gsbase;
> -
> - rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> -
> - return gsbase;
> -}
> -
> static inline void x86_fsbase_write_cpu(unsigned long fsbase)
> {
> - wrmsrl(MSR_FS_BASE, fsbase);
> + if (static_cpu_has(X86_FEATURE_FSGSBASE))
> + wrfsbase(fsbase);
> + else
> + wrmsrl(MSR_FS_BASE, fsbase);
> }
>
> -static inline void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
> -{
> - wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> -}
> +extern unsigned long x86_gsbase_read_cpu_inactive(void);
> +extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>
> #endif /* CONFIG_X86_64 */
>
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index 6a62f4af9fcf..ebc55ed31fe7 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -160,6 +160,42 @@ enum which_selector {
> GS
> };
>
> +/*
> + * Interrupts are disabled here. Out of line to be protected
> + * from kprobes. It is not used on Xen paravirt. When paravirt
> + * support is needed, it needs to be renamed with native_ prefix.
> + */
> +static noinline unsigned long __rdgsbase_inactive(void)
> +{
> + unsigned long gsbase, flags;
> +
> + local_irq_save(flags);
> + native_swapgs();
> + gsbase = rdgsbase();
> + native_swapgs();
> + local_irq_restore(flags);
> +
> + return gsbase;
> +}
> +NOKPROBE_SYMBOL(__rdgsbase_inactive);
> +
> +/*
> + * Interrupts are disabled here. Out of line to be protected
> + * from kprobes. It is not used on Xen paravirt. When paravirt
> + * support is needed, it needs to be renamed with native_ prefix.
> + */
> +static noinline void __wrgsbase_inactive(unsigned long gsbase)
> +{
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + native_swapgs();
> + wrgsbase(gsbase);
> + native_swapgs();
> + local_irq_restore(flags);
> +}
> +NOKPROBE_SYMBOL(__wrgsbase_inactive);
> +
> /*
> * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
> * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
> @@ -338,13 +374,34 @@ static unsigned long x86_fsgsbase_read_task(struct task_struct *task,
> return base;
> }
>
> +unsigned long x86_gsbase_read_cpu_inactive(void)
> +{
> + unsigned long gsbase;
> +
> + if (static_cpu_has(X86_FEATURE_FSGSBASE))
> + gsbase = __rdgsbase_inactive();
> + else
> + rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +
> + return gsbase;
> +}
> +
> +void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
> +{
> + if (static_cpu_has(X86_FEATURE_FSGSBASE))
> + __wrgsbase_inactive(gsbase);
> + else
> + wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +}
> +
> unsigned long x86_fsbase_read_task(struct task_struct *task)
> {
> unsigned long fsbase;
>
> if (task == current)
> fsbase = x86_fsbase_read_cpu();
> - else if (task->thread.fsindex == 0)
> + else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
> + (task->thread.fsindex == 0))
> fsbase = task->thread.fsbase;
> else
> fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
> @@ -358,7 +415,8 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
>
> if (task == current)
> gsbase = x86_gsbase_read_cpu_inactive();
> - else if (task->thread.gsindex == 0)
> + else if (static_cpu_has(X86_FEATURE_FSGSBASE) ||
> + (task->thread.gsindex == 0))
> gsbase = task->thread.gsbase;
> else
> gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);

These last two hunks changes do not belong in this patch. Presumably
they belong in patch 6.

--Andy



> --
> 2.19.1
>

2019-02-02 17:18:20

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v5 08/13] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro

On Fri, Feb 1, 2019 at 12:55 PM Chang S. Bae <[email protected]> wrote:
>
> GSBASE is used to find per-CPU data in the kernel. But when it is unknown,
> the per-CPU base can be found from the per_cpu_offset table with a CPU NR.
> The CPU NR is extracted from the limit field of the CPUNODE entry in GDT,
> or by the RDPID instruction.
>
> Also, add the GAS-compatible RDPID macro.
>
> The new macro will be used on a following patch.
>
> Suggested-by: H. Peter Anvin <[email protected]>
> Signed-off-by: Chang S. Bae <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Andi Kleen <[email protected]>
> ---
> arch/x86/include/asm/fsgsbase.h | 46 +++++++++++++++++++++++++++++++++
> arch/x86/include/asm/inst.h | 15 +++++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
> index aefd53767a5d..eecca2250748 100644
> --- a/arch/x86/include/asm/fsgsbase.h
> +++ b/arch/x86/include/asm/fsgsbase.h
> @@ -78,6 +78,52 @@ extern void x86_gsbase_write_cpu_inactive(unsigned long gsbase);
>
> #endif /* CONFIG_X86_64 */
>
> +#else /* __ASSEMBLY__ */
> +
> +#ifdef CONFIG_X86_64
> +
> +#include <asm/inst.h>
> +
> +#if CONFIG_SMP

ifdef?

> +
> +/*
> + * CPU/node NR is loaded from the limit (size) field of a special segment
> + * descriptor entry in GDT.
> + */
> +.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
> + movq $__CPUNODE_SEG, \reg
> + lsl \reg, \reg
> +.endm
> +

Please put the alternative in here instead of in FIND_PERCPU_BASE.

> +/*
> + * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
> + * We normally use %gs for accessing per-CPU data, but we are setting up
> + * %gs here and obviously can not use %gs itself to access per-CPU data.
> + */
> +.macro FIND_PERCPU_BASE reg:req
> + /*
> + * The CPU/node NR is initialized earlier, directly in cpu_init().
> + * The CPU NR is extracted from it.
> + */

This comment is unnecessary.

> + ALTERNATIVE \
> + "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
> + "RDPID \reg", \
> + X86_FEATURE_RDPID
> + andq $VDSO_CPUNODE_MASK, \reg
> + movq __per_cpu_offset(, \reg, 8), \reg
> +.endm
> +
> +#else
> +
> +.macro FIND_PERCPU_BASE reg:req
> + /* Tracking the base offset value */

I don't understand this comment at all. Please just remove it.

2019-02-05 00:08:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v5 03/13] kbuild: Raise the minimum required binutils version to 2.21

On Fri, 1 Feb 2019 18:45:44 -0800 Andy Lutomirski <[email protected]> wrote:

> On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
> >
> > It helps to use some new instructions directly in inline assembly.
>
> akpm, can you ack this patch? AFAIK you are the only, or at least
> most vocal, user of ancient userspace to build new kernels. Are you
> okay with this?

Acked-by: Andrew Morton <[email protected]>

That little initiative got beaten into submission :(

2019-02-05 06:39:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v5 00/13] x86: Enable FSGSBASE instructions

On February 1, 2019 6:43:25 PM PST, Andy Lutomirski <[email protected]> wrote:
>Hi hpa-
>
>A while back, you were working on some patches to make modify_ldt()
>play better with this series. What happened to them?
>
>--Andy

Looks like I need to dig them out...
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2019-02-05 21:22:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v5 01/13] taint: Introduce a new taint flag (insecure)

On Fri, 1 Feb 2019 18:42:29 -0800 Andy Lutomirski <[email protected]> wrote:

> On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
> >
> > For testing (or root-only) purposes, the new flag will serve to tag the
> > kernel taint accurately.
> >
> > When adding a new feature support, patches need to be incrementally
> > applied and tested with temporal parameters. Currently, there is no flag
> > for this usage.
>
> I think this should be reviewed by someone like akpm. akpm, for
> background, this is part of an x86 patch series. If only part of the
> series is applied, the kernel will be blatantly insecure (but still
> functional and useful for testing and bisection), and this taint flag
> will be set if this kernel is booted. With the whole series applied,
> there are no users of the taint flag in the kernel.
>
> Do you think this is a good idea?

What does "temporal parameters" mean? A complete description of this
testing process would help.

I sounds a bit strange. You mean it assumes that people will partially
apply the series to test its functionality? That would be inconvenient.

- Can the new and now-unused taint flag be removed again at
end-of-series?

- It would be a lot more convenient if we had some means of testing
after the whole series is applied, on a permanent basis - some
debugfs flag, perhaps?

2019-02-05 22:47:12

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v5 01/13] taint: Introduce a new taint flag (insecure)

On 2/5/19 1:21 PM, Andrew Morton wrote:
> On Fri, 1 Feb 2019 18:42:29 -0800 Andy Lutomirski <[email protected]> wrote:
>
>> On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae <[email protected]> wrote:
>>>
>>> For testing (or root-only) purposes, the new flag will serve to tag the
>>> kernel taint accurately.
>>>
>>> When adding a new feature support, patches need to be incrementally
>>> applied and tested with temporal parameters. Currently, there is no flag
>>> for this usage.
>>
>> I think this should be reviewed by someone like akpm. akpm, for
>> background, this is part of an x86 patch series. If only part of the
>> series is applied, the kernel will be blatantly insecure (but still
>> functional and useful for testing and bisection), and this taint flag
>> will be set if this kernel is booted. With the whole series applied,
>> there are no users of the taint flag in the kernel.
>>
>> Do you think this is a good idea?
>
> What does "temporal parameters" mean? A complete description of this
> testing process would help.
>
> I sounds a bit strange. You mean it assumes that people will partially
> apply the series to test its functionality? That would be inconvenient.

Ack. I don't think we need to (or should) worry about that kind of
muckup.

> - Can the new and now-unused taint flag be removed again at
> end-of-series?
>
> - It would be a lot more convenient if we had some means of testing
> after the whole series is applied, on a permanent basis - some
> debugfs flag, perhaps?
>


--
~Randy

2019-02-05 23:46:14

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v5 01/13] taint: Introduce a new taint flag (insecure)

On February 5, 2019 2:46:11 PM PST, Randy Dunlap <[email protected]> wrote:
>On 2/5/19 1:21 PM, Andrew Morton wrote:
>> On Fri, 1 Feb 2019 18:42:29 -0800 Andy Lutomirski <[email protected]>
>wrote:
>>
>>> On Fri, Feb 1, 2019 at 12:54 PM Chang S. Bae
><[email protected]> wrote:
>>>>
>>>> For testing (or root-only) purposes, the new flag will serve to tag
>the
>>>> kernel taint accurately.
>>>>
>>>> When adding a new feature support, patches need to be incrementally
>>>> applied and tested with temporal parameters. Currently, there is no
>flag
>>>> for this usage.
>>>
>>> I think this should be reviewed by someone like akpm. akpm, for
>>> background, this is part of an x86 patch series. If only part of
>the
>>> series is applied, the kernel will be blatantly insecure (but still
>>> functional and useful for testing and bisection), and this taint
>flag
>>> will be set if this kernel is booted. With the whole series
>applied,
>>> there are no users of the taint flag in the kernel.
>>>
>>> Do you think this is a good idea?
>>
>> What does "temporal parameters" mean? A complete description of this
>> testing process would help.
>>
>> I sounds a bit strange. You mean it assumes that people will
>partially
>> apply the series to test its functionality? That would be
>inconvenient.
>
>Ack. I don't think we need to (or should) worry about that kind of
>muckup.
>
>> - Can the new and now-unused taint flag be removed again at
>> end-of-series?
>>
>> - It would be a lot more convenient if we had some means of testing
>> after the whole series is applied, on a permanent basis - some
>> debugfs flag, perhaps?
>>

I would like to see this taint flag, though, because sometimes it is useful to write test modules (e.g. when I was testing SMAP) which are dangerous even if out of tree.

In case of an escape or pilot error gets it into the wrong kernel, it is a very good thing to have the kernel flagged.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2019-02-14 00:21:23

by Chang S. Bae

[permalink] [raw]
Subject: Re: [PATCH v5 08/13] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro


> On Feb 2, 2019, at 09:17, Andy Lutomirski <[email protected]> wrote:
>
> On Fri, Feb 1, 2019 at 12:55 PM Chang S. Bae <[email protected]> wrote:
>>
>> +
>> +/*
>> + * CPU/node NR is loaded from the limit (size) field of a special segment
>> + * descriptor entry in GDT.
>> + */
>> +.macro LOAD_CPU_AND_NODE_SEG_LIMIT reg:req
>> + movq $__CPUNODE_SEG, \reg
>> + lsl \reg, \reg
>> +.endm
>> +
>
> Please put the alternative in here instead of in FIND_PERCPU_BASE.

I would like to apply your comment though, build errors will come up due to the
__CPUNODE_SEG. So, still hoping the alternative line below is straightforward enough.
Will send out a revision, that reflects all other comments, shortly.

Chang

>
>> +/*
>> + * Fetch the per-CPU GSBASE value for this processor and put it in @reg.
>> + * We normally use %gs for accessing per-CPU data, but we are setting up
>> + * %gs here and obviously can not use %gs itself to access per-CPU data.
>> + */
>> +.macro FIND_PERCPU_BASE reg:req
>>
>> + ALTERNATIVE \
>> + "LOAD_CPU_AND_NODE_SEG_LIMIT \reg", \
>> + "RDPID \reg", \
>> + X86_FEATURE_RDPID
>> + andq $VDSO_CPUNODE_MASK, \reg
>> + movq __per_cpu_offset(, \reg, 8), \reg
>> +.endm