2018-06-20 23:18:56

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 0/7] x86: infrastructure to enable FSGSBASE

Given feedbacks from [1], it was suggested to separate two parts
and to (re-)submit this patchset first.

To facilitate FSGSBASE, helper functions and refactoring work
are incorporated. Also, it includes Andy's fix for accurate
FS/GS base read and cleanup for the vDSO initialization.

Changes from V3 [4]:
* Unify CPU number initialization
* Rebase on 4.18-rc1

Changes from V2 [3]:
* Bisect the CPU number initialization patch
* Drop patches for introducing i386 CPU_NUMBER and switching
write_rdtscp_aux() to use wrmsr_safe()

Changes from V1 [2]:
* Rename the x86-64 CPU_NUMBER segment from PER_CPU
* Add i386 CPU_NUMBER equivalent to x86-64 at GDT entry 23
* Add additional helper function to store CPU number
* Switch write_rdtscp_aux() to use wrmsr_safe()

[1] FSGSBASE patch set V2: https://lkml.org/lkml/2018/5/31/686
[2] infrastructure for FSGSBASE V1: https://lkml.org/lkml/2018/6/4/887
[3] V2: https://lkml.org/lkml/2018/6/6/582
[4] V3: https://lkml.org/lkml/2018/6/7/975

Andy Lutomirski (1):
x86/fsgsbase/64: Make ptrace read FS/GS base accurately

Chang S. Bae (6):
x86/fsgsbase/64: Introduce FS/GS base helper functions
x86/fsgsbase/64: Use FS/GS base helpers in core dump
x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to
x86/segments/64: Rename PER_CPU segment to CPU_NUMBER
x86/vdso: Introduce CPU number helper functions
x86/vdso: Move out the CPU number store

arch/x86/entry/vdso/vgetcpu.c | 4 +-
arch/x86/entry/vdso/vma.c | 38 +--------
arch/x86/include/asm/elf.h | 6 +-
arch/x86/include/asm/fsgsbase.h | 47 +++++++++++
arch/x86/include/asm/segment.h | 33 +++++++-
arch/x86/include/asm/vgtod.h | 4 +-
arch/x86/kernel/cpu/common.c | 28 +++++++
arch/x86/kernel/process_64.c | 181 +++++++++++++++++++++++++++++++---------
arch/x86/kernel/ptrace.c | 28 ++-----
9 files changed, 263 insertions(+), 106 deletions(-)
create mode 100644 arch/x86/include/asm/fsgsbase.h

--
2.7.4



2018-06-20 23:16:36

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 4/7] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to

Instead of open coding the calls to load_seg_legacy(), add a
load_fsgs() helper to handle fs and gs. When FSGSBASE is enabled,
load_fsgs() will be updated.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
---
arch/x86/kernel/process_64.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index e498671..cebf240 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -279,6 +279,15 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
}
}

+static __always_inline void load_fsgs(struct thread_struct *prev,
+ struct thread_struct *next)
+{
+ load_seg_legacy(prev->fsindex, prev->fsbase,
+ next->fsindex, next->fsbase, FS);
+ load_seg_legacy(prev->gsindex, prev->gsbase,
+ next->gsindex, next->gsbase, GS);
+}
+
static unsigned long task_seg_base(struct task_struct *task,
unsigned short selector)
{
@@ -588,10 +597,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
if (unlikely(next->ds | prev->ds))
loadsegment(ds, next->ds);

- load_seg_legacy(prev->fsindex, prev->fsbase,
- next->fsindex, next->fsbase, FS);
- load_seg_legacy(prev->gsindex, prev->gsbase,
- next->gsindex, next->gsbase, GS);
+ load_fsgs(prev, next);

switch_fpu_finish(next_fpu, cpu);

--
2.7.4


2018-06-20 23:16:52

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 2/7] x86/fsgsbase/64: Make ptrace read FS/GS base accurately

From: Andy Lutomirski <[email protected]>

ptrace can read FS/GS base using the register access API
(PTRACE_PEEKUSER, etc) or PTRACE_ARCH_PRCTL. Make both of these
mechanisms return the actual FS/GS base.

This will improve debuggability by providing the correct information
to ptracer (GDB and etc).

Signed-off-by: Andy Lutomirski <[email protected]>
[chang: Rebase and revise patch description]
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/kernel/process_64.c | 67 +++++++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index ace0158..e498671 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -279,6 +279,49 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
}
}

+static unsigned long task_seg_base(struct task_struct *task,
+ unsigned short selector)
+{
+ unsigned short idx = selector >> 3;
+ unsigned long base;
+
+ if (likely((selector & SEGMENT_TI_MASK) == 0)) {
+ if (unlikely(idx >= GDT_ENTRIES))
+ return 0;
+
+ /*
+ * There are no user segments in the GDT with nonzero bases
+ * other than the TLS segments.
+ */
+ if (idx < GDT_ENTRY_TLS_MIN || idx > GDT_ENTRY_TLS_MAX)
+ return 0;
+
+ idx -= GDT_ENTRY_TLS_MIN;
+ base = get_desc_base(&task->thread.tls_array[idx]);
+ } else {
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+ struct ldt_struct *ldt;
+
+ /*
+ * If performance here mattered, we could protect the LDT
+ * with RCU. This is a slow path, though, so we can just
+ * take the mutex.
+ */
+ mutex_lock(&task->mm->context.lock);
+ ldt = task->mm->context.ldt;
+ if (unlikely(idx >= ldt->nr_entries))
+ base = 0;
+ else
+ base = get_desc_base(ldt->entries + idx);
+ mutex_unlock(&task->mm->context.lock);
+#else
+ base = 0;
+#endif
+ }
+
+ return base;
+}
+
void write_fsbase(unsigned long fsbase)
{
/* set the selector to 0 to not confuse __switch_to */
@@ -297,16 +340,12 @@ unsigned long read_task_fsbase(struct task_struct *task)
{
unsigned long fsbase;

- if (task == current) {
+ if (task == current)
fsbase = read_fsbase();
- } else {
- /*
- * XXX: This will not behave as expected if called
- * if fsindex != 0. This preserves an existing bug
- * that will be fixed.
- */
+ else if (task->thread.fsindex == 0)
fsbase = task->thread.fsbase;
- }
+ else
+ fsbase = task_seg_base(task, task->thread.fsindex);

return fsbase;
}
@@ -315,16 +354,12 @@ unsigned long read_task_gsbase(struct task_struct *task)
{
unsigned long gsbase;

- if (task == current) {
+ if (task == current)
gsbase = read_inactive_gsbase();
- } else {
- /*
- * XXX: This will not behave as expected if called
- * if gsindex != 0. Same bug preservation as above
- * read_task_fsbase.
- */
+ else if (task->thread.gsindex == 0)
gsbase = task->thread.gsbase;
- }
+ else
+ gsbase = task_seg_base(task, task->thread.gsindex);

return gsbase;
}
--
2.7.4


2018-06-20 23:17:11

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions

With new helpers, FS/GS base access is centralized.
Eventually, when FSGSBASE instruction enabled, it will
be faster.

The helpers are used on ptrace APIs (PTRACE_ARCH_PRCTL,
PTRACE_SETREG, PTRACE_GETREG, etc). Idea is to keep
the FS/GS-update mechanism organized.

"inactive" GS base refers to base backed up at kernel
entries and of inactive (user) task's.

The bug that returns stale FS/GS base value (when index
is nonzero) is preserved and will be fixed by next
patch.

Based-on-code-from: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fsgsbase.h | 47 ++++++++++++++
arch/x86/kernel/process_64.c | 132 +++++++++++++++++++++++++++++-----------
arch/x86/kernel/ptrace.c | 28 +++------
3 files changed, 153 insertions(+), 54 deletions(-)
create mode 100644 arch/x86/include/asm/fsgsbase.h

diff --git a/arch/x86/include/asm/fsgsbase.h b/arch/x86/include/asm/fsgsbase.h
new file mode 100644
index 0000000..f00a8a6
--- /dev/null
+++ b/arch/x86/include/asm/fsgsbase.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_FSGSBASE_H
+#define _ASM_FSGSBASE_H 1
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
+
+#include <asm/msr-index.h>
+
+/*
+ * Read/write a task's fsbase or gsbase. This returns the value that
+ * the FS/GS base would have (if the task were to be resumed). These
+ * work on current or on a different non-running task.
+ */
+unsigned long read_task_fsbase(struct task_struct *task);
+unsigned long read_task_gsbase(struct task_struct *task);
+int write_task_fsbase(struct task_struct *task, unsigned long fsbase);
+int write_task_gsbase(struct task_struct *task, unsigned long gsbase);
+
+/* Helper functions for reading/writing FS/GS base */
+
+static inline unsigned long read_fsbase(void)
+{
+ unsigned long fsbase;
+
+ rdmsrl(MSR_FS_BASE, fsbase);
+ return fsbase;
+}
+
+void write_fsbase(unsigned long fsbase);
+
+static inline unsigned long read_inactive_gsbase(void)
+{
+ unsigned long gsbase;
+
+ rdmsrl(MSR_KERNEL_GS_BASE, gsbase);
+ return gsbase;
+}
+
+void write_inactive_gsbase(unsigned long gsbase);
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_FSGSBASE_H */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445..ace0158 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -54,6 +54,7 @@
#include <asm/vdso.h>
#include <asm/intel_rdt_sched.h>
#include <asm/unistd.h>
+#include <asm/fsgsbase.h>
#ifdef CONFIG_IA32_EMULATION
/* Not included via unistd.h */
#include <asm/unistd_32_ia32.h>
@@ -278,6 +279,94 @@ static __always_inline void load_seg_legacy(unsigned short prev_index,
}
}

+void write_fsbase(unsigned long fsbase)
+{
+ /* set the selector to 0 to not confuse __switch_to */
+ loadseg(FS, 0);
+ wrmsrl(MSR_FS_BASE, fsbase);
+}
+
+void write_inactive_gsbase(unsigned long gsbase)
+{
+ /* set the selector to 0 to not confuse __switch_to */
+ loadseg(GS, 0);
+ wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
+}
+
+unsigned long read_task_fsbase(struct task_struct *task)
+{
+ unsigned long fsbase;
+
+ if (task == current) {
+ fsbase = read_fsbase();
+ } else {
+ /*
+ * XXX: This will not behave as expected if called
+ * if fsindex != 0. This preserves an existing bug
+ * that will be fixed.
+ */
+ fsbase = task->thread.fsbase;
+ }
+
+ return fsbase;
+}
+
+unsigned long read_task_gsbase(struct task_struct *task)
+{
+ unsigned long gsbase;
+
+ if (task == current) {
+ gsbase = read_inactive_gsbase();
+ } else {
+ /*
+ * XXX: This will not behave as expected if called
+ * if gsindex != 0. Same bug preservation as above
+ * read_task_fsbase.
+ */
+ gsbase = task->thread.gsbase;
+ }
+
+ return gsbase;
+}
+
+int write_task_fsbase(struct task_struct *task, unsigned long fsbase)
+{
+ int cpu;
+
+ /*
+ * Not strictly needed for fs, but do it for symmetry
+ * with gs
+ */
+ if (unlikely(fsbase >= TASK_SIZE_MAX))
+ return -EPERM;
+
+ cpu = get_cpu();
+ task->thread.fsbase = fsbase;
+ if (task == current)
+ write_fsbase(fsbase);
+ task->thread.fsindex = 0;
+ put_cpu();
+
+ return 0;
+}
+
+int write_task_gsbase(struct task_struct *task, unsigned long gsbase)
+{
+ int cpu;
+
+ if (unlikely(gsbase >= TASK_SIZE_MAX))
+ return -EPERM;
+
+ cpu = get_cpu();
+ task->thread.gsbase = gsbase;
+ if (task == current)
+ write_inactive_gsbase(gsbase);
+ task->thread.gsindex = 0;
+ put_cpu();
+
+ return 0;
+}
+
int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
{
@@ -618,54 +707,27 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
{
int ret = 0;
- int doit = task == current;
- int cpu;

switch (option) {
- case ARCH_SET_GS:
- if (arg2 >= TASK_SIZE_MAX)
- return -EPERM;
- cpu = get_cpu();
- task->thread.gsindex = 0;
- task->thread.gsbase = arg2;
- if (doit) {
- load_gs_index(0);
- ret = wrmsrl_safe(MSR_KERNEL_GS_BASE, arg2);
- }
- put_cpu();
+ case ARCH_SET_GS: {
+ ret = write_task_gsbase(task, arg2);
break;
- case ARCH_SET_FS:
- /* Not strictly needed for fs, but do it for symmetry
- with gs */
- if (arg2 >= TASK_SIZE_MAX)
- return -EPERM;
- cpu = get_cpu();
- task->thread.fsindex = 0;
- task->thread.fsbase = arg2;
- if (doit) {
- /* set the selector to 0 to not confuse __switch_to */
- loadsegment(fs, 0);
- ret = wrmsrl_safe(MSR_FS_BASE, arg2);
- }
- put_cpu();
+ }
+ case ARCH_SET_FS: {
+ ret = write_task_fsbase(task, arg2);
break;
+ }
case ARCH_GET_FS: {
unsigned long base;

- if (doit)
- rdmsrl(MSR_FS_BASE, base);
- else
- base = task->thread.fsbase;
+ base = read_task_fsbase(task);
ret = put_user(base, (unsigned long __user *)arg2);
break;
}
case ARCH_GET_GS: {
unsigned long base;

- if (doit)
- rdmsrl(MSR_KERNEL_GS_BASE, base);
- else
- base = task->thread.gsbase;
+ base = read_task_gsbase(task);
ret = put_user(base, (unsigned long __user *)arg2);
break;
}
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index e2ee403..c53c2bcf6 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -39,6 +39,7 @@
#include <asm/hw_breakpoint.h>
#include <asm/traps.h>
#include <asm/syscall.h>
+#include <asm/fsgsbase.h>

#include "tls.h"

@@ -396,12 +397,11 @@ static int putreg(struct task_struct *child,
if (value >= TASK_SIZE_MAX)
return -EIO;
/*
- * When changing the segment base, use do_arch_prctl_64
- * to set either thread.fs or thread.fsindex and the
- * corresponding GDT slot.
+ * When changing the FS base, use the same
+ * mechanism as for do_arch_prctl_64
*/
if (child->thread.fsbase != value)
- return do_arch_prctl_64(child, ARCH_SET_FS, value);
+ return write_task_fsbase(child, value);
return 0;
case offsetof(struct user_regs_struct,gs_base):
/*
@@ -410,7 +410,7 @@ static int putreg(struct task_struct *child,
if (value >= TASK_SIZE_MAX)
return -EIO;
if (child->thread.gsbase != value)
- return do_arch_prctl_64(child, ARCH_SET_GS, value);
+ return write_task_gsbase(child, value);
return 0;
#endif
}
@@ -434,20 +434,10 @@ static unsigned long getreg(struct task_struct *task, unsigned long offset)
return get_flags(task);

#ifdef CONFIG_X86_64
- case offsetof(struct user_regs_struct, fs_base): {
- /*
- * XXX: This will not behave as expected if called on
- * current or if fsindex != 0.
- */
- return task->thread.fsbase;
- }
- case offsetof(struct user_regs_struct, gs_base): {
- /*
- * XXX: This will not behave as expected if called on
- * current or if fsindex != 0.
- */
- return task->thread.gsbase;
- }
+ case offsetof(struct user_regs_struct, fs_base):
+ return read_task_fsbase(task);
+ case offsetof(struct user_regs_struct, gs_base):
+ return read_task_gsbase(task);
#endif
}

--
2.7.4


2018-06-20 23:17:23

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 6/7] x86/vdso: Introduce CPU number helper functions

CPU number initialization in vDSO is now a bit cleaned up by
the new helper functions. The helper functions will take
care of combing CPU and node number and reading each from
the combined value.

Suggested-by: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vgetcpu.c | 4 ++--
arch/x86/entry/vdso/vma.c | 16 ++++++++--------
arch/x86/include/asm/segment.h | 28 ++++++++++++++++++++++++++++
arch/x86/include/asm/vgtod.h | 2 --
4 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
index 8ec3d1f..3284069 100644
--- a/arch/x86/entry/vdso/vgetcpu.c
+++ b/arch/x86/entry/vdso/vgetcpu.c
@@ -18,9 +18,9 @@ __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
p = __getcpu();

if (cpu)
- *cpu = p & VGETCPU_CPU_MASK;
+ *cpu = lsl_tscp_to_cpu(p);
if (node)
- *node = p >> 12;
+ *node = lsl_tscp_to_node(p);
return 0;
}

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 833e229..1fc93da 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -340,19 +340,19 @@ static void vgetcpu_cpu_init(void *arg)
int cpu = smp_processor_id();
struct desc_struct d = { };
unsigned long node = 0;
+ unsigned long cpu_number = 0;
#ifdef CONFIG_NUMA
node = cpu_to_node(cpu);
#endif
+ cpu_number = make_lsl_tscp(cpu, node);
+
if (static_cpu_has(X86_FEATURE_RDTSCP))
- write_rdtscp_aux((node << 12) | cpu);
+ write_rdtscp_aux(cpu_number);
+
+ /* Store cpu number in limit */
+ d.limit0 = cpu_number;
+ d.limit1 = cpu_number >> 16;

- /*
- * Store cpu number in limit so that it can be loaded
- * quickly in user space in vgetcpu. (12 bits for the CPU
- * and 8 bits for the node)
- */
- d.limit0 = cpu | ((node & 0xf) << 12);
- d.limit1 = node >> 4;
d.type = 5; /* RO data, expand down, accessed */
d.dpl = 3; /* Visible to user code */
d.s = 1; /* Not a system segment */
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index e3e788ea..18fec9e 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -224,6 +224,34 @@
#define GDT_ENTRY_TLS_ENTRIES 3
#define TLS_SIZE (GDT_ENTRY_TLS_ENTRIES* 8)

+#ifdef CONFIG_X86_64
+
+/* Bit size and mask of CPU number stored in the per CPU data (and TSC_AUX) */
+#define LSL_TSCP_CPU_SIZE 12
+#define LSL_TSCP_CPU_MASK 0xfff
+
+#ifndef __ASSEMBLY__
+
+/* Helper functions to store/load CPU and node numbers */
+
+static inline unsigned long make_lsl_tscp(int cpu, unsigned long node)
+{
+ return ((node << LSL_TSCP_CPU_SIZE) | cpu);
+}
+
+static inline unsigned int lsl_tscp_to_cpu(unsigned long x)
+{
+ return (x & LSL_TSCP_CPU_MASK);
+}
+
+static inline unsigned int lsl_tscp_to_node(unsigned long x)
+{
+ return (x >> LSL_TSCP_CPU_SIZE);
+}
+
+#endif /* !__ASSEMBLY__ */
+#endif /* CONFIG_X86_64 */
+
#ifdef __KERNEL__

/*
diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index 9cd9036..24e69b3 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -79,8 +79,6 @@ static inline void gtod_write_end(struct vsyscall_gtod_data *s)

#ifdef CONFIG_X86_64

-#define VGETCPU_CPU_MASK 0xfff
-
static inline unsigned int __getcpu(void)
{
unsigned int p;
--
2.7.4


2018-06-20 23:17:56

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 5/7] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER

64-bit doesn't use the entry for per CPU data, but for CPU
numbers. The change will clarify the real usage of this
entry in GDT.

Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
---
arch/x86/entry/vdso/vma.c | 5 ++++-
arch/x86/include/asm/segment.h | 5 ++---
arch/x86/include/asm/vgtod.h | 2 +-
3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 5b8b556..833e229 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -359,7 +359,10 @@ static void vgetcpu_cpu_init(void *arg)
d.p = 1; /* Present */
d.d = 1; /* 32-bit */

- write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
+ write_gdt_entry(get_cpu_gdt_rw(cpu),
+ GDT_ENTRY_CPU_NUMBER,
+ &d,
+ DESCTYPE_S);
}

static int vgetcpu_online(unsigned int cpu)
diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h
index e293c12..e3e788ea 100644
--- a/arch/x86/include/asm/segment.h
+++ b/arch/x86/include/asm/segment.h
@@ -186,8 +186,7 @@
#define GDT_ENTRY_TLS_MIN 12
#define GDT_ENTRY_TLS_MAX 14

-/* Abused to load per CPU data from limit */
-#define GDT_ENTRY_PER_CPU 15
+#define GDT_ENTRY_CPU_NUMBER 15

/*
* Number of entries in the GDT table:
@@ -207,7 +206,7 @@
#define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS*8 + 3)
#define __USER32_DS __USER_DS
#define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8 + 3)
-#define __PER_CPU_SEG (GDT_ENTRY_PER_CPU*8 + 3)
+#define __CPU_NUMBER_SEG (GDT_ENTRY_CPU_NUMBER*8 + 3)

#endif

diff --git a/arch/x86/include/asm/vgtod.h b/arch/x86/include/asm/vgtod.h
index fb856c9..9cd9036 100644
--- a/arch/x86/include/asm/vgtod.h
+++ b/arch/x86/include/asm/vgtod.h
@@ -96,7 +96,7 @@ static inline unsigned int __getcpu(void)
alternative_io ("lsl %[p],%[seg]",
".byte 0xf3,0x0f,0xc7,0xf8", /* RDPID %eax/rax */
X86_FEATURE_RDPID,
- [p] "=a" (p), [seg] "r" (__PER_CPU_SEG));
+ [p] "=a" (p), [seg] "r" (__CPU_NUMBER_SEG));

return p;
}
--
2.7.4


2018-06-20 23:18:25

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

The CPU (and node) number will be written, as early enough,
to the segment limit of per CPU data and TSC_AUX MSR entry.
The information has been retrieved by vgetcpu in user space
and will be also loaded from the paranoid entry, when
FSGSBASE enabled.

CPU number initialization will be done during each CPU
initialization, before setting up IST.

The redundant setting of the segment in entry/vdso/vma.c
was removed; a substantial code removal. It removes a
hotplug notifier, makes a facility useful to both the kernel
and userspace unconditionally available much sooner.
(Thanks to HPA for suggesting the cleanup)

Signed-off-by: Chang S. Bae <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/entry/vdso/vma.c | 41 +----------------------------------------
arch/x86/kernel/cpu/common.c | 28 ++++++++++++++++++++++++++++
2 files changed, 29 insertions(+), 40 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index 1fc93da..3f9d43f 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -332,43 +332,6 @@ static __init int vdso_setup(char *s)
return 0;
}
__setup("vdso=", vdso_setup);
-#endif
-
-#ifdef CONFIG_X86_64
-static void vgetcpu_cpu_init(void *arg)
-{
- int cpu = smp_processor_id();
- struct desc_struct d = { };
- unsigned long node = 0;
- unsigned long cpu_number = 0;
-#ifdef CONFIG_NUMA
- node = cpu_to_node(cpu);
-#endif
- cpu_number = make_lsl_tscp(cpu, node);
-
- if (static_cpu_has(X86_FEATURE_RDTSCP))
- write_rdtscp_aux(cpu_number);
-
- /* Store cpu number in limit */
- d.limit0 = cpu_number;
- d.limit1 = cpu_number >> 16;
-
- d.type = 5; /* RO data, expand down, accessed */
- d.dpl = 3; /* Visible to user code */
- d.s = 1; /* Not a system segment */
- d.p = 1; /* Present */
- d.d = 1; /* 32-bit */
-
- write_gdt_entry(get_cpu_gdt_rw(cpu),
- GDT_ENTRY_CPU_NUMBER,
- &d,
- DESCTYPE_S);
-}
-
-static int vgetcpu_online(unsigned int cpu)
-{
- return smp_call_function_single(cpu, vgetcpu_cpu_init, NULL, 1);
-}

static int __init init_vdso(void)
{
@@ -378,9 +341,7 @@ static int __init init_vdso(void)
init_vdso_image(&vdso_image_x32);
#endif

- /* notifier priority > KVM */
- return cpuhp_setup_state(CPUHP_AP_X86_VDSO_VMA_ONLINE,
- "x86/vdso/vma:online", vgetcpu_online, NULL);
+ return 0;
}
subsys_initcall(init_vdso);
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0df7151..a554e4a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1648,6 +1648,33 @@ static void wait_for_master_cpu(int cpu)
#endif
}

+#ifdef CONFIG_X86_64
+static void setup_cpu_number(int cpu)
+{
+ unsigned long node = early_cpu_to_node(cpu);
+ unsigned long cpu_number = make_lsl_tscp(cpu, node);
+ struct desc_struct d = { };
+
+ /* Store cpu number in limit */
+ d.limit0 = cpu_number;
+ d.limit1 = cpu_number >> 16;
+
+ d.type = 5; /* RO data, expand down, accessed */
+ d.dpl = 3; /* Visible to user code */
+ d.s = 1; /* Not a system segment */
+ d.p = 1; /* Present */
+ d.d = 1; /* 32-bit */
+
+ write_gdt_entry(get_cpu_gdt_rw(cpu),
+ GDT_ENTRY_CPU_NUMBER,
+ &d,
+ DESCTYPE_S);
+
+ if (static_cpu_has(X86_FEATURE_RDTSCP))
+ write_rdtscp_aux(cpu_number);
+}
+#endif
+
/*
* cpu_init() initializes state that is per-CPU. Some data is already
* initialized (naturally) in the bootstrap process, such as the GDT
@@ -1685,6 +1712,7 @@ void cpu_init(void)
early_cpu_to_node(cpu) != NUMA_NO_NODE)
set_numa_node(early_cpu_to_node(cpu));
#endif
+ setup_cpu_number(cpu);

me = current;

--
2.7.4


2018-06-20 23:19:22

by Chang S. Bae

[permalink] [raw]
Subject: [PATCH v4 3/7] x86/fsgsbase/64: Use FS/GS base helpers in core dump

When new FSGSBASE instructions enabled, this read will
become faster.

Based-on-code-from: Andy Lutomirski <[email protected]>
Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Andi Kleen <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Reviewed-by: Andy Lutomirski <[email protected]>
---
arch/x86/include/asm/elf.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 0d157d2..49acefb 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -10,6 +10,7 @@
#include <asm/ptrace.h>
#include <asm/user.h>
#include <asm/auxvec.h>
+#include <asm/fsgsbase.h>

typedef unsigned long elf_greg_t;

@@ -205,7 +206,6 @@ void set_personality_ia32(bool);

#define ELF_CORE_COPY_REGS(pr_reg, regs) \
do { \
- unsigned long base; \
unsigned v; \
(pr_reg)[0] = (regs)->r15; \
(pr_reg)[1] = (regs)->r14; \
@@ -228,8 +228,8 @@ do { \
(pr_reg)[18] = (regs)->flags; \
(pr_reg)[19] = (regs)->sp; \
(pr_reg)[20] = (regs)->ss; \
- rdmsrl(MSR_FS_BASE, base); (pr_reg)[21] = base; \
- rdmsrl(MSR_KERNEL_GS_BASE, base); (pr_reg)[22] = base; \
+ (pr_reg)[21] = read_fsbase(); \
+ (pr_reg)[22] = read_inactive_gsbase(); \
asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v; \
asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v; \
asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v; \
--
2.7.4


2018-06-22 14:30:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions

On Wed, 20 Jun 2018, Chang S. Bae wrote:
> +void write_fsbase(unsigned long fsbase)
> +{
> + /* set the selector to 0 to not confuse __switch_to */

'to not confuse __switch_to' is not that helpful of a comment as it
requires to stare into __switch_to to figure out what might get confused
there. Please write it out why this needs to be done in technical terms.

> + loadseg(FS, 0);
> + wrmsrl(MSR_FS_BASE, fsbase);
> +}
> +
> +void write_inactive_gsbase(unsigned long gsbase)
> +{
> + /* set the selector to 0 to not confuse __switch_to */

Ditto

> + loadseg(GS, 0);
> + wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> +}
> +
> +unsigned long read_task_fsbase(struct task_struct *task)
> +{
> + unsigned long fsbase;
> +
> + if (task == current) {
> + fsbase = read_fsbase();
> + } else {
> + /*
> + * XXX: This will not behave as expected if called
> + * if fsindex != 0. This preserves an existing bug
> + * that will be fixed.

I'm late to this party, but let me ask the obvious question:

Why is the existing bug not fixed as the first patch in the series?

We do not preserve bugs when adding new stuff as that makes it a pain to
backport the fix.

> +int write_task_fsbase(struct task_struct *task, unsigned long fsbase)
> +{
> + int cpu;
> +
> + /*
> + * Not strictly needed for fs, but do it for symmetry
> + * with gs
> + */
> + if (unlikely(fsbase >= TASK_SIZE_MAX))
> + return -EPERM;
> +
> + cpu = get_cpu();

What's the point of using get_cpu()? There is nothing at all which needs
'cpu' here. The only point is to prevent preemption, then please use
preempt_disable() and not some random function which happens to disable
preemption underneath.

> + task->thread.fsbase = fsbase;
> + if (task == current)
> + write_fsbase(fsbase);
> + task->thread.fsindex = 0;
> + put_cpu();
> +
> + return 0;
> +}
> +
> +int write_task_gsbase(struct task_struct *task, unsigned long gsbase)
> +{
> + int cpu;
> +
> + if (unlikely(gsbase >= TASK_SIZE_MAX))
> + return -EPERM;
> +
> + cpu = get_cpu();

Ditto

> + task->thread.gsbase = gsbase;
> + if (task == current)
> + write_inactive_gsbase(gsbase);
> + task->thread.gsindex = 0;
> + put_cpu();
> +
> + return 0;
> +}
> +
> int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
> unsigned long arg, struct task_struct *p, unsigned long tls)
> {
> @@ -618,54 +707,27 @@ static long prctl_map_vdso(const struct vdso_image *image, unsigned long addr)
> long do_arch_prctl_64(struct task_struct *task, int option, unsigned long arg2)
> {
> int ret = 0;
> - int doit = task == current;
> - int cpu;
>
> switch (option) {
> - case ARCH_SET_GS:
> - if (arg2 >= TASK_SIZE_MAX)
> - return -EPERM;
> - cpu = get_cpu();

Ah. You copied it from here where it makes no sense either. Copy and paste
is useful, but you really want to think about it.

> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index e2ee403..c53c2bcf6 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c

....

> if (child->thread.fsbase != value)
> - return do_arch_prctl_64(child, ARCH_SET_FS, value);
> + return write_task_fsbase(child, value);

This patch wants to be split into:

1) Adding the new functions

2) Convert vdso

3) Convert ptrace

_AFTER_ fixing the existing bug.

Thanks,

tglx

2018-06-22 14:34:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] x86/segments/64: Rename PER_CPU segment to CPU_NUMBER

On Wed, 20 Jun 2018, Chang S. Bae wrote:
> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 5b8b556..833e229 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -359,7 +359,10 @@ static void vgetcpu_cpu_init(void *arg)
> d.p = 1; /* Present */
> d.d = 1; /* 32-bit */
>
> - write_gdt_entry(get_cpu_gdt_rw(cpu), GDT_ENTRY_PER_CPU, &d, DESCTYPE_S);
> + write_gdt_entry(get_cpu_gdt_rw(cpu),
> + GDT_ENTRY_CPU_NUMBER,
> + &d,
> + DESCTYPE_S);

There is no value in using 4 lines for something which fits in one.

Other than that.

Reviewed-by: Thomas Gleixner <[email protected]>

2018-06-22 14:35:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 4/7] x86/fsgsbase/64: Factor out load FS/GS segments from __switch_to

On Wed, 20 Jun 2018, Chang S. Bae wrote:

> Instead of open coding the calls to load_seg_legacy(), add a
> load_fsgs() helper to handle fs and gs. When FSGSBASE is enabled,
> load_fsgs() will be updated.
>
> Signed-off-by: Chang S. Bae <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Reviewed-by: Andy Lutomirski <[email protected]>

Reviewed-by: Thomas Gleixner <[email protected]>

2018-06-22 14:38:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 2/7] x86/fsgsbase/64: Make ptrace read FS/GS base accurately

On Wed, 20 Jun 2018, Chang S. Bae wrote:

> From: Andy Lutomirski <[email protected]>
>
> ptrace can read FS/GS base using the register access API
> (PTRACE_PEEKUSER, etc) or PTRACE_ARCH_PRCTL. Make both of these
> mechanisms return the actual FS/GS base.

That's the 'BUG' fix, right? Ok, not something which needs to be urgently
backported it seems. It's surely a correctness issue, which improved
debugability, but surely not something crucial.

So the right ordering of the patches is:

1) Introduce task_seg_base() as a standalong patch w/o using it.

2) Add the new fsgs helpers and use task_seg_base() right away instead
of adding this misleading 'preserve bug' comment just to remove it
one patch later.

.....

Thanks,

tglx

2018-06-22 14:44:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 3/7] x86/fsgsbase/64: Use FS/GS base helpers in core dump

On Wed, 20 Jun 2018, Chang S. Bae wrote:

> When new FSGSBASE instructions enabled, this read will
> become faster.

The point is that it replaces the open coded access, which prevents using
the enhanced FSGSBASE mechanism.

Please be more careful with your changelog wordings. They really should
explain stuff proper.

> Based-on-code-from: Andy Lutomirski <[email protected]>
> Signed-off-by: Chang S. Bae <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Reviewed-by: Andy Lutomirski <[email protected]>

Other than that:

Reviewed-by: Thomas Gleixner <[email protected]>


2018-06-22 14:54:21

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions

On Fri, Jun 22, 2018 at 7:28 AM Thomas Gleixner <[email protected]> wrote:
>
> On Wed, 20 Jun 2018, Chang S. Bae wrote:
> > +void write_fsbase(unsigned long fsbase)
> > +{
> > + /* set the selector to 0 to not confuse __switch_to */
>
> 'to not confuse __switch_to' is not that helpful of a comment as it
> requires to stare into __switch_to to figure out what might get confused
> there. Please write it out why this needs to be done in technical terms.
>
> > + loadseg(FS, 0);
> > + wrmsrl(MSR_FS_BASE, fsbase);
> > +}
> > +
> > +void write_inactive_gsbase(unsigned long gsbase)
> > +{
> > + /* set the selector to 0 to not confuse __switch_to */
>
> Ditto
>
> > + loadseg(GS, 0);
> > + wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
> > +}
> > +
> > +unsigned long read_task_fsbase(struct task_struct *task)
> > +{
> > + unsigned long fsbase;
> > +
> > + if (task == current) {
> > + fsbase = read_fsbase();
> > + } else {
> > + /*
> > + * XXX: This will not behave as expected if called
> > + * if fsindex != 0. This preserves an existing bug
> > + * that will be fixed.
>
> I'm late to this party, but let me ask the obvious question:
>
> Why is the existing bug not fixed as the first patch in the series?

IIRC that was how I did it in the old version of this code. I think
it did it because it was less messy to fix the bug after cleaning up
the code, but I could be remembering wrong.

2018-06-22 15:15:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 6/7] x86/vdso: Introduce CPU number helper functions

On Wed, 20 Jun 2018, Chang S. Bae wrote:

> CPU number initialization in vDSO is now a bit cleaned up by
> the new helper functions. The helper functions will take
> care of combing CPU and node number and reading each from

s/combing/combining/ please

> the combined value.
>
> Suggested-by: Andy Lutomirski <[email protected]>
> Signed-off-by: Chang S. Bae <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Acked-by: Andy Lutomirski <[email protected]>
> ---
> arch/x86/entry/vdso/vgetcpu.c | 4 ++--
> arch/x86/entry/vdso/vma.c | 16 ++++++++--------
> arch/x86/include/asm/segment.h | 28 ++++++++++++++++++++++++++++
> arch/x86/include/asm/vgtod.h | 2 --
> 4 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/entry/vdso/vgetcpu.c b/arch/x86/entry/vdso/vgetcpu.c
> index 8ec3d1f..3284069 100644
> --- a/arch/x86/entry/vdso/vgetcpu.c
> +++ b/arch/x86/entry/vdso/vgetcpu.c
> @@ -18,9 +18,9 @@ __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
> p = __getcpu();

While we are touching this, can we please as a first step change __getcpu()
to something else? I've tripped over this several times in the past and
confused it with a (nonexisting) variant of get_cpu().

>
> if (cpu)
> - *cpu = p & VGETCPU_CPU_MASK;
> + *cpu = lsl_tscp_to_cpu(p);
>
> if (node)
> - *node = p >> 12;
> + *node = lsl_tscp_to_node(p);

Are these new helpers going to be used at some other place than this? If
not, then there is really no point at all. Then just go and make this:

__vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
{
vdso_read_cpu_and_node(cpu, node);
return 0;
}

> diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
> index 833e229..1fc93da 100644
> --- a/arch/x86/entry/vdso/vma.c
> +++ b/arch/x86/entry/vdso/vma.c
> @@ -340,19 +340,19 @@ static void vgetcpu_cpu_init(void *arg)
> int cpu = smp_processor_id();
> struct desc_struct d = { };
> unsigned long node = 0;
> + unsigned long cpu_number = 0;

That's hardly a CPU number. It's encoded CPU and node information.

> #ifdef CONFIG_NUMA
> node = cpu_to_node(cpu);
> #endif

While at it please get rid of the ifdeffery. If CONFIG_NUMA=n then
cpu_to_node(cpu) resolves to (0).

> + cpu_number = make_lsl_tscp(cpu, node);

So the whole thing can be reduced to:

u64 cpudata = vdso_encode_cpu_and_node(cpu, cpu_to_node(cpu));

Or some other sensible function name.

Thanks,

tglx

2018-06-22 15:19:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On Wed, 20 Jun 2018, Chang S. Bae wrote:
>
> +#ifdef CONFIG_X86_64
> +static void setup_cpu_number(int cpu)

This function name is really misleading. vdso_setup_cpu_and_node() or
something like that clearly tells what this is about.

Thanks,

tglx

2018-06-22 15:41:50

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions

On Fri, 22 Jun 2018, Andy Lutomirski wrote:
> On Fri, Jun 22, 2018 at 7:28 AM Thomas Gleixner <[email protected]> wrote:
> > > +unsigned long read_task_fsbase(struct task_struct *task)
> > > +{
> > > + unsigned long fsbase;
> > > +
> > > + if (task == current) {
> > > + fsbase = read_fsbase();
> > > + } else {
> > > + /*
> > > + * XXX: This will not behave as expected if called
> > > + * if fsindex != 0. This preserves an existing bug
> > > + * that will be fixed.
> >
> > I'm late to this party, but let me ask the obvious question:
> >
> > Why is the existing bug not fixed as the first patch in the series?
>
> IIRC that was how I did it in the old version of this code. I think
> it did it because it was less messy to fix the bug after cleaning up
> the code, but I could be remembering wrong.

Fair enough. Though the general rule is: Fix bugs first and then do
features, unless you really need the extra step to fix it proper.

Now in that case the real question is whether this is a bug or just a
slight incorrectness which has no practical impact. If it's the latter,
then introduce the new function which does the right thing first and make
the new fs/gs base functions use it without having a blurb about preserving
bugs.

Thanks,

tglx




2018-06-22 16:11:59

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions



> On Jun 22, 2018, at 8:39 AM, Thomas Gleixner <[email protected]> wrote:
>
>> On Fri, 22 Jun 2018, Andy Lutomirski wrote:
>> On Fri, Jun 22, 2018 at 7:28 AM Thomas Gleixner <[email protected]> wrote:
>>>> +unsigned long read_task_fsbase(struct task_struct *task)
>>>> +{
>>>> + unsigned long fsbase;
>>>> +
>>>> + if (task == current) {
>>>> + fsbase = read_fsbase();
>>>> + } else {
>>>> + /*
>>>> + * XXX: This will not behave as expected if called
>>>> + * if fsindex != 0. This preserves an existing bug
>>>> + * that will be fixed.
>>>
>>> I'm late to this party, but let me ask the obvious question:
>>>
>>> Why is the existing bug not fixed as the first patch in the series?
>>
>> IIRC that was how I did it in the old version of this code. I think
>> it did it because it was less messy to fix the bug after cleaning up
>> the code, but I could be remembering wrong.
>
> Fair enough. Though the general rule is: Fix bugs first and then do
> features, unless you really need the extra step to fix it proper.
>
> Now in that case the real question is whether this is a bug or just a
> slight incorrectness which has no practical impact. If it's the latter,
> then introduce the new function which does the right thing first and make
> the new fs/gs base functions use it without having a blurb about preserving
> bugs.

The idea was to have one patch that was intended to have no observable effect (pure refactor) and another to change behavior in an easily reviewable way. I should probably not have used the word bug :)

>
> Thanks,
>
> tglx
>
>
>

2018-06-22 16:13:04

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store


> On Jun 22, 2018, at 8:18 AM, Thomas Gleixner <[email protected]> wrote:
>
>> On Wed, 20 Jun 2018, Chang S. Bae wrote:
>>
>> +#ifdef CONFIG_X86_64
>> +static void setup_cpu_number(int cpu)
>
> This function name is really misleading. vdso_setup_cpu_and_node() or
> something like that clearly tells what this is about.
>

Ah. The big secret is that it’s not just vDSO, or at lease hpa wants to start using it in the entry code.

> Thanks,
>
> tglx

2018-06-22 16:20:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On Fri, 22 Jun 2018, Andy Lutomirski wrote:
> > On Jun 22, 2018, at 8:18 AM, Thomas Gleixner <[email protected]> wrote:
> >
> >> On Wed, 20 Jun 2018, Chang S. Bae wrote:
> >>
> >> +#ifdef CONFIG_X86_64
> >> +static void setup_cpu_number(int cpu)
> >
> > This function name is really misleading. vdso_setup_cpu_and_node() or
> > something like that clearly tells what this is about.
> >
> Ah. The big secret is that it’s not just vDSO, or at lease hpa wants to
> start using it in the entry code.

Fair enough. Still setup_cpu_number() sucks.

Thanks,

tglx

2018-06-22 18:03:39

by Chang S. Bae

[permalink] [raw]
Subject: RE: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

> Still setup_cpu_number() sucks.

How about setup_cpu_and_node_number()?

Thanks,
Chang

2018-06-22 18:14:14

by Chang S. Bae

[permalink] [raw]
Subject: RE: [PATCH v4 1/7] x86/fsgsbase/64: Introduce FS/GS base helper functions

> Ah. You copied it from here where it makes no sense either. Copy and paste
> is useful, but you really want to think about it.

That's the matter of fact though, let me revise those you pointed out here.

> This patch wants to be split into:
> 1) Adding the new functions
> 2) Convert vdso
> 3) Convert ptrace
> _AFTER_ fixing the existing bug.

Okay, I think it is reasonable to do that, (regardless of defining it as bug or not).
Thanks for the sharp reviews.

Chang

2018-06-22 18:47:54

by Chang S. Bae

[permalink] [raw]
Subject: RE: [PATCH v4 6/7] x86/vdso: Introduce CPU number helper functions

On Friday, 22 Jun 2018, Thomas Gleixner wrote:
> On Wed, 20 Jun 2018, Chang S. Bae wrote:
> > CPU number initialization in vDSO is now a bit cleaned up by
> > the new helper functions. The helper functions will take
> > care of combing CPU and node number and reading each from

> s/combing/combining/ please

Will fix.

> > @@ -18,9 +18,9 @@ __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
> > p = __getcpu();

> While we are touching this, can we please as a first step change __getcpu()
> to something else? I've tripped over this several times in the past and
> confused it with a (nonexisting) variant of get_cpu().

How about __vdso_get_cpudata()?

> > #ifdef CONFIG_NUMA
> > node = cpu_to_node(cpu);
> > #endif

> While at it please get rid of the ifdeffery. If CONFIG_NUMA=n then
> cpu_to_node(cpu) resolves to (0).

Okay, will fix this.


> >
> > if (cpu)
> > - *cpu = p & VGETCPU_CPU_MASK;
> > + *cpu = lsl_tscp_to_cpu(p);
> >
> > if (node)
> > - *node = p >> 12;
> > + *node = lsl_tscp_to_node(p);
>
> Are these new helpers going to be used at some other place than this? If
> not, then there is really no point at all. Then just go and make this:
>
> __vdso_getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *unused)
> {
> vdso_read_cpu_and_node(cpu, node);
> return 0;
> }

> > @@ -340,19 +340,19 @@ static void vgetcpu_cpu_init(void *arg)
> > int cpu = smp_processor_id();
> > struct desc_struct d = { };
> > unsigned long node = 0;
> > + unsigned long cpu_number = 0;
>
> That's hardly a CPU number. It's encoded CPU and node information.
>
> > + cpu_number = make_lsl_tscp(cpu, node);
>
> So the whole thing can be reduced to:
>
> u64 cpudata = vdso_encode_cpu_and_node(cpu, cpu_to_node(cpu));
>
> Or some other sensible function name.

Now, for these helper function renaming, I would like to check it from Andy.

Thanks,
Chang

2018-06-22 19:10:29

by Thomas Gleixner

[permalink] [raw]
Subject: RE: [PATCH v4 6/7] x86/vdso: Introduce CPU number helper functions

On Fri, 22 Jun 2018, Bae, Chang Seok wrote:
> On Friday, 22 Jun 2018, Thomas Gleixner wrote:
> > Or some other sensible function name.
>
> Now, for these helper function renaming, I would like to check it from Andy.

Sure I just made some up. Find some which are descriptive enough,

Thanks,

tglx

2018-06-22 22:04:37

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On Fri, Jun 22, 2018 at 11:02 AM Bae, Chang Seok
<[email protected]> wrote:
>
> > Still setup_cpu_number() sucks.
>
> How about setup_cpu_and_node_number()?

setup_fast_cpu_node_nr()? setup_cpunr_regs()? I don't have any
brilliant ideas. Thomas?

2018-06-25 15:41:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On Fri, 22 Jun 2018, Andy Lutomirski wrote:

> On Fri, Jun 22, 2018 at 11:02 AM Bae, Chang Seok
> <[email protected]> wrote:
> >
> > > Still setup_cpu_number() sucks.
> >
> > How about setup_cpu_and_node_number()?
>
> setup_fast_cpu_node_nr()? setup_cpunr_regs()? I don't have any
> brilliant ideas. Thomas?

It's really hard to find a good one. setup_cpu_node_cache() might work, but
it's not brilliant either.

Thanks,

tglx


2018-06-25 18:04:24

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On 06/25/18 08:40, Thomas Gleixner wrote:
> On Fri, 22 Jun 2018, Andy Lutomirski wrote:
>
>> On Fri, Jun 22, 2018 at 11:02 AM Bae, Chang Seok
>> <[email protected]> wrote:
>>>
>>>> Still setup_cpu_number() sucks.
>>>
>>> How about setup_cpu_and_node_number()?
>>
>> setup_fast_cpu_node_nr()? setup_cpunr_regs()? I don't have any
>> brilliant ideas. Thomas?
>
> It's really hard to find a good one. setup_cpu_node_cache() might work, but
> it's not brilliant either.
>

Since this applies to the getcpu(2) system call (in addition to the
kernel usage), perhaps setup_getcpu()?

-hpa


2018-06-25 19:41:51

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 7/7] x86/vdso: Move out the CPU number store

On Mon, 25 Jun 2018, H. Peter Anvin wrote:

> On 06/25/18 08:40, Thomas Gleixner wrote:
> > On Fri, 22 Jun 2018, Andy Lutomirski wrote:
> >
> >> On Fri, Jun 22, 2018 at 11:02 AM Bae, Chang Seok
> >> <[email protected]> wrote:
> >>>
> >>>> Still setup_cpu_number() sucks.
> >>>
> >>> How about setup_cpu_and_node_number()?
> >>
> >> setup_fast_cpu_node_nr()? setup_cpunr_regs()? I don't have any
> >> brilliant ideas. Thomas?
> >
> > It's really hard to find a good one. setup_cpu_node_cache() might work, but
> > it's not brilliant either.
> >
>
> Since this applies to the getcpu(2) system call (in addition to the
> kernel usage), perhaps setup_getcpu()?

Works for me.