2022-09-02 15:04:29

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH v2 22/59] x86: Put hot per CPU variables into a struct

From: Thomas Gleixner <[email protected]>

The layout of per-cpu variables is at the mercy of the compiler. This
can lead to random performance fluctuations from build to build.

Create a structure to hold some of the hottest per-cpu variables,
starting with current_task.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/include/asm/current.h | 19 ++++++++++++++++---
arch/x86/kernel/cpu/common.c | 14 +++++---------
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/smpboot.c | 2 +-
5 files changed, 24 insertions(+), 15 deletions(-)

--- a/arch/x86/include/asm/current.h
+++ b/arch/x86/include/asm/current.h
@@ -3,16 +3,29 @@
#define _ASM_X86_CURRENT_H

#include <linux/compiler.h>
-#include <asm/percpu.h>

#ifndef __ASSEMBLY__
+
+#include <linux/cache.h>
+#include <asm/percpu.h>
+
struct task_struct;

-DECLARE_PER_CPU(struct task_struct *, current_task);
+struct pcpu_hot {
+ union {
+ struct {
+ struct task_struct *current_task;
+ };
+ u8 pad[64];
+ };
+};
+static_assert(sizeof(struct pcpu_hot) == 64);
+
+DECLARE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot);

static __always_inline struct task_struct *get_current(void)
{
- return this_cpu_read_stable(current_task);
+ return this_cpu_read_stable(pcpu_hot.current_task);
}

#define current get_current()
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2000,18 +2000,16 @@ static __init int setup_clearcpuid(char
}
__setup("clearcpuid=", setup_clearcpuid);

+DEFINE_PER_CPU_ALIGNED(struct pcpu_hot, pcpu_hot) = {
+ .current_task = &init_task,
+};
+EXPORT_PER_CPU_SYMBOL(pcpu_hot);
+
#ifdef CONFIG_X86_64
DEFINE_PER_CPU_FIRST(struct fixed_percpu_data,
fixed_percpu_data) __aligned(PAGE_SIZE) __visible;
EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data);

-/*
- * The following percpu variables are hot. Align current_task to
- * cacheline size such that they fall in the same cacheline.
- */
-DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned =
- &init_task;
-EXPORT_PER_CPU_SYMBOL(current_task);

DEFINE_PER_CPU(void *, hardirq_stack_ptr);
DEFINE_PER_CPU(bool, hardirq_stack_inuse);
@@ -2071,8 +2069,6 @@ void syscall_init(void)

#else /* CONFIG_X86_64 */

-DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
-EXPORT_PER_CPU_SYMBOL(current_task);
DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
EXPORT_PER_CPU_SYMBOL(__preempt_count);

--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -207,7 +207,7 @@ EXPORT_SYMBOL_GPL(start_thread);
if (prev->gs | next->gs)
loadsegment(gs, next->gs);

- this_cpu_write(current_task, next_p);
+ raw_cpu_write(pcpu_hot.current_task, next_p);

switch_fpu_finish();

--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -616,7 +616,7 @@ void compat_start_thread(struct pt_regs
/*
* Switch the PDA and FPU contexts.
*/
- this_cpu_write(current_task, next_p);
+ raw_cpu_write(pcpu_hot.current_task, next_p);
this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));

switch_fpu_finish();
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1046,7 +1046,7 @@ int common_cpu_up(unsigned int cpu, stru
/* Just in case we booted with a single CPU. */
alternatives_enable_smp();

- per_cpu(current_task, cpu) = idle;
+ per_cpu(pcpu_hot.current_task, cpu) = idle;
cpu_init_stack_canary(cpu, idle);

/* Initialize the interrupt stack(s) */



2022-09-02 18:16:41

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v2 22/59] x86: Put hot per CPU variables into a struct

On Fri, Sep 2, 2022 at 3:54 PM Peter Zijlstra <[email protected]> wrote:
> From: Thomas Gleixner <[email protected]>
>
> The layout of per-cpu variables is at the mercy of the compiler. This
> can lead to random performance fluctuations from build to build.
>
> Create a structure to hold some of the hottest per-cpu variables,
> starting with current_task.
[...]
> -DECLARE_PER_CPU(struct task_struct *, current_task);
> +struct pcpu_hot {
> + union {
> + struct {
> + struct task_struct *current_task;
> + };
> + u8 pad[64];
> + };
> +};

fixed_percpu_data::stack_canary is probably also a fairly hot per-cpu
variable on distro kernels with CONFIG_STACKPROTECTOR_STRONG (which
e.g. Debian enables), so perhaps it'd make sense to reuse
fixed_percpu_data as the struct for hot percpu variables? But I don't
have any numbers to actually back up that idea.

2022-09-15 11:36:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 22/59] x86: Put hot per CPU variables into a struct

On Fri, Sep 02, 2022 at 08:02:46PM +0200, Jann Horn wrote:
> On Fri, Sep 2, 2022 at 3:54 PM Peter Zijlstra <[email protected]> wrote:
> > From: Thomas Gleixner <[email protected]>
> >
> > The layout of per-cpu variables is at the mercy of the compiler. This
> > can lead to random performance fluctuations from build to build.
> >
> > Create a structure to hold some of the hottest per-cpu variables,
> > starting with current_task.
> [...]
> > -DECLARE_PER_CPU(struct task_struct *, current_task);
> > +struct pcpu_hot {
> > + union {
> > + struct {
> > + struct task_struct *current_task;
> > + };
> > + u8 pad[64];
> > + };
> > +};
>
> fixed_percpu_data::stack_canary is probably also a fairly hot per-cpu
> variable on distro kernels with CONFIG_STACKPROTECTOR_STRONG (which
> e.g. Debian enables), so perhaps it'd make sense to reuse
> fixed_percpu_data as the struct for hot percpu variables? But I don't
> have any numbers to actually back up that idea.

Not a bad idea; but the immediate problem I see with this is that
fixed_percpu_data is x86_64 only.

Also; I'm thinking the current stack-protector thing is somewhat of a
hack due to GCC limitations (per the comment there) and once that gets
cleaned up it can come live in the pcpu_hot thing.