2021-04-22 12:03:23

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/8] tick/nohz updates v2

This set brings various interrupts reducing while running in nohz_full:

* Remove one tick interrupt while waking up from idle to a user task
running in nohz_full mode. (thanks Yunfeng Ye).

* Reduce IPIs when running posix cpu timers, only relevant tasks should
be interrupted now instead of all tick nohz CPUs (thanks Marcelo)

And a few other cleanups and improvement.

Changes since last take:

- Remove "tick/nohz: Prevent tick_nohz_get_sleep_length() from returning negative value"
since the issue has been solve on the cpuidle side.

- Remove "timer: Report ignored local enqueue in nohz mode"
and hope that objtool will spot the future offenders.

- Changed "tick/nohz: Add tick_nohz_full_this_cpu()" and provide with
"tick/nohz: Evaluate the CPU expression after the static key" (please
add your SOB on this one).

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
timers/nohz-v2

HEAD: 4546d43a9938f6c7eec024f005cb240b8b73637b

Thanks,
Frederic
---

Frederic Weisbecker (3):
tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
tick/nohz: Update nohz_full Kconfig help
tick/nohz: Only wakeup a single target cpu when kicking a task

Marcelo Tosatti (2):
tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks
tick/nohz: Kick only _queued_ task whose tick dependency is updated

Yunfeng Ye (2):
tick/nohz: Conditionally restart tick on idle exit
tick/nohz: Update idle_exittime on actual idle exit

Peter Zijlstra (1):
tick/nohz: Evaluate the CPU expression after the static key


include/linux/sched.h | 2 +
include/linux/tick.h | 26 +++++----
kernel/sched/core.c | 5 ++
kernel/time/Kconfig | 11 ++--
kernel/time/posix-cpu-timers.c | 4 +-
kernel/time/tick-sched.c | 122 +++++++++++++++++++++++++++++------------
6 files changed, 117 insertions(+), 53 deletions(-)


2021-04-22 12:03:26

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/8] tick/nohz: Evaluate the CPU expression after the static key

From: Peter Zijlstra <[email protected]>

When tick_nohz_full_cpu() is called with smp_processor_id(), the latter
is unconditionally evaluated whether the static key is on or off. It is
not necessary in the off-case though, so make sure the cpu expression
is executed at the last moment.

Illustrate with the following test function:

int tick_nohz_test(void)
{
return tick_nohz_full_cpu(smp_processor_id());
}

The resulting code before was:

mov %gs:0x7eea92d1(%rip),%eax # smp_processor_id() fetch
nopl 0x0(%rax,%rax,1)
xor %eax,%eax
retq
cmpb $0x0,0x29d393a(%rip) # <tick_nohz_full_running>
je tick_nohz_test+0x29 # jump to below eax clear
mov %eax,%eax
bt %rax,0x29d3936(%rip) # <tick_nohz_full_mask>
setb %al
movzbl %al,%eax
retq
xor %eax,%eax
retq

Now it becomes:

nopl 0x0(%rax,%rax,1)
xor %eax,%eax
retq
cmpb $0x0,0x29d3871(%rip) # <tick_nohz_full_running>
je tick_nohz_test+0x29 # jump to below eax clear
mov %gs:0x7eea91f0(%rip),%eax # smp_processor_id() fetch, after static key
mov %eax,%eax
bt %rax,0x29d3866(%rip) # <tick_nohz_full_mask>
setb %al
movzbl %al,%eax
retq
xor %eax,%eax
retq

Not-Yet-Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 7340613c7eff..2258984a0e8a 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -185,13 +185,17 @@ static inline bool tick_nohz_full_enabled(void)
return tick_nohz_full_running;
}

-static inline bool tick_nohz_full_cpu(int cpu)
-{
- if (!tick_nohz_full_enabled())
- return false;
-
- return cpumask_test_cpu(cpu, tick_nohz_full_mask);
-}
+/*
+ * Check if a CPU is part of the nohz_full subset. Arrange for evaluating
+ * the cpu expression (typically smp_processor_id()) _after_ the static
+ * key.
+ */
+#define tick_nohz_full_cpu(_cpu) ({ \
+ bool __ret = false; \
+ if (tick_nohz_full_enabled()) \
+ __ret = cpumask_test_cpu((_cpu), tick_nohz_full_mask); \
+ __ret; \
+})

static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
{
--
2.25.1

2021-04-22 12:03:31

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/8] tick/nohz: Conditionally restart tick on idle exit

From: Yunfeng Ye <[email protected]>

In nohz_full mode, switching from idle to a task will unconditionally
issue a tick restart. If the task is alone in the runqueue or is the
highest priority, the tick will fire once then eventually stop. But that
alone is still undesired noise.

Therefore, only restart the tick on idle exit when it's strictly
necessary.

Signed-off-by: Yunfeng Ye <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 46 +++++++++++++++++++++++++---------------
1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e10a4af88737..c888445fb181 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -926,24 +926,30 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
tick_nohz_restart(ts, now);
}

-static void tick_nohz_full_update_tick(struct tick_sched *ts)
+static void __tick_nohz_full_update_tick(struct tick_sched *ts,
+ ktime_t now)
{
#ifdef CONFIG_NO_HZ_FULL
int cpu = smp_processor_id();

- if (!tick_nohz_full_cpu(cpu))
- return;
-
- if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
- return;
-
if (can_stop_full_tick(cpu, ts))
tick_nohz_stop_sched_tick(ts, cpu);
else if (ts->tick_stopped)
- tick_nohz_restart_sched_tick(ts, ktime_get());
+ tick_nohz_restart_sched_tick(ts, now);
#endif
}

+static void tick_nohz_full_update_tick(struct tick_sched *ts)
+{
+ if (!tick_nohz_full_cpu(smp_processor_id()))
+ return;
+
+ if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
+ return;
+
+ __tick_nohz_full_update_tick(ts, ktime_get());
+}
+
static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
{
/*
@@ -1205,18 +1211,24 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
#endif
}

-static void __tick_nohz_idle_restart_tick(struct tick_sched *ts, ktime_t now)
-{
- tick_nohz_restart_sched_tick(ts, now);
- tick_nohz_account_idle_ticks(ts);
-}
-
void tick_nohz_idle_restart_tick(void)
{
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);

- if (ts->tick_stopped)
- __tick_nohz_idle_restart_tick(ts, ktime_get());
+ if (ts->tick_stopped) {
+ tick_nohz_restart_sched_tick(ts, ktime_get());
+ tick_nohz_account_idle_ticks(ts);
+ }
+}
+
+static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now)
+{
+ if (tick_nohz_full_cpu(smp_processor_id()))
+ __tick_nohz_full_update_tick(ts, now);
+ else
+ tick_nohz_restart_sched_tick(ts, now);
+
+ tick_nohz_account_idle_ticks(ts);
}

/**
@@ -1248,7 +1260,7 @@ void tick_nohz_idle_exit(void)
tick_nohz_stop_idle(ts, now);

if (tick_stopped)
- __tick_nohz_idle_restart_tick(ts, now);
+ tick_nohz_idle_update_tick(ts, now);

local_irq_enable();
}
--
2.25.1

2021-04-22 12:03:41

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/8] tick/nohz: Update idle_exittime on actual idle exit

From: Yunfeng Ye <[email protected]>

The idle_exittime field of tick_sched is used to record the time when
the idle state was left. but currently the idle_exittime is updated in
the function tick_nohz_restart_sched_tick(), which is not always in idle
state when nohz_full is configured:

tick_irq_exit
tick_nohz_irq_exit
tick_nohz_full_update_tick
tick_nohz_restart_sched_tick
ts->idle_exittime = now;

It's thus overwritten by mistake on nohz_full tick restart. Move the
update to the appropriate idle exit path instead.

Signed-off-by: Yunfeng Ye <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
kernel/time/tick-sched.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 31efd55ed302..ffc13b9dfbe3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -921,8 +921,6 @@ static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
* Cancel the scheduled timer and restore the tick
*/
ts->tick_stopped = 0;
- ts->idle_exittime = now;
-
tick_nohz_restart(ts, now);
}

@@ -1190,10 +1188,13 @@ unsigned long tick_nohz_get_idle_calls(void)
return ts->idle_calls;
}

-static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
+static void tick_nohz_account_idle_time(struct tick_sched *ts,
+ ktime_t now)
{
unsigned long ticks;

+ ts->idle_exittime = now;
+
if (vtime_accounting_enabled_this_cpu())
return;
/*
@@ -1214,8 +1215,9 @@ void tick_nohz_idle_restart_tick(void)
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);

if (ts->tick_stopped) {
- tick_nohz_restart_sched_tick(ts, ktime_get());
- tick_nohz_account_idle_ticks(ts);
+ ktime_t now = ktime_get();
+ tick_nohz_restart_sched_tick(ts, now);
+ tick_nohz_account_idle_time(ts, now);
}
}

@@ -1226,7 +1228,7 @@ static void tick_nohz_idle_update_tick(struct tick_sched *ts, ktime_t now)
else
tick_nohz_restart_sched_tick(ts, now);

- tick_nohz_account_idle_ticks(ts);
+ tick_nohz_account_idle_time(ts, now);
}

/**
--
2.25.1

2021-04-22 12:03:45

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/8] tick/nohz: Update nohz_full Kconfig help

CONFIG_NO_HZ_FULL behaves just like CONFIG_NO_HZ_IDLE by default.
Reassure distros about it.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
---
kernel/time/Kconfig | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 83e158d016ba..6649e1d2dba5 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -117,13 +117,14 @@ config NO_HZ_FULL
the task mostly runs in userspace and has few kernel activity.

You need to fill up the nohz_full boot parameter with the
- desired range of dynticks CPUs.
+ desired range of dynticks CPUs to use it. This is implemented at
+ the expense of some overhead in user <-> kernel transitions:
+ syscalls, exceptions and interrupts.

- This is implemented at the expense of some overhead in user <-> kernel
- transitions: syscalls, exceptions and interrupts. Even when it's
- dynamically off.
+ By default, without passing nohz_full parameter, this behaves just
+ like NO_HZ_IDLE.

- Say N.
+ If you're a distro say Y.

endchoice

--
2.25.1

2021-04-22 12:03:54

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/8] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE

The vtime_accounting_enabled_this_cpu() early check already makes what
follows as dead code in the case of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
No need to keep the ifdeferry around.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Marcelo Tosatti <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
---
kernel/time/tick-sched.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index c888445fb181..31efd55ed302 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1192,7 +1192,6 @@ unsigned long tick_nohz_get_idle_calls(void)

static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
{
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
unsigned long ticks;

if (vtime_accounting_enabled_this_cpu())
@@ -1208,7 +1207,6 @@ static void tick_nohz_account_idle_ticks(struct tick_sched *ts)
*/
if (ticks && ticks < LONG_MAX)
account_idle_ticks(ticks);
-#endif
}

void tick_nohz_idle_restart_tick(void)
--
2.25.1

2021-04-22 12:04:33

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 6/8] tick/nohz: Only wakeup a single target cpu when kicking a task

When adding a tick dependency to a task, its necessary to
wakeup the CPU where the task resides to reevaluate tick
dependencies on that CPU.

However the current code wakes up all nohz_full CPUs, which
is unnecessary.

Switch to waking up a single CPU, by using ordering of writes
to task->cpu and task->tick_dep_mask.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
---
kernel/time/tick-sched.c | 40 +++++++++++++++++++++++++++-------------
1 file changed, 27 insertions(+), 13 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ffc13b9dfbe3..45d9a4ea3ee0 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -322,6 +322,31 @@ void tick_nohz_full_kick_cpu(int cpu)
irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
}

+static void tick_nohz_kick_task(struct task_struct *tsk)
+{
+ int cpu = task_cpu(tsk);
+
+ /*
+ * If the task concurrently migrates to another cpu,
+ * we guarantee it sees the new tick dependency upon
+ * schedule.
+ *
+ *
+ * set_task_cpu(p, cpu);
+ * STORE p->cpu = @cpu
+ * __schedule() (switch to task 'p')
+ * LOCK rq->lock
+ * smp_mb__after_spin_lock() STORE p->tick_dep_mask
+ * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
+ * LOAD p->tick_dep_mask LOAD p->cpu
+ */
+
+ preempt_disable();
+ if (cpu_online(cpu))
+ tick_nohz_full_kick_cpu(cpu);
+ preempt_enable();
+}
+
/*
* Kick all full dynticks CPUs in order to force these to re-evaluate
* their dependency on the tick and restart it if necessary.
@@ -404,19 +429,8 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);
*/
void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
{
- if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask)) {
- if (tsk == current) {
- preempt_disable();
- tick_nohz_full_kick();
- preempt_enable();
- } else {
- /*
- * Some future tick_nohz_full_kick_task()
- * should optimize this.
- */
- tick_nohz_full_kick_all();
- }
- }
+ if (!atomic_fetch_or(BIT(bit), &tsk->tick_dep_mask))
+ tick_nohz_kick_task(tsk);
}
EXPORT_SYMBOL_GPL(tick_nohz_dep_set_task);

--
2.25.1

2021-04-22 12:05:34

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 8/8] tick/nohz: Kick only _queued_ task whose tick dependency is updated

From: Marcelo Tosatti <[email protected]>

When the tick dependency of a task is updated, we want it to aknowledge
the new state and restart the tick if needed. If the task is not
running, we don't need to kick it because it will observe the new
dependency upon scheduling in. But if the task is running, we may need
to send an IPI to it so that it gets notified.

Unfortunately we don't have the means to check if a task is running
in a race free way. Checking p->on_cpu in a synchronized way against
p->tick_dep_mask would imply adding a full barrier between
prepare_task_switch() and tick_nohz_task_switch(), which we want to
avoid in this fast-path.

Therefore we blindly fire an IPI to the task's CPU.

Meanwhile we can check if the task is queued on the CPU rq because
p->on_rq is always set to TASK_ON_RQ_QUEUED _before_ schedule() and its
full barrier that precedes tick_nohz_task_switch(). And if the task is
queued on a nohz_full CPU, it also has fair chances to be running as the
isolation constraints prescribe running single tasks on full dynticks
CPUs.

So use this as a trick to check if we can spare an IPI toward a
non-running task.

NOTE: For the ordering to be correct, it is assumed that we never
deactivate a task while it is running, the only exception being the task
deactivating itself while scheduling out.

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/sched.h | 2 ++
kernel/sched/core.c | 5 +++++
kernel/time/tick-sched.c | 19 +++++++++++++++++--
3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ef00bb22164c..64dd6f698f3a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1999,6 +1999,8 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)

#endif /* CONFIG_SMP */

+extern bool sched_task_on_rq(struct task_struct *p);
+
/*
* In order to reduce various lock holder preemption latencies provide an
* interface to see if a vCPU is currently running or not.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 98191218d891..08526227d200 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1580,6 +1580,11 @@ static inline void uclamp_post_fork(struct task_struct *p) { }
static inline void init_uclamp(void) { }
#endif /* CONFIG_UCLAMP_TASK */

+bool sched_task_on_rq(struct task_struct *p)
+{
+ return task_on_rq_queued(p);
+}
+
static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
{
if (!(flags & ENQUEUE_NOCLOCK))
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ad5c3905196a..faba7881048f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -324,8 +324,6 @@ void tick_nohz_full_kick_cpu(int cpu)

static void tick_nohz_kick_task(struct task_struct *tsk)
{
- int cpu = task_cpu(tsk);
-
/*
* If the task concurrently migrates to another cpu,
* we guarantee it sees the new tick dependency upon
@@ -340,6 +338,23 @@ static void tick_nohz_kick_task(struct task_struct *tsk)
* tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
* LOAD p->tick_dep_mask LOAD p->cpu
*/
+ int cpu = task_cpu(tsk);
+
+ /*
+ * If the task is not running, run_posix_cpu_timers
+ * has nothing to elapsed, can spare IPI in that
+ * case.
+ *
+ * activate_task() STORE p->tick_dep_mask
+ * STORE p->on_rq
+ * __schedule() (switch to task 'p') smp_mb() (atomic_fetch_or())
+ * LOCK rq->lock LOAD p->on_rq
+ * smp_mb__after_spin_lock()
+ * tick_nohz_task_switch()
+ * LOAD p->tick_dep_mask
+ */
+ if (!sched_task_on_rq(tsk))
+ return;

preempt_disable();
if (cpu_online(cpu))
--
2.25.1

2021-04-22 12:06:57

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 7/8] tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks

From: Marcelo Tosatti <[email protected]>

Rather than waking up all nohz_full CPUs on the system, only wakeup
the target CPUs of member threads of the signal.

Reduces interruptions to nohz_full CPUs.

Signed-off-by: Marcelo Tosatti <[email protected]>
Cc: Yunfeng Ye <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
---
include/linux/tick.h | 8 ++++----
kernel/time/posix-cpu-timers.c | 4 ++--
kernel/time/tick-sched.c | 15 +++++++++++++--
3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 2258984a0e8a..0bb80a7f05b9 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -211,7 +211,7 @@ extern void tick_nohz_dep_set_task(struct task_struct *tsk,
enum tick_dep_bits bit);
extern void tick_nohz_dep_clear_task(struct task_struct *tsk,
enum tick_dep_bits bit);
-extern void tick_nohz_dep_set_signal(struct signal_struct *signal,
+extern void tick_nohz_dep_set_signal(struct task_struct *tsk,
enum tick_dep_bits bit);
extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
enum tick_dep_bits bit);
@@ -256,11 +256,11 @@ static inline void tick_dep_clear_task(struct task_struct *tsk,
if (tick_nohz_full_enabled())
tick_nohz_dep_clear_task(tsk, bit);
}
-static inline void tick_dep_set_signal(struct signal_struct *signal,
+static inline void tick_dep_set_signal(struct task_struct *tsk,
enum tick_dep_bits bit)
{
if (tick_nohz_full_enabled())
- tick_nohz_dep_set_signal(signal, bit);
+ tick_nohz_dep_set_signal(tsk, bit);
}
static inline void tick_dep_clear_signal(struct signal_struct *signal,
enum tick_dep_bits bit)
@@ -288,7 +288,7 @@ static inline void tick_dep_set_task(struct task_struct *tsk,
enum tick_dep_bits bit) { }
static inline void tick_dep_clear_task(struct task_struct *tsk,
enum tick_dep_bits bit) { }
-static inline void tick_dep_set_signal(struct signal_struct *signal,
+static inline void tick_dep_set_signal(struct task_struct *tsk,
enum tick_dep_bits bit) { }
static inline void tick_dep_clear_signal(struct signal_struct *signal,
enum tick_dep_bits bit) { }
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 9abe15255bc4..127255a73ea6 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -523,7 +523,7 @@ static void arm_timer(struct k_itimer *timer, struct task_struct *p)
if (CPUCLOCK_PERTHREAD(timer->it_clock))
tick_dep_set_task(p, TICK_DEP_BIT_POSIX_TIMER);
else
- tick_dep_set_signal(p->signal, TICK_DEP_BIT_POSIX_TIMER);
+ tick_dep_set_signal(p, TICK_DEP_BIT_POSIX_TIMER);
}

/*
@@ -1358,7 +1358,7 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clkid,
if (*newval < *nextevt)
*nextevt = *newval;

- tick_dep_set_signal(tsk->signal, TICK_DEP_BIT_POSIX_TIMER);
+ tick_dep_set_signal(tsk, TICK_DEP_BIT_POSIX_TIMER);
}

static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 45d9a4ea3ee0..ad5c3905196a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -444,9 +444,20 @@ EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_task);
* Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse
* per process timers.
*/
-void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
+void tick_nohz_dep_set_signal(struct task_struct *tsk,
+ enum tick_dep_bits bit)
{
- tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
+ int prev;
+ struct signal_struct *sig = tsk->signal;
+
+ prev = atomic_fetch_or(BIT(bit), &sig->tick_dep_mask);
+ if (!prev) {
+ struct task_struct *t;
+
+ lockdep_assert_held(&tsk->sighand->siglock);
+ __for_each_thread(sig, t)
+ tick_nohz_kick_task(t);
+ }
}

void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit)
--
2.25.1

2021-05-04 12:41:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/8] tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE

On Thu, Apr 22, 2021 at 02:01:53PM +0200, Frederic Weisbecker wrote:
> The vtime_accounting_enabled_this_cpu() early check already makes what
> follows as dead code in the case of CONFIG_VIRT_CPU_ACCOUNTING_NATIVE.
> No need to keep the ifdeferry around.

Somewhat unrelated, but vtime_accounting_enabled_cpu() is missing a ' '...

2021-05-04 15:01:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/8] tick/nohz: Evaluate the CPU expression after the static key

On Thu, Apr 22, 2021 at 02:01:51PM +0200, Frederic Weisbecker wrote:
> From: Peter Zijlstra <[email protected]>
>
> When tick_nohz_full_cpu() is called with smp_processor_id(), the latter
> is unconditionally evaluated whether the static key is on or off. It is
> not necessary in the off-case though, so make sure the cpu expression
> is executed at the last moment.
>
> Illustrate with the following test function:
>
> int tick_nohz_test(void)
> {
> return tick_nohz_full_cpu(smp_processor_id());
> }
>
> The resulting code before was:
>
> mov %gs:0x7eea92d1(%rip),%eax # smp_processor_id() fetch
> nopl 0x0(%rax,%rax,1)
> xor %eax,%eax
> retq
> cmpb $0x0,0x29d393a(%rip) # <tick_nohz_full_running>
> je tick_nohz_test+0x29 # jump to below eax clear
> mov %eax,%eax
> bt %rax,0x29d3936(%rip) # <tick_nohz_full_mask>
> setb %al
> movzbl %al,%eax
> retq
> xor %eax,%eax
> retq
>
> Now it becomes:
>
> nopl 0x0(%rax,%rax,1)
> xor %eax,%eax
> retq
> cmpb $0x0,0x29d3871(%rip) # <tick_nohz_full_running>
> je tick_nohz_test+0x29 # jump to below eax clear
> mov %gs:0x7eea91f0(%rip),%eax # smp_processor_id() fetch, after static key
> mov %eax,%eax
> bt %rax,0x29d3866(%rip) # <tick_nohz_full_mask>
> setb %al
> movzbl %al,%eax
> retq
> xor %eax,%eax
> retq
>
> Not-Yet-Signed-off-by: Peter Zijlstra <[email protected]>

Signed-off-by: Peter Zijlstra <[email protected]>

Thanks for writing the Changelog.

2021-05-05 14:13:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/8] tick/nohz updates v2

On Thu, Apr 22, 2021 at 02:01:50PM +0200, Frederic Weisbecker wrote:
> Frederic Weisbecker (3):
> tick/nohz: Remove superflous check for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
> tick/nohz: Update nohz_full Kconfig help
> tick/nohz: Only wakeup a single target cpu when kicking a task
>
> Marcelo Tosatti (2):
> tick/nohz: Change signal tick dependency to wakeup CPUs of member tasks
> tick/nohz: Kick only _queued_ task whose tick dependency is updated
>
> Yunfeng Ye (2):
> tick/nohz: Conditionally restart tick on idle exit
> tick/nohz: Update idle_exittime on actual idle exit
>
> Peter Zijlstra (1):
> tick/nohz: Evaluate the CPU expression after the static key
>

Acked-by: Peter Zijlstra (Intel) <[email protected]>

2021-05-05 14:13:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/8] tick/nohz: Only wakeup a single target cpu when kicking a task

On Thu, Apr 22, 2021 at 02:01:56PM +0200, Frederic Weisbecker wrote:
> When adding a tick dependency to a task, its necessary to
> wakeup the CPU where the task resides to reevaluate tick
> dependencies on that CPU.
>
> However the current code wakes up all nohz_full CPUs, which
> is unnecessary.
>
> Switch to waking up a single CPU, by using ordering of writes
> to task->cpu and task->tick_dep_mask.
>
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Yunfeng Ye <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Marcelo Tosatti <[email protected]>
> ---
> kernel/time/tick-sched.c | 40 +++++++++++++++++++++++++++-------------
> 1 file changed, 27 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index ffc13b9dfbe3..45d9a4ea3ee0 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -322,6 +322,31 @@ void tick_nohz_full_kick_cpu(int cpu)
> irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
> }
>
> +static void tick_nohz_kick_task(struct task_struct *tsk)
> +{
> + int cpu = task_cpu(tsk);
> +
> + /*
> + * If the task concurrently migrates to another cpu,
> + * we guarantee it sees the new tick dependency upon
> + * schedule.
> + *
> + *
> + * set_task_cpu(p, cpu);
> + * STORE p->cpu = @cpu
> + * __schedule() (switch to task 'p')
> + * LOCK rq->lock
> + * smp_mb__after_spin_lock() STORE p->tick_dep_mask
> + * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
> + * LOAD p->tick_dep_mask LOAD p->cpu
> + */
> +
> + preempt_disable();
> + if (cpu_online(cpu))
> + tick_nohz_full_kick_cpu(cpu);
> + preempt_enable();
> +}


That had me looking at tick_nohz_task_switch(), does we want the below?


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9143163fa678..ff45fc513ba7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4207,6 +4207,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
vtime_task_switch(prev);
perf_event_task_sched_in(prev, current);
finish_task(prev);
+ tick_nohz_task_switch();
finish_lock_switch(rq);
finish_arch_post_lock_switch();
kcov_finish_switch(current);
@@ -4252,7 +4253,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
put_task_struct_rcu_user(prev);
}

- tick_nohz_task_switch();
return rq;
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 828b091501ca..ea079be9097f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -447,13 +447,10 @@ void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bi
*/
void __tick_nohz_task_switch(void)
{
- unsigned long flags;
struct tick_sched *ts;

- local_irq_save(flags);
-
if (!tick_nohz_full_cpu(smp_processor_id()))
- goto out;
+ return;

ts = this_cpu_ptr(&tick_cpu_sched);

@@ -462,8 +459,6 @@ void __tick_nohz_task_switch(void)
atomic_read(&current->signal->tick_dep_mask))
tick_nohz_full_kick();
}
-out:
- local_irq_restore(flags);
}

/* Get the boot-time nohz CPU list from the kernel parameters. */

2021-05-05 15:11:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 8/8] tick/nohz: Kick only _queued_ task whose tick dependency is updated

On Thu, Apr 22, 2021 at 02:01:58PM +0200, Frederic Weisbecker wrote:

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 98191218d891..08526227d200 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1580,6 +1580,11 @@ static inline void uclamp_post_fork(struct task_struct *p) { }
> static inline void init_uclamp(void) { }
> #endif /* CONFIG_UCLAMP_TASK */
>
> +bool sched_task_on_rq(struct task_struct *p)
> +{
> + return task_on_rq_queued(p);
> +}
> +
> static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> {
> if (!(flags & ENQUEUE_NOCLOCK))

That's a wee bit sad..

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index ad5c3905196a..faba7881048f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -324,8 +324,6 @@ void tick_nohz_full_kick_cpu(int cpu)
>
> static void tick_nohz_kick_task(struct task_struct *tsk)
> {
> - int cpu = task_cpu(tsk);
> -
> /*
> * If the task concurrently migrates to another cpu,
> * we guarantee it sees the new tick dependency upon
> @@ -340,6 +338,23 @@ static void tick_nohz_kick_task(struct task_struct *tsk)
> * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
> * LOAD p->tick_dep_mask LOAD p->cpu
> */
> + int cpu = task_cpu(tsk);
> +
> + /*
> + * If the task is not running, run_posix_cpu_timers
> + * has nothing to elapsed, can spare IPI in that
> + * case.
> + *
> + * activate_task() STORE p->tick_dep_mask
> + * STORE p->on_rq
> + * __schedule() (switch to task 'p') smp_mb() (atomic_fetch_or())
> + * LOCK rq->lock LOAD p->on_rq
> + * smp_mb__after_spin_lock()
> + * tick_nohz_task_switch()
> + * LOAD p->tick_dep_mask
> + */

That needs indenting, the style is distinctly different from the comment
right above it.

> + if (!sched_task_on_rq(tsk))
> + return;

I'm too tired, but do we really need the task_cpu() load to be before
this?

>
> preempt_disable();
> if (cpu_online(cpu))
> --
> 2.25.1
>

2021-05-10 11:12:05

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/8] tick/nohz: Only wakeup a single target cpu when kicking a task

On Wed, May 05, 2021 at 03:43:28PM +0200, Peter Zijlstra wrote:
> That had me looking at tick_nohz_task_switch(), does we want the below?
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9143163fa678..ff45fc513ba7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4207,6 +4207,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> vtime_task_switch(prev);
> perf_event_task_sched_in(prev, current);
> finish_task(prev);
> + tick_nohz_task_switch();
> finish_lock_switch(rq);
> finish_arch_post_lock_switch();
> kcov_finish_switch(current);
> @@ -4252,7 +4253,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> put_task_struct_rcu_user(prev);
> }
>
> - tick_nohz_task_switch();
> return rq;
> }
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 828b091501ca..ea079be9097f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -447,13 +447,10 @@ void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bi
> */
> void __tick_nohz_task_switch(void)
> {
> - unsigned long flags;
> struct tick_sched *ts;
>
> - local_irq_save(flags);
> -
> if (!tick_nohz_full_cpu(smp_processor_id()))
> - goto out;
> + return;
>
> ts = this_cpu_ptr(&tick_cpu_sched);
>
> @@ -462,8 +459,6 @@ void __tick_nohz_task_switch(void)
> atomic_read(&current->signal->tick_dep_mask))
> tick_nohz_full_kick();
> }
> -out:
> - local_irq_restore(flags);
> }
>
> /* Get the boot-time nohz CPU list from the kernel parameters. */


Sure, I'll take your SoB on that too, ok?

2021-05-10 11:52:02

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 6/8] tick/nohz: Only wakeup a single target cpu when kicking a task

On Mon, May 10, 2021 at 12:48:14PM +0200, Peter Zijlstra wrote:
> On Mon, May 10, 2021 at 12:39:01PM +0200, Frederic Weisbecker wrote:
> > On Wed, May 05, 2021 at 03:43:28PM +0200, Peter Zijlstra wrote:
> > > That had me looking at tick_nohz_task_switch(), does we want the below?
> > >
> > >
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index 9143163fa678..ff45fc513ba7 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -4207,6 +4207,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > > vtime_task_switch(prev);
> > > perf_event_task_sched_in(prev, current);
> > > finish_task(prev);
> > > + tick_nohz_task_switch();
> > > finish_lock_switch(rq);
> > > finish_arch_post_lock_switch();
> > > kcov_finish_switch(current);
> > > @@ -4252,7 +4253,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > > put_task_struct_rcu_user(prev);
> > > }
> > >
> > > - tick_nohz_task_switch();
> > > return rq;
> > > }
> > >
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index 828b091501ca..ea079be9097f 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -447,13 +447,10 @@ void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bi
> > > */
> > > void __tick_nohz_task_switch(void)
> > > {
> > > - unsigned long flags;
> > > struct tick_sched *ts;
> > >
> > > - local_irq_save(flags);
> > > -
> > > if (!tick_nohz_full_cpu(smp_processor_id()))
> > > - goto out;
> > > + return;
> > >
> > > ts = this_cpu_ptr(&tick_cpu_sched);
> > >
> > > @@ -462,8 +459,6 @@ void __tick_nohz_task_switch(void)
> > > atomic_read(&current->signal->tick_dep_mask))
> > > tick_nohz_full_kick();
> > > }
> > > -out:
> > > - local_irq_restore(flags);
> > > }
> > >
> > > /* Get the boot-time nohz CPU list from the kernel parameters. */
> >
> >
> > Sure, I'll take your SoB on that too, ok?
>
> OK, but please also test it :-) I didn't even ask a compiler it's
> opinion on the thing.

Of course, there will be another version of the patchset + usual testing :-)

2021-05-10 11:52:23

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 8/8] tick/nohz: Kick only _queued_ task whose tick dependency is updated

On Wed, May 05, 2021 at 03:57:08PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 22, 2021 at 02:01:58PM +0200, Frederic Weisbecker wrote:
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 98191218d891..08526227d200 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1580,6 +1580,11 @@ static inline void uclamp_post_fork(struct task_struct *p) { }
> > static inline void init_uclamp(void) { }
> > #endif /* CONFIG_UCLAMP_TASK */
> >
> > +bool sched_task_on_rq(struct task_struct *p)
> > +{
> > + return task_on_rq_queued(p);
> > +}
> > +
> > static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> > {
> > if (!(flags & ENQUEUE_NOCLOCK))
>
> That's a wee bit sad..

I know... But I couldn't find a better way.

>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index ad5c3905196a..faba7881048f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -324,8 +324,6 @@ void tick_nohz_full_kick_cpu(int cpu)
> >
> > static void tick_nohz_kick_task(struct task_struct *tsk)
> > {
> > - int cpu = task_cpu(tsk);
> > -
> > /*
> > * If the task concurrently migrates to another cpu,
> > * we guarantee it sees the new tick dependency upon
> > @@ -340,6 +338,23 @@ static void tick_nohz_kick_task(struct task_struct *tsk)
> > * tick_nohz_task_switch() smp_mb() (atomic_fetch_or())
> > * LOAD p->tick_dep_mask LOAD p->cpu
> > */
> > + int cpu = task_cpu(tsk);
> > +
> > + /*
> > + * If the task is not running, run_posix_cpu_timers
> > + * has nothing to elapsed, can spare IPI in that
> > + * case.
> > + *
> > + * activate_task() STORE p->tick_dep_mask
> > + * STORE p->on_rq
> > + * __schedule() (switch to task 'p') smp_mb() (atomic_fetch_or())
> > + * LOCK rq->lock LOAD p->on_rq
> > + * smp_mb__after_spin_lock()
> > + * tick_nohz_task_switch()
> > + * LOAD p->tick_dep_mask
> > + */
>
> That needs indenting, the style is distinctly different from the comment
> right above it.

Ok, I'll fix that.

>
> > + if (!sched_task_on_rq(tsk))
> > + return;
>
> I'm too tired, but do we really need the task_cpu() load to be before
> this?

Nope, it should be fine to put it after.

Thanks!

>
> >
> > preempt_disable();
> > if (cpu_online(cpu))
> > --
> > 2.25.1
> >

2021-05-10 11:53:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 6/8] tick/nohz: Only wakeup a single target cpu when kicking a task

On Mon, May 10, 2021 at 12:39:01PM +0200, Frederic Weisbecker wrote:
> On Wed, May 05, 2021 at 03:43:28PM +0200, Peter Zijlstra wrote:
> > That had me looking at tick_nohz_task_switch(), does we want the below?
> >
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 9143163fa678..ff45fc513ba7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -4207,6 +4207,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > vtime_task_switch(prev);
> > perf_event_task_sched_in(prev, current);
> > finish_task(prev);
> > + tick_nohz_task_switch();
> > finish_lock_switch(rq);
> > finish_arch_post_lock_switch();
> > kcov_finish_switch(current);
> > @@ -4252,7 +4253,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> > put_task_struct_rcu_user(prev);
> > }
> >
> > - tick_nohz_task_switch();
> > return rq;
> > }
> >
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 828b091501ca..ea079be9097f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -447,13 +447,10 @@ void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bi
> > */
> > void __tick_nohz_task_switch(void)
> > {
> > - unsigned long flags;
> > struct tick_sched *ts;
> >
> > - local_irq_save(flags);
> > -
> > if (!tick_nohz_full_cpu(smp_processor_id()))
> > - goto out;
> > + return;
> >
> > ts = this_cpu_ptr(&tick_cpu_sched);
> >
> > @@ -462,8 +459,6 @@ void __tick_nohz_task_switch(void)
> > atomic_read(&current->signal->tick_dep_mask))
> > tick_nohz_full_kick();
> > }
> > -out:
> > - local_irq_restore(flags);
> > }
> >
> > /* Get the boot-time nohz CPU list from the kernel parameters. */
>
>
> Sure, I'll take your SoB on that too, ok?

OK, but please also test it :-) I didn't even ask a compiler it's
opinion on the thing.