2020-02-22 00:53:06

by Thara Gopinath

[permalink] [raw]
Subject: [Patch v10 9/9] sched/fair: Enable tuning of decay period

Thermal pressure follows pelt signals which means the decay period for
thermal pressure is the default pelt decay period. Depending on soc
characteristics and thermal activity, it might be beneficial to decay
thermal pressure slower, but still in-tune with the pelt signals. One way
to achieve this is to provide a command line parameter to set a decay
shift parameter to an integer between 0 and 10.

Signed-off-by: Thara Gopinath <[email protected]>
---
v8->v9:
- Initialized the __shift to 0 in setup_sched_thermal_decay_shift
as per Quentin's suggestion.
v9->v10:
- Added description for sched_thermal_decay_shift in
kernel-parameters.txt following Randy's review comments.

.../admin-guide/kernel-parameters.txt | 16 ++++++++++++++++
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 15 ++++++++++++++-
kernel/sched/sched.h | 18 ++++++++++++++++++
4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7f407eb22d3..7cf12c611fe0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4393,6 +4393,22 @@
incurs a small amount of overhead in the scheduler
but is useful for debugging and performance tuning.

+ sched_thermal_decay_shift=
+ [KNL, SMP] Set a decay shift for scheduler thermal
+ pressure signal. Thermal pressure signal follows the
+ default decay period of other scheduler pelt
+ signals(usually 32 ms but configurable). Setting
+ sched_thermal_decay_shift will left shift the decay
+ period for the thermal pressure signal by the shift
+ value.
+ i.e. with the default pelt decay period of 32 ms
+ sched_thermal_decay_shift thermal pressure decay pr
+ 1 64 ms
+ 2 128 ms
+ and so on.
+ Format: integer between 0 and 10
+ Default is 0.
+
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 160b5e9e8945..166a3edfad5f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3597,7 +3597,7 @@ void scheduler_tick(void)

update_rq_clock(rq);
thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
- update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
+ update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
curr->sched_class->task_tick(rq, curr, 0);
calc_global_load_tick(rq);
psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10e867e540ab..454d7735764e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -86,6 +86,19 @@ static unsigned int normalized_sysctl_sched_wakeup_granularity = 1000000UL;

const_debug unsigned int sysctl_sched_migration_cost = 500000UL;

+int sched_thermal_decay_shift;
+static int __init setup_sched_thermal_decay_shift(char *str)
+{
+ int _shift = 0;
+
+ if (kstrtoint(str, 0, &_shift))
+ pr_warn("Unable to set scheduler thermal pressure decay shift parameter\n");
+
+ sched_thermal_decay_shift = clamp(_shift, 0, 10);
+ return 1;
+}
+__setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
+
#ifdef CONFIG_SMP
/*
* For asym packing, by default the lower numbered CPU has higher priority.
@@ -7577,7 +7590,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)

decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
- update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
+ update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure) |
update_irq_load_avg(rq, 0);

if (others_have_blocked(rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 211411ac0efa..e6312662679d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1110,6 +1110,24 @@ static inline u64 rq_clock_task(struct rq *rq)
return rq->clock_task;
}

+/**
+ * By default the decay is the default pelt decay period.
+ * The decay shift can change the decay period in
+ * multiples of 32.
+ * Decay shift Decay period(ms)
+ * 0 32
+ * 1 64
+ * 2 128
+ * 3 256
+ * 4 512
+ */
+extern int sched_thermal_decay_shift;
+
+static inline u64 rq_clock_thermal(struct rq *rq)
+{
+ return rq_clock_task(rq) >> sched_thermal_decay_shift;
+}
+
static inline void rq_clock_skip_update(struct rq *rq)
{
lockdep_assert_held(&rq->lock);
--
2.20.1


Subject: [tip: sched/core] sched/fair: Enable tuning of decay period

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 05289b90c2e40ae80f5c70431cd0be4cc8a6038d
Gitweb: https://git.kernel.org/tip/05289b90c2e40ae80f5c70431cd0be4cc8a6038d
Author: Thara Gopinath <[email protected]>
AuthorDate: Fri, 21 Feb 2020 19:52:13 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 06 Mar 2020 12:57:21 +01:00

sched/fair: Enable tuning of decay period

Thermal pressure follows pelt signals which means the decay period for
thermal pressure is the default pelt decay period. Depending on SoC
characteristics and thermal activity, it might be beneficial to decay
thermal pressure slower, but still in-tune with the pelt signals. One way
to achieve this is to provide a command line parameter to set a decay
shift parameter to an integer between 0 and 10.

Signed-off-by: Thara Gopinath <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
Documentation/admin-guide/kernel-parameters.txt | 16 ++++++++++++++-
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 15 ++++++++++++-
kernel/sched/sched.h | 18 ++++++++++++++++-
4 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c07815d..dac8245 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4392,6 +4392,22 @@
incurs a small amount of overhead in the scheduler
but is useful for debugging and performance tuning.

+ sched_thermal_decay_shift=
+ [KNL, SMP] Set a decay shift for scheduler thermal
+ pressure signal. Thermal pressure signal follows the
+ default decay period of other scheduler pelt
+ signals(usually 32 ms but configurable). Setting
+ sched_thermal_decay_shift will left shift the decay
+ period for the thermal pressure signal by the shift
+ value.
+ i.e. with the default pelt decay period of 32 ms
+ sched_thermal_decay_shift thermal pressure decay pr
+ 1 64 ms
+ 2 128 ms
+ and so on.
+ Format: integer between 0 and 10
+ Default is 0.
+
skew_tick= [KNL] Offset the periodic timer tick per cpu to mitigate
xtime_lock contention on larger systems, and/or RCU lock
contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3e620fe..4d76df3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3595,7 +3595,7 @@ void scheduler_tick(void)

update_rq_clock(rq);
thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
- update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure);
+ update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
curr->sched_class->task_tick(rq, curr, 0);
calc_global_load_tick(rq);
psi_task_tick(rq);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aa51286..79bb423 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -86,6 +86,19 @@ static unsigned int normalized_sysctl_sched_wakeup_granularity = 1000000UL;

const_debug unsigned int sysctl_sched_migration_cost = 500000UL;

+int sched_thermal_decay_shift;
+static int __init setup_sched_thermal_decay_shift(char *str)
+{
+ int _shift = 0;
+
+ if (kstrtoint(str, 0, &_shift))
+ pr_warn("Unable to set scheduler thermal pressure decay shift parameter\n");
+
+ sched_thermal_decay_shift = clamp(_shift, 0, 10);
+ return 1;
+}
+__setup("sched_thermal_decay_shift=", setup_sched_thermal_decay_shift);
+
#ifdef CONFIG_SMP
/*
* For asym packing, by default the lower numbered CPU has higher priority.
@@ -7760,7 +7773,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)

decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
- update_thermal_load_avg(rq_clock_task(rq), rq, thermal_pressure) |
+ update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure) |
update_irq_load_avg(rq, 0);

if (others_have_blocked(rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6c839f8..7f1a85b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1127,6 +1127,24 @@ static inline u64 rq_clock_task(struct rq *rq)
return rq->clock_task;
}

+/**
+ * By default the decay is the default pelt decay period.
+ * The decay shift can change the decay period in
+ * multiples of 32.
+ * Decay shift Decay period(ms)
+ * 0 32
+ * 1 64
+ * 2 128
+ * 3 256
+ * 4 512
+ */
+extern int sched_thermal_decay_shift;
+
+static inline u64 rq_clock_thermal(struct rq *rq)
+{
+ return rq_clock_task(rq) >> sched_thermal_decay_shift;
+}
+
static inline void rq_clock_skip_update(struct rq *rq)
{
lockdep_assert_held(&rq->lock);