2022-09-09 13:25:40

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 0/8] Add latency priority for CFS class

This patchset restarts the work about adding a latency priority to describe
the latency tolerance of cfs tasks.

The patches [1-4] have been done by Parth:
https://lore.kernel.org/lkml/[email protected]/

I have just rebased and moved the set of latency priority outside the
priority update. I have removed the reviewed tag because the patches
are 2 years old.

The patch [5] uses latency nice priority to define a latency offset
and then to decide if a cfs task can preempt the current running task. The
patch gives some tests results with cyclictests and hackbench to highlight
the benefit of latency priority for short interactive task or
long intensive tasks.

Patch [6] adds the support of latency_offset to task group by adding a
cpu.latency field. There were discussions for the last version about
using a real unit for the field so I decided to expose directly the
latency offset which reflects the time up to which we can preempt when the
value is negative, or up to which we can defer the preemption when the
value is positive.
The range is [-sysctl_sched_latency:sysctl_sched_latency]

Patch [7] makes sched_core taking into account the latency offset.

Patch [8] adds a rb tree to cover some corner cases where the latency
sensitive task is preempted by high priority task (RT/DL) or fails to
preempt them. This patch ensures that tasks will have at least a
slice of sched_min_granularity in priority at wakeup. The patch gives
results to show the benefit in addition to patch 5

I have also backported the patchset on a dragonboard RB3 with an android
mainline kernel based on v5.18 for a quick test. I have used
the TouchLatency app which is part of AOSP and described to be very good
test to highlight jitter and jank frame sources of a system [1].
In addition to the app, I have added some short running tasks waking-up
regularly (to use the 8 cpus for 4 ms every 37777us) to stress the system
without overloading it (and disabling EAS). The 1st results shows that the
patchset helps to reduce the missed deadline frames from 5% to less than
0.1% when the cpu.latency of task group are set.

[1] https://source.android.com/docs/core/debug/eval_perf#touchlatency

Change since v2:
- Set a latency_offset field instead of saving a weight and computing it
on the fly.
- Make latency_offset available for task group: cpu.latency
- Fix some corner cases to make latency sensitive tasks schedule first and
add a rb tree for latency sensitive task.

Change since v1:
- fix typo
- move some codes in the right patch to make bisect happy
- simplify and fixed how the weight is computed
- added support of sched core patch 7

Parth Shah (4):
sched: Introduce latency-nice as a per-task attribute
sched/core: Propagate parent task's latency requirements to the child
task
sched: Allow sched_{get,set}attr to change latency_nice of the task
sched/core: Add permission checks for setting the latency_nice value

Vincent Guittot (4):
sched/fair: Take into account latency priority at wakeup
sched/fair: Add sched group latency support
sched/core: support latency priority with sched core
sched/fair: Add latency list

include/linux/sched.h | 5 +
include/uapi/linux/sched.h | 4 +-
include/uapi/linux/sched/types.h | 19 ++++
init/init_task.c | 1 +
kernel/sched/core.c | 81 +++++++++++++
kernel/sched/debug.c | 1 +
kernel/sched/fair.c | 189 ++++++++++++++++++++++++++++++-
kernel/sched/sched.h | 35 ++++++
tools/include/uapi/linux/sched.h | 4 +-
9 files changed, 332 insertions(+), 7 deletions(-)

--
2.17.1


2022-09-09 13:27:17

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 2/8] sched/core: Propagate parent task's latency requirements to the child task

From: Parth Shah <[email protected]>

Clone parent task's latency_nice attribute to the forked child task.

Reset the latency_nice value to default value when the child task is
set to sched_reset_on_fork.

Also, initialize init_task.latency_nice value with DEFAULT_LATENCY_NICE
value

Signed-off-by: Parth Shah <[email protected]>
[rebase]
Signed-off-by: Vincent Guittot <[email protected]>
---
init/init_task.c | 1 +
kernel/sched/core.c | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/init/init_task.c b/init/init_task.c
index 73cc8f03511a..225d11a39bc9 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -78,6 +78,7 @@ struct task_struct init_task
.prio = MAX_PRIO - 20,
.static_prio = MAX_PRIO - 20,
.normal_prio = MAX_PRIO - 20,
+ .latency_nice = DEFAULT_LATENCY_NICE,
.policy = SCHED_NORMAL,
.cpus_ptr = &init_task.cpus_mask,
.user_cpus_ptr = NULL,
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 603a80ec9b0e..3c5fb04f00e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4511,6 +4511,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
*/
p->prio = current->normal_prio;

+ /* Propagate the parent's latency requirements to the child as well */
+ p->latency_nice = current->latency_nice;
+
uclamp_fork(p);

/*
@@ -4527,6 +4530,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
p->prio = p->normal_prio = p->static_prio;
set_load_weight(p, false);

+ p->latency_nice = DEFAULT_LATENCY_NICE;
/*
* We don't need the reset flag anymore after the fork. It has
* fulfilled its duty:
--
2.17.1

2022-09-09 13:37:29

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 6/8] sched/fair: Add sched group latency support

Task can set its latency priority, which is then used to decide to preempt
the current running entity of the cfs, but sched group entities still have
the default latency offset.

Add a latency field in task group to set the latency offset of the
sched_eneities of the group, which will be used against other entities in
the parent cfs when deciding which entity to schedule first.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/core.c | 24 ++++++++++++++++++++++++
kernel/sched/fair.c | 33 +++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 4 ++++
3 files changed, 61 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 13cf794708ee..bfea862a3588 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10860,6 +10860,19 @@ static int cpu_idle_write_s64(struct cgroup_subsys_state *css,
{
return sched_group_set_idle(css_tg(css), idle);
}
+
+static s64 cpu_latency_read_s64(struct cgroup_subsys_state *css,
+ struct cftype *cft)
+{
+ return css_tg(css)->latency_offset;
+}
+
+static int cpu_latency_write_s64(struct cgroup_subsys_state *css,
+ struct cftype *cft, s64 latency)
+{
+ return sched_group_set_latency(css_tg(css), latency);
+}
+
#endif

static struct cftype cpu_legacy_files[] = {
@@ -10874,6 +10887,11 @@ static struct cftype cpu_legacy_files[] = {
.read_s64 = cpu_idle_read_s64,
.write_s64 = cpu_idle_write_s64,
},
+ {
+ .name = "latency",
+ .read_s64 = cpu_latency_read_s64,
+ .write_s64 = cpu_latency_write_s64,
+ },
#endif
#ifdef CONFIG_CFS_BANDWIDTH
{
@@ -11091,6 +11109,12 @@ static struct cftype cpu_files[] = {
.read_s64 = cpu_idle_read_s64,
.write_s64 = cpu_idle_write_s64,
},
+ {
+ .name = "latency",
+ .flags = CFTYPE_NOT_ON_ROOT,
+ .read_s64 = cpu_latency_read_s64,
+ .write_s64 = cpu_latency_write_s64,
+ },
#endif
#ifdef CONFIG_CFS_BANDWIDTH
{
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a20eadb0af97..6cc4f2a9725d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11769,6 +11769,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
goto err;

tg->shares = NICE_0_LOAD;
+ tg->latency_offset = 0;

init_cfs_bandwidth(tg_cfs_bandwidth(tg));

@@ -11867,6 +11868,9 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
}

se->my_q = cfs_rq;
+
+ se->latency_offset = tg->latency_offset;
+
/* guarantee group entities always have weight */
update_load_set(&se->load, NICE_0_LOAD);
se->parent = parent;
@@ -11997,6 +12001,35 @@ int sched_group_set_idle(struct task_group *tg, long idle)
return 0;
}

+int sched_group_set_latency(struct task_group *tg, long latency)
+{
+ int i;
+
+ if (tg == &root_task_group)
+ return -EINVAL;
+
+ if (abs(latency) > sysctl_sched_latency)
+ return -EINVAL;
+
+ mutex_lock(&shares_mutex);
+
+ if (tg->latency_offset == latency) {
+ mutex_unlock(&shares_mutex);
+ return 0;
+ }
+
+ tg->latency_offset = latency;
+
+ for_each_possible_cpu(i) {
+ struct sched_entity *se = tg->se[i];
+
+ WRITE_ONCE(se->latency_offset, latency);
+ }
+
+ mutex_unlock(&shares_mutex);
+ return 0;
+}
+
#else /* CONFIG_FAIR_GROUP_SCHED */

void free_fair_sched_group(struct task_group *tg) { }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c4beddb58ebd..117b23dfb912 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -406,6 +406,8 @@ struct task_group {

/* A positive value indicates that this is a SCHED_IDLE group. */
int idle;
+ /* latency constraint of the group. */
+ int latency_offset;

#ifdef CONFIG_SMP
/*
@@ -519,6 +521,8 @@ extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);

extern int sched_group_set_idle(struct task_group *tg, long idle);

+extern int sched_group_set_latency(struct task_group *tg, long latency);
+
#ifdef CONFIG_SMP
extern void set_task_rq_fair(struct sched_entity *se,
struct cfs_rq *prev, struct cfs_rq *next);
--
2.17.1

2022-09-09 13:38:56

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 1/8] sched: Introduce latency-nice as a per-task attribute

From: Parth Shah <[email protected]>

Latency-nice indicates the latency requirements of a task with respect
to the other tasks in the system. The value of the attribute can be within
the range of [-20, 19] both inclusive to be in-line with the values just
like task nice values.

latency_nice = -20 indicates the task to have the least latency as
compared to the tasks having latency_nice = +19.

The latency_nice may affect only the CFS SCHED_CLASS by getting
latency requirements from the userspace.

Additionally, add debugging bits for newly added latency_nice attribute.

Signed-off-by: Parth Shah <[email protected]>
[rebase]
Signed-off-by: Vincent Guittot <[email protected]>
---
include/linux/sched.h | 1 +
kernel/sched/debug.c | 1 +
kernel/sched/sched.h | 18 ++++++++++++++++++
3 files changed, 20 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 88b8817b827d..a0adb55efa1c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -773,6 +773,7 @@ struct task_struct {
int static_prio;
int normal_prio;
unsigned int rt_priority;
+ int latency_nice;

struct sched_entity se;
struct sched_rt_entity rt;
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index bb3d63bdf4ae..a3f7876217a6 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1042,6 +1042,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
#endif
P(policy);
P(prio);
+ P(latency_nice);
if (task_has_dl_policy(p)) {
P(dl.runtime);
P(dl.deadline);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 74130a69d365..b927269b84f9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -124,6 +124,24 @@ extern int sched_rr_timeslice;
*/
#define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))

+/*
+ * Latency nice is meant to provide scheduler hints about the relative
+ * latency requirements of a task with respect to other tasks.
+ * Thus a task with latency_nice == 19 can be hinted as the task with no
+ * latency requirements, in contrast to the task with latency_nice == -20
+ * which should be given priority in terms of lower latency.
+ */
+#define MAX_LATENCY_NICE 19
+#define MIN_LATENCY_NICE -20
+
+#define LATENCY_NICE_WIDTH \
+ (MAX_LATENCY_NICE - MIN_LATENCY_NICE + 1)
+
+/*
+ * Default tasks should be treated as a task with latency_nice = 0.
+ */
+#define DEFAULT_LATENCY_NICE 0
+
/*
* Increase resolution of nice-level calculations for 64-bit architectures.
* The extra resolution improves shares distribution and load balancing of
--
2.17.1

2022-09-09 13:40:15

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH v3 7/8] sched/core: support latency priority with sched core

Take into account wakeup_latency_gran() when ordering the cfs threads.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6cc4f2a9725d..7563fb16aba1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11444,6 +11444,10 @@ bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool in_fi)
delta = (s64)(sea->vruntime - seb->vruntime) +
(s64)(cfs_rqb->min_vruntime_fi - cfs_rqa->min_vruntime_fi);

+ /* Take into account latency prio */
+ delta -= wakeup_latency_gran(sea, seb);
+
+
return delta > 0;
}
#else
--
2.17.1

2022-09-12 23:03:25

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] sched/fair: Add sched group latency support

Hello,

On Fri, Sep 09, 2022 at 03:03:07PM +0200, Vincent Guittot wrote:
> + {
> + .name = "latency",
> + .read_s64 = cpu_latency_read_s64,
> + .write_s64 = cpu_latency_write_s64,
> + },

You're still using the nice value here, right? If so, can you please use the
filename "latency.nice" so that it's consistent with "weight.nice"?

Thanks.

--
tejun

2022-09-13 08:40:10

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] sched/fair: Add sched group latency support

On Tue, 13 Sept 2022 at 00:18, Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Fri, Sep 09, 2022 at 03:03:07PM +0200, Vincent Guittot wrote:
> > + {
> > + .name = "latency",
> > + .read_s64 = cpu_latency_read_s64,
> > + .write_s64 = cpu_latency_write_s64,
> > + },
>
> You're still using the nice value here, right? If so, can you please use the
> filename "latency.nice" so that it's consistent with "weight.nice"?

This cpu.latency is not a nice priority but the signed offset used by
the scheduler at wakeup. On previous version you raised concern about
having a nice value for cgroup so I removed it for cgroup and directly
exposed the latency offset with cpu.latency similarly to the weight
that is exposed with cpu.shares.

Vincent
>
> Thanks.
>
> --
> tejun