2019-11-25 09:48:15

by Parth Shah

[permalink] [raw]
Subject: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints

This patch series is based on the discussion started as the "Usecases for
the per-task latency-nice attribute"[1]

This patch series introduces a new per-task attribute latency_tolerance to
provide the scheduler hints about the latency requirements of the task.

Latency_tolerance is a ranged attribute of a task with the value ranging
from [-20, 19] both inclusive which makes it align with the task nice
value.

The value should provide scheduler hints about the relative latency
requirements of tasks, meaning the task with "latency_tolerance = -20"
should have lower latency than compared to those tasks with higher values.
Similarly a task with "latency_tolerance = 19" can have higher latency and
hence such tasks may bot care much about the latency numbers.

The default value is set to 0. The usecases defined in [1] can use this
range of [-20, 19] for latency_tolerance for the specific purpose. This
patch does not define any use cases for such attribute so that any change
in naming or range does not affect much to the other (future) patches using
this. The actual use of latency_tolerance during task wakeup and
load-balancing is yet to be coded for each of those usecases.

As per my view, this defined attribute can be used in following ways for a
some of the usecases:
1 Reduce search scan time for select_idle_cpu():
- Reduce search scans for finding idle CPU for a waking task with lower
latency_tolerance values.

2 TurboSched:
- Classify the tasks with higher latency_tolerance values as a small
background task given that its historic utilization is very low, for
which the scheduler can search for more number of cores to do task
packing. A task with a latency_tolerance >= some threshold (e.g, >= +18)
and util <= 12.5% can be background tasks.

3 Optimize AVX512 based workload:
- Bias scheduler to not put a task having latency_tolerance==-20 on a core
occupying AVX512 based workload.

Series Organization:
======================
- Patch [1]: Add new attribute latency_tolerance to task_struct
- Patch [2]: Clone parent task's attribute on fork
- Patch [3]: Add support to sched_{set,get}attr syscall to modify
latency_tolerance of the task

The patch series can be applied on tip/sched/core at
commit 57abff067a08 ("sched/fair: Rework find_idlest_group()")


References:
===========
[1]. Usecases for the per-task latency-nice attribute,
https://lkml.org/lkml/2019/9/30/215
[2]. Task Latency-nice, "Subhra Mazumdar",
https://lkml.org/lkml/2019/8/30/829



Parth Shah (3):
Introduce latency-tolerance as an per-task attribute
Propagate parent task's latency requirements to the child task
Allow sched_{get,set}attr to change latency_tolerance of the task

include/linux/sched.h | 3 +++
include/linux/sched/latency_tolerance.h | 13 +++++++++++++
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 2 ++
kernel/sched/core.c | 19 +++++++++++++++++++
kernel/sched/sched.h | 1 +
6 files changed, 41 insertions(+), 1 deletion(-)
create mode 100644 include/linux/sched/latency_tolerance.h

--
2.17.2


2019-11-25 09:50:29

by Parth Shah

[permalink] [raw]
Subject: [RFC 2/3] Propagate parent task's latency requirements to the child task

Clone parent task's latency_tolerance attribute to the forked child task.

Reset the latency_tolerance value to default value when the child task is
set to sched_reset_on_fork.

Signed-off-by: Parth Shah <[email protected]>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7880f4f64d0e..ea7abbf5c1bb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2853,6 +2853,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
*/
p->prio = current->normal_prio;

+ /* Propagate the parent's latency requirements to the child as well */
+ p->latency_tolerance = current->latency_tolerance;
+
uclamp_fork(p);

/*
@@ -2869,6 +2872,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
p->prio = p->normal_prio = __normal_prio(p);
set_load_weight(p, false);

+ p->latency_tolerance = DEFAULT_LATENCY_TOLERANCE;
/*
* We don't need the reset flag anymore after the fork. It has
* fulfilled its duty:
--
2.17.2

2019-11-25 11:29:47

by Parth Shah

[permalink] [raw]
Subject: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task

Introduce the latency_tolerance attribute to sched_attr and provide a
mechanism to change the value with the use of sched_setattr/sched_getattr
syscall.

Also add new flag "SCHED_FLAG_LATENCY_TOLERANCE" to hint the change in
latency_tolerance of the task on every sched_setattr syscall.

Signed-off-by: Parth Shah <[email protected]>
---
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 2 ++
kernel/sched/core.c | 15 +++++++++++++++
kernel/sched/sched.h | 1 +
4 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index b3105ac1381a..73db430d11b6 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -71,6 +71,7 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_LATENCY_TOLERANCE 0x80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)
@@ -82,6 +83,7 @@ struct clone_args {
SCHED_FLAG_RECLAIM | \
SCHED_FLAG_DL_OVERRUN | \
SCHED_FLAG_KEEP_ALL | \
- SCHED_FLAG_UTIL_CLAMP)
+ SCHED_FLAG_UTIL_CLAMP | \
+ SCHED_FLAG_LATENCY_TOLERANCE)

#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index c852153ddb0d..960774ac0c70 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -118,6 +118,8 @@ struct sched_attr {
__u32 sched_util_min;
__u32 sched_util_max;

+ /* latency requirement hints */
+ __s32 sched_latency_tolerance;
};

#endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ea7abbf5c1bb..dfd36ec14404 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
p->rt_priority = attr->sched_priority;
p->normal_prio = normal_prio(p);
set_load_weight(p, true);
+
+ /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */
+ p->latency_tolerance = attr->sched_latency_tolerance;
}

/* Actually do priority change: must hold pi & rq lock. */
@@ -4852,6 +4855,13 @@ static int __sched_setscheduler(struct task_struct *p,
return retval;
}

+ if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE) {
+ if (attr->sched_latency_tolerance > MAX_LATENCY_TOLERANCE)
+ return -EINVAL;
+ if (attr->sched_latency_tolerance < MIN_LATENCY_TOLERANCE)
+ return -EINVAL;
+ }
+
if (pi)
cpuset_read_lock();

@@ -4886,6 +4896,9 @@ static int __sched_setscheduler(struct task_struct *p,
goto change;
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
goto change;
+ if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE &&
+ attr->sched_latency_tolerance != p->latency_tolerance)
+ goto change;

p->sched_reset_on_fork = reset_on_fork;
retval = 0;
@@ -5392,6 +5405,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
else
kattr.sched_nice = task_nice(p);

+ kattr.sched_latency_tolerance = p->latency_tolerance;
+
#ifdef CONFIG_UCLAMP_TASK
kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0db2c1b3361e..bb181175954b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -21,6 +21,7 @@
#include <linux/sched/nohz.h>
#include <linux/sched/numa_balancing.h>
#include <linux/sched/prio.h>
+#include <linux/sched/latency_tolerance.h>
#include <linux/sched/rt.h>
#include <linux/sched/signal.h>
#include <linux/sched/smt.h>
--
2.17.2

2019-11-25 12:12:02

by Parth Shah

[permalink] [raw]
Subject: [RFC 1/3] Introduce latency-tolerance as an per-task attribute

Latency-tolerance indicates the latency requirements of a task with respect
to the other tasks in the system. The value of the attribute can be within
the range of [-20, 19] both inclusive to be in-line with the values just
like task nice values.

latency_tolerance = -20 indicates the task to have the least latency as
compared to the tasks having latency_tolerance = +19.

The latency_tolerance may affect only the CFS SCHED_CLASS by getting
latency requirements from the userspace.

Signed-off-by: Parth Shah <[email protected]>
---
include/linux/sched.h | 3 +++
include/linux/sched/latency_tolerance.h | 13 +++++++++++++
2 files changed, 16 insertions(+)
create mode 100644 include/linux/sched/latency_tolerance.h

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56bd8913..bcc1c1d0856d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -25,6 +25,7 @@
#include <linux/resource.h>
#include <linux/latencytop.h>
#include <linux/sched/prio.h>
+#include <linux/sched/latency_tolerance.h>
#include <linux/sched/types.h>
#include <linux/signal_types.h>
#include <linux/mm_types_task.h>
@@ -666,6 +667,8 @@ struct task_struct {
#endif
int on_rq;

+ int latency_tolerance;
+
int prio;
int static_prio;
int normal_prio;
diff --git a/include/linux/sched/latency_tolerance.h b/include/linux/sched/latency_tolerance.h
new file mode 100644
index 000000000000..7a00abe05bc4
--- /dev/null
+++ b/include/linux/sched/latency_tolerance.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_SCHED_LATENCY_TOLERANCE_H
+#define _LINUX_SCHED_LATENCY_TOLERANCE_H
+
+#define MAX_LATENCY_TOLERANCE 19
+#define MIN_LATENCY_TOLERANCE -20
+
+#define LATENCY_TOLERANCE_WIDTH \
+ (MAX_LATENCY_TOLERANCE - MIN_LATENCY_TOLERANCE + 1)
+
+#define DEFAULT_LATENCY_TOLERANCE 0
+
+#endif /* _LINUX_SCHED_LATENCY_TOLERANCE_H */
--
2.17.2

2019-12-03 08:37:47

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC 1/3] Introduce latency-tolerance as an per-task attribute

On 11/25/19 15:16, Parth Shah wrote:
> Latency-tolerance indicates the latency requirements of a task with respect
> to the other tasks in the system. The value of the attribute can be within
> the range of [-20, 19] both inclusive to be in-line with the values just
> like task nice values.
>
> latency_tolerance = -20 indicates the task to have the least latency as
> compared to the tasks having latency_tolerance = +19.
>
> The latency_tolerance may affect only the CFS SCHED_CLASS by getting
> latency requirements from the userspace.
>
> Signed-off-by: Parth Shah <[email protected]>
> ---
> include/linux/sched.h | 3 +++
> include/linux/sched/latency_tolerance.h | 13 +++++++++++++
> 2 files changed, 16 insertions(+)
> create mode 100644 include/linux/sched/latency_tolerance.h
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 2c2e56bd8913..bcc1c1d0856d 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -25,6 +25,7 @@
> #include <linux/resource.h>
> #include <linux/latencytop.h>
> #include <linux/sched/prio.h>
> +#include <linux/sched/latency_tolerance.h>
> #include <linux/sched/types.h>
> #include <linux/signal_types.h>
> #include <linux/mm_types_task.h>
> @@ -666,6 +667,8 @@ struct task_struct {
> #endif
> int on_rq;
>
> + int latency_tolerance;
> +
> int prio;
> int static_prio;
> int normal_prio;
> diff --git a/include/linux/sched/latency_tolerance.h b/include/linux/sched/latency_tolerance.h
> new file mode 100644
> index 000000000000..7a00abe05bc4
> --- /dev/null
> +++ b/include/linux/sched/latency_tolerance.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_SCHED_LATENCY_TOLERANCE_H
> +#define _LINUX_SCHED_LATENCY_TOLERANCE_H

nit: Add some description here explaining what latency tolerance is please. You
copy paste some text from your cover letter :)

--
Qais Youesf

> +
> +#define MAX_LATENCY_TOLERANCE 19
> +#define MIN_LATENCY_TOLERANCE -20
> +
> +#define LATENCY_TOLERANCE_WIDTH \
> + (MAX_LATENCY_TOLERANCE - MIN_LATENCY_TOLERANCE + 1)
> +
> +#define DEFAULT_LATENCY_TOLERANCE 0
> +
> +#endif /* _LINUX_SCHED_LATENCY_TOLERANCE_H */
> --
> 2.17.2
>

2019-12-03 08:40:22

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task

On 11/25/19 15:16, Parth Shah wrote:
> Introduce the latency_tolerance attribute to sched_attr and provide a
> mechanism to change the value with the use of sched_setattr/sched_getattr
> syscall.
>
> Also add new flag "SCHED_FLAG_LATENCY_TOLERANCE" to hint the change in
> latency_tolerance of the task on every sched_setattr syscall.
>
> Signed-off-by: Parth Shah <[email protected]>
> ---
> include/uapi/linux/sched.h | 4 +++-
> include/uapi/linux/sched/types.h | 2 ++
> kernel/sched/core.c | 15 +++++++++++++++
> kernel/sched/sched.h | 1 +
> 4 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index b3105ac1381a..73db430d11b6 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -71,6 +71,7 @@ struct clone_args {
> #define SCHED_FLAG_KEEP_PARAMS 0x10
> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> +#define SCHED_FLAG_LATENCY_TOLERANCE 0x80
>
> #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> SCHED_FLAG_KEEP_PARAMS)
> @@ -82,6 +83,7 @@ struct clone_args {
> SCHED_FLAG_RECLAIM | \
> SCHED_FLAG_DL_OVERRUN | \
> SCHED_FLAG_KEEP_ALL | \
> - SCHED_FLAG_UTIL_CLAMP)
> + SCHED_FLAG_UTIL_CLAMP | \
> + SCHED_FLAG_LATENCY_TOLERANCE)
>
> #endif /* _UAPI_LINUX_SCHED_H */
> diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
> index c852153ddb0d..960774ac0c70 100644
> --- a/include/uapi/linux/sched/types.h
> +++ b/include/uapi/linux/sched/types.h
> @@ -118,6 +118,8 @@ struct sched_attr {
> __u32 sched_util_min;
> __u32 sched_util_max;
>
> + /* latency requirement hints */
> + __s32 sched_latency_tolerance;
> };
>
> #endif /* _UAPI_LINUX_SCHED_TYPES_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ea7abbf5c1bb..dfd36ec14404 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
> p->rt_priority = attr->sched_priority;
> p->normal_prio = normal_prio(p);
> set_load_weight(p, true);
> +
> + /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */
> + p->latency_tolerance = attr->sched_latency_tolerance;
> }
>
> /* Actually do priority change: must hold pi & rq lock. */
> @@ -4852,6 +4855,13 @@ static int __sched_setscheduler(struct task_struct *p,
> return retval;
> }
>
> + if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE) {
> + if (attr->sched_latency_tolerance > MAX_LATENCY_TOLERANCE)
> + return -EINVAL;
> + if (attr->sched_latency_tolerance < MIN_LATENCY_TOLERANCE)
> + return -EINVAL;
> + }
> +
> if (pi)
> cpuset_read_lock();
>
> @@ -4886,6 +4896,9 @@ static int __sched_setscheduler(struct task_struct *p,
> goto change;
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
> goto change;
> + if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE &&
> + attr->sched_latency_tolerance != p->latency_tolerance)
> + goto change;
>
> p->sched_reset_on_fork = reset_on_fork;
> retval = 0;
> @@ -5392,6 +5405,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
> else
> kattr.sched_nice = task_nice(p);
>
> + kattr.sched_latency_tolerance = p->latency_tolerance;
> +
> #ifdef CONFIG_UCLAMP_TASK
> kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
> kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 0db2c1b3361e..bb181175954b 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -21,6 +21,7 @@
> #include <linux/sched/nohz.h>
> #include <linux/sched/numa_balancing.h>
> #include <linux/sched/prio.h>
> +#include <linux/sched/latency_tolerance.h>

nit: keep in alphabatical order.

The series looks good to me except for the 2 minor nits. Thanks for taking care
of this!

Reviewed-by: Qais Yousef <[email protected]>

Cheers

--
Qais Yousef

> #include <linux/sched/rt.h>
> #include <linux/sched/signal.h>
> #include <linux/sched/smt.h>
> --
> 2.17.2
>

2019-12-03 15:48:23

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 1/3] Introduce latency-tolerance as an per-task attribute



On 12/3/19 2:06 PM, Qais Yousef wrote:
> On 11/25/19 15:16, Parth Shah wrote:
>> Latency-tolerance indicates the latency requirements of a task with respect
>> to the other tasks in the system. The value of the attribute can be within
>> the range of [-20, 19] both inclusive to be in-line with the values just
>> like task nice values.
>>
>> latency_tolerance = -20 indicates the task to have the least latency as
>> compared to the tasks having latency_tolerance = +19.
>>
>> The latency_tolerance may affect only the CFS SCHED_CLASS by getting
>> latency requirements from the userspace.
>>
>> Signed-off-by: Parth Shah <[email protected]>
>> ---
>> include/linux/sched.h | 3 +++
>> include/linux/sched/latency_tolerance.h | 13 +++++++++++++
>> 2 files changed, 16 insertions(+)
>> create mode 100644 include/linux/sched/latency_tolerance.h
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 2c2e56bd8913..bcc1c1d0856d 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -25,6 +25,7 @@
>> #include <linux/resource.h>
>> #include <linux/latencytop.h>
>> #include <linux/sched/prio.h>
>> +#include <linux/sched/latency_tolerance.h>
>> #include <linux/sched/types.h>
>> #include <linux/signal_types.h>
>> #include <linux/mm_types_task.h>
>> @@ -666,6 +667,8 @@ struct task_struct {
>> #endif
>> int on_rq;
>>
>> + int latency_tolerance;
>> +
>> int prio;
>> int static_prio;
>> int normal_prio;
>> diff --git a/include/linux/sched/latency_tolerance.h b/include/linux/sched/latency_tolerance.h
>> new file mode 100644
>> index 000000000000..7a00abe05bc4
>> --- /dev/null
>> +++ b/include/linux/sched/latency_tolerance.h
>> @@ -0,0 +1,13 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#ifndef _LINUX_SCHED_LATENCY_TOLERANCE_H
>> +#define _LINUX_SCHED_LATENCY_TOLERANCE_H
>
> nit: Add some description here explaining what latency tolerance is please. You
> copy paste some text from your cover letter :)

Sure. Will add some text here.

>
> --
> Qais Youesf
>
>> +
>> +#define MAX_LATENCY_TOLERANCE 19
>> +#define MIN_LATENCY_TOLERANCE -20
>> +
>> +#define LATENCY_TOLERANCE_WIDTH \
>> + (MAX_LATENCY_TOLERANCE - MIN_LATENCY_TOLERANCE + 1)
>> +
>> +#define DEFAULT_LATENCY_TOLERANCE 0
>> +
>> +#endif /* _LINUX_SCHED_LATENCY_TOLERANCE_H */
>> --
>> 2.17.2
>>

2019-12-03 16:14:05

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task



On 12/3/19 2:09 PM, Qais Yousef wrote:
> On 11/25/19 15:16, Parth Shah wrote:
>> Introduce the latency_tolerance attribute to sched_attr and provide a
>> mechanism to change the value with the use of sched_setattr/sched_getattr
>> syscall.
>>
>> Also add new flag "SCHED_FLAG_LATENCY_TOLERANCE" to hint the change in
>> latency_tolerance of the task on every sched_setattr syscall.
>>
>> Signed-off-by: Parth Shah <[email protected]>
>> ---
>> include/uapi/linux/sched.h | 4 +++-
>> include/uapi/linux/sched/types.h | 2 ++
>> kernel/sched/core.c | 15 +++++++++++++++
>> kernel/sched/sched.h | 1 +
>> 4 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
>> index b3105ac1381a..73db430d11b6 100644
>> --- a/include/uapi/linux/sched.h
>> +++ b/include/uapi/linux/sched.h
>> @@ -71,6 +71,7 @@ struct clone_args {
>> #define SCHED_FLAG_KEEP_PARAMS 0x10
>> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
>> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
>> +#define SCHED_FLAG_LATENCY_TOLERANCE 0x80
>>
>> #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
>> SCHED_FLAG_KEEP_PARAMS)
>> @@ -82,6 +83,7 @@ struct clone_args {
>> SCHED_FLAG_RECLAIM | \
>> SCHED_FLAG_DL_OVERRUN | \
>> SCHED_FLAG_KEEP_ALL | \
>> - SCHED_FLAG_UTIL_CLAMP)
>> + SCHED_FLAG_UTIL_CLAMP | \
>> + SCHED_FLAG_LATENCY_TOLERANCE)
>>
>> #endif /* _UAPI_LINUX_SCHED_H */
>> diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
>> index c852153ddb0d..960774ac0c70 100644
>> --- a/include/uapi/linux/sched/types.h
>> +++ b/include/uapi/linux/sched/types.h
>> @@ -118,6 +118,8 @@ struct sched_attr {
>> __u32 sched_util_min;
>> __u32 sched_util_max;
>>
>> + /* latency requirement hints */
>> + __s32 sched_latency_tolerance;
>> };
>>
>> #endif /* _UAPI_LINUX_SCHED_TYPES_H */
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index ea7abbf5c1bb..dfd36ec14404 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
>> p->rt_priority = attr->sched_priority;
>> p->normal_prio = normal_prio(p);
>> set_load_weight(p, true);
>> +
>> + /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */
>> + p->latency_tolerance = attr->sched_latency_tolerance;
>> }
>>
>> /* Actually do priority change: must hold pi & rq lock. */
>> @@ -4852,6 +4855,13 @@ static int __sched_setscheduler(struct task_struct *p,
>> return retval;
>> }
>>
>> + if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE) {
>> + if (attr->sched_latency_tolerance > MAX_LATENCY_TOLERANCE)
>> + return -EINVAL;
>> + if (attr->sched_latency_tolerance < MIN_LATENCY_TOLERANCE)
>> + return -EINVAL;
>> + }
>> +
>> if (pi)
>> cpuset_read_lock();
>>
>> @@ -4886,6 +4896,9 @@ static int __sched_setscheduler(struct task_struct *p,
>> goto change;
>> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
>> goto change;
>> + if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE &&
>> + attr->sched_latency_tolerance != p->latency_tolerance)
>> + goto change;
>>
>> p->sched_reset_on_fork = reset_on_fork;
>> retval = 0;
>> @@ -5392,6 +5405,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
>> else
>> kattr.sched_nice = task_nice(p);
>>
>> + kattr.sched_latency_tolerance = p->latency_tolerance;
>> +
>> #ifdef CONFIG_UCLAMP_TASK
>> kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
>> kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 0db2c1b3361e..bb181175954b 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -21,6 +21,7 @@
>> #include <linux/sched/nohz.h>
>> #include <linux/sched/numa_balancing.h>
>> #include <linux/sched/prio.h>
>> +#include <linux/sched/latency_tolerance.h>
>
> nit: keep in alphabatical order.

ok.

>
> The series looks good to me except for the 2 minor nits. Thanks for taking care
> of this!

My pleasure. Infact, I'm trying to write patches around what Subhra posted
for reducing wakeup scans https://lkml.org/lkml/2019/8/30/829 and few ideas
from Peter's patch https://lkml.org/lkml/2018/5/30/632. Aim here is to
reduce scans for lower latency_tolerance tasks and will post out soon which
uses this feature.

>
> Reviewed-by: Qais Yousef <[email protected]>

Thanks. Will add it.

>
> Cheers
>
> --
> Qais Yousef
>
>> #include <linux/sched/rt.h>
>> #include <linux/sched/signal.h>
>> #include <linux/sched/smt.h>
>> --
>> 2.17.2
>>

2019-12-05 09:25:28

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints

On 25/11/2019 10:46, Parth Shah wrote:
> This patch series is based on the discussion started as the "Usecases for
> the per-task latency-nice attribute"[1]
>
> This patch series introduces a new per-task attribute latency_tolerance to
> provide the scheduler hints about the latency requirements of the task.

I forgot but is there a chance to have this as a per-taskgroup attribute
as well?

> Latency_tolerance is a ranged attribute of a task with the value ranging
> from [-20, 19] both inclusive which makes it align with the task nice
> value.
>
> The value should provide scheduler hints about the relative latency
> requirements of tasks, meaning the task with "latency_tolerance = -20"
> should have lower latency than compared to those tasks with higher values.
> Similarly a task with "latency_tolerance = 19" can have higher latency and
> hence such tasks may bot care much about the latency numbers.
>
> The default value is set to 0. The usecases defined in [1] can use this
> range of [-20, 19] for latency_tolerance for the specific purpose. This
> patch does not define any use cases for such attribute so that any change
> in naming or range does not affect much to the other (future) patches using
> this. The actual use of latency_tolerance during task wakeup and
> load-balancing is yet to be coded for each of those usecases.

This can definitely be useful for Android/EAS by replacing the current
proprietary solution in android-google-common android-5.4:

commit 760b82c9b88d ("ANDROID: sched/fair: Bias EAS placement for latency")
commit c28f9d3945f1 ("ANDROID: sched/core: Add a latency-sensitive flag
to uclamp")

which links to usercase 6 (EAS) in [1].

2019-12-05 09:25:45

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 1/3] Introduce latency-tolerance as an per-task attribute

On 03/12/2019 16:47, Parth Shah wrote:
>
> On 12/3/19 2:06 PM, Qais Yousef wrote:
>> On 11/25/19 15:16, Parth Shah wrote:
>>> Latency-tolerance indicates the latency requirements of a task with respect
>>> to the other tasks in the system. The value of the attribute can be within
>>> the range of [-20, 19] both inclusive to be in-line with the values just
>>> like task nice values.
>>>
>>> latency_tolerance = -20 indicates the task to have the least latency as
>>> compared to the tasks having latency_tolerance = +19.
>>>
>>> The latency_tolerance may affect only the CFS SCHED_CLASS by getting
>>> latency requirements from the userspace.

[...]

>>> diff --git a/include/linux/sched/latency_tolerance.h b/include/linux/sched/latency_tolerance.h

Do we really need an extra header file for this? I know there is
linux/sched/prio.h but couldn't this go into kernel/sched/sched.h?

[...]

2019-12-05 09:25:56

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task

On 03/12/2019 16:51, Parth Shah wrote:
>
> On 12/3/19 2:09 PM, Qais Yousef wrote:
>> On 11/25/19 15:16, Parth Shah wrote:

[...]

>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index ea7abbf5c1bb..dfd36ec14404 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
>>> p->rt_priority = attr->sched_priority;
>>> p->normal_prio = normal_prio(p);
>>> set_load_weight(p, true);
>>> +
>>> + /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */

IMHO, this comment seems to be gratuitous.

>>> + p->latency_tolerance = attr->sched_latency_tolerance;
>>> }

[...]

2019-12-05 10:50:39

by Valentin Schneider

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints



On 05/12/2019 09:24, Dietmar Eggemann wrote:
> On 25/11/2019 10:46, Parth Shah wrote:
>> This patch series is based on the discussion started as the "Usecases for
>> the per-task latency-nice attribute"[1]
>>
>> This patch series introduces a new per-task attribute latency_tolerance to
>> provide the scheduler hints about the latency requirements of the task.
>
> I forgot but is there a chance to have this as a per-taskgroup attribute
> as well?
>

Peter argued we should go for task attributes first, and then
cgroup/taskgroups later on:

https://lore.kernel.org/lkml/[email protected]/

2019-12-05 11:43:14

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 1/3] Introduce latency-tolerance as an per-task attribute



On 12/5/19 2:54 PM, Dietmar Eggemann wrote:
> On 03/12/2019 16:47, Parth Shah wrote:
>>
>> On 12/3/19 2:06 PM, Qais Yousef wrote:
>>> On 11/25/19 15:16, Parth Shah wrote:
>>>> Latency-tolerance indicates the latency requirements of a task with respect
>>>> to the other tasks in the system. The value of the attribute can be within
>>>> the range of [-20, 19] both inclusive to be in-line with the values just
>>>> like task nice values.
>>>>
>>>> latency_tolerance = -20 indicates the task to have the least latency as
>>>> compared to the tasks having latency_tolerance = +19.
>>>>
>>>> The latency_tolerance may affect only the CFS SCHED_CLASS by getting
>>>> latency requirements from the userspace.
>
> [...]
>
>>>> diff --git a/include/linux/sched/latency_tolerance.h b/include/linux/sched/latency_tolerance.h
>
> Do we really need an extra header file for this? I know there is
> linux/sched/prio.h but couldn't this go into kernel/sched/sched.h?

We can include this in kernel/sched/sched.h itself unless we have any plans
to use it outside the scheduler subsystem. I will then add it as specified
in next revision.

>
> [...]
>

2019-12-05 14:05:04

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints

On 05/12/2019 11:49, Valentin Schneider wrote:
>
> On 05/12/2019 09:24, Dietmar Eggemann wrote:
>> On 25/11/2019 10:46, Parth Shah wrote:
>>> This patch series is based on the discussion started as the "Usecases for
>>> the per-task latency-nice attribute"[1]
>>>
>>> This patch series introduces a new per-task attribute latency_tolerance to
>>> provide the scheduler hints about the latency requirements of the task.
>>
>> I forgot but is there a chance to have this as a per-taskgroup attribute
>> as well?
>>
>
> Peter argued we should go for task attributes first, and then
> cgroup/taskgroups later on:
>
> https://lore.kernel.org/lkml/[email protected]/

OK, I went through this thread again. So Google or we have to provide
the missing per-taskgroup API via cpu controller's attributes (like for
uclamp) for the EAS usecase.

After reading:

https://lore.kernel.org/r/[email protected]

IMHO the following mapping of the existing Android (binary)
latency_sensitive per-taskgroup flag makes sense:

latency_sensitive=1 -> latency_tolerance*[-20 .. -1] (less tolerant,
more sensitive)

latency_sensitive=0 -> latency_tolerance[0 .. 19] (more tolerant, less
sensitive)

Default value is 0 so not latency_sensitive.

* Since we use [-20 .. 19] as values for latency_tolerance we could name
it latency_nice. It's shorter ... ?

2019-12-05 17:15:00

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints



On 12/5/19 7:33 PM, Dietmar Eggemann wrote:
> On 05/12/2019 11:49, Valentin Schneider wrote:
>>
>> On 05/12/2019 09:24, Dietmar Eggemann wrote:
>>> On 25/11/2019 10:46, Parth Shah wrote:
>>>> This patch series is based on the discussion started as the "Usecases for
>>>> the per-task latency-nice attribute"[1]
>>>>
>>>> This patch series introduces a new per-task attribute latency_tolerance to
>>>> provide the scheduler hints about the latency requirements of the task.
>>>
>>> I forgot but is there a chance to have this as a per-taskgroup attribute
>>> as well?
>>>
>>
>> Peter argued we should go for task attributes first, and then
>> cgroup/taskgroups later on:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>
> OK, I went through this thread again. So Google or we have to provide
> the missing per-taskgroup API via cpu controller's attributes (like for
> uclamp) for the EAS usecase.

I suppose many others (including myself) will also be interested in having
per-taskgroup attribute via CPU controller.

>
> After reading:
>
> https://lore.kernel.org/r/[email protected]
>
> IMHO the following mapping of the existing Android (binary)
> latency_sensitive per-taskgroup flag makes sense:
>
> latency_sensitive=1 -> latency_tolerance*[-20 .. -1] (less tolerant,
> more sensitive)
>
> latency_sensitive=0 -> latency_tolerance[0 .. 19] (more tolerant, less
> sensitive)
>
> Default value is 0 so not latency_sensitive.
>
> * Since we use [-20 .. 19] as values for latency_tolerance we could name
> it latency_nice. It's shorter ... ?

I kept choosing appropriate name and possible values for this new attribute
in the separate thread. https://lkml.org/lkml/2019/9/30/215
From which discussion, looking at Patrick's comment
https://lkml.org/lkml/2019/9/18/678 I thought of picking latency_tolerance
as the appropriate name.
Still will be happy to change as per the community needs.

Thanks,
parth

2019-12-06 12:32:50

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints

On 05.12.19 18:13, Parth Shah wrote:
>
>
> On 12/5/19 7:33 PM, Dietmar Eggemann wrote:
>> On 05/12/2019 11:49, Valentin Schneider wrote:
>>>
>>> On 05/12/2019 09:24, Dietmar Eggemann wrote:
>>>> On 25/11/2019 10:46, Parth Shah wrote:

[...]

>> OK, I went through this thread again. So Google or we have to provide
>> the missing per-taskgroup API via cpu controller's attributes (like for
>> uclamp) for the EAS usecase.
>
> I suppose many others (including myself) will also be interested in having
> per-taskgroup attribute via CPU controller.

Ok, let us have a look since Android needs it.

[...]

> I kept choosing appropriate name and possible values for this new attribute
> in the separate thread. https://lkml.org/lkml/2019/9/30/215
> From which discussion, looking at Patrick's comment
> https://lkml.org/lkml/2019/9/18/678 I thought of picking latency_tolerance
> as the appropriate name.
> Still will be happy to change as per the community needs.

Yeah, SCHED_FLAG_LATENCY_TOLERANCE seems to be pretty long.

2019-12-06 16:07:33

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task



On 05.12.19 10:24, Dietmar Eggemann wrote:
> On 03/12/2019 16:51, Parth Shah wrote:
>>
>> On 12/3/19 2:09 PM, Qais Yousef wrote:
>>> On 11/25/19 15:16, Parth Shah wrote:
>
> [...]
>
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index ea7abbf5c1bb..dfd36ec14404 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
>>>> p->rt_priority = attr->sched_priority;
>>>> p->normal_prio = normal_prio(p);
>>>> set_load_weight(p, true);
>>>> +
>>>> + /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */
>
> IMHO, this comment seems to be gratuitous.
>
>>>> + p->latency_tolerance = attr->sched_latency_tolerance;
>>>> }
>
> [...]
>

This also would require some changes to UAPI
(include/uapi/linux/sched.h, include/uapi/linux/sched/types.h), see
commit a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support
utilization clamping") and tools headers UAPI
(tools/include/uapi/linux/sched.h), see commit c093de6bd3c5 ("tools
headers UAPI: Sync sched.h with the kernel").

2019-12-08 05:53:34

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 3/3] Allow sched_{get,set}attr to change latency_tolerance of the task



On 12/6/19 9:34 PM, Dietmar Eggemann wrote:
>
>
> On 05.12.19 10:24, Dietmar Eggemann wrote:
>> On 03/12/2019 16:51, Parth Shah wrote:
>>>
>>> On 12/3/19 2:09 PM, Qais Yousef wrote:
>>>> On 11/25/19 15:16, Parth Shah wrote:
>>
>> [...]
>>
>>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>>> index ea7abbf5c1bb..dfd36ec14404 100644
>>>>> --- a/kernel/sched/core.c
>>>>> +++ b/kernel/sched/core.c
>>>>> @@ -4695,6 +4695,9 @@ static void __setscheduler_params(struct task_struct *p,
>>>>> p->rt_priority = attr->sched_priority;
>>>>> p->normal_prio = normal_prio(p);
>>>>> set_load_weight(p, true);
>>>>> +
>>>>> + /* Change latency tolerance of the task if !SCHED_FLAG_KEEP_PARAMS */
>>
>> IMHO, this comment seems to be gratuitous.
>>
>>>>> + p->latency_tolerance = attr->sched_latency_tolerance;
>>>>> }
>>
>> [...]
>>
>
> This also would require some changes to UAPI
> (include/uapi/linux/sched.h, include/uapi/linux/sched/types.h), see
> commit a509a7cd7974 ("sched/uclamp: Extend sched_setattr() to support
> utilization clamping") and tools headers UAPI
> (tools/include/uapi/linux/sched.h), see commit c093de6bd3c5 ("tools
> headers UAPI: Sync sched.h with the kernel").
>

Ok. Will add it. Thanks

2019-12-08 05:58:56

by Parth Shah

[permalink] [raw]
Subject: Re: [RFC 0/3] Introduce per-task latency_tolerance for scheduler hints



On 12/6/19 6:01 PM, Dietmar Eggemann wrote:
> On 05.12.19 18:13, Parth Shah wrote:
>>
>>
>> On 12/5/19 7:33 PM, Dietmar Eggemann wrote:
>>> On 05/12/2019 11:49, Valentin Schneider wrote:
>>>>
>>>> On 05/12/2019 09:24, Dietmar Eggemann wrote:
>>>>> On 25/11/2019 10:46, Parth Shah wrote:
>
> [...]
>
>>> OK, I went through this thread again. So Google or we have to provide
>>> the missing per-taskgroup API via cpu controller's attributes (like for
>>> uclamp) for the EAS usecase.
>>
>> I suppose many others (including myself) will also be interested in having
>> per-taskgroup attribute via CPU controller.
>
> Ok, let us have a look since Android needs it.
>
> [...]
>
>> I kept choosing appropriate name and possible values for this new attribute
>> in the separate thread. https://lkml.org/lkml/2019/9/30/215
>> From which discussion, looking at Patrick's comment
>> https://lkml.org/lkml/2019/9/18/678 I thought of picking latency_tolerance
>> as the appropriate name.
>> Still will be happy to change as per the community needs.
>
> Yeah, SCHED_FLAG_LATENCY_TOLERANCE seems to be pretty long.
>

Hi, I'm thinking of sending v2 for the patch series and for the sake of
continuity, I will maintain the name as it is because I'm expecting further
response from other developers for the latency_nice. Will re-spin the
series with new name if people agrees upon.


Best,
Parth