2019-12-08 06:05:14

by Parth Shah

[permalink] [raw]
Subject: [PATCH v2 0/3] Introduce per-task latency_tolerance for scheduler hints

This is the 2nd revision of the patch set to introduce latency_tolerance as
a per task attribute.

The previous version can be found at:
v1: https://lkml.org/lkml/2019/11/25/151

Changes in this revision are:
v1 -> v2:
- Addressed comments from Qais Yousef
- As per suggestion from Dietmar, moved content from newly created
include/linux/sched/latency_tolerance.h to kernel/sched/sched.h
- Extend sched_setattr() to support latency_tolerance in tools headers UAPI


This patch series introduces a new per-task attribute latency_tolerance to
provide the scheduler hints about the latency requirements of the task [1].

Latency_tolerance is a ranged attribute of a task with the value ranging
from [-20, 19] both inclusive which makes it align with the task nice
value.

The value should provide scheduler hints about the relative latency
requirements of tasks, meaning the task with "latency_tolerance = -20"
should have lower latency than compared to those tasks with higher values.
Similarly a task with "latency_tolerance = 19" can have higher latency and
hence such tasks may not care much about latency.

The default value is set to 0. The usecases discussed below can use this
range of [-20, 19] for latency_tolerance for the specific purpose. This
patch does not implement any use cases for such attribute so that any
change in naming or range does not affect much to the other (future)
patches using this. The actual use of latency_tolerance during task wakeup
and load-balancing is yet to be coded for each of those usecases.

As per my view, this defined attribute can be used in following ways for a
some of the usecases:
1 Reduce search scan time for select_idle_cpu():
- Reduce search scans for finding idle CPU for a waking task with lower
latency_tolerance values.

2 TurboSched:
- Classify the tasks with higher latency_tolerance values as a small
background task given that its historic utilization is very low, for
which the scheduler can search for more number of cores to do task
packing. A task with a latency_tolerance >= some_threshold (e.g, >= +18)
and util <= 12.5% can be background tasks.

3 Optimize AVX512 based workload:
- Bias scheduler to not put a task having (latency_tolerance == -20) on a
core occupying AVX512 based workload.

Series Organization:
====================
- Patch 1: Add new attribute latency_tolerance to task_struct.
- Patch 2: Clone parent task's attribute to the child task on fork
- Patch 3: Add support for sched_{set,get}attr syscall to modify
latency_tolerance of the task

Patch 1 is kept separate for review purposes and may be merged to patch 3.

Though, the comment https://lkml.org/lkml/2019/12/6/276 from Dietmar
suggests using latency_nice as the shorter name instead of currently the
proposed name, this patch series still uses latency_tolerance as the task
attribute, but will change the name to the desired name on further
comments.


The patch series can be applied on tip/sched/core at the
commit de881a341c41 ("Merge branch 'sched/rt' into sched/core, to pick up commit")


References:
============
[1]. Usecases for the per-task latency-nice attribute,
https://lkml.org/lkml/2019/9/30/215
[2]. Task Latency-nice, "Subhra Mazumdar",
https://lkml.org/lkml/2019/8/30/829


Parth Shah (3):
sched: Introduce latency-tolerance as a per-task attribute
sched/core: Propagate parent task's latency requirements to the child
task
sched: Allow sched_{get,set}attr to change latency_tolerance of the
task

include/linux/sched.h | 1 +
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
kernel/sched/core.c | 21 +++++++++++++++++++++
kernel/sched/sched.h | 18 ++++++++++++++++++
tools/include/uapi/linux/sched.h | 4 +++-
6 files changed, 65 insertions(+), 2 deletions(-)

--
2.17.2


2019-12-08 06:05:34

by Parth Shah

[permalink] [raw]
Subject: [PATCH v2 1/3] sched: Introduce latency-tolerance as a per-task attribute

Latency-tolerance indicates the latency requirements of a task with respect
to the other tasks in the system. The value of the attribute can be within
the range of [-20, 19] both inclusive to be in-line with the values just
like task nice values.

latency_tolerance = -20 indicates the task to have the least latency as
compared to the tasks having latency_tolerance = +19.

The latency_tolerance may affect only the CFS SCHED_CLASS by getting
latency requirements from the userspace.

Signed-off-by: Parth Shah <[email protected]>
Reviewed-by: Qais Yousef <[email protected]>
---
include/linux/sched.h | 1 +
kernel/sched/sched.h | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 84b26d38c929..d5f907d7a91a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -676,6 +676,7 @@ struct task_struct {
int static_prio;
int normal_prio;
unsigned int rt_priority;
+ int latency_tolerance;

const struct sched_class *sched_class;
struct sched_entity se;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 280a3c735935..3d68aaad02cc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -101,6 +101,24 @@ extern long calc_load_fold_active(struct rq *this_rq, long adjust);
*/
#define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))

+/*
+ * Latency tolerance is meant to provide scheduler hints about the relative
+ * latency requirements of a task with respect to other tasks.
+ * Thus a task with latency_tolerance == 19 can be hinted as the task with no
+ * latency requirements, in contrast to the task with latency_tolerance == -20
+ * which should be given priority in terms of lower latency.
+ */
+#define MAX_LATENCY_TOLERANCE 19
+#define MIN_LATENCY_TOLERANCE -20
+
+#define LATENCY_TOLERANCE_WIDTH \
+ (MAX_LATENCY_TOLERANCE - MIN_LATENCY_TOLERANCE + 1)
+
+/*
+ * Default tasks should be treated as a task with latency_tolerance = 0.
+ */
+#define DEFAULT_LATENCY_TOLERANCE 0
+
/*
* Increase resolution of nice-level calculations for 64-bit architectures.
* The extra resolution improves shares distribution and load balancing of
--
2.17.2

2019-12-08 06:06:03

by Parth Shah

[permalink] [raw]
Subject: [PATCH v2 3/3] sched: Allow sched_{get,set}attr to change latency_tolerance of the task

Introduce the latency_tolerance attribute to sched_attr and provide a
mechanism to change the value with the use of sched_setattr/sched_getattr
syscall.

Also add new flag "SCHED_FLAG_LATENCY_TOLERANCE" to hint the change in
latency_tolerance of the task on every sched_setattr syscall.

Signed-off-by: Parth Shah <[email protected]>
Reviewed-by: Qais Yousef <[email protected]>
---
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
kernel/sched/core.c | 17 +++++++++++++++++
tools/include/uapi/linux/sched.h | 4 +++-
4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 25b4fa00bad1..a211d397e41d 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -101,6 +101,7 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_LATENCY_TOLERANCE 0x80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)
@@ -112,6 +113,7 @@ struct clone_args {
SCHED_FLAG_RECLAIM | \
SCHED_FLAG_DL_OVERRUN | \
SCHED_FLAG_KEEP_ALL | \
- SCHED_FLAG_UTIL_CLAMP)
+ SCHED_FLAG_UTIL_CLAMP | \
+ SCHED_FLAG_LATENCY_TOLERANCE)

#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index c852153ddb0d..4f169c88517f 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -10,6 +10,7 @@ struct sched_param {

#define SCHED_ATTR_SIZE_VER0 48 /* sizeof first published struct */
#define SCHED_ATTR_SIZE_VER1 56 /* add: util_{min,max} */
+#define SCHED_ATTR_SIZE_VER2 60 /* add: latency_tolerance */

/*
* Extended scheduling parameters data structure.
@@ -96,6 +97,22 @@ struct sched_param {
* on a CPU with a capacity big enough to fit the specified value.
* A task with a max utilization value smaller than 1024 is more likely
* scheduled on a CPU with no more capacity than the specified value.
+ *
+ * Latency Tolerance Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes allows to specify the relative latency
+ * requirements of a task with respect to the other tasks running/queued in the
+ * system.
+ *
+ * @ sched_latency_tolerance task's latency_tolerance value
+ *
+ * The latency_tolerance of a task can have any value in a range of
+ * [LATENCY_TOLERANCE_MIN..LATENCY_TOLERANCE_MAX].
+ *
+ * A task with latency_tolerance with the value of LATENCY_TOLERANCE_MIN can be
+ * taken for a task with lower latency requirements as opposed to the task with
+ * higher latency_tolerance.
*/
struct sched_attr {
__u32 size;
@@ -118,6 +135,8 @@ struct sched_attr {
__u32 sched_util_min;
__u32 sched_util_max;

+ /* latency requirement hints */
+ __s32 sched_latency_tolerance;
};

#endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3f9359d0e326..bc8a260223f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4706,6 +4706,8 @@ static void __setscheduler_params(struct task_struct *p,
p->rt_priority = attr->sched_priority;
p->normal_prio = normal_prio(p);
set_load_weight(p, true);
+
+ p->latency_tolerance = attr->sched_latency_tolerance;
}

/* Actually do priority change: must hold pi & rq lock. */
@@ -4863,6 +4865,13 @@ static int __sched_setscheduler(struct task_struct *p,
return retval;
}

+ if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE) {
+ if (attr->sched_latency_tolerance > MAX_LATENCY_TOLERANCE)
+ return -EINVAL;
+ if (attr->sched_latency_tolerance < MIN_LATENCY_TOLERANCE)
+ return -EINVAL;
+ }
+
if (pi)
cpuset_read_lock();

@@ -4897,6 +4906,9 @@ static int __sched_setscheduler(struct task_struct *p,
goto change;
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
goto change;
+ if (attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE &&
+ attr->sched_latency_tolerance != p->latency_tolerance)
+ goto change;

p->sched_reset_on_fork = reset_on_fork;
retval = 0;
@@ -5145,6 +5157,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
size < SCHED_ATTR_SIZE_VER1)
return -EINVAL;

+ if ((attr->sched_flags & SCHED_FLAG_LATENCY_TOLERANCE) &&
+ size < SCHED_ATTR_SIZE_VER2)
+ return -EINVAL;
/*
* XXX: Do we want to be lenient like existing syscalls; or do we want
* to be strict and return an error on out-of-bounds values?
@@ -5374,6 +5389,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
else
kattr.sched_nice = task_nice(p);

+ kattr.sched_latency_tolerance = p->latency_tolerance;
+
#ifdef CONFIG_UCLAMP_TASK
kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 99335e1f4a27..5ce62b1be196 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -97,6 +97,7 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_LATENCY_TOLERANCE 0X80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)
@@ -108,6 +109,7 @@ struct clone_args {
SCHED_FLAG_RECLAIM | \
SCHED_FLAG_DL_OVERRUN | \
SCHED_FLAG_KEEP_ALL | \
- SCHED_FLAG_UTIL_CLAMP)
+ SCHED_FLAG_UTIL_CLAMP | \
+ SCHED_FLAG_LATENCY_TOLERANCE)

#endif /* _UAPI_LINUX_SCHED_H */
--
2.17.2

2020-01-15 20:52:32

by Dhaval Giani

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Introduce per-task latency_tolerance for scheduler hints



On 12/7/19 10:04 PM, Parth Shah wrote:
> This is the 2nd revision of the patch set to introduce latency_tolerance as
> a per task attribute.
>
> The previous version can be found at:
> v1: https://lkml.org/lkml/2019/11/25/151
>
> Changes in this revision are:
> v1 -> v2:
> - Addressed comments from Qais Yousef
> - As per suggestion from Dietmar, moved content from newly created
> include/linux/sched/latency_tolerance.h to kernel/sched/sched.h
> - Extend sched_setattr() to support latency_tolerance in tools headers UAPI
>
>
> This patch series introduces a new per-task attribute latency_tolerance to
> provide the scheduler hints about the latency requirements of the task [1].
>
> Latency_tolerance is a ranged attribute of a task with the value ranging
> from [-20, 19] both inclusive which makes it align with the task nice
> value.
>
> The value should provide scheduler hints about the relative latency
> requirements of tasks, meaning the task with "latency_tolerance = -20"
> should have lower latency than compared to those tasks with higher values.
> Similarly a task with "latency_tolerance = 19" can have higher latency and
> hence such tasks may not care much about latency.
>
> The default value is set to 0. The usecases discussed below can use this
> range of [-20, 19] for latency_tolerance for the specific purpose. This
> patch does not implement any use cases for such attribute so that any
> change in naming or range does not affect much to the other (future)
> patches using this. The actual use of latency_tolerance during task wakeup
> and load-balancing is yet to be coded for each of those usecases.
>
> As per my view, this defined attribute can be used in following ways for a
> some of the usecases:
> 1 Reduce search scan time for select_idle_cpu():
> - Reduce search scans for finding idle CPU for a waking task with lower
> latency_tolerance values.
>
> 2 TurboSched:
> - Classify the tasks with higher latency_tolerance values as a small
> background task given that its historic utilization is very low, for
> which the scheduler can search for more number of cores to do task
> packing. A task with a latency_tolerance >= some_threshold (e.g, >= +18)
> and util <= 12.5% can be background tasks.
>
> 3 Optimize AVX512 based workload:
> - Bias scheduler to not put a task having (latency_tolerance == -20) on a
> core occupying AVX512 based workload.

Have you been able to adapt any of these use cases to this new interface?

Does the interface translate well to them?

Do you have any code that you can share?

Dhaval


Attachments:
pEpkey.asc (1.76 kB)

2020-01-16 12:06:03

by Parth Shah

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Introduce per-task latency_tolerance for scheduler hints

Hi Dhaval,

On 1/16/20 2:03 AM, Dhaval Giani wrote:
>
>
> On 12/7/19 10:04 PM, Parth Shah wrote:
>> This is the 2nd revision of the patch set to introduce latency_tolerance as
>> a per task attribute.
>>
>> The previous version can be found at:
>> v1: https://lkml.org/lkml/2019/11/25/151
>>
>> Changes in this revision are:
>> v1 -> v2:
>> - Addressed comments from Qais Yousef
>> - As per suggestion from Dietmar, moved content from newly created
>> include/linux/sched/latency_tolerance.h to kernel/sched/sched.h
>> - Extend sched_setattr() to support latency_tolerance in tools headers UAPI
>>
>>
>> This patch series introduces a new per-task attribute latency_tolerance to
>> provide the scheduler hints about the latency requirements of the task [1].
>>
>> Latency_tolerance is a ranged attribute of a task with the value ranging
>> from [-20, 19] both inclusive which makes it align with the task nice
>> value.
>>
>> The value should provide scheduler hints about the relative latency
>> requirements of tasks, meaning the task with "latency_tolerance = -20"
>> should have lower latency than compared to those tasks with higher values.
>> Similarly a task with "latency_tolerance = 19" can have higher latency and
>> hence such tasks may not care much about latency.
>>
>> The default value is set to 0. The usecases discussed below can use this
>> range of [-20, 19] for latency_tolerance for the specific purpose. This
>> patch does not implement any use cases for such attribute so that any
>> change in naming or range does not affect much to the other (future)
>> patches using this. The actual use of latency_tolerance during task wakeup
>> and load-balancing is yet to be coded for each of those usecases.
>>
>> As per my view, this defined attribute can be used in following ways for a
>> some of the usecases:
>> 1 Reduce search scan time for select_idle_cpu():
>> - Reduce search scans for finding idle CPU for a waking task with lower
>> latency_tolerance values.
>>
>> 2 TurboSched:
>> - Classify the tasks with higher latency_tolerance values as a small
>> background task given that its historic utilization is very low, for
>> which the scheduler can search for more number of cores to do task
>> packing. A task with a latency_tolerance >= some_threshold (e.g, >= +18)
>> and util <= 12.5% can be background tasks.
>>
>> 3 Optimize AVX512 based workload:
>> - Bias scheduler to not put a task having (latency_tolerance == -20) on a
>> core occupying AVX512 based workload.
>
> Have you been able to adapt any of these use cases to this new interface?
>
> Does the interface translate well to them?
>
> Do you have any code that you can share?

Yes, I am able to adapt this patch set for TurboSched and proves useful for
classifying low latency requiring tasks and pack those on fewer number of
cores. I am able to pack the tasks having
latency_tolerance==MAX_LATENCY_TOLERANCE.

I will send the RFC v6 of the TurboSched soon on the lkml, which uses the
latency_{nice/tolerance}.


Thanks,
Parth

>
> Dhaval
>