2020-02-24 08:59:51

by Parth Shah

[permalink] [raw]
Subject: [PATCH v4 0/4] Introduce per-task latency_nice for scheduler hints

This is the 4th revision of the patch set to introduce latency_nice as a
per task attribute.

The previous version can be found at:
v1: https://lkml.org/lkml/2019/11/25/151
v2: https://lkml.org/lkml/2019/12/8/10
v3: https://lkml.org/lkml/2020/1/16/319

Changes in this revision are:
v3->v4:
- Based on Chris's comment, added security check to refrain non-root user
set lower value than the current latency_nice values.
v2 -> v3:
- This series changes the longer attribute name to "latency_nice" as per
the comment from Dietmar Eggemann https://lkml.org/lkml/2019/12/5/394
v1 -> v2:
- Addressed comments from Qais Yousef
- As per suggestion from Dietmar, moved content from newly created
include/linux/sched/latency_tolerance.h to kernel/sched/sched.h
- Extend sched_setattr() to support latency_tolerance in tools headers UAPI


Introduction:
==============
This patch series introduces a new per-task attribute latency_nice to
provide the scheduler hints about the latency requirements of the task [1].

Latency_nice is a ranged attribute of a task with the value ranging
from [-20, 19] both inclusive which makes it align with the task nice
value.

The value should provide scheduler hints about the relative latency
requirements of tasks, meaning the task with "latency_nice = -20"
should have lower latency requirements than compared to those tasks with
higher values. Similarly a task with "latency_nice = 19" can have higher
latency and hence such tasks may not care much about latency.

The default value is set to 0. The usecases discussed below can use this
range of [-20, 19] for latency_nice for the specific purpose. This
patch does not implement any use cases for such attribute so that any
change in naming or range does not affect much to the other (future)
patches using this. The actual use of latency_nice during task wakeup
and load-balancing is yet to be coded for each of those usecases.

As per my view, this defined attribute can be used in following ways for a
some of the usecases:
1 Reduce search scan time for select_idle_cpu():
- Reduce search scans for finding idle CPU for a waking task with lower
latency_nice values.

2 TurboSched:
- Classify the tasks with higher latency_nice values as a small
background task given that its historic utilization is very low, for
which the scheduler can search for more number of cores to do task
packing. A task with a latency_nice >= some_threshold (e.g, == 19)
and util <= 12.5% can be background tasks.

3 Optimize AVX512 based workload:
- Bias scheduler to not put a task having (latency_nice == -20) on a
core occupying AVX512 based workload.


Series Organization:
====================
- Patch 1-3: Add support for latency_nice attr in the task struct using
sched_{set,get}attr syscall
- Patch 4 : Add permission checks for setting the value.


The patch series can be applied on tip/sched/core at the
commit 000619680c37 ("sched/fair: Remove wake_cap()")


References:
============
[1]. Usecases for the per-task latency-nice attribute,
https://lkml.org/lkml/2019/9/30/215
[2]. Task Latency-nice, "Subhra Mazumdar",
https://lkml.org/lkml/2019/8/30/829
[3]. Introduce per-task latency_tolerance for scheduler hints,
https://lkml.org/lkml/2019/12/8/10



Parth Shah (4):
sched: Introduce latency-nice as a per-task attribute
sched/core: Propagate parent task's latency requirements to the child
task
sched: Allow sched_{get,set}attr to change latency_nice of the task
sched/core: Add permission checks for setting the latency_nice value

include/linux/sched.h | 1 +
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
kernel/sched/core.c | 25 +++++++++++++++++++++++++
kernel/sched/sched.h | 18 ++++++++++++++++++
tools/include/uapi/linux/sched.h | 4 +++-
6 files changed, 69 insertions(+), 2 deletions(-)

--
2.17.2


2020-02-24 08:59:54

by Parth Shah

[permalink] [raw]
Subject: [PATCH v4 1/4] sched: Introduce latency-nice as a per-task attribute

Latency-nice indicates the latency requirements of a task with respect
to the other tasks in the system. The value of the attribute can be within
the range of [-20, 19] both inclusive to be in-line with the values just
like task nice values.

latency_nice = -20 indicates the task to have the least latency as
compared to the tasks having latency_nice = +19.

The latency_nice may affect only the CFS SCHED_CLASS by getting
latency requirements from the userspace.

Signed-off-by: Parth Shah <[email protected]>
Reviewed-by: Qais Yousef <[email protected]>
---
include/linux/sched.h | 1 +
kernel/sched/sched.h | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 467d26046416..0668948fddcd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -676,6 +676,7 @@ struct task_struct {
int static_prio;
int normal_prio;
unsigned int rt_priority;
+ int latency_nice;

const struct sched_class *sched_class;
struct sched_entity se;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 878910e8b299..26b9758075e4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -101,6 +101,24 @@ extern long calc_load_fold_active(struct rq *this_rq, long adjust);
*/
#define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))

+/*
+ * Latency nice is meant to provide scheduler hints about the relative
+ * latency requirements of a task with respect to other tasks.
+ * Thus a task with latency_nice == 19 can be hinted as the task with no
+ * latency requirements, in contrast to the task with latency_nice == -20
+ * which should be given priority in terms of lower latency.
+ */
+#define MAX_LATENCY_NICE 19
+#define MIN_LATENCY_NICE -20
+
+#define LATENCY_NICE_WIDTH \
+ (MAX_LATENCY_NICE - MIN_LATENCY_NICE + 1)
+
+/*
+ * Default tasks should be treated as a task with latency_nice = 0.
+ */
+#define DEFAULT_LATENCY_NICE 0
+
/*
* Increase resolution of nice-level calculations for 64-bit architectures.
* The extra resolution improves shares distribution and load balancing of
--
2.17.2

2020-02-24 09:00:03

by Parth Shah

[permalink] [raw]
Subject: [PATCH v4 2/4] sched/core: Propagate parent task's latency requirements to the child task

Clone parent task's latency_nice attribute to the forked child task.

Reset the latency_nice value to default value when the child task is
set to sched_reset_on_fork.

Signed-off-by: Parth Shah <[email protected]>
Reviewed-by: Qais Yousef <[email protected]>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 377ec26e9159..65b6c00d6dac 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2860,6 +2860,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
*/
p->prio = current->normal_prio;

+ /* Propagate the parent's latency requirements to the child as well */
+ p->latency_nice = current->latency_nice;
+
uclamp_fork(p);

/*
@@ -2876,6 +2879,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
p->prio = p->normal_prio = __normal_prio(p);
set_load_weight(p, false);

+ p->latency_nice = DEFAULT_LATENCY_NICE;
/*
* We don't need the reset flag anymore after the fork. It has
* fulfilled its duty:
--
2.17.2

2020-02-24 09:00:08

by Parth Shah

[permalink] [raw]
Subject: [PATCH v4 3/4] sched: Allow sched_{get,set}attr to change latency_nice of the task

Introduce the latency_nice attribute to sched_attr and provide a
mechanism to change the value with the use of sched_setattr/sched_getattr
syscall.

Also add new flag "SCHED_FLAG_LATENCY_NICE" to hint the change in
latency_nice of the task on every sched_setattr syscall.

Signed-off-by: Parth Shah <[email protected]>
Reviewed-by: Qais Yousef <[email protected]>
---
include/uapi/linux/sched.h | 4 +++-
include/uapi/linux/sched/types.h | 19 +++++++++++++++++++
kernel/sched/core.c | 17 +++++++++++++++++
tools/include/uapi/linux/sched.h | 4 +++-
4 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 4a0217832464..88ce1e8d7553 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -121,6 +121,7 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_LATENCY_NICE 0x80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)
@@ -132,6 +133,7 @@ struct clone_args {
SCHED_FLAG_RECLAIM | \
SCHED_FLAG_DL_OVERRUN | \
SCHED_FLAG_KEEP_ALL | \
- SCHED_FLAG_UTIL_CLAMP)
+ SCHED_FLAG_UTIL_CLAMP | \
+ SCHED_FLAG_LATENCY_NICE)

#endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index c852153ddb0d..626ab61c1145 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -10,6 +10,7 @@ struct sched_param {

#define SCHED_ATTR_SIZE_VER0 48 /* sizeof first published struct */
#define SCHED_ATTR_SIZE_VER1 56 /* add: util_{min,max} */
+#define SCHED_ATTR_SIZE_VER2 60 /* add: latency_nice */

/*
* Extended scheduling parameters data structure.
@@ -96,6 +97,22 @@ struct sched_param {
* on a CPU with a capacity big enough to fit the specified value.
* A task with a max utilization value smaller than 1024 is more likely
* scheduled on a CPU with no more capacity than the specified value.
+ *
+ * Latency Tolerance Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes allows to specify the relative latency
+ * requirements of a task with respect to the other tasks running/queued in the
+ * system.
+ *
+ * @ sched_latency_nice task's latency_nice value
+ *
+ * The latency_nice of a task can have any value in a range of
+ * [LATENCY_NICE_MIN..LATENCY_NICE_MAX].
+ *
+ * A task with latency_nice with the value of LATENCY_NICE_MIN can be
+ * taken for a task with lower latency requirements as opposed to the task with
+ * higher latency_nice.
*/
struct sched_attr {
__u32 size;
@@ -118,6 +135,8 @@ struct sched_attr {
__u32 sched_util_min;
__u32 sched_util_max;

+ /* latency requirement hints */
+ __s32 sched_latency_nice;
};

#endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 65b6c00d6dac..e1dc536d4ca3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4723,6 +4723,8 @@ static void __setscheduler_params(struct task_struct *p,
p->rt_priority = attr->sched_priority;
p->normal_prio = normal_prio(p);
set_load_weight(p, true);
+
+ p->latency_nice = attr->sched_latency_nice;
}

/* Actually do priority change: must hold pi & rq lock. */
@@ -4880,6 +4882,13 @@ static int __sched_setscheduler(struct task_struct *p,
return retval;
}

+ if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
+ if (attr->sched_latency_nice > MAX_LATENCY_NICE)
+ return -EINVAL;
+ if (attr->sched_latency_nice < MIN_LATENCY_NICE)
+ return -EINVAL;
+ }
+
if (pi)
cpuset_read_lock();

@@ -4914,6 +4923,9 @@ static int __sched_setscheduler(struct task_struct *p,
goto change;
if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
goto change;
+ if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
+ attr->sched_latency_nice != p->latency_nice)
+ goto change;

p->sched_reset_on_fork = reset_on_fork;
retval = 0;
@@ -5162,6 +5174,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
size < SCHED_ATTR_SIZE_VER1)
return -EINVAL;

+ if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) &&
+ size < SCHED_ATTR_SIZE_VER2)
+ return -EINVAL;
/*
* XXX: Do we want to be lenient like existing syscalls; or do we want
* to be strict and return an error on out-of-bounds values?
@@ -5391,6 +5406,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
else
kattr.sched_nice = task_nice(p);

+ kattr.sched_latency_nice = p->latency_nice;
+
#ifdef CONFIG_UCLAMP_TASK
kattr.sched_util_min = p->uclamp_req[UCLAMP_MIN].value;
kattr.sched_util_max = p->uclamp_req[UCLAMP_MAX].value;
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 4a0217832464..fa67e2527169 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -121,6 +121,7 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_LATENCY_NICE 0X80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)
@@ -132,6 +133,7 @@ struct clone_args {
SCHED_FLAG_RECLAIM | \
SCHED_FLAG_DL_OVERRUN | \
SCHED_FLAG_KEEP_ALL | \
- SCHED_FLAG_UTIL_CLAMP)
+ SCHED_FLAG_UTIL_CLAMP | \
+ SCHED_FLAG_LATENCY_NICE)

#endif /* _UAPI_LINUX_SCHED_H */
--
2.17.2

2020-02-24 09:00:53

by Parth Shah

[permalink] [raw]
Subject: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value

Since the latency_nice uses the similar infrastructure as NICE, use the
already existing CAP_SYS_NICE security checks for the latency_nice. This
should return -EPERM for the non-root user when trying to set the task
latency_nice value to any lower than the current value.

Signed-off-by: Parth Shah <[email protected]>
---
kernel/sched/core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e1dc536d4ca3..f883e1d3cd10 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4887,6 +4887,10 @@ static int __sched_setscheduler(struct task_struct *p,
return -EINVAL;
if (attr->sched_latency_nice < MIN_LATENCY_NICE)
return -EINVAL;
+ /* Use the same security checks as NICE */
+ if (attr->sched_latency_nice < p->latency_nice &&
+ !can_nice(p, attr->sched_latency_nice))
+ return -EPERM;
}

if (pi)
--
2.17.2

2020-02-24 13:29:43

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value

On 02/24/20 14:29, Parth Shah wrote:
> Since the latency_nice uses the similar infrastructure as NICE, use the
> already existing CAP_SYS_NICE security checks for the latency_nice. This
> should return -EPERM for the non-root user when trying to set the task
> latency_nice value to any lower than the current value.
>
> Signed-off-by: Parth Shah <[email protected]>

I'm not against this, so I'm okay if it goes in as is.

But IMO the definition of this flag is system dependent and I think it's
prudent to keep it an admin only configuration.

It'd be hard to predict how normal application could use and depend on this
feature in the future, which could tie our hand in terms of extending it.

I can't argue hard about this though. But I do feel going further and have
a sched_feature() for each optimization that uses this flag could be necessary
too.

Thanks

--
Qais Yousef

> ---
> kernel/sched/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e1dc536d4ca3..f883e1d3cd10 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4887,6 +4887,10 @@ static int __sched_setscheduler(struct task_struct *p,
> return -EINVAL;
> if (attr->sched_latency_nice < MIN_LATENCY_NICE)
> return -EINVAL;
> + /* Use the same security checks as NICE */
> + if (attr->sched_latency_nice < p->latency_nice &&
> + !can_nice(p, attr->sched_latency_nice))
> + return -EPERM;
> }
>
> if (pi)
> --
> 2.17.2
>

2020-02-24 23:09:31

by Chris Hyser

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value

On 2/24/20 3:59 AM, Parth Shah wrote:
> Since the latency_nice uses the similar infrastructure as NICE, use the
> already existing CAP_SYS_NICE security checks for the latency_nice. This
> should return -EPERM for the non-root user when trying to set the task
> latency_nice value to any lower than the current value.
>
> Signed-off-by: Parth Shah <[email protected]>

Reviewed-by: Chris Hyser <[email protected]>

> ---
> kernel/sched/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index e1dc536d4ca3..f883e1d3cd10 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4887,6 +4887,10 @@ static int __sched_setscheduler(struct task_struct *p,
> return -EINVAL;
> if (attr->sched_latency_nice < MIN_LATENCY_NICE)
> return -EINVAL;
> + /* Use the same security checks as NICE */
> + if (attr->sched_latency_nice < p->latency_nice &&
> + !can_nice(p, attr->sched_latency_nice))
> + return -EPERM;
> }
>
> if (pi)
>

2020-02-25 06:43:07

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] sched/core: Propagate parent task's latency requirements to the child task

On Mon, Feb 24, 2020 at 02:29:16PM +0530, Parth Shah wrote:
> Clone parent task's latency_nice attribute to the forked child task.
>
> Reset the latency_nice value to default value when the child task is
> set to sched_reset_on_fork.
>
> Signed-off-by: Parth Shah <[email protected]>
> Reviewed-by: Qais Yousef <[email protected]>
> ---
> kernel/sched/core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 377ec26e9159..65b6c00d6dac 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2860,6 +2860,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
> */
> p->prio = current->normal_prio;
>
> + /* Propagate the parent's latency requirements to the child as well */
> + p->latency_nice = current->latency_nice;
> +
> uclamp_fork(p);
>
> /*
> @@ -2876,6 +2879,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
> p->prio = p->normal_prio = __normal_prio(p);
> set_load_weight(p, false);
>
> + p->latency_nice = DEFAULT_LATENCY_NICE;
> /*
> * We don't need the reset flag anymore after the fork. It has
> * fulfilled its duty:
> --
> 2.17.2
>

LGTM.

latency_nice value initialization is missing for the init_task.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-25 06:48:48

by Parth Shah

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value



On 2/24/20 6:59 PM, Qais Yousef wrote:
> On 02/24/20 14:29, Parth Shah wrote:
>> Since the latency_nice uses the similar infrastructure as NICE, use the
>> already existing CAP_SYS_NICE security checks for the latency_nice. This
>> should return -EPERM for the non-root user when trying to set the task
>> latency_nice value to any lower than the current value.
>>
>> Signed-off-by: Parth Shah <[email protected]>
>
> I'm not against this, so I'm okay if it goes in as is.
>
> But IMO the definition of this flag is system dependent and I think it's
> prudent to keep it an admin only configuration.
>
> It'd be hard to predict how normal application could use and depend on this
> feature in the future, which could tie our hand in terms of extending it.
>

I am fine with this going in too. But just to lie down the fact on single
page and starting the discussion, here are the pros and cons for including
this permission checks:

Pros:
=====
- Having this permission checks will allow only root users to promote the
task, meaning lowering the latency_nice of the task. This is required in
case when the admin has increased the latency_nice value of a task and
non-root user can not lower it.
- In absence of this check, the non-root user can decrease the latency_nice
value against the admin configured value.

Cons:
=====
- This permission check prevents the non-root user to lower the value. This
is a problem when the user itself has increased the latency_nice value in
the past but fails to lower it again.
- After task fork, non-root user cannot lower the inherited child task's
latency_nice value, which might be a problem in the future for extending
this latency_nice ideas for different optimizations.


> I can't argue hard about this though. But I do feel going further and have
> a sched_feature() for each optimization that uses this flag could be necessary
> too.

I agree to your point.


Thanks,
Parth

>
> Thanks
>
> --
> Qais Yousef
>
>> ---
>> kernel/sched/core.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index e1dc536d4ca3..f883e1d3cd10 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4887,6 +4887,10 @@ static int __sched_setscheduler(struct task_struct *p,
>> return -EINVAL;
>> if (attr->sched_latency_nice < MIN_LATENCY_NICE)
>> return -EINVAL;
>> + /* Use the same security checks as NICE */
>> + if (attr->sched_latency_nice < p->latency_nice &&
>> + !can_nice(p, attr->sched_latency_nice))
>> + return -EPERM;
>> }
>>
>> if (pi)
>> --
>> 2.17.2
>>

2020-02-25 06:56:38

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] sched: Allow sched_{get,set}attr to change latency_nice of the task

On Mon, Feb 24, 2020 at 02:29:17PM +0530, Parth Shah wrote:

[...]

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 65b6c00d6dac..e1dc536d4ca3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4723,6 +4723,8 @@ static void __setscheduler_params(struct task_struct *p,
> p->rt_priority = attr->sched_priority;
> p->normal_prio = normal_prio(p);
> set_load_weight(p, true);
> +
> + p->latency_nice = attr->sched_latency_nice;
> }

We don't want this when SCHED_FLAG_LATENCY_NICE is not set in
attr->flags.

The user may pass SCHED_FLAG_KEEP_PARAMS | SCHED_FLAG_LATENCY_NICE to
change only latency nice value. So we have to update latency_nice
outside __setscheduler_params(), I think.

>
> /* Actually do priority change: must hold pi & rq lock. */
> @@ -4880,6 +4882,13 @@ static int __sched_setscheduler(struct task_struct *p,
> return retval;
> }
>
> + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
> + if (attr->sched_latency_nice > MAX_LATENCY_NICE)
> + return -EINVAL;
> + if (attr->sched_latency_nice < MIN_LATENCY_NICE)
> + return -EINVAL;
> + }
> +
> if (pi)
> cpuset_read_lock();
>
> @@ -4914,6 +4923,9 @@ static int __sched_setscheduler(struct task_struct *p,
> goto change;
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
> goto change;
> + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
> + attr->sched_latency_nice != p->latency_nice)
> + goto change;
>
> p->sched_reset_on_fork = reset_on_fork;
> retval = 0;
> @@ -5162,6 +5174,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
> size < SCHED_ATTR_SIZE_VER1)
> return -EINVAL;
>
> + if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) &&
> + size < SCHED_ATTR_SIZE_VER2)
> + return -EINVAL;
> /*
> * XXX: Do we want to be lenient like existing syscalls; or do we want
> * to be strict and return an error on out-of-bounds values?
> @@ -5391,6 +5406,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
> else
> kattr.sched_nice = task_nice(p);
>
> + kattr.sched_latency_nice = p->latency_nice;
> +

Can you consider printing latency_nice value in proc_sched_show_task() in this
patch/series?

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-25 08:17:20

by Parth Shah

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] sched/core: Propagate parent task's latency requirements to the child task



On 2/25/20 12:02 PM, Pavan Kondeti wrote:
> On Mon, Feb 24, 2020 at 02:29:16PM +0530, Parth Shah wrote:
>> Clone parent task's latency_nice attribute to the forked child task.
>>
>> Reset the latency_nice value to default value when the child task is
>> set to sched_reset_on_fork.
>>
>> Signed-off-by: Parth Shah <[email protected]>
>> Reviewed-by: Qais Yousef <[email protected]>
>> ---
>> kernel/sched/core.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 377ec26e9159..65b6c00d6dac 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2860,6 +2860,9 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
>> */
>> p->prio = current->normal_prio;
>>
>> + /* Propagate the parent's latency requirements to the child as well */
>> + p->latency_nice = current->latency_nice;
>> +
>> uclamp_fork(p);
>>
>> /*
>> @@ -2876,6 +2879,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
>> p->prio = p->normal_prio = __normal_prio(p);
>> set_load_weight(p, false);
>>
>> + p->latency_nice = DEFAULT_LATENCY_NICE;
>> /*
>> * We don't need the reset flag anymore after the fork. It has
>> * fulfilled its duty:
>> --
>> 2.17.2
>>
>
> LGTM.
>
> latency_nice value initialization is missing for the init_task.

good catch. Will add that part too. Thanks for pointing out.

- Parth

>
> Thanks,
> Pavan
>

2020-02-25 15:10:54

by Parth Shah

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] sched: Allow sched_{get,set}attr to change latency_nice of the task



On 2/25/20 12:24 PM, Pavan Kondeti wrote:
> On Mon, Feb 24, 2020 at 02:29:17PM +0530, Parth Shah wrote:
>
> [...]
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 65b6c00d6dac..e1dc536d4ca3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4723,6 +4723,8 @@ static void __setscheduler_params(struct task_struct *p,
>> p->rt_priority = attr->sched_priority;
>> p->normal_prio = normal_prio(p);
>> set_load_weight(p, true);
>> +
>> + p->latency_nice = attr->sched_latency_nice;
>> }
>
> We don't want this when SCHED_FLAG_LATENCY_NICE is not set in
> attr->flags.
>
> The user may pass SCHED_FLAG_KEEP_PARAMS | SCHED_FLAG_LATENCY_NICE to
> change only latency nice value. So we have to update latency_nice
> outside __setscheduler_params(), I think.


AFAICT, passing SCHED_FLAG_KEEP_PARAMS with any other flag will prevent us
from changing the latency_nice value because of the below code flow:

__sched_setscheduler()
__setscheduler()
if (attr->sched_flags & SCHED_FLAG_KEEP_PARAMS) return;
__setscheduler_params()

whereas, one thing we still can do is add if condition when setting the
value, i.e.,

@@ -4724,7 +4724,8 @@ static void __setscheduler_params(struct task_struct *p,
p->normal_prio = normal_prio(p);
set_load_weight(p, true);

- p->latency_nice = attr->sched_latency_nice;
+ if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
+ p->latency_nice = attr->sched_latency_nice;
}


>
>>
>> /* Actually do priority change: must hold pi & rq lock. */
>> @@ -4880,6 +4882,13 @@ static int __sched_setscheduler(struct task_struct *p,
>> return retval;
>> }
>>
>> + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
>> + if (attr->sched_latency_nice > MAX_LATENCY_NICE)
>> + return -EINVAL;
>> + if (attr->sched_latency_nice < MIN_LATENCY_NICE)
>> + return -EINVAL;
>> + }
>> +
>> if (pi)
>> cpuset_read_lock();
>>
>> @@ -4914,6 +4923,9 @@ static int __sched_setscheduler(struct task_struct *p,
>> goto change;
>> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP)
>> goto change;
>> + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE &&
>> + attr->sched_latency_nice != p->latency_nice)
>> + goto change;
>>
>> p->sched_reset_on_fork = reset_on_fork;
>> retval = 0;
>> @@ -5162,6 +5174,9 @@ static int sched_copy_attr(struct sched_attr __user *uattr, struct sched_attr *a
>> size < SCHED_ATTR_SIZE_VER1)
>> return -EINVAL;
>>
>> + if ((attr->sched_flags & SCHED_FLAG_LATENCY_NICE) &&
>> + size < SCHED_ATTR_SIZE_VER2)
>> + return -EINVAL;
>> /*
>> * XXX: Do we want to be lenient like existing syscalls; or do we want
>> * to be strict and return an error on out-of-bounds values?
>> @@ -5391,6 +5406,8 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
>> else
>> kattr.sched_nice = task_nice(p);
>>
>> + kattr.sched_latency_nice = p->latency_nice;
>> +
>
> Can you consider printing latency_nice value in proc_sched_show_task() in this
> patch/series?

Sure, I will add it.


Thanks,
Parth

2020-02-26 03:45:01

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] sched: Allow sched_{get,set}attr to change latency_nice of the task

On Tue, Feb 25, 2020 at 08:33:53PM +0530, Parth Shah wrote:
>
>
> On 2/25/20 12:24 PM, Pavan Kondeti wrote:
> > On Mon, Feb 24, 2020 at 02:29:17PM +0530, Parth Shah wrote:
> >
> > [...]
> >
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 65b6c00d6dac..e1dc536d4ca3 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -4723,6 +4723,8 @@ static void __setscheduler_params(struct task_struct *p,
> >> p->rt_priority = attr->sched_priority;
> >> p->normal_prio = normal_prio(p);
> >> set_load_weight(p, true);
> >> +
> >> + p->latency_nice = attr->sched_latency_nice;
> >> }
> >
> > We don't want this when SCHED_FLAG_LATENCY_NICE is not set in
> > attr->flags.
> >
> > The user may pass SCHED_FLAG_KEEP_PARAMS | SCHED_FLAG_LATENCY_NICE to
> > change only latency nice value. So we have to update latency_nice
> > outside __setscheduler_params(), I think.
>
>
> AFAICT, passing SCHED_FLAG_KEEP_PARAMS with any other flag will prevent us
> from changing the latency_nice value because of the below code flow:
>
> __sched_setscheduler()
> __setscheduler()
> if (attr->sched_flags & SCHED_FLAG_KEEP_PARAMS) return;
> __setscheduler_params()
>

I thought the user space could set SCHED_FLAG_KEEP_ALL | SCHED_FLAG_LATENCY_NICE
and be able to modify the nice value alone. This does not work since we skip
setting the latency nice value when SCHED_FLAG_KEEP_PARAMS is set. So to
change the nice value alone, we first have to do getattr() and modify the nice
value and pass it to setattr(). It is not a big deal. so I will leave it here.

> whereas, one thing we still can do is add if condition when setting the
> value, i.e.,
>
> @@ -4724,7 +4724,8 @@ static void __setscheduler_params(struct task_struct *p,
> p->normal_prio = normal_prio(p);
> set_load_weight(p, true);
>
> - p->latency_nice = attr->sched_latency_nice;
> + if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
> + p->latency_nice = attr->sched_latency_nice;
> }
>

Yes, without this, we accidently override latency value when other attributes
are modified.

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-02-27 11:44:29

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value

On 02/25/20 12:17, Parth Shah wrote:
>
>
> On 2/24/20 6:59 PM, Qais Yousef wrote:
> > On 02/24/20 14:29, Parth Shah wrote:
> >> Since the latency_nice uses the similar infrastructure as NICE, use the
> >> already existing CAP_SYS_NICE security checks for the latency_nice. This
> >> should return -EPERM for the non-root user when trying to set the task
> >> latency_nice value to any lower than the current value.
> >>
> >> Signed-off-by: Parth Shah <[email protected]>
> >
> > I'm not against this, so I'm okay if it goes in as is.
> >
> > But IMO the definition of this flag is system dependent and I think it's
> > prudent to keep it an admin only configuration.
> >
> > It'd be hard to predict how normal application could use and depend on this
> > feature in the future, which could tie our hand in terms of extending it.
> >
>
> I am fine with this going in too. But just to lie down the fact on single
> page and starting the discussion, here are the pros and cons for including
> this permission checks:
>
> Pros:
> =====
> - Having this permission checks will allow only root users to promote the
> task, meaning lowering the latency_nice of the task. This is required in
> case when the admin has increased the latency_nice value of a task and
> non-root user can not lower it.
> - In absence of this check, the non-root user can decrease the latency_nice
> value against the admin configured value.
>
> Cons:
> =====
> - This permission check prevents the non-root user to lower the value. This
> is a problem when the user itself has increased the latency_nice value in
> the past but fails to lower it again.
> - After task fork, non-root user cannot lower the inherited child task's
> latency_nice value, which might be a problem in the future for extending
> this latency_nice ideas for different optimizations.

Worth adding that if we start strict with root (or capable user) only, relaxing
this to allow lowering the nice would still be possible in the future. But the
opposite is not true, we can't reverse the users ability to lower its
latency_nice value once we give it away.

Beside thinking a bit more about it now. If high latency_nice value means
cutting short the idle search for instance, does this prevent someone using
a lower latency nice to be aggressive in some code path to get higher
throughput?

I think this lack of clarity is what makes me think it's better to start
conservative and expand when needed.

My 2p :)

Cheers

--
Qais Yousef

2020-02-27 14:48:53

by Chris Hyser

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Add permission checks for setting the latency_nice value

On 2/27/20 6:44 AM, Qais Yousef wrote:
> On 02/25/20 12:17, Parth Shah wrote:
>>
>>
>> On 2/24/20 6:59 PM, Qais Yousef wrote:
>>> On 02/24/20 14:29, Parth Shah wrote:
>>>> Since the latency_nice uses the similar infrastructure as NICE, use the
>>>> already existing CAP_SYS_NICE security checks for the latency_nice. This
>>>> should return -EPERM for the non-root user when trying to set the task
>>>> latency_nice value to any lower than the current value.
>>>>
>>>> Signed-off-by: Parth Shah <[email protected]>
>>>
>>> I'm not against this, so I'm okay if it goes in as is.
>>>
>>> But IMO the definition of this flag is system dependent and I think it's
>>> prudent to keep it an admin only configuration.
>>>
>>> It'd be hard to predict how normal application could use and depend on this
>>> feature in the future, which could tie our hand in terms of extending it.
>>>
>>
>> I am fine with this going in too. But just to lie down the fact on single
>> page and starting the discussion, here are the pros and cons for including
>> this permission checks:
>>
>> Pros:
>> =====
>> - Having this permission checks will allow only root users to promote the
>> task, meaning lowering the latency_nice of the task. This is required in
>> case when the admin has increased the latency_nice value of a task and
>> non-root user can not lower it.
>> - In absence of this check, the non-root user can decrease the latency_nice
>> value against the admin configured value.
>>
>> Cons:
>> =====
>> - This permission check prevents the non-root user to lower the value. This
>> is a problem when the user itself has increased the latency_nice value in
>> the past but fails to lower it again.
>> - After task fork, non-root user cannot lower the inherited child task's
>> latency_nice value, which might be a problem in the future for extending
>> this latency_nice ideas for different optimizations.
>
> Worth adding that if we start strict with root (or capable user) only, relaxing
> this to allow lowering the nice would still be possible in the future. But the
> opposite is not true, we can't reverse the users ability to lower its
> latency_nice value once we give it away.
>
> Beside thinking a bit more about it now. If high latency_nice value means
> cutting short the idle search for instance, does this prevent someone using
> a lower latency nice to be aggressive in some code path to get higher
> throughput?

Short-cutting an idle cpu search reduces latency. There would be a mapping between the latency_nice values -20..-1 and
the short cut. In that view 0 is the default and performs the full domain search as before and -20 presumably skips the
entire search. Positive values then presumably indicate a trade-off in preference of throughput. I've not thought any
about it till now, but maybe indicates that spending extra time (versus less) finding this task the perfect home to just
sit and crank on throughput would be worth it.

-chrish