2020-04-28 16:43:48

by Qais Yousef

[permalink] [raw]
Subject: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.

This is also referred to as 'the default boost value of RT tasks'.

See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").

On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.

For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.

Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.

They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.

The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.

Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.

To control the default behavior globally by system admins and device
integrators, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.

I anticipate this to be mostly in the form of modifying the init script
of a particular device.

Whenever the new default changes, it'd be applied lazily on the next
opportunity the scheduler needs to calculate the effective uclamp.min
value for the task, assuming that it still uses the system default value
and not a user applied one.

Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.

With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.

[1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware")

Signed-off-by: Qais Yousef <[email protected]>
CC: Jonathan Corbet <[email protected]>
CC: Juri Lelli <[email protected]>
CC: Vincent Guittot <[email protected]>
CC: Dietmar Eggemann <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ben Segall <[email protected]>
CC: Mel Gorman <[email protected]>
CC: Luis Chamberlain <[email protected]>
CC: Kees Cook <[email protected]>
CC: Iurii Zaikin <[email protected]>
CC: Quentin Perret <[email protected]>
CC: Valentin Schneider <[email protected]>
CC: Patrick Bellasi <[email protected]>
CC: Pavan Kondeti <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
---

Changes in v3:

* Do the sync in uclamp_eff_get() (Patrck & Dietmar)
* Rename to sysctl_sched_uclamp_util_min_rt_default (Patrick, Steve,
Dietmar)
* Ensure the sync is applied only to RT tasks (Patrick)

v2 can be found here (apologies forgot to mark it as v2 in the subject)

https://lore.kernel.org/lkml/[email protected]/

include/linux/sched/sysctl.h | 1 +
kernel/sched/core.c | 63 +++++++++++++++++++++++++++++++++---
kernel/sysctl.c | 7 ++++
3 files changed, 66 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215ee03f..e62cef019094 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -59,6 +59,7 @@ extern int sysctl_sched_rt_runtime;
#ifdef CONFIG_UCLAMP_TASK
extern unsigned int sysctl_sched_uclamp_util_min;
extern unsigned int sysctl_sched_uclamp_util_max;
+extern unsigned int sysctl_sched_uclamp_util_min_rt_default;
#endif

#ifdef CONFIG_CFS_BANDWIDTH
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9a2fbf98fd6f..17325b4aa451 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -790,6 +790,26 @@ unsigned int sysctl_sched_uclamp_util_min = SCHED_CAPACITY_SCALE;
/* Max allowed maximum utilization */
unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE;

+/*
+ * By default RT tasks run at the maximum performance point/capacity of the
+ * system. Uclamp enforces this by always setting UCLAMP_MIN of RT tasks to
+ * SCHED_CAPACITY_SCALE.
+ *
+ * This knob allows admins to change the default behavior when uclamp is being
+ * used. In battery powered devices, particularly, running at the maximum
+ * capacity and frequency will increase energy consumption and shorten the
+ * battery life.
+ *
+ * This knob only affects RT tasks that their uclamp_se->user_defined == false.
+ *
+ * This knob will not override the system default sched_util_clamp_min defined
+ * above.
+ *
+ * Any modification is applied lazily on the next attempt to calculate the
+ * effective value of the task.
+ */
+unsigned int sysctl_sched_uclamp_util_min_rt_default = SCHED_CAPACITY_SCALE;
+
/* All clamps are required to be less or equal than these values */
static struct uclamp_se uclamp_default[UCLAMP_CNT];

@@ -872,6 +892,14 @@ unsigned int uclamp_rq_max_value(struct rq *rq, enum uclamp_id clamp_id,
return uclamp_idle_value(rq, clamp_id, clamp_value);
}

+static void uclamp_sync_util_min_rt_default(struct task_struct *p)
+{
+ struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
+
+ if (unlikely(rt_task(p)) && !uc_se->user_defined)
+ uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
+}
+
static inline struct uclamp_se
uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
{
@@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
static inline struct uclamp_se
uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
{
- struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
- struct uclamp_se uc_max = uclamp_default[clamp_id];
+ struct uclamp_se uc_req, uc_max;
+
+ /*
+ * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
+ */
+ uclamp_sync_util_min_rt_default(p);
+
+ uc_req = uclamp_tg_restrict(p, clamp_id);
+ uc_max = uclamp_default[clamp_id];

/* System default restrictions always apply */
if (unlikely(uc_req.value > uc_max.value))
@@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
loff_t *ppos)
{
bool update_root_tg = false;
- int old_min, old_max;
+ int old_min, old_max, old_min_rt;
int result;

mutex_lock(&uclamp_mutex);
old_min = sysctl_sched_uclamp_util_min;
old_max = sysctl_sched_uclamp_util_max;
+ old_min_rt = sysctl_sched_uclamp_util_min_rt_default;

result = proc_dointvec(table, write, buffer, lenp, ppos);
if (result)
@@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
goto undo;
}

+ /*
+ * The new value will be applied to RT tasks the next time the
+ * scheduler needs to calculate the effective uclamp.min for that task,
+ * assuming the task is using the system default and not a user
+ * specified value. In the latter we shall leave the value as the user
+ * requested.
+ */
+ if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
+ result = -EINVAL;
+ goto undo;
+ }
+
if (old_min != sysctl_sched_uclamp_util_min) {
uclamp_se_set(&uclamp_default[UCLAMP_MIN],
sysctl_sched_uclamp_util_min, false);
@@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
undo:
sysctl_sched_uclamp_util_min = old_min;
sysctl_sched_uclamp_util_max = old_max;
+ sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
done:
mutex_unlock(&uclamp_mutex);

@@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
if (uc_se->user_defined)
continue;

- /* By default, RT tasks always get 100% boost */
+ /*
+ * By default, RT tasks always get 100% boost, which the admins
+ * are allowed to change via
+ * sysctl_sched_uclamp_util_min_rt_default knob.
+ */
if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
- clamp_value = uclamp_none(UCLAMP_MAX);
+ clamp_value = sysctl_sched_uclamp_util_min_rt_default;

uclamp_se_set(uc_se, clamp_value, false);
}
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a176d8727a3..64117363c502 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
.mode = 0644,
.proc_handler = sysctl_sched_uclamp_handler,
},
+ {
+ .procname = "sched_util_clamp_min_rt_default",
+ .data = &sysctl_sched_uclamp_util_min_rt_default,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = sysctl_sched_uclamp_handler,
+ },
#endif
#ifdef CONFIG_SCHED_AUTOGROUP
{
--
2.17.1


2020-04-28 16:46:07

by Qais Yousef

[permalink] [raw]
Subject: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs

Uclamp exposes 3 sysctl knobs:

* sched_util_clamp_min
* sched_util_clamp_max
* sched_util_clamp_min_rt_default

Document them in sysctl/kernel.rst.

Signed-off-by: Qais Yousef <[email protected]>
CC: Jonathan Corbet <[email protected]>
CC: Juri Lelli <[email protected]>
CC: Vincent Guittot <[email protected]>
CC: Dietmar Eggemann <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ben Segall <[email protected]>
CC: Mel Gorman <[email protected]>
CC: Luis Chamberlain <[email protected]>
CC: Kees Cook <[email protected]>
CC: Iurii Zaikin <[email protected]>
CC: Quentin Perret <[email protected]>
CC: Valentin Schneider <[email protected]>
CC: Patrick Bellasi <[email protected]>
CC: Pavan Kondeti <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
---
Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 0d427fd10941..e7255f71493c 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
incurs a small amount of overhead in the scheduler but is
useful for debugging and performance tuning.

+sched_util_clamp_min:
+=====================
+
+Max allowed *minimum* utilization.
+
+Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
+value.
+
+It means that any requested uclamp.min value cannot be greater than
+sched_util_clamp_min, ie: it is restricted to the range
+[0:sched_util_clamp_min].
+
+sched_util_clamp_max:
+=====================
+
+Max allowed *maximum* utilization.
+
+Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
+value.
+
+It means that any requested uclamp.max value cannot be greater than
+sched_util_clamp_max, ie: it is restricted to the range
+[0:sched_util_clamp_max].
+
+sched_util_clamp_min_rt_default:
+================================
+
+By default Linux is tuned for performance. Which means that RT tasks always run
+at the highest frequency and most capable (highest capacity) CPU (in
+heterogeneous systems).
+
+Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
+SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
+run at the highest frequency and bias them to run on the biggest CPU.
+
+This knob allows admins to change the default behavior when uclamp is being
+used. In battery powered devices particularly, running at the maximum
+capacity and frequency will increase energy consumption and shorten the battery
+life.
+
+This knob is only effective for RT tasks which the user hasn't modified their
+requested uclamp.min value via sched_setattr() syscall.
+
+This knob will not escape the constraint imposed by sched_util_clamp_min
+defined above.
+
+Any modification is applied lazily on the next opportunity the scheduler needs
+to calculate the effective value of uclamp.min of the task.

seccomp
=======
--
2.17.1

2020-04-28 17:48:21

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs

Hi--

I have a few corrections for you below:

On 4/28/20 9:41 AM, Qais Yousef wrote:
> Uclamp exposes 3 sysctl knobs:
>
> * sched_util_clamp_min
> * sched_util_clamp_max
> * sched_util_clamp_min_rt_default
>
> Document them in sysctl/kernel.rst.
>
> Signed-off-by: Qais Yousef <[email protected]>
> CC: Jonathan Corbet <[email protected]>
> CC: Juri Lelli <[email protected]>
> CC: Vincent Guittot <[email protected]>
> CC: Dietmar Eggemann <[email protected]>
> CC: Steven Rostedt <[email protected]>
> CC: Ben Segall <[email protected]>
> CC: Mel Gorman <[email protected]>
> CC: Luis Chamberlain <[email protected]>
> CC: Kees Cook <[email protected]>
> CC: Iurii Zaikin <[email protected]>
> CC: Quentin Perret <[email protected]>
> CC: Valentin Schneider <[email protected]>
> CC: Patrick Bellasi <[email protected]>
> CC: Pavan Kondeti <[email protected]>
> CC: [email protected]
> CC: [email protected]
> CC: [email protected]
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
> 1 file changed, 48 insertions(+)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 0d427fd10941..e7255f71493c 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
> incurs a small amount of overhead in the scheduler but is
> useful for debugging and performance tuning.
>
> +sched_util_clamp_min:
> +=====================
> +
> +Max allowed *minimum* utilization.
> +
> +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> +value.
> +
> +It means that any requested uclamp.min value cannot be greater than
> +sched_util_clamp_min, ie: it is restricted to the range

i.e., it is

> +[0:sched_util_clamp_min].
> +
> +sched_util_clamp_max:
> +=====================
> +
> +Max allowed *maximum* utilization.
> +
> +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> +value.
> +
> +It means that any requested uclamp.max value cannot be greater than
> +sched_util_clamp_max, ie: it is restricted to the range

i.e., it is

> +[0:sched_util_clamp_max].
> +
> +sched_util_clamp_min_rt_default:
> +================================
> +
> +By default Linux is tuned for performance. Which means that RT tasks always run
> +at the highest frequency and most capable (highest capacity) CPU (in
> +heterogeneous systems).
> +
> +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to

by default, which

> +run at the highest frequency and bias them to run on the biggest CPU.

biases them

> +
> +This knob allows admins to change the default behavior when uclamp is being
> +used. In battery powered devices particularly, running at the maximum
> +capacity and frequency will increase energy consumption and shorten the battery
> +life.
> +
> +This knob is only effective for RT tasks which the user hasn't modified their
> +requested uclamp.min value via sched_setattr() syscall.
> +
> +This knob will not escape the constraint imposed by sched_util_clamp_min
> +defined above.
> +
> +Any modification is applied lazily on the next opportunity the scheduler needs
> +to calculate the effective value of uclamp.min of the task.
>
> seccomp
> =======
>

thanks.
--
~Randy

2020-04-29 09:56:07

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs

On 04/28/20 10:43, Randy Dunlap wrote:
> Hi--
>
> I have a few corrections for you below:

Thanks Randy. I applied all your suggestions for the next version.

Thanks

--
Qais Yousef

>
> On 4/28/20 9:41 AM, Qais Yousef wrote:
> > Uclamp exposes 3 sysctl knobs:
> >
> > * sched_util_clamp_min
> > * sched_util_clamp_max
> > * sched_util_clamp_min_rt_default
> >
> > Document them in sysctl/kernel.rst.
> >
> > Signed-off-by: Qais Yousef <[email protected]>
> > CC: Jonathan Corbet <[email protected]>
> > CC: Juri Lelli <[email protected]>
> > CC: Vincent Guittot <[email protected]>
> > CC: Dietmar Eggemann <[email protected]>
> > CC: Steven Rostedt <[email protected]>
> > CC: Ben Segall <[email protected]>
> > CC: Mel Gorman <[email protected]>
> > CC: Luis Chamberlain <[email protected]>
> > CC: Kees Cook <[email protected]>
> > CC: Iurii Zaikin <[email protected]>
> > CC: Quentin Perret <[email protected]>
> > CC: Valentin Schneider <[email protected]>
> > CC: Patrick Bellasi <[email protected]>
> > CC: Pavan Kondeti <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: [email protected]
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
> > 1 file changed, 48 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 0d427fd10941..e7255f71493c 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
> > incurs a small amount of overhead in the scheduler but is
> > useful for debugging and performance tuning.
> >
> > +sched_util_clamp_min:
> > +=====================
> > +
> > +Max allowed *minimum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.min value cannot be greater than
> > +sched_util_clamp_min, ie: it is restricted to the range
>
> i.e., it is
>
> > +[0:sched_util_clamp_min].
> > +
> > +sched_util_clamp_max:
> > +=====================
> > +
> > +Max allowed *maximum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.max value cannot be greater than
> > +sched_util_clamp_max, ie: it is restricted to the range
>
> i.e., it is
>
> > +[0:sched_util_clamp_max].
> > +
> > +sched_util_clamp_min_rt_default:
> > +================================
> > +
> > +By default Linux is tuned for performance. Which means that RT tasks always run
> > +at the highest frequency and most capable (highest capacity) CPU (in
> > +heterogeneous systems).
> > +
> > +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> > +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
>
> by default, which
>
> > +run at the highest frequency and bias them to run on the biggest CPU.
>
> biases them
>
> > +
> > +This knob allows admins to change the default behavior when uclamp is being
> > +used. In battery powered devices particularly, running at the maximum
> > +capacity and frequency will increase energy consumption and shorten the battery
> > +life.
> > +
> > +This knob is only effective for RT tasks which the user hasn't modified their
> > +requested uclamp.min value via sched_setattr() syscall.
> > +
> > +This knob will not escape the constraint imposed by sched_util_clamp_min
> > +defined above.
> > +
> > +Any modification is applied lazily on the next opportunity the scheduler needs
> > +to calculate the effective value of uclamp.min of the task.
> >
> > seccomp
> > =======
> >
>
> thanks.
> --
> ~Randy
>

2020-04-29 11:35:49

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Hi Qais,

On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:

[...]

>
> +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> +{
> + struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> +
> + if (unlikely(rt_task(p)) && !uc_se->user_defined)
> + uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> +}

Unlike system default clamp values, RT default value is written to
p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
big deal. Just sharing my observation. Is this how you expected it to work?

> +
> static inline struct uclamp_se
> uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> {
> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> static inline struct uclamp_se
> uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> {
> - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> - struct uclamp_se uc_max = uclamp_default[clamp_id];
> + struct uclamp_se uc_req, uc_max;
> +
> + /*
> + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> + */
> + uclamp_sync_util_min_rt_default(p);
> +
> + uc_req = uclamp_tg_restrict(p, clamp_id);
> + uc_max = uclamp_default[clamp_id];

We are calling uclamp_sync_util_min_rt_default() unnecessarily for
clamp_id == UCLAMP_MAX case. Would it be better to have a separate
uclamp_default for RT like uclamp_default_rt and select uc_max based
on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
we can cover the case of uclamp_util_min < uclamp_util_min_rt.

>
> /* System default restrictions always apply */
> if (unlikely(uc_req.value > uc_max.value))
> @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> loff_t *ppos)
> {
> bool update_root_tg = false;
> - int old_min, old_max;
> + int old_min, old_max, old_min_rt;
> int result;
>
> mutex_lock(&uclamp_mutex);
> old_min = sysctl_sched_uclamp_util_min;
> old_max = sysctl_sched_uclamp_util_max;
> + old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
>
> result = proc_dointvec(table, write, buffer, lenp, ppos);
> if (result)
> @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> goto undo;
> }
>
> + /*
> + * The new value will be applied to RT tasks the next time the
> + * scheduler needs to calculate the effective uclamp.min for that task,
> + * assuming the task is using the system default and not a user
> + * specified value. In the latter we shall leave the value as the user
> + * requested.
> + */
> + if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> + result = -EINVAL;
> + goto undo;
> + }
> +
> if (old_min != sysctl_sched_uclamp_util_min) {
> uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> sysctl_sched_uclamp_util_min, false);
> @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> undo:
> sysctl_sched_uclamp_util_min = old_min;
> sysctl_sched_uclamp_util_max = old_max;
> + sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> done:
> mutex_unlock(&uclamp_mutex);
>
> @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> if (uc_se->user_defined)
> continue;
>
> - /* By default, RT tasks always get 100% boost */
> + /*
> + * By default, RT tasks always get 100% boost, which the admins
> + * are allowed to change via
> + * sysctl_sched_uclamp_util_min_rt_default knob.
> + */
> if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> - clamp_value = uclamp_none(UCLAMP_MAX);
> + clamp_value = sysctl_sched_uclamp_util_min_rt_default;
>
> uclamp_se_set(uc_se, clamp_value, false);
> }
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a176d8727a3..64117363c502 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> .mode = 0644,
> .proc_handler = sysctl_sched_uclamp_handler,
> },
> + {
> + .procname = "sched_util_clamp_min_rt_default",
> + .data = &sysctl_sched_uclamp_util_min_rt_default,
> + .maxlen = sizeof(unsigned int),
> + .mode = 0644,
> + .proc_handler = sysctl_sched_uclamp_handler,
> + },
> #endif
> #ifdef CONFIG_SCHED_AUTOGROUP
> {
> --
> 2.17.1
>

Thanks,
Pavan

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-04-29 12:32:49

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Hi Pavan

On 04/29/20 17:02, Pavan Kondeti wrote:
> Hi Qais,
>
> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
>
> [...]
>
> >
> > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > +{
> > + struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > +
> > + if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > + uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > +}
>
> Unlike system default clamp values, RT default value is written to
> p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> big deal. Just sharing my observation. Is this how you expected it to work?

This is how I expect it to work yes.

Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
values' as you called them, are constraints. They define the range any task is
_allowed to_ *request*.

The new sysctl_sched_uclamp_util_min_rt_default is setting the default
*request* for RT tasks. And this was done since historically RT tasks has
always run at the highest performance point in Linux.

Can you think of a scenario where setting the default *request* of uclamp.max
for all RT tasks helps?

I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
suffice. Or one can use cgroup to selectively cap some tasks.

For example if you don't want any task in the system to be boosted to anything
higher than 800, just do

sysctl_sched_uclamp_util_max = 800

Or if you want to be selective, the same thing can be achieved by creating
a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.

>
> > +
> > static inline struct uclamp_se
> > uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > {
> > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > static inline struct uclamp_se
> > uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> > {
> > - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > - struct uclamp_se uc_max = uclamp_default[clamp_id];
> > + struct uclamp_se uc_req, uc_max;
> > +
> > + /*
> > + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > + */
> > + uclamp_sync_util_min_rt_default(p);
> > +
> > + uc_req = uclamp_tg_restrict(p, clamp_id);
> > + uc_max = uclamp_default[clamp_id];
>
> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> clamp_id == UCLAMP_MAX case. Would it be better to have a separate

It was actually intentional to make sure we update the value ASAP. I didn't
think it's a lot of overhead. I can further protect with a check to verify
whether the value has changed if it seems heavy handed.

> uclamp_default for RT like uclamp_default_rt and select uc_max based
> on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> we can cover the case of uclamp_util_min < uclamp_util_min_rt.

Hmm I'm not sure I got you.

uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
to take.

What's added here is simply setting the requested uclamp.min value, which if
the constraints allow it will be applied.

For example. If

sysctl_sched_uclamp_util_min = 0
sysctl_sched_uclamp_util_min_rt_default = 400

uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
while the task still requests to be boosted to 1024, but the system constraint
prevents it from getting it. But as soon as the system constraint has changed,
the task might be able to get what it wants.

For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024

sysctl_sched_uclamp_util_min = 1024
sysctl_sched_uclamp_util_min_rt_default = 400

uclamp_eff_get() return 400 for the task now, as requested.

Did this help answering your questions?

Thanks

--
Qais Yousef

>
> >
> > /* System default restrictions always apply */
> > if (unlikely(uc_req.value > uc_max.value))
> > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > loff_t *ppos)
> > {
> > bool update_root_tg = false;
> > - int old_min, old_max;
> > + int old_min, old_max, old_min_rt;
> > int result;
> >
> > mutex_lock(&uclamp_mutex);
> > old_min = sysctl_sched_uclamp_util_min;
> > old_max = sysctl_sched_uclamp_util_max;
> > + old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> >
> > result = proc_dointvec(table, write, buffer, lenp, ppos);
> > if (result)
> > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > goto undo;
> > }
> >
> > + /*
> > + * The new value will be applied to RT tasks the next time the
> > + * scheduler needs to calculate the effective uclamp.min for that task,
> > + * assuming the task is using the system default and not a user
> > + * specified value. In the latter we shall leave the value as the user
> > + * requested.
> > + */
> > + if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > + result = -EINVAL;
> > + goto undo;
> > + }
> > +
> > if (old_min != sysctl_sched_uclamp_util_min) {
> > uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> > sysctl_sched_uclamp_util_min, false);
> > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > undo:
> > sysctl_sched_uclamp_util_min = old_min;
> > sysctl_sched_uclamp_util_max = old_max;
> > + sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> > done:
> > mutex_unlock(&uclamp_mutex);
> >
> > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > if (uc_se->user_defined)
> > continue;
> >
> > - /* By default, RT tasks always get 100% boost */
> > + /*
> > + * By default, RT tasks always get 100% boost, which the admins
> > + * are allowed to change via
> > + * sysctl_sched_uclamp_util_min_rt_default knob.
> > + */
> > if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > - clamp_value = uclamp_none(UCLAMP_MAX);
> > + clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> >
> > uclamp_se_set(uc_se, clamp_value, false);
> > }
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index 8a176d8727a3..64117363c502 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> > .mode = 0644,
> > .proc_handler = sysctl_sched_uclamp_handler,
> > },
> > + {
> > + .procname = "sched_util_clamp_min_rt_default",
> > + .data = &sysctl_sched_uclamp_util_min_rt_default,
> > + .maxlen = sizeof(unsigned int),
> > + .mode = 0644,
> > + .proc_handler = sysctl_sched_uclamp_handler,
> > + },
> > #endif
> > #ifdef CONFIG_SCHED_AUTOGROUP
> > {
> > --
> > 2.17.1
> >
>
> Thanks,
> Pavan
>
> --
> Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-04-29 15:24:00

by Pavankumar Kondeti

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

Hi Qais,

On Wed, Apr 29, 2020 at 01:30:57PM +0100, Qais Yousef wrote:
> Hi Pavan
>
> On 04/29/20 17:02, Pavan Kondeti wrote:
> > Hi Qais,
> >
> > On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
> >
> > [...]
> >
> > >
> > > +static void uclamp_sync_util_min_rt_default(struct task_struct *p)
> > > +{
> > > + struct uclamp_se *uc_se = &p->uclamp_req[UCLAMP_MIN];
> > > +
> > > + if (unlikely(rt_task(p)) && !uc_se->user_defined)
> > > + uclamp_se_set(uc_se, sysctl_sched_uclamp_util_min_rt_default, false);
> > > +}
> >
> > Unlike system default clamp values, RT default value is written to
> > p->uclamp_req[UCLAMP_MIN]. A user may not be able to set the uclamp.max to a
> > lower value than sysctl_sched_uclamp_util_min_rt_default. This is not a
> > big deal. Just sharing my observation. Is this how you expected it to work?
>
> This is how I expect it to work yes.
>
> Note that the sysctl_sched_uclamp_util_{min,max}, or 'system default clamp
> values' as you called them, are constraints. They define the range any task is
> _allowed to_ *request*.
>
> The new sysctl_sched_uclamp_util_min_rt_default is setting the default
> *request* for RT tasks. And this was done since historically RT tasks has
> always run at the highest performance point in Linux.
>
> Can you think of a scenario where setting the default *request* of uclamp.max
> for all RT tasks helps?
>

It was a hypothetical question :-) Thanks for clearly explaining the
difference between system default clamp values (constraints) vs the newly
introduced sysctl_sched_uclamp_util_min_rt_default (request for RT tasks).

> I'm thinking, the global constrain of sysctl_sched_uclamp_util_max should
> suffice. Or one can use cgroup to selectively cap some tasks.
>
> For example if you don't want any task in the system to be boosted to anything
> higher than 800, just do
>
> sysctl_sched_uclamp_util_max = 800
>
> Or if you want to be selective, the same thing can be achieved by creating
> a cgroup cpu controller that has uclamp.max = 0.8 and attaching tasks to it.
>
> >
> > > +
> > > static inline struct uclamp_se
> > > uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > > {
> > > @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> > > static inline struct uclamp_se
> > > uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> > > {
> > > - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> > > - struct uclamp_se uc_max = uclamp_default[clamp_id];
> > > + struct uclamp_se uc_req, uc_max;
> > > +
> > > + /*
> > > + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> > > + */
> > > + uclamp_sync_util_min_rt_default(p);
> > > +
> > > + uc_req = uclamp_tg_restrict(p, clamp_id);
> > > + uc_max = uclamp_default[clamp_id];
> >
> > We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> > clamp_id == UCLAMP_MAX case. Would it be better to have a separate
>
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.
>
> > uclamp_default for RT like uclamp_default_rt and select uc_max based
> > on task policy? Since all tunables are handled in sysctl_sched_uclamp_handler
> > we can cover the case of uclamp_util_min < uclamp_util_min_rt.
>
> Hmm I'm not sure I got you.
>
> uclamp_default[] is setting the constraints. I.e. what's the range it's allowed
> to take.
>
> What's added here is simply setting the requested uclamp.min value, which if
> the constraints allow it will be applied.
>
> For example. If
>
> sysctl_sched_uclamp_util_min = 0
> sysctl_sched_uclamp_util_min_rt_default = 400
>
> uclamp_eff_get() will return 0 as effective uclamp.min value for the task. So
> while the task still requests to be boosted to 1024, but the system constraint
> prevents it from getting it. But as soon as the system constraint has changed,
> the task might be able to get what it wants.
>
> For example, if at runtime sysctl_sched_uclamp_util_min was modified to 1024
>
> sysctl_sched_uclamp_util_min = 1024
> sysctl_sched_uclamp_util_min_rt_default = 400
>
> uclamp_eff_get() return 400 for the task now, as requested.
>

Yes, I did understand the relation between sysctl_sched_uclamp_util_min and
sysctl_sched_uclamp_util_min_rt_default . It is not clear why we are copying
sysctl_sched_uclamp_util_min_rt_default into p->uclamp_req[UCLAMP_MIN] though
user did not explicitly asked for it. In the above reply, you clearly
mentioned that the new knob works like a request from the user space for all
RT tasks.

As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
sched_setattr() was not called before. I guess that is expected behavior with
your definition of this new tunable. Thanks for answering the question in
detail.

Thanks,
Pavan

> Thanks
>
> --
> Qais Yousef
>
> >
> > >
> > > /* System default restrictions always apply */
> > > if (unlikely(uc_req.value > uc_max.value))
> > > @@ -1114,12 +1149,13 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > loff_t *ppos)
> > > {
> > > bool update_root_tg = false;
> > > - int old_min, old_max;
> > > + int old_min, old_max, old_min_rt;
> > > int result;
> > >
> > > mutex_lock(&uclamp_mutex);
> > > old_min = sysctl_sched_uclamp_util_min;
> > > old_max = sysctl_sched_uclamp_util_max;
> > > + old_min_rt = sysctl_sched_uclamp_util_min_rt_default;
> > >
> > > result = proc_dointvec(table, write, buffer, lenp, ppos);
> > > if (result)
> > > @@ -1133,6 +1169,18 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > goto undo;
> > > }
> > >
> > > + /*
> > > + * The new value will be applied to RT tasks the next time the
> > > + * scheduler needs to calculate the effective uclamp.min for that task,
> > > + * assuming the task is using the system default and not a user
> > > + * specified value. In the latter we shall leave the value as the user
> > > + * requested.
> > > + */
> > > + if (sysctl_sched_uclamp_util_min_rt_default > SCHED_CAPACITY_SCALE) {
> > > + result = -EINVAL;
> > > + goto undo;
> > > + }
> > > +
> > > if (old_min != sysctl_sched_uclamp_util_min) {
> > > uclamp_se_set(&uclamp_default[UCLAMP_MIN],
> > > sysctl_sched_uclamp_util_min, false);
> > > @@ -1158,6 +1206,7 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write,
> > > undo:
> > > sysctl_sched_uclamp_util_min = old_min;
> > > sysctl_sched_uclamp_util_max = old_max;
> > > + sysctl_sched_uclamp_util_min_rt_default = old_min_rt;
> > > done:
> > > mutex_unlock(&uclamp_mutex);
> > >
> > > @@ -1200,9 +1249,13 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > > if (uc_se->user_defined)
> > > continue;
> > >
> > > - /* By default, RT tasks always get 100% boost */
> > > + /*
> > > + * By default, RT tasks always get 100% boost, which the admins
> > > + * are allowed to change via
> > > + * sysctl_sched_uclamp_util_min_rt_default knob.
> > > + */
> > > if (unlikely(rt_task(p) && clamp_id == UCLAMP_MIN))
> > > - clamp_value = uclamp_none(UCLAMP_MAX);
> > > + clamp_value = sysctl_sched_uclamp_util_min_rt_default;
> > >
> > > uclamp_se_set(uc_se, clamp_value, false);
> > > }
> > > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > > index 8a176d8727a3..64117363c502 100644
> > > --- a/kernel/sysctl.c
> > > +++ b/kernel/sysctl.c
> > > @@ -453,6 +453,13 @@ static struct ctl_table kern_table[] = {
> > > .mode = 0644,
> > > .proc_handler = sysctl_sched_uclamp_handler,
> > > },
> > > + {
> > > + .procname = "sched_util_clamp_min_rt_default",
> > > + .data = &sysctl_sched_uclamp_util_min_rt_default,
> > > + .maxlen = sizeof(unsigned int),
> > > + .mode = 0644,
> > > + .proc_handler = sysctl_sched_uclamp_handler,
> > > + },
> > > #endif
> > > #ifdef CONFIG_SCHED_AUTOGROUP
> > > {
> > > --
> > > 2.17.1
> > >
> >
> > Thanks,
> > Pavan
> >
> > --
> > Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

2020-04-29 15:42:48

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

On 04/29/20 20:51, Pavan Kondeti wrote:
> As we are copying the sysctl_sched_uclamp_util_min_rt_default value into
> p->uclamp_req[UCLAMP_MIN], user gets it when sched_getattr() is called though
> sched_setattr() was not called before. I guess that is expected behavior with
> your definition of this new tunable. Thanks for answering the question in
> detail.

Yes. That's the original design without this patch actually. Though before it
was always set to 1024.

Thanks for having a look!

--
Qais Yousef

2020-04-30 10:07:29

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] Documentation/sysctl: Document uclamp sysctl knobs

On 04/30/20 13:06, Tao Zhou wrote:
> Hi,
>
> On Tue, Apr 28, 2020 at 05:41:34PM +0100, Qais Yousef wrote:
> > Uclamp exposes 3 sysctl knobs:
> >
> > * sched_util_clamp_min
> > * sched_util_clamp_max
> > * sched_util_clamp_min_rt_default
>
> > Document them in sysctl/kernel.rst.
> >
> > Signed-off-by: Qais Yousef <[email protected]>
> > CC: Jonathan Corbet <[email protected]>
> > CC: Juri Lelli <[email protected]>
> > CC: Vincent Guittot <[email protected]>
> > CC: Dietmar Eggemann <[email protected]>
> > CC: Steven Rostedt <[email protected]>
> > CC: Ben Segall <[email protected]>
> > CC: Mel Gorman <[email protected]>
> > CC: Luis Chamberlain <[email protected]>
> > CC: Kees Cook <[email protected]>
> > CC: Iurii Zaikin <[email protected]>
> > CC: Quentin Perret <[email protected]>
> > CC: Valentin Schneider <[email protected]>
> > CC: Patrick Bellasi <[email protected]>
> > CC: Pavan Kondeti <[email protected]>
> > CC: [email protected]
> > CC: [email protected]
> > CC: [email protected]
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 48 +++++++++++++++++++++
> > 1 file changed, 48 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 0d427fd10941..e7255f71493c 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -940,6 +940,54 @@ Enables/disables scheduler statistics. Enabling this feature
> > incurs a small amount of overhead in the scheduler but is
> > useful for debugging and performance tuning.
> >
> > +sched_util_clamp_min:
> > +=====================
> > +
> > +Max allowed *minimum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.min value cannot be greater than
> ^^^^^^^
>
> Seems that 'greater' should be 'smaller'.
> And the range is [sched_util_clamp_min:SCHED_CAPACITY_SCALE]
> Uclamp request should not below this system value.
> Or I am totally wrong for memory leak.

No the range is [0:sched_util_clamp_min].

It is what it is implemented by this condition:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/core.c?h=v5.7-rc3#n913

So if the requested uclamp.min value is greater than sched_util_clamp_min, we
return sched_util_clamp_min.

Thanks

--
Qais Yousef

>
> Thank you
>
> > +sched_util_clamp_min, ie: it is restricted to the range
> > +[0:sched_util_clamp_min].
> > +
> > +sched_util_clamp_max:
> > +=====================
> > +
> > +Max allowed *maximum* utilization.
> > +
> > +Default value is SCHED_CAPACITY_SCALE (1024), which is the maximum possible
> > +value.
> > +
> > +It means that any requested uclamp.max value cannot be greater than
> > +sched_util_clamp_max, ie: it is restricted to the range
> > +[0:sched_util_clamp_max].
> > +
> > +sched_util_clamp_min_rt_default:
> > +================================
> > +
> > +By default Linux is tuned for performance. Which means that RT tasks always run
> > +at the highest frequency and most capable (highest capacity) CPU (in
> > +heterogeneous systems).
> > +
> > +Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
> > +SCHED_CAPACITY_SCALE (1024) by default. Which effectively boosts the tasks to
> > +run at the highest frequency and bias them to run on the biggest CPU.
> > +
> > +This knob allows admins to change the default behavior when uclamp is being
> > +used. In battery powered devices particularly, running at the maximum
> > +capacity and frequency will increase energy consumption and shorten the battery
> > +life.
> > +
> > +This knob is only effective for RT tasks which the user hasn't modified their
> > +requested uclamp.min value via sched_setattr() syscall.
> > +
> > +This knob will not escape the constraint imposed by sched_util_clamp_min
> > +defined above.
> > +
> > +Any modification is applied lazily on the next opportunity the scheduler needs
> > +to calculate the effective value of uclamp.min of the task.
> >
> > seccomp
> > =======
> > --
> > 2.17.1
> >

2020-04-30 18:24:07

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

On 29/04/2020 14:30, Qais Yousef wrote:
> Hi Pavan
>
> On 04/29/20 17:02, Pavan Kondeti wrote:
>> Hi Qais,
>>
>> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:

[...]

>>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
>>> static inline struct uclamp_se
>>> uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
>>> {
>>> - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
>>> - struct uclamp_se uc_max = uclamp_default[clamp_id];
>>> + struct uclamp_se uc_req, uc_max;
>>> +
>>> + /*
>>> + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
>>> + */
>>> + uclamp_sync_util_min_rt_default(p);
>>> +
>>> + uc_req = uclamp_tg_restrict(p, clamp_id);
>>> + uc_max = uclamp_default[clamp_id];
>>
>> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
>> clamp_id == UCLAMP_MAX case. Would it be better to have a separate
>
> It was actually intentional to make sure we update the value ASAP. I didn't
> think it's a lot of overhead. I can further protect with a check to verify
> whether the value has changed if it seems heavy handed.

Users of uclamp_eff_value()->uclamp_eff_get() ((like
rt_task_fits_capacity())) always call both ids.

So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would
make sense. It's overhead in the fast path for rt tasks.

Since changes to sched_util_clamp_min_rt_default will be fairly rare,
you might even want to consider only doing the uclamp_se_set(...,
min_rt_default, ...) in case

uc_se->value != sysctl_sched_uclamp_util_min_rt_default

[...]

2020-05-01 11:07:01

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/uclamp: Add a new sysctl to control RT default boost value

On 04/30/20 20:21, Dietmar Eggemann wrote:
> On 29/04/2020 14:30, Qais Yousef wrote:
> > Hi Pavan
> >
> > On 04/29/20 17:02, Pavan Kondeti wrote:
> >> Hi Qais,
> >>
> >> On Tue, Apr 28, 2020 at 05:41:33PM +0100, Qais Yousef wrote:
>
> [...]
>
> >>> @@ -907,8 +935,15 @@ uclamp_tg_restrict(struct task_struct *p, enum uclamp_id clamp_id)
> >>> static inline struct uclamp_se
> >>> uclamp_eff_get(struct task_struct *p, enum uclamp_id clamp_id)
> >>> {
> >>> - struct uclamp_se uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> - struct uclamp_se uc_max = uclamp_default[clamp_id];
> >>> + struct uclamp_se uc_req, uc_max;
> >>> +
> >>> + /*
> >>> + * Sync up any change to sysctl_sched_uclamp_util_min_rt_default value.
> >>> + */
> >>> + uclamp_sync_util_min_rt_default(p);
> >>> +
> >>> + uc_req = uclamp_tg_restrict(p, clamp_id);
> >>> + uc_max = uclamp_default[clamp_id];
> >>
> >> We are calling uclamp_sync_util_min_rt_default() unnecessarily for
> >> clamp_id == UCLAMP_MAX case. Would it be better to have a separate
> >
> > It was actually intentional to make sure we update the value ASAP. I didn't
> > think it's a lot of overhead. I can further protect with a check to verify
> > whether the value has changed if it seems heavy handed.
>
> Users of uclamp_eff_value()->uclamp_eff_get() ((like
> rt_task_fits_capacity())) always call both ids.
>
> So calling uclamp_sync_util_min_rt_default() only for UCLAMP_MIN would
> make sense. It's overhead in the fast path for rt tasks.
>
> Since changes to sched_util_clamp_min_rt_default will be fairly rare,
> you might even want to consider only doing the uclamp_se_set(...,
> min_rt_default, ...) in case
>
> uc_se->value != sysctl_sched_uclamp_util_min_rt_default

Okay will do though I'm not convinced this micro optimization makes any
difference. p->uclamp_req[] is already read, so it should be cache hot.
uclamp_set_se() performs 3 writes on it, so I expect no stalls. Even if it
was a write-through cache, there's usually a write buffer that I think should
be able to hide the 3 writes. Write-through + linux is a bad idea in general.

In contrary, the complex if condition might make branch prediction less
effective, hence this micro optimization might create a negative effect.

So I don't see this a clear win and it would be hard to know without proper
measurement really. It is cheaper sometimes to always execute something than
sprinkle more branches to avoid executing this simple code.

I just realized though that I didn't define the
uclamp_sync_util_min_rt_default() as inline, which I should to avoid a function
call. Compiler *should* be smart though :p

Thanks

--
Qais Yousef