2023-04-16 21:43:55

by David Dai

[permalink] [raw]
Subject: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

A userspace service may manage uclamp dynamically for individual tasks and
a child task will unintentionally inherit a pesudo-random uclamp setting.
This could result in the child task being stuck with a static uclamp value
that results in poor performance or poor power.

Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
reset other useful scheduler attributes. Adding a
SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
over scheduler attributes of child processes.

Cc: Qais Yousef <[email protected]>
Cc: Quentin Perret <[email protected]>
Cc: Saravana Kannan <[email protected]>
Signed-off-by: David Dai <[email protected]>
---
include/linux/sched.h | 3 +++
include/uapi/linux/sched.h | 4 +++-
kernel/sched/core.c | 6 +++++-
tools/include/uapi/linux/sched.h | 4 +++-
4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 63d242164b1a..b1676b9381f9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -885,6 +885,9 @@ struct task_struct {
unsigned sched_reset_on_fork:1;
unsigned sched_contributes_to_load:1;
unsigned sched_migrated:1;
+#ifdef CONFIG_UCLAMP_TASK
+ unsigned sched_reset_uclamp_on_fork:1;
+#endif

/* Force alignment to the next boundary: */
unsigned :0;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 3bac0a8ceab2..7515106e1f1a 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -132,12 +132,14 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)

#define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
- SCHED_FLAG_UTIL_CLAMP_MAX)
+ SCHED_FLAG_UTIL_CLAMP_MAX | \
+ SCHED_FLAG_RESET_UCLAMP_ON_FORK)

#define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
SCHED_FLAG_RECLAIM | \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0d18c3969f90..f2d5f7911855 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1943,6 +1943,10 @@ static void __setscheduler_uclamp(struct task_struct *p,
uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
attr->sched_util_max, true);
}
+
+ p->sched_reset_uclamp_on_fork = !!(attr->sched_flags &
+ SCHED_FLAG_RESET_UCLAMP_ON_FORK);
+
}

static void uclamp_fork(struct task_struct *p)
@@ -1956,7 +1960,7 @@ static void uclamp_fork(struct task_struct *p)
for_each_clamp_id(clamp_id)
p->uclamp[clamp_id].active = false;

- if (likely(!p->sched_reset_on_fork))
+ if (likely(!p->sched_reset_on_fork && !p->sched_reset_uclamp_on_fork))
return;

for_each_clamp_id(clamp_id) {
diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
index 3bac0a8ceab2..d52c59a2e0d0 100644
--- a/tools/include/uapi/linux/sched.h
+++ b/tools/include/uapi/linux/sched.h
@@ -132,12 +132,14 @@ struct clone_args {
#define SCHED_FLAG_KEEP_PARAMS 0x10
#define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
#define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
+#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80

#define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
SCHED_FLAG_KEEP_PARAMS)

#define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
- SCHED_FLAG_UTIL_CLAMP_MAX)
+ SCHED_FLAG_UTIL_CLAMP_MAX | \
+ SCHED_FLAG_RESET_UCLAMP_ON_FORK)

#define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
SCHED_FLAG_RECLAIM | \
--
2.40.0.634.g4ca3ef3211-goog


2023-04-19 05:28:47

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 16/04/2023 23:34, David Dai wrote:
> A userspace service may manage uclamp dynamically for individual tasks and
> a child task will unintentionally inherit a pesudo-random uclamp setting.
> This could result in the child task being stuck with a static uclamp value

Could you explain this with a little bit more detail? Why isn't the
child task also managed by the userspace service?

The child task will only make a difference if it's on the rq.

Does this issue happen with uclamp mainline or only with Android's
slightly different version (max- vs. sum aggregation)?

> that results in poor performance or poor power.
>
> Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
> reset other useful scheduler attributes. Adding a
> SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
> over scheduler attributes of child processes.

[...]

2023-04-19 18:04:47

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

Hi David!

On 04/16/23 14:34, David Dai wrote:
> A userspace service may manage uclamp dynamically for individual tasks and
> a child task will unintentionally inherit a pesudo-random uclamp setting.
> This could result in the child task being stuck with a static uclamp value
> that results in poor performance or poor power.
>
> Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
> reset other useful scheduler attributes. Adding a
> SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
> over scheduler attributes of child processes.

Thanks a lot for the patch. This has a been a known limitation for a while but
didn't manage to find the time to push anything yet.

ADPF (Android Dynamic Performance Framework) exposes APIs to manage performance
for a set of pids [1]. Only these tasks belong to the session and any forked
tasked is expected to have its uclamp values reset. But as you pointed out, the
current RESET_ON_FORK resets everything, but we don't want that as these
attributes don't belong to ADPF to decide whether they should be reset too or
not. And not resetting them means we can end up with tasks inheriting random
uclamp values unintentionally. We can't tell these tasks not to fork anything.
If the forked tasks are expected to be part of the session, then their pids
must be added explicitly.

[1] https://developer.android.com/reference/android/os/PerformanceHintManager#createHintSession(int%5B%5D,%20long)

>
> Cc: Qais Yousef <[email protected]>
> Cc: Quentin Perret <[email protected]>
> Cc: Saravana Kannan <[email protected]>
> Signed-off-by: David Dai <[email protected]>
> ---
> include/linux/sched.h | 3 +++
> include/uapi/linux/sched.h | 4 +++-
> kernel/sched/core.c | 6 +++++-
> tools/include/uapi/linux/sched.h | 4 +++-
> 4 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 63d242164b1a..b1676b9381f9 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -885,6 +885,9 @@ struct task_struct {
> unsigned sched_reset_on_fork:1;

nit: can't we convert to a flag and re-use?

> unsigned sched_contributes_to_load:1;
> unsigned sched_migrated:1;
> +#ifdef CONFIG_UCLAMP_TASK
> + unsigned sched_reset_uclamp_on_fork:1;
> +#endif
>
> /* Force alignment to the next boundary: */
> unsigned :0;
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 3bac0a8ceab2..7515106e1f1a 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -132,12 +132,14 @@ struct clone_args {
> #define SCHED_FLAG_KEEP_PARAMS 0x10
> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> +#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80
>
> #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> SCHED_FLAG_KEEP_PARAMS)
>
> #define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
> - SCHED_FLAG_UTIL_CLAMP_MAX)
> + SCHED_FLAG_UTIL_CLAMP_MAX | \
> + SCHED_FLAG_RESET_UCLAMP_ON_FORK)

I was considering to have something a bit more generic that allows selecting
which attributes to reset.

For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
if we have other similar use cases in the future. The downside it *might*
require to be done in a separate syscall to the one that sets these parameter.
But it should be done once.

Maybe there's a better interface, but I think it makes sense to do it in a way
that we won't have to do this again. Would be good to hear from maintainers
first before you take my word for it ;-)


Cheers

--
Qais Yousef

>
> #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> SCHED_FLAG_RECLAIM | \
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0d18c3969f90..f2d5f7911855 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1943,6 +1943,10 @@ static void __setscheduler_uclamp(struct task_struct *p,
> uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
> attr->sched_util_max, true);
> }
> +
> + p->sched_reset_uclamp_on_fork = !!(attr->sched_flags &
> + SCHED_FLAG_RESET_UCLAMP_ON_FORK);
> +
> }
>
> static void uclamp_fork(struct task_struct *p)
> @@ -1956,7 +1960,7 @@ static void uclamp_fork(struct task_struct *p)
> for_each_clamp_id(clamp_id)
> p->uclamp[clamp_id].active = false;
>
> - if (likely(!p->sched_reset_on_fork))
> + if (likely(!p->sched_reset_on_fork && !p->sched_reset_uclamp_on_fork))
> return;
>
> for_each_clamp_id(clamp_id) {
> diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
> index 3bac0a8ceab2..d52c59a2e0d0 100644
> --- a/tools/include/uapi/linux/sched.h
> +++ b/tools/include/uapi/linux/sched.h
> @@ -132,12 +132,14 @@ struct clone_args {
> #define SCHED_FLAG_KEEP_PARAMS 0x10
> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> +#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80
>
> #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> SCHED_FLAG_KEEP_PARAMS)
>
> #define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
> - SCHED_FLAG_UTIL_CLAMP_MAX)
> + SCHED_FLAG_UTIL_CLAMP_MAX | \
> + SCHED_FLAG_RESET_UCLAMP_ON_FORK)
>
> #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> SCHED_FLAG_RECLAIM | \
> --
> 2.40.0.634.g4ca3ef3211-goog
>

2023-04-19 22:54:06

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Wed, Apr 19, 2023 at 10:54 AM Qais Yousef <[email protected]> wrote:
>
> Hi David!
>
> On 04/16/23 14:34, David Dai wrote:
> > A userspace service may manage uclamp dynamically for individual tasks and
> > a child task will unintentionally inherit a pesudo-random uclamp setting.
> > This could result in the child task being stuck with a static uclamp value
> > that results in poor performance or poor power.
> >
> > Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
> > reset other useful scheduler attributes. Adding a
> > SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
> > over scheduler attributes of child processes.
>
> Thanks a lot for the patch. This has a been a known limitation for a while but
> didn't manage to find the time to push anything yet.
>
> ADPF (Android Dynamic Performance Framework) exposes APIs to manage performance
> for a set of pids [1]. Only these tasks belong to the session and any forked
> tasked is expected to have its uclamp values reset. But as you pointed out, the
> current RESET_ON_FORK resets everything, but we don't want that as these
> attributes don't belong to ADPF to decide whether they should be reset too or
> not. And not resetting them means we can end up with tasks inheriting random
> uclamp values unintentionally. We can't tell these tasks not to fork anything.
> If the forked tasks are expected to be part of the session, then their pids
> must be added explicitly.
>
> [1] https://developer.android.com/reference/android/os/PerformanceHintManager#createHintSession(int%5B%5D,%20long)
>
> >
> > Cc: Qais Yousef <[email protected]>
> > Cc: Quentin Perret <[email protected]>
> > Cc: Saravana Kannan <[email protected]>
> > Signed-off-by: David Dai <[email protected]>
> > ---
> > include/linux/sched.h | 3 +++
> > include/uapi/linux/sched.h | 4 +++-
> > kernel/sched/core.c | 6 +++++-
> > tools/include/uapi/linux/sched.h | 4 +++-
> > 4 files changed, 14 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 63d242164b1a..b1676b9381f9 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -885,6 +885,9 @@ struct task_struct {
> > unsigned sched_reset_on_fork:1;
>
> nit: can't we convert to a flag and re-use?
>
> > unsigned sched_contributes_to_load:1;
> > unsigned sched_migrated:1;
> > +#ifdef CONFIG_UCLAMP_TASK
> > + unsigned sched_reset_uclamp_on_fork:1;
> > +#endif
> >
> > /* Force alignment to the next boundary: */
> > unsigned :0;
> > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > index 3bac0a8ceab2..7515106e1f1a 100644
> > --- a/include/uapi/linux/sched.h
> > +++ b/include/uapi/linux/sched.h
> > @@ -132,12 +132,14 @@ struct clone_args {
> > #define SCHED_FLAG_KEEP_PARAMS 0x10
> > #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> > #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> > +#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80
> >
> > #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> > SCHED_FLAG_KEEP_PARAMS)
> >
> > #define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
> > - SCHED_FLAG_UTIL_CLAMP_MAX)
> > + SCHED_FLAG_UTIL_CLAMP_MAX | \
> > + SCHED_FLAG_RESET_UCLAMP_ON_FORK)
>
> I was considering to have something a bit more generic that allows selecting
> which attributes to reset.
>
> For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> if we have other similar use cases in the future. The downside it *might*
> require to be done in a separate syscall to the one that sets these parameter.
> But it should be done once.

In addition to the downside you mentioned, I'm not a huge fan of this
suggestion since the meaning of the SCHED_FLAG_RESET_ON_FORK_SEL flag
changes based on what other flags or attrs are set. I'd rather we have
explicit flags.

SCHED_FLAG_RESET_ON_FORK_SEL makes it harder to maintain the userspace
code/makes it easy to accidentally introduce bugs. For example, a
syscall could be setting UCLAMP_MIN and RESET_ON_FORK_SEL. Someone
else might come and change the call to also set a nice value but not
remember to split it up into two calls. Whereas with an explicit flag
like David's proposal, we won't hit such an issue.

Also, we'll need to have separate flags internally to track what needs
to be reset on fork vs not. So we really aren't saving anything by
adding RESET_ON_FORK_SEL.

At least, that's my 2 cents.

-Saravana

> Maybe there's a better interface, but I think it makes sense to do it in a way
> that we won't have to do this again. Would be good to hear from maintainers
> first before you take my word for it ;-)
>
>
> Cheers
>
> --
> Qais Yousef
>
> >
> > #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> > SCHED_FLAG_RECLAIM | \
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 0d18c3969f90..f2d5f7911855 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1943,6 +1943,10 @@ static void __setscheduler_uclamp(struct task_struct *p,
> > uclamp_se_set(&p->uclamp_req[UCLAMP_MAX],
> > attr->sched_util_max, true);
> > }
> > +
> > + p->sched_reset_uclamp_on_fork = !!(attr->sched_flags &
> > + SCHED_FLAG_RESET_UCLAMP_ON_FORK);
> > +
> > }
> >
> > static void uclamp_fork(struct task_struct *p)
> > @@ -1956,7 +1960,7 @@ static void uclamp_fork(struct task_struct *p)
> > for_each_clamp_id(clamp_id)
> > p->uclamp[clamp_id].active = false;
> >
> > - if (likely(!p->sched_reset_on_fork))
> > + if (likely(!p->sched_reset_on_fork && !p->sched_reset_uclamp_on_fork))
> > return;
> >
> > for_each_clamp_id(clamp_id) {
> > diff --git a/tools/include/uapi/linux/sched.h b/tools/include/uapi/linux/sched.h
> > index 3bac0a8ceab2..d52c59a2e0d0 100644
> > --- a/tools/include/uapi/linux/sched.h
> > +++ b/tools/include/uapi/linux/sched.h
> > @@ -132,12 +132,14 @@ struct clone_args {
> > #define SCHED_FLAG_KEEP_PARAMS 0x10
> > #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> > #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> > +#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80
> >
> > #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> > SCHED_FLAG_KEEP_PARAMS)
> >
> > #define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
> > - SCHED_FLAG_UTIL_CLAMP_MAX)
> > + SCHED_FLAG_UTIL_CLAMP_MAX | \
> > + SCHED_FLAG_RESET_UCLAMP_ON_FORK)
> >
> > #define SCHED_FLAG_ALL (SCHED_FLAG_RESET_ON_FORK | \
> > SCHED_FLAG_RECLAIM | \
> > --
> > 2.40.0.634.g4ca3ef3211-goog
> >

2023-04-20 01:15:18

by David Dai

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
<[email protected]> wrote:
>

Hi Dietmar, thanks for your time,

> On 16/04/2023 23:34, David Dai wrote:
> > A userspace service may manage uclamp dynamically for individual tasks and
> > a child task will unintentionally inherit a pesudo-random uclamp setting.
> > This could result in the child task being stuck with a static uclamp value
>
> Could you explain this with a little bit more detail? Why isn't the
> child task also managed by the userspace service?

See Qais’ reply that contains more detail on how it’s being used in
Android. In general, if a dynamic userspace service will adjust uclamp
on the fly for a given task, but has no knowledge or control over if
or when a task forks. Depending on the timing of the fork, a child
task may inherit a very large or a small uclamp_min or uclamp_max
value. The intent of this patch is to provide more flexibility to the
uclamp APIs such that child tasks do not get stuck with a poor uclamp
value when spawned while retaining other sched attributes. When
RESET_ON_FORK is set on the parent task, it will reset uclamp values
for the child but also reset other sched attributes as well.

>
> The child task will only make a difference if it's on the rq.
>
> Does this issue happen with uclamp mainline or only with Android's
> slightly different version (max- vs. sum aggregation)?

I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
aggregated is unrelated to the problem I’m trying to solve with this
patch. Which is to extend the uclamp APIs to have finer control for
the uclamp inheritance of child tasks.

Thanks,
David

>
> > that results in poor performance or poor power.
> >
> > Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
> > reset other useful scheduler attributes. Adding a
> > SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
> > over scheduler attributes of child processes.
>
> [...]
>

2023-04-20 09:39:56

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 20/04/2023 03:11, David Dai wrote:
> On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> <[email protected]> wrote:
>>
>
> Hi Dietmar, thanks for your time,
>
>> On 16/04/2023 23:34, David Dai wrote:
>>> A userspace service may manage uclamp dynamically for individual tasks and
>>> a child task will unintentionally inherit a pesudo-random uclamp setting.
>>> This could result in the child task being stuck with a static uclamp value
>>
>> Could you explain this with a little bit more detail? Why isn't the
>> child task also managed by the userspace service?
>
> See Qais’ reply that contains more detail on how it’s being used in
> Android. In general, if a dynamic userspace service will adjust uclamp
> on the fly for a given task, but has no knowledge or control over if
> or when a task forks. Depending on the timing of the fork, a child
> task may inherit a very large or a small uclamp_min or uclamp_max
> value. The intent of this patch is to provide more flexibility to the
> uclamp APIs such that child tasks do not get stuck with a poor uclamp
> value when spawned while retaining other sched attributes. When
> RESET_ON_FORK is set on the parent task, it will reset uclamp values
> for the child but also reset other sched attributes as well.

OK, in this case, why not just change behavior and always reset the
uclamp values at fork?

Do we anticipate a use-case in which uclamp inheritance would be required?

Let's not over-complicate the sched_[sg]etattr() unnecessarily.

[...]

>> Does this issue happen with uclamp mainline or only with Android's
>> slightly different version (max- vs. sum aggregation)?
>
> I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
> aggregated is unrelated to the problem I’m trying to solve with this
> patch. Which is to extend the uclamp APIs to have finer control for
> the uclamp inheritance of child tasks.

OK, I see.

2023-04-20 13:48:48

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 04/19/23 15:49, Saravana Kannan wrote:
> On Wed, Apr 19, 2023 at 10:54 AM Qais Yousef <[email protected]> wrote:
> >
> > Hi David!
> >
> > On 04/16/23 14:34, David Dai wrote:
> > > A userspace service may manage uclamp dynamically for individual tasks and
> > > a child task will unintentionally inherit a pesudo-random uclamp setting.
> > > This could result in the child task being stuck with a static uclamp value
> > > that results in poor performance or poor power.
> > >
> > > Using SCHED_FLAG_RESET_ON_FORK is too coarse for this usecase and will
> > > reset other useful scheduler attributes. Adding a
> > > SCHED_FLAG_RESET_UCLAMP_ON_FORK will allow userspace to have finer control
> > > over scheduler attributes of child processes.
> >
> > Thanks a lot for the patch. This has a been a known limitation for a while but
> > didn't manage to find the time to push anything yet.
> >
> > ADPF (Android Dynamic Performance Framework) exposes APIs to manage performance
> > for a set of pids [1]. Only these tasks belong to the session and any forked
> > tasked is expected to have its uclamp values reset. But as you pointed out, the
> > current RESET_ON_FORK resets everything, but we don't want that as these
> > attributes don't belong to ADPF to decide whether they should be reset too or
> > not. And not resetting them means we can end up with tasks inheriting random
> > uclamp values unintentionally. We can't tell these tasks not to fork anything.
> > If the forked tasks are expected to be part of the session, then their pids
> > must be added explicitly.
> >
> > [1] https://developer.android.com/reference/android/os/PerformanceHintManager#createHintSession(int%5B%5D,%20long)
> >
> > >
> > > Cc: Qais Yousef <[email protected]>
> > > Cc: Quentin Perret <[email protected]>
> > > Cc: Saravana Kannan <[email protected]>
> > > Signed-off-by: David Dai <[email protected]>
> > > ---
> > > include/linux/sched.h | 3 +++
> > > include/uapi/linux/sched.h | 4 +++-
> > > kernel/sched/core.c | 6 +++++-
> > > tools/include/uapi/linux/sched.h | 4 +++-
> > > 4 files changed, 14 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index 63d242164b1a..b1676b9381f9 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -885,6 +885,9 @@ struct task_struct {
> > > unsigned sched_reset_on_fork:1;
> >
> > nit: can't we convert to a flag and re-use?
> >
> > > unsigned sched_contributes_to_load:1;
> > > unsigned sched_migrated:1;
> > > +#ifdef CONFIG_UCLAMP_TASK
> > > + unsigned sched_reset_uclamp_on_fork:1;
> > > +#endif
> > >
> > > /* Force alignment to the next boundary: */
> > > unsigned :0;
> > > diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> > > index 3bac0a8ceab2..7515106e1f1a 100644
> > > --- a/include/uapi/linux/sched.h
> > > +++ b/include/uapi/linux/sched.h
> > > @@ -132,12 +132,14 @@ struct clone_args {
> > > #define SCHED_FLAG_KEEP_PARAMS 0x10
> > > #define SCHED_FLAG_UTIL_CLAMP_MIN 0x20
> > > #define SCHED_FLAG_UTIL_CLAMP_MAX 0x40
> > > +#define SCHED_FLAG_RESET_UCLAMP_ON_FORK 0x80
> > >
> > > #define SCHED_FLAG_KEEP_ALL (SCHED_FLAG_KEEP_POLICY | \
> > > SCHED_FLAG_KEEP_PARAMS)
> > >
> > > #define SCHED_FLAG_UTIL_CLAMP (SCHED_FLAG_UTIL_CLAMP_MIN | \
> > > - SCHED_FLAG_UTIL_CLAMP_MAX)
> > > + SCHED_FLAG_UTIL_CLAMP_MAX | \
> > > + SCHED_FLAG_RESET_UCLAMP_ON_FORK)
> >
> > I was considering to have something a bit more generic that allows selecting
> > which attributes to reset.
> >
> > For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> > SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> > if we have other similar use cases in the future. The downside it *might*
> > require to be done in a separate syscall to the one that sets these parameter.
> > But it should be done once.
>
> In addition to the downside you mentioned, I'm not a huge fan of this
> suggestion since the meaning of the SCHED_FLAG_RESET_ON_FORK_SEL flag
> changes based on what other flags or attrs are set. I'd rather we have
> explicit flags.

The concern is that these flags are limited resources. latency_nice hopefully
is coming and I don't see uclamp is an exception to warrant its own unique
reset flag. Do you think we should never ever face similar exception again?

>
> SCHED_FLAG_RESET_ON_FORK_SEL makes it harder to maintain the userspace
> code/makes it easy to accidentally introduce bugs. For example, a
> syscall could be setting UCLAMP_MIN and RESET_ON_FORK_SEL. Someone
> else might come and change the call to also set a nice value but not
> remember to split it up into two calls. Whereas with an explicit flag
> like David's proposal, we won't hit such an issue.

I think this mode of failure exists today and not new. You'll have to remember
to set the right flag to keep policy etc otherwise you can end up with
accidental effect.

That was the first suggestion comes to mind, it could be done another ways
I suppose.

> Also, we'll need to have separate flags internally to track what needs
> to be reset on fork vs not. So we really aren't saving anything by
> adding RESET_ON_FORK_SEL.

I don't get you here. Do you mean in kernel or userspace we'll have to track?
I persume the former. It's just setting a flag in reset_on_fork variable.
I don't see the problem.

If preserving the flag space is not a concern, then yeah potentially this is
okay. Though in principle I think it doesn't make sense to continue to add new
flag for every potential similar exception. History tends to repeat itself.
I'm okay with keeping it simple if the maintainers don't share the concern
about the flag space.

user_check_sched_setscheduler() prevents none privileged users from clearing
reset_on_fork. Shouldn't we do the same?

Also we should make sure to clear it after the fork. Like is done for
reset_on_fork.


Cheers

--
Qais Yousef

2023-04-20 14:04:00

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Thu, 20 Apr 2023 at 11:37, Dietmar Eggemann <[email protected]> wrote:
>
> On 20/04/2023 03:11, David Dai wrote:
> > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > <[email protected]> wrote:
> >>
> >
> > Hi Dietmar, thanks for your time,
> >
> >> On 16/04/2023 23:34, David Dai wrote:
> >>> A userspace service may manage uclamp dynamically for individual tasks and
> >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> >>> This could result in the child task being stuck with a static uclamp value
> >>
> >> Could you explain this with a little bit more detail? Why isn't the
> >> child task also managed by the userspace service?
> >
> > See Qais’ reply that contains more detail on how it’s being used in
> > Android. In general, if a dynamic userspace service will adjust uclamp
> > on the fly for a given task, but has no knowledge or control over if
> > or when a task forks. Depending on the timing of the fork, a child
> > task may inherit a very large or a small uclamp_min or uclamp_max
> > value. The intent of this patch is to provide more flexibility to the
> > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > value when spawned while retaining other sched attributes. When
> > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > for the child but also reset other sched attributes as well.
>
> OK, in this case, why not just change behavior and always reset the
> uclamp values at fork?
>
> Do we anticipate a use-case in which uclamp inheritance would be required?
>
> Let's not over-complicate the sched_[sg]etattr() unnecessarily.

I was about to ask the same question and I'm aligned with Dietmar.
Use RESET_ON_FORK and set all attributes

>
> [...]
>
> >> Does this issue happen with uclamp mainline or only with Android's
> >> slightly different version (max- vs. sum aggregation)?
> >
> > I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
> > aggregated is unrelated to the problem I’m trying to solve with this
> > patch. Which is to extend the uclamp APIs to have finer control for
> > the uclamp inheritance of child tasks.
>
> OK, I see.

2023-04-20 16:31:08

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Thu, Apr 20, 2023 at 2:37 AM Dietmar Eggemann
<[email protected]> wrote:
>
> On 20/04/2023 03:11, David Dai wrote:
> > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > <[email protected]> wrote:
> >>
> >
> > Hi Dietmar, thanks for your time,
> >
> >> On 16/04/2023 23:34, David Dai wrote:
> >>> A userspace service may manage uclamp dynamically for individual tasks and
> >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> >>> This could result in the child task being stuck with a static uclamp value
> >>
> >> Could you explain this with a little bit more detail? Why isn't the
> >> child task also managed by the userspace service?
> >
> > See Qais’ reply that contains more detail on how it’s being used in
> > Android. In general, if a dynamic userspace service will adjust uclamp
> > on the fly for a given task, but has no knowledge or control over if
> > or when a task forks. Depending on the timing of the fork, a child
> > task may inherit a very large or a small uclamp_min or uclamp_max
> > value. The intent of this patch is to provide more flexibility to the
> > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > value when spawned while retaining other sched attributes. When
> > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > for the child but also reset other sched attributes as well.
>
> OK, in this case, why not just change behavior and always reset the
> uclamp values at fork?

Personally, I'd have preferred uclamp was never inherited in the first
place, but wouldn't that be considered as breaking UAPI if we change
it now?

-Saravana

> Do we anticipate a use-case in which uclamp inheritance would be required?
>
> Let's not over-complicate the sched_[sg]etattr() unnecessarily.
>
> [...]
>
> >> Does this issue happen with uclamp mainline or only with Android's
> >> slightly different version (max- vs. sum aggregation)?
> >
> > I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
> > aggregated is unrelated to the problem I’m trying to solve with this
> > patch. Which is to extend the uclamp APIs to have finer control for
> > the uclamp inheritance of child tasks.
>
> OK, I see.

2023-04-20 16:31:45

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Thu, Apr 20, 2023 at 6:44 AM Vincent Guittot
<[email protected]> wrote:
>
> On Thu, 20 Apr 2023 at 11:37, Dietmar Eggemann <[email protected]> wrote:
> >
> > On 20/04/2023 03:11, David Dai wrote:
> > > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > > <[email protected]> wrote:
> > >>
> > >
> > > Hi Dietmar, thanks for your time,
> > >
> > >> On 16/04/2023 23:34, David Dai wrote:
> > >>> A userspace service may manage uclamp dynamically for individual tasks and
> > >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> > >>> This could result in the child task being stuck with a static uclamp value
> > >>
> > >> Could you explain this with a little bit more detail? Why isn't the
> > >> child task also managed by the userspace service?
> > >
> > > See Qais’ reply that contains more detail on how it’s being used in
> > > Android. In general, if a dynamic userspace service will adjust uclamp
> > > on the fly for a given task, but has no knowledge or control over if
> > > or when a task forks. Depending on the timing of the fork, a child
> > > task may inherit a very large or a small uclamp_min or uclamp_max
> > > value. The intent of this patch is to provide more flexibility to the
> > > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > > value when spawned while retaining other sched attributes. When
> > > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > > for the child but also reset other sched attributes as well.
> >
> > OK, in this case, why not just change behavior and always reset the
> > uclamp values at fork?
> >
> > Do we anticipate a use-case in which uclamp inheritance would be required?
> >
> > Let's not over-complicate the sched_[sg]etattr() unnecessarily.
>
> I was about to ask the same question and I'm aligned with Dietmar.
> Use RESET_ON_FORK and set all attributes

That's racy though. If we have an external service (that's only
responsible for setting uclamp) setting all the attributes, the forked
thread could also be trying to set some of the attributes. Also, how
is this external service going to keep track of all the threads being
forked and set the right attributes for all of them?

If it's not considered a UAPI breakage, I'd rather we never inherit uclamp.

-Saravana

>
> >
> > [...]
> >
> > >> Does this issue happen with uclamp mainline or only with Android's
> > >> slightly different version (max- vs. sum aggregation)?
> > >
> > > I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
> > > aggregated is unrelated to the problem I’m trying to solve with this
> > > patch. Which is to extend the uclamp APIs to have finer control for
> > > the uclamp inheritance of child tasks.
> >
> > OK, I see.

2023-04-20 17:40:17

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 04/20/23 09:22, Saravana Kannan wrote:
> On Thu, Apr 20, 2023 at 2:37 AM Dietmar Eggemann
> <[email protected]> wrote:
> >
> > On 20/04/2023 03:11, David Dai wrote:
> > > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > > <[email protected]> wrote:
> > >>
> > >
> > > Hi Dietmar, thanks for your time,
> > >
> > >> On 16/04/2023 23:34, David Dai wrote:
> > >>> A userspace service may manage uclamp dynamically for individual tasks and
> > >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> > >>> This could result in the child task being stuck with a static uclamp value
> > >>
> > >> Could you explain this with a little bit more detail? Why isn't the
> > >> child task also managed by the userspace service?
> > >
> > > See Qais’ reply that contains more detail on how it’s being used in
> > > Android. In general, if a dynamic userspace service will adjust uclamp
> > > on the fly for a given task, but has no knowledge or control over if
> > > or when a task forks. Depending on the timing of the fork, a child
> > > task may inherit a very large or a small uclamp_min or uclamp_max
> > > value. The intent of this patch is to provide more flexibility to the
> > > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > > value when spawned while retaining other sched attributes. When
> > > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > > for the child but also reset other sched attributes as well.
> >
> > OK, in this case, why not just change behavior and always reset the
> > uclamp values at fork?
>
> Personally, I'd have preferred uclamp was never inherited in the first
> place, but wouldn't that be considered as breaking UAPI if we change
> it now?

Why is it okay to inherit priority but not uclamp? I see this is hacky and
setting a policy in the kernel. Let's not tailor it to specific use cases only
please.


--
Qais Yousef

2023-04-20 18:01:54

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 04/20/23 09:25, Saravana Kannan wrote:
> On Thu, Apr 20, 2023 at 6:44 AM Vincent Guittot
> <[email protected]> wrote:
> >
> > On Thu, 20 Apr 2023 at 11:37, Dietmar Eggemann <[email protected]> wrote:
> > >
> > > On 20/04/2023 03:11, David Dai wrote:
> > > > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > > > <[email protected]> wrote:
> > > >>
> > > >
> > > > Hi Dietmar, thanks for your time,
> > > >
> > > >> On 16/04/2023 23:34, David Dai wrote:
> > > >>> A userspace service may manage uclamp dynamically for individual tasks and
> > > >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> > > >>> This could result in the child task being stuck with a static uclamp value
> > > >>
> > > >> Could you explain this with a little bit more detail? Why isn't the
> > > >> child task also managed by the userspace service?
> > > >
> > > > See Qais’ reply that contains more detail on how it’s being used in
> > > > Android. In general, if a dynamic userspace service will adjust uclamp
> > > > on the fly for a given task, but has no knowledge or control over if
> > > > or when a task forks. Depending on the timing of the fork, a child
> > > > task may inherit a very large or a small uclamp_min or uclamp_max
> > > > value. The intent of this patch is to provide more flexibility to the
> > > > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > > > value when spawned while retaining other sched attributes. When
> > > > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > > > for the child but also reset other sched attributes as well.
> > >
> > > OK, in this case, why not just change behavior and always reset the
> > > uclamp values at fork?
> > >
> > > Do we anticipate a use-case in which uclamp inheritance would be required?
> > >
> > > Let's not over-complicate the sched_[sg]etattr() unnecessarily.
> >
> > I was about to ask the same question and I'm aligned with Dietmar.
> > Use RESET_ON_FORK and set all attributes
>
> That's racy though. If we have an external service (that's only
> responsible for setting uclamp) setting all the attributes, the forked
> thread could also be trying to set some of the attributes. Also, how
> is this external service going to keep track of all the threads being
> forked and set the right attributes for all of them?

I didn't understand what Vincent is suggesting. But based on your response it
seems we are supposed to track all the forks and reset everything. If we can
track all the forks and reset everything, then why use RESET_ON_FORK even or
rely on the kernel do anything actually..

This also smells like a hack.


Cheers

--
Qais Yousef

2023-04-21 15:20:45

by Vincent Guittot

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Thu, 20 Apr 2023 at 18:26, Saravana Kannan <[email protected]> wrote:
>
> On Thu, Apr 20, 2023 at 6:44 AM Vincent Guittot
> <[email protected]> wrote:
> >
> > On Thu, 20 Apr 2023 at 11:37, Dietmar Eggemann <[email protected]> wrote:
> > >
> > > On 20/04/2023 03:11, David Dai wrote:
> > > > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > > > <[email protected]> wrote:
> > > >>
> > > >
> > > > Hi Dietmar, thanks for your time,
> > > >
> > > >> On 16/04/2023 23:34, David Dai wrote:
> > > >>> A userspace service may manage uclamp dynamically for individual tasks and
> > > >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> > > >>> This could result in the child task being stuck with a static uclamp value
> > > >>
> > > >> Could you explain this with a little bit more detail? Why isn't the
> > > >> child task also managed by the userspace service?
> > > >
> > > > See Qais’ reply that contains more detail on how it’s being used in
> > > > Android. In general, if a dynamic userspace service will adjust uclamp
> > > > on the fly for a given task, but has no knowledge or control over if
> > > > or when a task forks. Depending on the timing of the fork, a child
> > > > task may inherit a very large or a small uclamp_min or uclamp_max
> > > > value. The intent of this patch is to provide more flexibility to the
> > > > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > > > value when spawned while retaining other sched attributes. When
> > > > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > > > for the child but also reset other sched attributes as well.
> > >
> > > OK, in this case, why not just change behavior and always reset the
> > > uclamp values at fork?
> > >
> > > Do we anticipate a use-case in which uclamp inheritance would be required?
> > >
> > > Let's not over-complicate the sched_[sg]etattr() unnecessarily.
> >
> > I was about to ask the same question and I'm aligned with Dietmar.
> > Use RESET_ON_FORK and set all attributes
>
> That's racy though. If we have an external service (that's only
> responsible for setting uclamp) setting all the attributes, the forked
> thread could also be trying to set some of the attributes. Also, how
> is this external service going to keep track of all the threads being
> forked and set the right attributes for all of them?

My assumption was that you didn't use RESET_ON_FORK because there were
other attributes that you wanted to keep from parent but it doesn't
seem to be the case so use RESET_ON_FORK and if needed the forked
thread will set its other attributes

>
> If it's not considered a UAPI breakage, I'd rather we never inherit uclamp.
>
> -Saravana
>
> >
> > >
> > > [...]
> > >
> > > >> Does this issue happen with uclamp mainline or only with Android's
> > > >> slightly different version (max- vs. sum aggregation)?
> > > >
> > > > I’m using the version of uclamp that’s in Linus’ tree. How uclamp is
> > > > aggregated is unrelated to the problem I’m trying to solve with this
> > > > patch. Which is to extend the uclamp APIs to have finer control for
> > > > the uclamp inheritance of child tasks.
> > >
> > > OK, I see.

2023-04-28 11:51:39

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

Hi Vincent

Sorry for the late response.

On 04/21/23 17:10, Vincent Guittot wrote:
> On Thu, 20 Apr 2023 at 18:26, Saravana Kannan <[email protected]> wrote:
> >
> > On Thu, Apr 20, 2023 at 6:44 AM Vincent Guittot
> > <[email protected]> wrote:
> > >
> > > On Thu, 20 Apr 2023 at 11:37, Dietmar Eggemann <[email protected]> wrote:
> > > >
> > > > On 20/04/2023 03:11, David Dai wrote:
> > > > > On Tue, Apr 18, 2023 at 10:18 PM Dietmar Eggemann
> > > > > <[email protected]> wrote:
> > > > >>
> > > > >
> > > > > Hi Dietmar, thanks for your time,
> > > > >
> > > > >> On 16/04/2023 23:34, David Dai wrote:
> > > > >>> A userspace service may manage uclamp dynamically for individual tasks and
> > > > >>> a child task will unintentionally inherit a pesudo-random uclamp setting.
> > > > >>> This could result in the child task being stuck with a static uclamp value
> > > > >>
> > > > >> Could you explain this with a little bit more detail? Why isn't the
> > > > >> child task also managed by the userspace service?
> > > > >
> > > > > See Qais’ reply that contains more detail on how it’s being used in
> > > > > Android. In general, if a dynamic userspace service will adjust uclamp
> > > > > on the fly for a given task, but has no knowledge or control over if
> > > > > or when a task forks. Depending on the timing of the fork, a child
> > > > > task may inherit a very large or a small uclamp_min or uclamp_max
> > > > > value. The intent of this patch is to provide more flexibility to the
> > > > > uclamp APIs such that child tasks do not get stuck with a poor uclamp
> > > > > value when spawned while retaining other sched attributes. When
> > > > > RESET_ON_FORK is set on the parent task, it will reset uclamp values
> > > > > for the child but also reset other sched attributes as well.
> > > >
> > > > OK, in this case, why not just change behavior and always reset the
> > > > uclamp values at fork?
> > > >
> > > > Do we anticipate a use-case in which uclamp inheritance would be required?
> > > >
> > > > Let's not over-complicate the sched_[sg]etattr() unnecessarily.
> > >
> > > I was about to ask the same question and I'm aligned with Dietmar.
> > > Use RESET_ON_FORK and set all attributes
> >
> > That's racy though. If we have an external service (that's only
> > responsible for setting uclamp) setting all the attributes, the forked
> > thread could also be trying to set some of the attributes. Also, how
> > is this external service going to keep track of all the threads being
> > forked and set the right attributes for all of them?
>
> My assumption was that you didn't use RESET_ON_FORK because there were
> other attributes that you wanted to keep from parent but it doesn't

Correct.

> seem to be the case so use RESET_ON_FORK and if needed the forked
> thread will set its other attributes

Hmm, it seems you think it's the parent who's trying to control the fork for
the child. But the situation we're in is different.

ADPF is a shared library/middleware/system service that provides higher level
API to manage the performance requirements for a group of tasks. They provide
a set of PIDs and a deadline, then continuously tell it how long they took to
finish their work. ADPF uses this info to manipulate uclamp_min dynamically on
their behalf so they finish their work in time. Explaining it very simply and
briefly here. It should make using uclamp_max easier too (though we are not
there yet) since the most efficient point is platform specific and the
middleware can help abstract this better in a portable way.

The problem is these tasks can be RT and/or can have their priorities/nice
values set by something else. But what we want is to make sure that we control
uclamp values only for the PIDs we were asked to manage, and anything forked
must have their uclamp values reset, but not the rest as this is outside of
this service available context. It gets delegated to handle performance hints
only and make using uclamp easier and generic. But this should not interfere
with policy and priority management done outside of its scope.

Apps that manage their own uclamp values wouldn't need this. But if we are to
build a smart middleware that provides a consistent and easier to use higher
level APIs, which is what we have here, we need this selective reset on fork.

Does this explain the problem better?

Note we noticed because we saw problems here. So we are not trying to be
theoretically cautious.


Thanks!

--
Qais Yousef

2023-04-28 12:05:40

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 04/19/23 18:54, Qais Yousef wrote:

[...]

> I was considering to have something a bit more generic that allows selecting
> which attributes to reset.
>
> For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> if we have other similar use cases in the future. The downside it *might*
> require to be done in a separate syscall to the one that sets these parameter.
> But it should be done once.
>
> Maybe there's a better interface, but I think it makes sense to do it in a way
> that we won't have to do this again. Would be good to hear from maintainers
> first before you take my word for it ;-)

Actually I think we can do a better and simpler generic interface. We don't
need a new flag. We can just add a new parameter for what to reset on fork.
When this value is 0 (which it should be by default), it means reset
everything.

// pseudo code

#define RESET_ON_FORK_ALL 0
#define RESET_ON_FORK_POLICY BIT(1) // implies resetting priority
#define RESET_ON_FORK_PRIORITY BIT(2)
#define RESET_ON_FORK_UCLAMP BIT(3)

struct sched_attr {
...
__u64 sched_reset_on_fork_flags;
};


Cheers

--
Qais Yousef

2023-04-28 18:17:57

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Fri, Apr 28, 2023 at 4:57 AM Qais Yousef <[email protected]> wrote:
>
> On 04/19/23 18:54, Qais Yousef wrote:
>
> [...]
>
> > I was considering to have something a bit more generic that allows selecting
> > which attributes to reset.
> >
> > For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> > SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> > if we have other similar use cases in the future. The downside it *might*
> > require to be done in a separate syscall to the one that sets these parameter.
> > But it should be done once.
> >
> > Maybe there's a better interface, but I think it makes sense to do it in a way
> > that we won't have to do this again. Would be good to hear from maintainers
> > first before you take my word for it ;-)
>
> Actually I think we can do a better and simpler generic interface. We don't
> need a new flag. We can just add a new parameter for what to reset on fork.
> When this value is 0 (which it should be by default), it means reset
> everything.

Isn't he default NOT to reset everything?

> // pseudo code
>
> #define RESET_ON_FORK_ALL 0
> #define RESET_ON_FORK_POLICY BIT(1) // implies resetting priority
> #define RESET_ON_FORK_PRIORITY BIT(2)
> #define RESET_ON_FORK_UCLAMP BIT(3)
>
> struct sched_attr {
> ...
> __u64 sched_reset_on_fork_flags;
> };
>

Also, honestly I think this is over designing for a hypothetical. We
have approximately 53 unused bits. By the time we run out of those,
we'd have added at least 20-50 more fields. At that point, we can
always add a flags2 field if we need it. I like David's patch as is --
it's clear and simple. Add a flag for explicitly what we are trying to
do and extend as needed.

-Saravana

2023-05-03 13:38:25

by Qais Yousef

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On 04/28/23 11:12, Saravana Kannan wrote:
> On Fri, Apr 28, 2023 at 4:57 AM Qais Yousef <[email protected]> wrote:
> >
> > On 04/19/23 18:54, Qais Yousef wrote:
> >
> > [...]
> >
> > > I was considering to have something a bit more generic that allows selecting
> > > which attributes to reset.
> > >
> > > For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> > > SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> > > if we have other similar use cases in the future. The downside it *might*
> > > require to be done in a separate syscall to the one that sets these parameter.
> > > But it should be done once.
> > >
> > > Maybe there's a better interface, but I think it makes sense to do it in a way
> > > that we won't have to do this again. Would be good to hear from maintainers
> > > first before you take my word for it ;-)
> >
> > Actually I think we can do a better and simpler generic interface. We don't
> > need a new flag. We can just add a new parameter for what to reset on fork.
> > When this value is 0 (which it should be by default), it means reset
> > everything.
>
> Isn't he default NOT to reset everything?

The default when the RESET_ON_FORK flag is set. This field will not be used
otherwise. Like what happens for the other params.

>
> > // pseudo code
> >
> > #define RESET_ON_FORK_ALL 0
> > #define RESET_ON_FORK_POLICY BIT(1) // implies resetting priority
> > #define RESET_ON_FORK_PRIORITY BIT(2)
> > #define RESET_ON_FORK_UCLAMP BIT(3)
> >
> > struct sched_attr {
> > ...
> > __u64 sched_reset_on_fork_flags;
> > };
> >
>
> Also, honestly I think this is over designing for a hypothetical. We

latency_nice is coming next and most likely to require something similar. It's
not hypothetical nor over designing. I think it's worthwhile spending time to
plan for the future. More interfaces are confusing to the end users. And glibc
already complained about evolution of sched_setattr, that's why we don't have
a wrapper there yet (beside none of us pushed that hard to resolve the concerns
due to lack of bandwidth).

https://public-inbox.org/libc-alpha/[email protected]/

(this thread reminded me linux-api must be CCed)

And there has been various discussions of the need of higher level
wrappers/libraries that exposes simpler interface to app developers. So I'm
actually expecting this to repeat. I think that was at LPC by Len Brown. I can
find this thread on libc mailing list.

https://public-inbox.org/libc-alpha/CAMe9rOpUh1pjfEUqf_hNxce8ZX=4mg6W=n+BbdZSNFHLi7wtkw@mail.gmail.com/

These QoS hints might imply manipulating nice values and I can see ending up
with a similar situation where we need to reset nice on fork without resetting
other params.

Generally I don't think we should restrict users to self-managed model.
A delegated model does make sense, and the latter implies the need for finer
control on what to reset.

There's rtkit by the way which already an example of a delegating model to
enable creating RT tasks by non privileged users.

Should rtkit force resetting uclamp when on fork? I think it's a grey area and
I learn towards it shouldn't.

> have approximately 53 unused bits. By the time we run out of those,
> we'd have added at least 20-50 more fields. At that point, we can
> always add a flags2 field if we need it. I like David's patch as is --
> it's clear and simple. Add a flag for explicitly what we are trying to
> do and extend as needed.

Fair enough. As I said if the maintainers are okay with current proposal, no
objection from my side. Based on my experience I didn't expect them to be. And
I do think a generic solution is not really complicated and is the better
option. You can consider this as a backup plan ;-)


Cheers

--
Qais Yousef

2023-05-03 16:09:39

by Saravana Kannan

[permalink] [raw]
Subject: Re: [RFC PATCH v1] sched/uclamp: Introduce SCHED_FLAG_RESET_UCLAMP_ON_FORK flag

On Wed, May 3, 2023 at 6:29 AM Qais Yousef <[email protected]> wrote:
>
> On 04/28/23 11:12, Saravana Kannan wrote:
> > On Fri, Apr 28, 2023 at 4:57 AM Qais Yousef <[email protected]> wrote:
> > >
> > > On 04/19/23 18:54, Qais Yousef wrote:
> > >
> > > [...]
> > >
> > > > I was considering to have something a bit more generic that allows selecting
> > > > which attributes to reset.
> > > >
> > > > For example a syscall with SCHED_FLAG_RESET_ON_FORK_SEL combined with
> > > > SCHED_FLAG_UCLAMP_MIN/MAX will only reset those. This should make it extensible
> > > > if we have other similar use cases in the future. The downside it *might*
> > > > require to be done in a separate syscall to the one that sets these parameter.
> > > > But it should be done once.
> > > >
> > > > Maybe there's a better interface, but I think it makes sense to do it in a way
> > > > that we won't have to do this again. Would be good to hear from maintainers
> > > > first before you take my word for it ;-)
> > >
> > > Actually I think we can do a better and simpler generic interface. We don't
> > > need a new flag. We can just add a new parameter for what to reset on fork.
> > > When this value is 0 (which it should be by default), it means reset
> > > everything.
> >
> > Isn't he default NOT to reset everything?
>
> The default when the RESET_ON_FORK flag is set. This field will not be used
> otherwise. Like what happens for the other params.
>
> >
> > > // pseudo code
> > >
> > > #define RESET_ON_FORK_ALL 0
> > > #define RESET_ON_FORK_POLICY BIT(1) // implies resetting priority
> > > #define RESET_ON_FORK_PRIORITY BIT(2)
> > > #define RESET_ON_FORK_UCLAMP BIT(3)
> > >
> > > struct sched_attr {
> > > ...
> > > __u64 sched_reset_on_fork_flags;
> > > };
> > >
> >
> > Also, honestly I think this is over designing for a hypothetical. We
>
> latency_nice is coming next and most likely to require something similar. It's
> not hypothetical nor over designing. I think it's worthwhile spending time to
> plan for the future. More interfaces are confusing to the end users. And glibc
> already complained about evolution of sched_setattr, that's why we don't have
> a wrapper there yet (beside none of us pushed that hard to resolve the concerns
> due to lack of bandwidth).

My comment about over designing/hypothetical is about the "running out
of bits" concern you are trying to solve. Totally agree that we'll
likely need finer reset control and never disagreed with that. But we
can add those as we go to the existing flag field instead of
introducing a new one.

> https://public-inbox.org/libc-alpha/[email protected]/
>
> (this thread reminded me linux-api must be CCed)
>
> And there has been various discussions of the need of higher level
> wrappers/libraries that exposes simpler interface to app developers. So I'm
> actually expecting this to repeat. I think that was at LPC by Len Brown. I can
> find this thread on libc mailing list.
>
> https://public-inbox.org/libc-alpha/CAMe9rOpUh1pjfEUqf_hNxce8ZX=4mg6W=n+BbdZSNFHLi7wtkw@mail.gmail.com/
>
> These QoS hints might imply manipulating nice values and I can see ending up
> with a similar situation where we need to reset nice on fork without resetting
> other params.
>
> Generally I don't think we should restrict users to self-managed model.
> A delegated model does make sense, and the latter implies the need for finer
> control on what to reset.
>
> There's rtkit by the way which already an example of a delegating model to
> enable creating RT tasks by non privileged users.
>
> Should rtkit force resetting uclamp when on fork? I think it's a grey area and
> I learn towards it shouldn't.

Nothing to say to any of the stuff above because I didn't disagree on
this in the first place.

> > have approximately 53 unused bits. By the time we run out of those,
> > we'd have added at least 20-50 more fields. At that point, we can
> > always add a flags2 field if we need it. I like David's patch as is --
> > it's clear and simple. Add a flag for explicitly what we are trying to
> > do and extend as needed.
>
> Fair enough. As I said if the maintainers are okay with current proposal, no
> objection from my side. Based on my experience I didn't expect them to be. And
> I do think a generic solution is not really complicated and is the better
> option. You can consider this as a backup plan ;-)

I think the fact I could misunderstand the API and you had to explain
it makes it a bit more complicated (agree, not a lot) over just using
the existing bits. And I think that's what the maintainers are pushing
back on (in addition to even having finer reset control).

-Saravana